Is this due to the insert command not having metrics? It's a problem we
should fix.


On Mon, Nov 27, 2017 at 10:45 AM, Jason White <jason.wh...@shopify.com>
wrote:

> I'd like to use the SparkListenerInterface to listen for some metrics for
> monitoring/logging/metadata purposes. The first ones I'm interested in
> hooking into are recordsWritten and bytesWritten as a measure of
> throughput.
> I'm using PySpark to write Parquet files from DataFrames.
>
> I'm able to extract a rich set of metrics this way, but for some reason the
> two that I want are always 0. This mirrors what I see in the Spark
> Application Master - the # records written field is always missing.
>
> I've filed a JIRA already for this issue:
> https://issues.apache.org/jira/browse/SPARK-22605
>
> I _think_ how this works is that inside the ResultTask.runTask method, the
> rdd.iterator call is incrementing the bytes read & records read via
> RDD.getOrCompute. Where would the equivalent be for the records written
> metrics?
>
> These metrics are populated properly if I save the data as an RDD via
> df.rdd.saveAsTextFile, so the code path exists somewhere. Any hints as to
> where I should be looking?
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to