Is this due to the insert command not having metrics? It's a problem we should fix.
On Mon, Nov 27, 2017 at 10:45 AM, Jason White <jason.wh...@shopify.com> wrote: > I'd like to use the SparkListenerInterface to listen for some metrics for > monitoring/logging/metadata purposes. The first ones I'm interested in > hooking into are recordsWritten and bytesWritten as a measure of > throughput. > I'm using PySpark to write Parquet files from DataFrames. > > I'm able to extract a rich set of metrics this way, but for some reason the > two that I want are always 0. This mirrors what I see in the Spark > Application Master - the # records written field is always missing. > > I've filed a JIRA already for this issue: > https://issues.apache.org/jira/browse/SPARK-22605 > > I _think_ how this works is that inside the ResultTask.runTask method, the > rdd.iterator call is incrementing the bytes read & records read via > RDD.getOrCompute. Where would the equivalent be for the records written > metrics? > > These metrics are populated properly if I save the data as an RDD via > df.rdd.saveAsTextFile, so the code path exists somewhere. Any hints as to > where I should be looking? > > > > -- > Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >