OutputMetrics with data frames (spark-avro)

Tim Moran Mon, 17 Oct 2016 05:37:54 -0700

Hi,

I'm using the Databricks spark-avro library to save some DataFrames out as
Avro (with Spark 1.6.1). When I do this however, I lose the information in
the spark events about the number of records and size of data written to
HDFS for each partition that's available if I save an RDD out as a text
file.


Is this just a limitation of data frames, or is there a way of making this
information available? It's really useful for performance monitoring.

Thanks,

Tim.

-- 
This email is confidential, if you are not the intended recipient please 
delete it and notify us immediately by emailing the sender. You should not 
copy it or use it for any purpose nor disclose its contents to any other 
person. Privitar Limited is registered in England with registered number 
09305666. Registered office Salisbury House, Station Road, Cambridge, 
CB12LA.

OutputMetrics with data frames (spark-avro)

Reply via email to