Custom RDD: Report Size of Partition in Bytes to Spark

Pedro Rodriguez Sun, 03 Jul 2016 19:47:58 -0700

Hi All,

I noticed on some Spark jobs it shows you input/output read size. I am 
implementing a custom RDD which reads files and would like to report these 
metrics to Spark since they are available to me.


I looked through the RDD source code and a couple different implementations and 
the best I could find were some Hadoop metrics. Is there a way to simply report 
the number of bytes a partition read so Spark can put it on the UI?

Thanks,
—
Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni

pedrorodriguez.io | 909-353-4423
github.com/EntilZha | LinkedIn

Custom RDD: Report Size of Partition in Bytes to Spark

Reply via email to