Re: Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-04 Thread Pedro Rodriguez
Just realized I had been replying back to only Takeshi. Thanks for tip as it got me on the right track. Running into an issue with private [spark] methods though. It looks like the input metrics start out as None and are not initialized (verified by throwing new Exception on pattern match

Re: Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-03 Thread Takeshi Yamamuro
How about using `SparkListener`? You can collect IO statistics thru TaskMetrics#inputMetrics by yourself. // maropu On Mon, Jul 4, 2016 at 11:46 AM, Pedro Rodriguez wrote: > Hi All, > > I noticed on some Spark jobs it shows you input/output read size. I am >

Custom RDD: Report Size of Partition in Bytes to Spark

2016-07-03 Thread Pedro Rodriguez
Hi All, I noticed on some Spark jobs it shows you input/output read size. I am implementing a custom RDD which reads files and would like to report these metrics to Spark since they are available to me. I looked through the RDD source code and a couple different implementations and the best I