Yes, sure. Tracking records per split and UDF exec time per call (min, max, avg, or histogram) would be valuable information when debugging the performance of a program.
2014-12-02 22:08 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > In my specific use case I was intererested in understanding why the scans > of the splits were taking a long time, so I was intrested in getting > statistics about the number of records contained in each split and the > rate/speed of its reading..do you think it could be something useful in > general? > On Dec 2, 2014 9:56 PM, "Fabian Hueske" <fhue...@apache.org> wrote: > > > Hi Flavio, > > > > we have a few recently started efforts to implement the collection of > > monitoring and runtime/data statistics. > > Counting the number of elements emitted by an operator (or data source) > > will be included. > > > > Do you want to count the number of produced tuples for monitoring the > > progress or do you see a different use case? > > > > 2014-11-28 9:37 GMT+01:00 Flavio Pompermaier <pomperma...@okkam.it>: > > > > > Hi guys, > > > > > > I was debugging an inputFormat and I discovered that there's no way to > > > understand how many records have been processed in a split. > > > So I added a counter in my input format incremented every > nextRecord..do > > > you think adding something to similar like "public int > > > getProcessedRecordsCount()" to InputFormat interface could be useful? > > > Or are you going to manage this count stat from the caller of > nextRecord? > > > > > > Best, > > > Flavio > > > > > >