Please send me the patch , I will apply and test. On Fri, 17 Jan 2020 at 10:33 PM, Ryan Blue <rb...@netflix.com> wrote:
> We've implemented these metrics in the RDD (for input metrics) and in the > v2 DataWritingSparkTask. That approach gives you the same metrics in the > stage views that you get with v1 sources, regardless of the v2 > implementation. > > I'm not sure why they weren't included from the start. It looks like the > way metrics are collected is changing. There are a couple of metrics for > number of rows; looks like one that goes to the Spark SQL tab and one that > is used for the stages view. > > If you'd like, I can send you a patch. > > rb > > On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan <cloud0...@gmail.com> wrote: > >> I think there are a few details we need to discuss. >> >> how frequently a source should update its metrics? For example, if file >> source needs to report size metrics per row, it'll be super slow. >> >> what metrics a source should report? data size? numFiles? read time? >> >> shall we show metrics in SQL web UI as well? >> >> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta < >> sandeep0102.opensou...@gmail.com> wrote: >> >>> Hi Devs, >>> >>> Currently DS V2 does not update any input metrics. SPARK-30362 aims at >>> solving this problem. >>> >>> We can have the below approach. Have marker interface let's say >>> "ReportMetrics" >>> >>> If the DataSource Implements this interface, then it will be easy to >>> collect the metrics. >>> >>> For e.g. FilePartitionReaderFactory can support metrics. >>> >>> So it will be easy to collect the metrics if FilePartitionReaderFactory >>> implements ReportMetrics >>> >>> >>> Please let me know the views, or even if we want to have new solution or >>> design. >>> >> > > -- > Ryan Blue > Software Engineer > Netflix >