We've implemented these metrics in the RDD (for input metrics) and in the
v2 DataWritingSparkTask. That approach gives you the same metrics in the
stage views that you get with v1 sources, regardless of the v2
implementation.

I'm not sure why they weren't included from the start. It looks like the
way metrics are collected is changing. There are a couple of metrics for
number of rows; looks like one that goes to the Spark SQL tab and one that
is used for the stages view.

If you'd like, I can send you a patch.

rb

On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan <cloud0...@gmail.com> wrote:

> I think there are a few details we need to discuss.
>
> how frequently a source should update its metrics? For example, if file
> source needs to report size metrics per row, it'll be super slow.
>
> what metrics a source should report? data size? numFiles? read time?
>
> shall we show metrics in SQL web UI as well?
>
> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <
> sandeep0102.opensou...@gmail.com> wrote:
>
>> Hi Devs,
>>
>> Currently DS V2 does not update any input metrics. SPARK-30362 aims at
>> solving this problem.
>>
>> We can have the below approach. Have marker interface let's say
>> "ReportMetrics"
>>
>> If the DataSource Implements this interface, then it will be easy to
>> collect the metrics.
>>
>> For e.g. FilePartitionReaderFactory can support metrics.
>>
>> So it will be easy to collect the metrics if FilePartitionReaderFactory
>> implements ReportMetrics
>>
>>
>> Please let me know the views, or even if we want to have new solution or
>> design.
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to