Re: [Discuss] Metrics Support for DS V2

2020-01-20 Thread Ryan Blue
I sent them to you. I had to go direct because the ASF mailing list will
remove attachments. I'm happy to send them to others if needed as well.

On Sun, Jan 19, 2020 at 9:01 PM Sandeep Katta <
sandeep0102.opensou...@gmail.com> wrote:

> Please send me the patch , I will apply and test.
>
> On Fri, 17 Jan 2020 at 10:33 PM, Ryan Blue  wrote:
>
>> We've implemented these metrics in the RDD (for input metrics) and in the
>> v2 DataWritingSparkTask. That approach gives you the same metrics in the
>> stage views that you get with v1 sources, regardless of the v2
>> implementation.
>>
>> I'm not sure why they weren't included from the start. It looks like the
>> way metrics are collected is changing. There are a couple of metrics for
>> number of rows; looks like one that goes to the Spark SQL tab and one that
>> is used for the stages view.
>>
>> If you'd like, I can send you a patch.
>>
>> rb
>>
>> On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan  wrote:
>>
>>> I think there are a few details we need to discuss.
>>>
>>> how frequently a source should update its metrics? For example, if file
>>> source needs to report size metrics per row, it'll be super slow.
>>>
>>> what metrics a source should report? data size? numFiles? read time?
>>>
>>> shall we show metrics in SQL web UI as well?
>>>
>>> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <
>>> sandeep0102.opensou...@gmail.com> wrote:
>>>
 Hi Devs,

 Currently DS V2 does not update any input metrics. SPARK-30362 aims at
 solving this problem.

 We can have the below approach. Have marker interface let's say
 "ReportMetrics"

 If the DataSource Implements this interface, then it will be easy to
 collect the metrics.

 For e.g. FilePartitionReaderFactory can support metrics.

 So it will be easy to collect the metrics if FilePartitionReaderFactory
 implements ReportMetrics


 Please let me know the views, or even if we want to have new solution
 or design.

>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [Discuss] Metrics Support for DS V2

2020-01-19 Thread Sandeep Katta
Please send me the patch , I will apply and test.

On Fri, 17 Jan 2020 at 10:33 PM, Ryan Blue  wrote:

> We've implemented these metrics in the RDD (for input metrics) and in the
> v2 DataWritingSparkTask. That approach gives you the same metrics in the
> stage views that you get with v1 sources, regardless of the v2
> implementation.
>
> I'm not sure why they weren't included from the start. It looks like the
> way metrics are collected is changing. There are a couple of metrics for
> number of rows; looks like one that goes to the Spark SQL tab and one that
> is used for the stages view.
>
> If you'd like, I can send you a patch.
>
> rb
>
> On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan  wrote:
>
>> I think there are a few details we need to discuss.
>>
>> how frequently a source should update its metrics? For example, if file
>> source needs to report size metrics per row, it'll be super slow.
>>
>> what metrics a source should report? data size? numFiles? read time?
>>
>> shall we show metrics in SQL web UI as well?
>>
>> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <
>> sandeep0102.opensou...@gmail.com> wrote:
>>
>>> Hi Devs,
>>>
>>> Currently DS V2 does not update any input metrics. SPARK-30362 aims at
>>> solving this problem.
>>>
>>> We can have the below approach. Have marker interface let's say
>>> "ReportMetrics"
>>>
>>> If the DataSource Implements this interface, then it will be easy to
>>> collect the metrics.
>>>
>>> For e.g. FilePartitionReaderFactory can support metrics.
>>>
>>> So it will be easy to collect the metrics if FilePartitionReaderFactory
>>> implements ReportMetrics
>>>
>>>
>>> Please let me know the views, or even if we want to have new solution or
>>> design.
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: [Discuss] Metrics Support for DS V2

2020-01-17 Thread Ryan Blue
We've implemented these metrics in the RDD (for input metrics) and in the
v2 DataWritingSparkTask. That approach gives you the same metrics in the
stage views that you get with v1 sources, regardless of the v2
implementation.

I'm not sure why they weren't included from the start. It looks like the
way metrics are collected is changing. There are a couple of metrics for
number of rows; looks like one that goes to the Spark SQL tab and one that
is used for the stages view.

If you'd like, I can send you a patch.

rb

On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan  wrote:

> I think there are a few details we need to discuss.
>
> how frequently a source should update its metrics? For example, if file
> source needs to report size metrics per row, it'll be super slow.
>
> what metrics a source should report? data size? numFiles? read time?
>
> shall we show metrics in SQL web UI as well?
>
> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <
> sandeep0102.opensou...@gmail.com> wrote:
>
>> Hi Devs,
>>
>> Currently DS V2 does not update any input metrics. SPARK-30362 aims at
>> solving this problem.
>>
>> We can have the below approach. Have marker interface let's say
>> "ReportMetrics"
>>
>> If the DataSource Implements this interface, then it will be easy to
>> collect the metrics.
>>
>> For e.g. FilePartitionReaderFactory can support metrics.
>>
>> So it will be easy to collect the metrics if FilePartitionReaderFactory
>> implements ReportMetrics
>>
>>
>> Please let me know the views, or even if we want to have new solution or
>> design.
>>
>

-- 
Ryan Blue
Software Engineer
Netflix


Re: [Discuss] Metrics Support for DS V2

2020-01-17 Thread Wenchen Fan
I think there are a few details we need to discuss.

how frequently a source should update its metrics? For example, if file
source needs to report size metrics per row, it'll be super slow.

what metrics a source should report? data size? numFiles? read time?

shall we show metrics in SQL web UI as well?

On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta <
sandeep0102.opensou...@gmail.com> wrote:

> Hi Devs,
>
> Currently DS V2 does not update any input metrics. SPARK-30362 aims at
> solving this problem.
>
> We can have the below approach. Have marker interface let's say
> "ReportMetrics"
>
> If the DataSource Implements this interface, then it will be easy to
> collect the metrics.
>
> For e.g. FilePartitionReaderFactory can support metrics.
>
> So it will be easy to collect the metrics if FilePartitionReaderFactory
> implements ReportMetrics
>
>
> Please let me know the views, or even if we want to have new solution or
> design.
>


[Discuss] Metrics Support for DS V2

2020-01-16 Thread Sandeep Katta
Hi Devs,

Currently DS V2 does not update any input metrics. SPARK-30362 aims at
solving this problem.

We can have the below approach. Have marker interface let's say
"ReportMetrics"

If the DataSource Implements this interface, then it will be easy to
collect the metrics.

For e.g. FilePartitionReaderFactory can support metrics.

So it will be easy to collect the metrics if FilePartitionReaderFactory
implements ReportMetrics


Please let me know the views, or even if we want to have new solution or
design.