Re: [Discuss] Metrics Support for DS V2
I sent them to you. I had to go direct because the ASF mailing list will remove attachments. I'm happy to send them to others if needed as well. On Sun, Jan 19, 2020 at 9:01 PM Sandeep Katta < sandeep0102.opensou...@gmail.com> wrote: > Please send me the patch , I will apply and test. > > On Fri, 17 Jan 2020 at 10:33 PM, Ryan Blue wrote: > >> We've implemented these metrics in the RDD (for input metrics) and in the >> v2 DataWritingSparkTask. That approach gives you the same metrics in the >> stage views that you get with v1 sources, regardless of the v2 >> implementation. >> >> I'm not sure why they weren't included from the start. It looks like the >> way metrics are collected is changing. There are a couple of metrics for >> number of rows; looks like one that goes to the Spark SQL tab and one that >> is used for the stages view. >> >> If you'd like, I can send you a patch. >> >> rb >> >> On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan wrote: >> >>> I think there are a few details we need to discuss. >>> >>> how frequently a source should update its metrics? For example, if file >>> source needs to report size metrics per row, it'll be super slow. >>> >>> what metrics a source should report? data size? numFiles? read time? >>> >>> shall we show metrics in SQL web UI as well? >>> >>> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta < >>> sandeep0102.opensou...@gmail.com> wrote: >>> Hi Devs, Currently DS V2 does not update any input metrics. SPARK-30362 aims at solving this problem. We can have the below approach. Have marker interface let's say "ReportMetrics" If the DataSource Implements this interface, then it will be easy to collect the metrics. For e.g. FilePartitionReaderFactory can support metrics. So it will be easy to collect the metrics if FilePartitionReaderFactory implements ReportMetrics Please let me know the views, or even if we want to have new solution or design. >>> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix
Re: [Discuss] Metrics Support for DS V2
Please send me the patch , I will apply and test. On Fri, 17 Jan 2020 at 10:33 PM, Ryan Blue wrote: > We've implemented these metrics in the RDD (for input metrics) and in the > v2 DataWritingSparkTask. That approach gives you the same metrics in the > stage views that you get with v1 sources, regardless of the v2 > implementation. > > I'm not sure why they weren't included from the start. It looks like the > way metrics are collected is changing. There are a couple of metrics for > number of rows; looks like one that goes to the Spark SQL tab and one that > is used for the stages view. > > If you'd like, I can send you a patch. > > rb > > On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan wrote: > >> I think there are a few details we need to discuss. >> >> how frequently a source should update its metrics? For example, if file >> source needs to report size metrics per row, it'll be super slow. >> >> what metrics a source should report? data size? numFiles? read time? >> >> shall we show metrics in SQL web UI as well? >> >> On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta < >> sandeep0102.opensou...@gmail.com> wrote: >> >>> Hi Devs, >>> >>> Currently DS V2 does not update any input metrics. SPARK-30362 aims at >>> solving this problem. >>> >>> We can have the below approach. Have marker interface let's say >>> "ReportMetrics" >>> >>> If the DataSource Implements this interface, then it will be easy to >>> collect the metrics. >>> >>> For e.g. FilePartitionReaderFactory can support metrics. >>> >>> So it will be easy to collect the metrics if FilePartitionReaderFactory >>> implements ReportMetrics >>> >>> >>> Please let me know the views, or even if we want to have new solution or >>> design. >>> >> > > -- > Ryan Blue > Software Engineer > Netflix >
Re: [Discuss] Metrics Support for DS V2
We've implemented these metrics in the RDD (for input metrics) and in the v2 DataWritingSparkTask. That approach gives you the same metrics in the stage views that you get with v1 sources, regardless of the v2 implementation. I'm not sure why they weren't included from the start. It looks like the way metrics are collected is changing. There are a couple of metrics for number of rows; looks like one that goes to the Spark SQL tab and one that is used for the stages view. If you'd like, I can send you a patch. rb On Fri, Jan 17, 2020 at 5:09 AM Wenchen Fan wrote: > I think there are a few details we need to discuss. > > how frequently a source should update its metrics? For example, if file > source needs to report size metrics per row, it'll be super slow. > > what metrics a source should report? data size? numFiles? read time? > > shall we show metrics in SQL web UI as well? > > On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta < > sandeep0102.opensou...@gmail.com> wrote: > >> Hi Devs, >> >> Currently DS V2 does not update any input metrics. SPARK-30362 aims at >> solving this problem. >> >> We can have the below approach. Have marker interface let's say >> "ReportMetrics" >> >> If the DataSource Implements this interface, then it will be easy to >> collect the metrics. >> >> For e.g. FilePartitionReaderFactory can support metrics. >> >> So it will be easy to collect the metrics if FilePartitionReaderFactory >> implements ReportMetrics >> >> >> Please let me know the views, or even if we want to have new solution or >> design. >> > -- Ryan Blue Software Engineer Netflix
Re: [Discuss] Metrics Support for DS V2
I think there are a few details we need to discuss. how frequently a source should update its metrics? For example, if file source needs to report size metrics per row, it'll be super slow. what metrics a source should report? data size? numFiles? read time? shall we show metrics in SQL web UI as well? On Fri, Jan 17, 2020 at 3:07 PM Sandeep Katta < sandeep0102.opensou...@gmail.com> wrote: > Hi Devs, > > Currently DS V2 does not update any input metrics. SPARK-30362 aims at > solving this problem. > > We can have the below approach. Have marker interface let's say > "ReportMetrics" > > If the DataSource Implements this interface, then it will be easy to > collect the metrics. > > For e.g. FilePartitionReaderFactory can support metrics. > > So it will be easy to collect the metrics if FilePartitionReaderFactory > implements ReportMetrics > > > Please let me know the views, or even if we want to have new solution or > design. >
[Discuss] Metrics Support for DS V2
Hi Devs, Currently DS V2 does not update any input metrics. SPARK-30362 aims at solving this problem. We can have the below approach. Have marker interface let's say "ReportMetrics" If the DataSource Implements this interface, then it will be easy to collect the metrics. For e.g. FilePartitionReaderFactory can support metrics. So it will be easy to collect the metrics if FilePartitionReaderFactory implements ReportMetrics Please let me know the views, or even if we want to have new solution or design.