Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Hi, Just wanted to mention that I updated this document with one detail https://s.apache.org/beam-gcp-debuggability Date Changes Sept 8, 2020 - Clarified that the InstructionRequest/Control Channel will be used in “Proposal: SDKHs to Report non-bundle metrics.” May 15, 2020 - Completed review with beam dev list. PTAL, and LMK what you think On Fri, May 15, 2020 at 6:02 PM Alex Amato wrote: > Thanks everyone. I was able to collect a lot of good feedback from > everyone who contributed. I am going to wrap it up for now and label the > design as "Design Finalized (Unimplemented)". > > I really believe we have made a much better design than I initially wrote > up. I couldn't have done it without the help of everyone who offered their > time, energy and viewpoints. :) > > Thanks again, please let me know if you see any major issues with the > design still. I think I have enough information to begin some > implementation as soon as I have some time in the coming weeks. > Alex > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > On Thu, May 14, 2020 at 5:22 PM Alex Amato wrote: > >> Thanks to all who have spent their time on this, there were many great >> suggestions, just another reminder that tomorrow I will be finalizing the >> documents, unless there are any major objections left. Please take a look >> at it if you are interested. >> >> I will still welcome feedback at any time :). >> >> But I believe we have gathered enough information to produce a good >> design, which I will start to work on soon. >> I will begin to build the necessary subset of the new features proposed >> to support the BigQueryIO metrics use case, proposed. >> I will likely start with the python SDK first. >> >> https://s.apache.org/beam-gcp-debuggability >> https://s.apache.org/beam-histogram-metrics >> >> >> On Wed, May 13, 2020 at 3:07 PM Alex Amato wrote: >> >>> Thanks again for more feedback :). I have iterated on things again. I'll >>> report back at the end of the week. If there are no major disagreements >>> still, I'll close the discussion, believe it to be in a good enough state >>> to start some implementation. But welcome feedback. >>> >>> Latest changes are changing the exponential format to allow denser >>> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use. >>> Requiring some labels, but allowing optional >>> ones for specific IOs to provide more contents. >>> >>> https://s.apache.org/beam-gcp-debuggability >>> https://s.apache.org/beam-histogram-metrics >>> >>> On Mon, May 11, 2020 at 4:24 PM Alex Amato wrote: >>> Thanks for the great feedback so far :). I've included many new ideas, and made some revisions. Both docs have changed a fair bit since the initial mail out. https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics PTAL and let me know what you think, and hopefully we can resolve major issues by the end of the week. I'll try to finalize things by then, but of course always stay open to your great ideas. :) On Wed, May 6, 2020 at 6:19 PM Alex Amato wrote: > Thanks everyone so far for taking a look so far :). > > I am hoping to have this finalize the two reviews by the end of next > week, May 15th. > > I'll continue to follow up on feedback and make changes, and I will > add some more mentions to the documents to draw attention > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: > >> Thanks, also took a look and left some comments. >> >> On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: >> >>> Hello, >>> >>> I created another design document. This time for GCP IO >>> Debuggability Metrics. Which defines some new metrics to collect in the >>> GCP >>> IO libraries. This is for monitoring request counts and request >>> latencies. >>> >>> Please take a look and let me know what you think: >>> https://s.apache.org/beam-gcp-debuggability >>> >>> I also sent out a separate design yesterday ( >>> https://s.apache.org/beam-histogram-metrics) which is related as >>> this document uses a Histogram style metric :). >>> >>> I would love some feedback to make this feature the best possible :D, >>> Alex >>> >>
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
Hi Again, Just reviving this thread to mention that I updated the doc with a few sections: https://s.apache.org/beam-histogram-metrics Date Changes Sept 8, 2020 - Added alternative section: “Collect Moment Sketch Variables Instead of Bucket Counts” (recommend not pursuing, due to opposing trade offs and significant implementation/maintenance challenge. But may be worth pursuing in a future MonitoringInfo type). - Add distribution variables: min, max, sum, count - Added alternative section: “Update all distribution metrics to be Histograms” (recommend not pursuing, update to histogramDistribution on a case by case, due to performance concerns). May 15, 2020 - Completed review with beam dev list. PTAL and LMK what you think :) On Wed, May 6, 2020 at 9:58 AM Luke Cwik wrote: > Thanks Alex, I had some minor comments. > > On Mon, May 4, 2020 at 4:04 PM Alex Amato wrote: > >> Thanks Ismaël :). Done >> >> On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía wrote: >> >>> Moving the short link to this thread >>> https://s.apache.org/beam-histogram-metrics >>> >>> Alex can you add this link (and any other of your documents that may >>> not be there) to >>> https://cwiki.apache.org/confluence/display/BEAM/Design+Documents >>> >>> >>> On Tue, May 5, 2020 at 12:51 AM Pablo Estrada >>> wrote: >>> > >>> > FYI +Boyuan Zhang worked on implementing a histogram metric that was >>> performance-optimized into outer space for Python : ) - I don't recall if >>> she ended up getting it merged, but it's worth looking at the work. I also >>> remember Scott Wegner wrote the metrics for Java. >>> > >>> > Best >>> > -P. >>> > >>> > On Mon, May 4, 2020 at 3:33 PM Alex Amato wrote: >>> >> >>> >> Hello, >>> >> >>> >> I have created a proposal for Apache Beam FN API to support Histogram >>> Style Metrics. Which defines a method to collect Histogram style metrics >>> and pass them over the FN API. >>> >> >>> >> I would love to hear your feedback in order to improve this proposal, >>> please let me know what you think. Thanks for taking a look :) >>> >> Alex >>> >>
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks everyone. I was able to collect a lot of good feedback from everyone who contributed. I am going to wrap it up for now and label the design as "Design Finalized (Unimplemented)". I really believe we have made a much better design than I initially wrote up. I couldn't have done it without the help of everyone who offered their time, energy and viewpoints. :) Thanks again, please let me know if you see any major issues with the design still. I think I have enough information to begin some implementation as soon as I have some time in the coming weeks. Alex https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics On Thu, May 14, 2020 at 5:22 PM Alex Amato wrote: > Thanks to all who have spent their time on this, there were many great > suggestions, just another reminder that tomorrow I will be finalizing the > documents, unless there are any major objections left. Please take a look > at it if you are interested. > > I will still welcome feedback at any time :). > > But I believe we have gathered enough information to produce a good > design, which I will start to work on soon. > I will begin to build the necessary subset of the new features proposed to > support the BigQueryIO metrics use case, proposed. > I will likely start with the python SDK first. > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > > On Wed, May 13, 2020 at 3:07 PM Alex Amato wrote: > >> Thanks again for more feedback :). I have iterated on things again. I'll >> report back at the end of the week. If there are no major disagreements >> still, I'll close the discussion, believe it to be in a good enough state >> to start some implementation. But welcome feedback. >> >> Latest changes are changing the exponential format to allow denser >> buckets. Using only two MonitoringInfoSpec now for all of the IOs to use. >> Requiring some labels, but allowing optional >> ones for specific IOs to provide more contents. >> >> https://s.apache.org/beam-gcp-debuggability >> https://s.apache.org/beam-histogram-metrics >> >> On Mon, May 11, 2020 at 4:24 PM Alex Amato wrote: >> >>> Thanks for the great feedback so far :). I've included many new ideas, >>> and made some revisions. Both docs have changed a fair bit since the >>> initial mail out. >>> >>> https://s.apache.org/beam-gcp-debuggability >>> https://s.apache.org/beam-histogram-metrics >>> >>> PTAL and let me know what you think, and hopefully we can resolve major >>> issues by the end of the week. I'll try to finalize things by then, but of >>> course always stay open to your great ideas. :) >>> >>> On Wed, May 6, 2020 at 6:19 PM Alex Amato wrote: >>> Thanks everyone so far for taking a look so far :). I am hoping to have this finalize the two reviews by the end of next week, May 15th. I'll continue to follow up on feedback and make changes, and I will add some more mentions to the documents to draw attention https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: > Thanks, also took a look and left some comments. > > On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: > >> Hello, >> >> I created another design document. This time for GCP IO Debuggability >> Metrics. Which defines some new metrics to collect in the GCP IO >> libraries. >> This is for monitoring request counts and request latencies. >> >> Please take a look and let me know what you think: >> https://s.apache.org/beam-gcp-debuggability >> >> I also sent out a separate design yesterday ( >> https://s.apache.org/beam-histogram-metrics) which is related as >> this document uses a Histogram style metric :). >> >> I would love some feedback to make this feature the best possible :D, >> Alex >> >
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks to all who have spent their time on this, there were many great suggestions, just another reminder that tomorrow I will be finalizing the documents, unless there are any major objections left. Please take a look at it if you are interested. I will still welcome feedback at any time :). But I believe we have gathered enough information to produce a good design, which I will start to work on soon. I will begin to build the necessary subset of the new features proposed to support the BigQueryIO metrics use case, proposed. I will likely start with the python SDK first. https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics On Wed, May 13, 2020 at 3:07 PM Alex Amato wrote: > Thanks again for more feedback :). I have iterated on things again. I'll > report back at the end of the week. If there are no major disagreements > still, I'll close the discussion, believe it to be in a good enough state > to start some implementation. But welcome feedback. > > Latest changes are changing the exponential format to allow denser > buckets. Using only two MonitoringInfoSpec now for all of the IOs to use. > Requiring some labels, but allowing optional > ones for specific IOs to provide more contents. > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > On Mon, May 11, 2020 at 4:24 PM Alex Amato wrote: > >> Thanks for the great feedback so far :). I've included many new ideas, >> and made some revisions. Both docs have changed a fair bit since the >> initial mail out. >> >> https://s.apache.org/beam-gcp-debuggability >> https://s.apache.org/beam-histogram-metrics >> >> PTAL and let me know what you think, and hopefully we can resolve major >> issues by the end of the week. I'll try to finalize things by then, but of >> course always stay open to your great ideas. :) >> >> On Wed, May 6, 2020 at 6:19 PM Alex Amato wrote: >> >>> Thanks everyone so far for taking a look so far :). >>> >>> I am hoping to have this finalize the two reviews by the end of next >>> week, May 15th. >>> >>> I'll continue to follow up on feedback and make changes, and I will add >>> some more mentions to the documents to draw attention >>> >>> https://s.apache.org/beam-gcp-debuggability >>> https://s.apache.org/beam-histogram-metrics >>> >>> On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: >>> Thanks, also took a look and left some comments. On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: > Hello, > > I created another design document. This time for GCP IO Debuggability > Metrics. Which defines some new metrics to collect in the GCP IO > libraries. > This is for monitoring request counts and request latencies. > > Please take a look and let me know what you think: > https://s.apache.org/beam-gcp-debuggability > > I also sent out a separate design yesterday ( > https://s.apache.org/beam-histogram-metrics) which is related as this > document uses a Histogram style metric :). > > I would love some feedback to make this feature the best possible :D, > Alex >
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks again for more feedback :). I have iterated on things again. I'll report back at the end of the week. If there are no major disagreements still, I'll close the discussion, believe it to be in a good enough state to start some implementation. But welcome feedback. Latest changes are changing the exponential format to allow denser buckets. Using only two MonitoringInfoSpec now for all of the IOs to use. Requiring some labels, but allowing optional ones for specific IOs to provide more contents. https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics On Mon, May 11, 2020 at 4:24 PM Alex Amato wrote: > Thanks for the great feedback so far :). I've included many new ideas, and > made some revisions. Both docs have changed a fair bit since the initial > mail out. > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > PTAL and let me know what you think, and hopefully we can resolve major > issues by the end of the week. I'll try to finalize things by then, but of > course always stay open to your great ideas. :) > > On Wed, May 6, 2020 at 6:19 PM Alex Amato wrote: > >> Thanks everyone so far for taking a look so far :). >> >> I am hoping to have this finalize the two reviews by the end of next >> week, May 15th. >> >> I'll continue to follow up on feedback and make changes, and I will add >> some more mentions to the documents to draw attention >> >> https://s.apache.org/beam-gcp-debuggability >> https://s.apache.org/beam-histogram-metrics >> >> On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: >> >>> Thanks, also took a look and left some comments. >>> >>> On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: >>> Hello, I created another design document. This time for GCP IO Debuggability Metrics. Which defines some new metrics to collect in the GCP IO libraries. This is for monitoring request counts and request latencies. Please take a look and let me know what you think: https://s.apache.org/beam-gcp-debuggability I also sent out a separate design yesterday ( https://s.apache.org/beam-histogram-metrics) which is related as this document uses a Histogram style metric :). I would love some feedback to make this feature the best possible :D, Alex >>>
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks for the great feedback so far :). I've included many new ideas, and made some revisions. Both docs have changed a fair bit since the initial mail out. https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics PTAL and let me know what you think, and hopefully we can resolve major issues by the end of the week. I'll try to finalize things by then, but of course always stay open to your great ideas. :) On Wed, May 6, 2020 at 6:19 PM Alex Amato wrote: > Thanks everyone so far for taking a look so far :). > > I am hoping to have this finalize the two reviews by the end of next week, > May 15th. > > I'll continue to follow up on feedback and make changes, and I will add > some more mentions to the documents to draw attention > > https://s.apache.org/beam-gcp-debuggability > https://s.apache.org/beam-histogram-metrics > > On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: > >> Thanks, also took a look and left some comments. >> >> On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: >> >>> Hello, >>> >>> I created another design document. This time for GCP IO Debuggability >>> Metrics. Which defines some new metrics to collect in the GCP IO libraries. >>> This is for monitoring request counts and request latencies. >>> >>> Please take a look and let me know what you think: >>> https://s.apache.org/beam-gcp-debuggability >>> >>> I also sent out a separate design yesterday ( >>> https://s.apache.org/beam-histogram-metrics) which is related as this >>> document uses a Histogram style metric :). >>> >>> I would love some feedback to make this feature the best possible :D, >>> Alex >>> >>
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks everyone so far for taking a look so far :). I am hoping to have this finalize the two reviews by the end of next week, May 15th. I'll continue to follow up on feedback and make changes, and I will add some more mentions to the documents to draw attention https://s.apache.org/beam-gcp-debuggability https://s.apache.org/beam-histogram-metrics On Wed, May 6, 2020 at 10:00 AM Luke Cwik wrote: > Thanks, also took a look and left some comments. > > On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: > >> Hello, >> >> I created another design document. This time for GCP IO Debuggability >> Metrics. Which defines some new metrics to collect in the GCP IO libraries. >> This is for monitoring request counts and request latencies. >> >> Please take a look and let me know what you think: >> https://s.apache.org/beam-gcp-debuggability >> >> I also sent out a separate design yesterday ( >> https://s.apache.org/beam-histogram-metrics) which is related as this >> document uses a Histogram style metric :). >> >> I would love some feedback to make this feature the best possible :D, >> Alex >> >
Re: [Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Thanks, also took a look and left some comments. On Tue, May 5, 2020 at 6:24 PM Alex Amato wrote: > Hello, > > I created another design document. This time for GCP IO Debuggability > Metrics. Which defines some new metrics to collect in the GCP IO libraries. > This is for monitoring request counts and request latencies. > > Please take a look and let me know what you think: > https://s.apache.org/beam-gcp-debuggability > > I also sent out a separate design yesterday ( > https://s.apache.org/beam-histogram-metrics) which is related as this > document uses a Histogram style metric :). > > I would love some feedback to make this feature the best possible :D, > Alex >
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
Thanks Alex, I had some minor comments. On Mon, May 4, 2020 at 4:04 PM Alex Amato wrote: > Thanks Ismaël :). Done > > On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía wrote: > >> Moving the short link to this thread >> https://s.apache.org/beam-histogram-metrics >> >> Alex can you add this link (and any other of your documents that may >> not be there) to >> https://cwiki.apache.org/confluence/display/BEAM/Design+Documents >> >> >> On Tue, May 5, 2020 at 12:51 AM Pablo Estrada wrote: >> > >> > FYI +Boyuan Zhang worked on implementing a histogram metric that was >> performance-optimized into outer space for Python : ) - I don't recall if >> she ended up getting it merged, but it's worth looking at the work. I also >> remember Scott Wegner wrote the metrics for Java. >> > >> > Best >> > -P. >> > >> > On Mon, May 4, 2020 at 3:33 PM Alex Amato wrote: >> >> >> >> Hello, >> >> >> >> I have created a proposal for Apache Beam FN API to support Histogram >> Style Metrics. Which defines a method to collect Histogram style metrics >> and pass them over the FN API. >> >> >> >> I would love to hear your feedback in order to improve this proposal, >> please let me know what you think. Thanks for taking a look :) >> >> Alex >> >
[Proposal] Apache Beam Fn API - GCP IO Debuggability Metrics
Hello, I created another design document. This time for GCP IO Debuggability Metrics. Which defines some new metrics to collect in the GCP IO libraries. This is for monitoring request counts and request latencies. Please take a look and let me know what you think: https://s.apache.org/beam-gcp-debuggability I also sent out a separate design yesterday ( https://s.apache.org/beam-histogram-metrics) which is related as this document uses a Histogram style metric :). I would love some feedback to make this feature the best possible :D, Alex
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
Thanks Ismaël :). Done On Mon, May 4, 2020 at 3:59 PM Ismaël Mejía wrote: > Moving the short link to this thread > https://s.apache.org/beam-histogram-metrics > > Alex can you add this link (and any other of your documents that may > not be there) to > https://cwiki.apache.org/confluence/display/BEAM/Design+Documents > > > On Tue, May 5, 2020 at 12:51 AM Pablo Estrada wrote: > > > > FYI +Boyuan Zhang worked on implementing a histogram metric that was > performance-optimized into outer space for Python : ) - I don't recall if > she ended up getting it merged, but it's worth looking at the work. I also > remember Scott Wegner wrote the metrics for Java. > > > > Best > > -P. > > > > On Mon, May 4, 2020 at 3:33 PM Alex Amato wrote: > >> > >> Hello, > >> > >> I have created a proposal for Apache Beam FN API to support Histogram > Style Metrics. Which defines a method to collect Histogram style metrics > and pass them over the FN API. > >> > >> I would love to hear your feedback in order to improve this proposal, > please let me know what you think. Thanks for taking a look :) > >> Alex >
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
Moving the short link to this thread https://s.apache.org/beam-histogram-metrics Alex can you add this link (and any other of your documents that may not be there) to https://cwiki.apache.org/confluence/display/BEAM/Design+Documents On Tue, May 5, 2020 at 12:51 AM Pablo Estrada wrote: > > FYI +Boyuan Zhang worked on implementing a histogram metric that was > performance-optimized into outer space for Python : ) - I don't recall if she > ended up getting it merged, but it's worth looking at the work. I also > remember Scott Wegner wrote the metrics for Java. > > Best > -P. > > On Mon, May 4, 2020 at 3:33 PM Alex Amato wrote: >> >> Hello, >> >> I have created a proposal for Apache Beam FN API to support Histogram Style >> Metrics. Which defines a method to collect Histogram style metrics and pass >> them over the FN API. >> >> I would love to hear your feedback in order to improve this proposal, please >> let me know what you think. Thanks for taking a look :) >> Alex
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
FYI +Boyuan Zhang worked on implementing a histogram metric that was performance-optimized into outer space for Python : ) - I don't recall if she ended up getting it merged, but it's worth looking at the work. I also remember Scott Wegner wrote the metrics for Java. Best -P. On Mon, May 4, 2020 at 3:33 PM Alex Amato wrote: > Hello, > > I have created a proposal for Apache Beam FN API to support Histogram > Style Metrics > <https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit#>. > Which defines a method to collect Histogram style metrics and pass them > over the FN API. > > I would love to hear your feedback in order to improve this > proposal, please let me know what you think. Thanks for taking a look :) > Alex >
[Proposal] Apache Beam Fn API - Histogram Style Metrics (Correct link this time)
Hello, I have created a proposal for Apache Beam FN API to support Histogram Style Metrics <https://docs.google.com/document/d/1kiNG2BAR-51pRdBCK4-XFmc0WuIkSuBzeb__Zv8owbU/edit#>. Which defines a method to collect Histogram style metrics and pass them over the FN API. I would love to hear your feedback in order to improve this proposal, please let me know what you think. Thanks for taking a look :) Alex
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics
Sorry, wrong link. Let's close this thread and I'll send another... On Mon, May 4, 2020 at 3:28 PM Pablo Estrada wrote: > Hi Alex! > Thanks for the proposal. I've created > https://s.apache.org/beam-histogram-metrics > > On Mon, May 4, 2020 at 2:44 PM Alex Amato wrote: > >> Hello, >> >> I have created a proposal for Apache Beam FN API to support Histogram >> Style Metrics >> <https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit#heading=h.c6fjf0g6rsbc>. >> Which defines a method to collect Histogram style metrics and pass them >> over the FN API. >> >> Also, I would appreciate it if someone could generate an s.apache.org >> link for this document? Unless there is some way for me to do it myself. >> >> I would love to hear your feedback in order to improve this >> proposal, please let me know what you think. Thanks for taking a look :) >> Alex >> >
Re: [Proposal] Apache Beam Fn API - Histogram Style Metrics
Hi Alex! Thanks for the proposal. I've created https://s.apache.org/beam-histogram-metrics On Mon, May 4, 2020 at 2:44 PM Alex Amato wrote: > Hello, > > I have created a proposal for Apache Beam FN API to support Histogram > Style Metrics > <https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit#heading=h.c6fjf0g6rsbc>. > Which defines a method to collect Histogram style metrics and pass them > over the FN API. > > Also, I would appreciate it if someone could generate an s.apache.org > link for this document? Unless there is some way for me to do it myself. > > I would love to hear your feedback in order to improve this > proposal, please let me know what you think. Thanks for taking a look :) > Alex >
[Proposal] Apache Beam Fn API - Histogram Style Metrics
Hello, I have created a proposal for Apache Beam FN API to support Histogram Style Metrics <https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit#heading=h.c6fjf0g6rsbc>. Which defines a method to collect Histogram style metrics and pass them over the FN API. Also, I would appreciate it if someone could generate an s.apache.org link for this document? Unless there is some way for me to do it myself. I would love to hear your feedback in order to improve this proposal, please let me know what you think. Thanks for taking a look :) Alex
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Hello, I have rewritten most of the proposal. Though I think that there is some more research that needs to be done to get the Metric specification perfect. I plan to do more research, and would like to ask you all for more help to make this proposal better. In particular, now that the metrics format by default is designed to allow metrics to pass through to monitoring collection systems such as Dropwizard and Stackdriver, they need to be complete enough to be compatible with these systems. I think some changes will be needed to fulfill this, but I wanted to send out this document, which contains the general idea, and continue refining it. Please take a look and let me know what you think. https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit Major Revision: April 17, 2018 The design has been reworked, to use a metric format which resembles Dropwizard and Stackdriver formats, allowing metrics to be passed through. The generic bytes payload style of metrics is still available but is reserved for complex use cases which do not fit into these typical metrics collection systems. Note: This document isn’t 100% complete, there are a few areas which need to be improved, though our discussion and more research I want to complete these details. Please share any thoughts that you have. 1. The metric specification and Metric proto schemas may need revisions: 1. The distribution format needs to be refined so that its compatible with Stackdriver and Dropwizard. The current example format is. A second distribution format need. 2. Annotations needs to be examine in detail, if there are first class annotations which should be supported to pass through properly to Dropwizard and Stackdriver. 3. Aggregation functions may need parameters. For example Top(n) may need to be parameterized. How should this best be supported. On Tue, Apr 17, 2018 at 11:10 AM Ben Chambers wrote: > That sounds like a very reasonable choice -- given the discussion seemed > to be focusing on the differences between these two categories, separating > them will allow the proposal (and implementation) to address each category > in the best way possible without needing to make compromises. > > Looking forward to the updated proposal. > > On Tue, Apr 17, 2018 at 10:53 AM Alex Amato wrote: > >> Hello, >> >> I just wanted to give an update . >> >> After some discussion, I've realized that its best to break up the two >> concepts, with two separate way of reporting monitoring data. These two >> categories are: >> >>1. Metrics - Counters, Gauges, Distributions. These are well defined >>concepts for monitoring information and ned to integrate with existing >>metrics collection systems such as Dropwizard and Stackdriver. Most >> metrics >>will go through this model, which will allow runners to process new >> metrics >>without adding extra code to support them, forwarding them to metric >>collection systems. >>2. Monitoring State - This supports general monitoring data which may >>not fit into the standard model for Metrics. For example an I/O source may >>provide a table of filenames+metadata, for files which are old and >> blocking >>the system. I will propose a general approach, similar to the URN+payload >>approach used in the doc right now. >> >> One thing to keep in mind -- even though it makes sense to allow each I/O > source to define their own monitoring state, this then shifts > responsibility for collecting that information to each runner and > displaying that information to every consumer. It would be reasonable to > see if there could be a set of 10 or so that covered most of the cases that > could become the "standard" set (eg., watermark information, performance > information, etc.). > > >> I will rewrite most of the doc and propose separating these two very >> different use cases, one which optimizes for integration with existing >> monitoring systems. The other which optimizes for flexibility, allowing >> more complex and custom metrics formats for other debugging scenarios. >> >> I just wanted to give a brief update on the direction of this change, >> before writing it up in full detail. >> >> >> On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw >> wrote: >> >>> I agree that the user/system dichotomy is false, the real question of >>> how counters can be scoped to avoid accidental (or even intentional) >>> interference. A system that entirely controls the interaction between the >>> "user" (from its perspective) and the underlying system can do this by >>> prefixing all requested "user" counters with a prefix it will not use >>> itself. Of course this breaks down whenever the wrapping isn't complete >>> (either on the production or consumption side), but may be worth doing for >>> some components (like the SDKs that value being able to provide this >>> isolation for better behavi
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
That sounds like a very reasonable choice -- given the discussion seemed to be focusing on the differences between these two categories, separating them will allow the proposal (and implementation) to address each category in the best way possible without needing to make compromises. Looking forward to the updated proposal. On Tue, Apr 17, 2018 at 10:53 AM Alex Amato wrote: > Hello, > > I just wanted to give an update . > > After some discussion, I've realized that its best to break up the two > concepts, with two separate way of reporting monitoring data. These two > categories are: > >1. Metrics - Counters, Gauges, Distributions. These are well defined >concepts for monitoring information and ned to integrate with existing >metrics collection systems such as Dropwizard and Stackdriver. Most metrics >will go through this model, which will allow runners to process new metrics >without adding extra code to support them, forwarding them to metric >collection systems. >2. Monitoring State - This supports general monitoring data which may >not fit into the standard model for Metrics. For example an I/O source may >provide a table of filenames+metadata, for files which are old and blocking >the system. I will propose a general approach, similar to the URN+payload >approach used in the doc right now. > > One thing to keep in mind -- even though it makes sense to allow each I/O source to define their own monitoring state, this then shifts responsibility for collecting that information to each runner and displaying that information to every consumer. It would be reasonable to see if there could be a set of 10 or so that covered most of the cases that could become the "standard" set (eg., watermark information, performance information, etc.). > I will rewrite most of the doc and propose separating these two very > different use cases, one which optimizes for integration with existing > monitoring systems. The other which optimizes for flexibility, allowing > more complex and custom metrics formats for other debugging scenarios. > > I just wanted to give a brief update on the direction of this change, > before writing it up in full detail. > > > On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw > wrote: > >> I agree that the user/system dichotomy is false, the real question of how >> counters can be scoped to avoid accidental (or even intentional) >> interference. A system that entirely controls the interaction between the >> "user" (from its perspective) and the underlying system can do this by >> prefixing all requested "user" counters with a prefix it will not use >> itself. Of course this breaks down whenever the wrapping isn't complete >> (either on the production or consumption side), but may be worth doing for >> some components (like the SDKs that value being able to provide this >> isolation for better behavior). Actual (human) end users are likely to be >> much less careful about avoiding conflicts than library authors who in turn >> are generally less careful than authors of the system itself. >> >> We could alternatively allow for specifying fully qualified URNs for >> counter names in the SDK APIs, and letting "normal" user counters be in the >> empty namespace rather than something like beam:metrics:{user,other,...}, >> perhaps with SDKs prohibiting certain conflicting prefixes (which is less >> than ideal). A layer above the SDK that has similar absolute control over >> its "users" would have a similar decision to make. >> >> >> On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles wrote: >> >>> One reason I resist the user/system distinction is that Beam is a >>> multi-party system with at least SDK, runner, and pipeline. Often there may >>> be a DSL like SQL or Scio, or similarly someone may be building a platform >>> for their company where there is no user authoring the pipeline. Should >>> Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to >>> tack on the prefix? It looks like it is the SDK harness? Are there just >>> three namespaces "runner", "sdk", and "user"? Most of what you'd think >>> of as "user" version "system" should simply be the different between >>> dynamically defined & typed metrics and fields in control plane protos. If >>> that layer of the namespaces is not finite and limited, who can extend make >>> a valid extension? Just some questions that I think would flesh out the >>> meaning of the "user" prefix. >>> >>> Kenn >>> >>> On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler >>> wrote: >>> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw wrote: > On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler > wrote: > >> Thanks, Robert! >> >> I think my lack of clarity is around the MetricSpec. Maybe what's in >> my head and what's being proposed are the same thing. When I read that >> the >> MetricSpec describes the proto structure, that sound kind of complicated
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Hello, I just wanted to give an update . After some discussion, I've realized that its best to break up the two concepts, with two separate way of reporting monitoring data. These two categories are: 1. Metrics - Counters, Gauges, Distributions. These are well defined concepts for monitoring information and ned to integrate with existing metrics collection systems such as Dropwizard and Stackdriver. Most metrics will go through this model, which will allow runners to process new metrics without adding extra code to support them, forwarding them to metric collection systems. 2. Monitoring State - This supports general monitoring data which may not fit into the standard model for Metrics. For example an I/O source may provide a table of filenames+metadata, for files which are old and blocking the system. I will propose a general approach, similar to the URN+payload approach used in the doc right now. I will rewrite most of the doc and propose separating these two very different use cases, one which optimizes for integration with existing monitoring systems. The other which optimizes for flexibility, allowing more complex and custom metrics formats for other debugging scenarios. I just wanted to give a brief update on the direction of this change, before writing it up in full detail. On Mon, Apr 16, 2018 at 10:36 AM Robert Bradshaw wrote: > I agree that the user/system dichotomy is false, the real question of how > counters can be scoped to avoid accidental (or even intentional) > interference. A system that entirely controls the interaction between the > "user" (from its perspective) and the underlying system can do this by > prefixing all requested "user" counters with a prefix it will not use > itself. Of course this breaks down whenever the wrapping isn't complete > (either on the production or consumption side), but may be worth doing for > some components (like the SDKs that value being able to provide this > isolation for better behavior). Actual (human) end users are likely to be > much less careful about avoiding conflicts than library authors who in turn > are generally less careful than authors of the system itself. > > We could alternatively allow for specifying fully qualified URNs for > counter names in the SDK APIs, and letting "normal" user counters be in the > empty namespace rather than something like beam:metrics:{user,other,...}, > perhaps with SDKs prohibiting certain conflicting prefixes (which is less > than ideal). A layer above the SDK that has similar absolute control over > its "users" would have a similar decision to make. > > > On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles wrote: > >> One reason I resist the user/system distinction is that Beam is a >> multi-party system with at least SDK, runner, and pipeline. Often there may >> be a DSL like SQL or Scio, or similarly someone may be building a platform >> for their company where there is no user authoring the pipeline. Should >> Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to >> tack on the prefix? It looks like it is the SDK harness? Are there just >> three namespaces "runner", "sdk", and "user"? Most of what you'd think >> of as "user" version "system" should simply be the different between >> dynamically defined & typed metrics and fields in control plane protos. If >> that layer of the namespaces is not finite and limited, who can extend make >> a valid extension? Just some questions that I think would flesh out the >> meaning of the "user" prefix. >> >> Kenn >> >> On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler >> wrote: >> >>> >>> >>> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler wrote: > Thanks, Robert! > > I think my lack of clarity is around the MetricSpec. Maybe what's in > my head and what's being proposed are the same thing. When I read that > the > MetricSpec describes the proto structure, that sound kind of complicated > to > me. But I may be misinterpreting it. What I picture is something like a > MetricSpec that looks like (note: my picture looks a lot like Stackdriver > :): > > { > name: "my_timer" > name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep requiring namespaces). Or "beam:metric:[some non-user designation]" >>> >>> Sure. Looks good. >>> >>> labels: { "ptransform" } > How does an SDK act on this information? >>> >>> The SDK is obligated to submit any metric values for that spec with a >>> "ptransform" -> "transformName" in the labels field. Autogenerating code >>> from the spec to avoid typos should be easy. >>> >>> > type: GAUGE > value_type: int64 > I was lumping type and value_type into the same field, as a urn for possibly extensibility, as they're tightly coupled (e.g. quantiles, >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I agree that the user/system dichotomy is false, the real question of how counters can be scoped to avoid accidental (or even intentional) interference. A system that entirely controls the interaction between the "user" (from its perspective) and the underlying system can do this by prefixing all requested "user" counters with a prefix it will not use itself. Of course this breaks down whenever the wrapping isn't complete (either on the production or consumption side), but may be worth doing for some components (like the SDKs that value being able to provide this isolation for better behavior). Actual (human) end users are likely to be much less careful about avoiding conflicts than library authors who in turn are generally less careful than authors of the system itself. We could alternatively allow for specifying fully qualified URNs for counter names in the SDK APIs, and letting "normal" user counters be in the empty namespace rather than something like beam:metrics:{user,other,...}, perhaps with SDKs prohibiting certain conflicting prefixes (which is less than ideal). A layer above the SDK that has similar absolute control over its "users" would have a similar decision to make. On Sat, Apr 14, 2018 at 4:00 PM Kenneth Knowles wrote: > One reason I resist the user/system distinction is that Beam is a > multi-party system with at least SDK, runner, and pipeline. Often there may > be a DSL like SQL or Scio, or similarly someone may be building a platform > for their company where there is no user authoring the pipeline. Should > Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to > tack on the prefix? It looks like it is the SDK harness? Are there just > three namespaces "runner", "sdk", and "user"? Most of what you'd think > of as "user" version "system" should simply be the different between > dynamically defined & typed metrics and fields in control plane protos. If > that layer of the namespaces is not finite and limited, who can extend make > a valid extension? Just some questions that I think would flesh out the > meaning of the "user" prefix. > > Kenn > > On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler wrote: > >> >> >> On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler >>> wrote: >>> Thanks, Robert! I think my lack of clarity is around the MetricSpec. Maybe what's in my head and what's being proposed are the same thing. When I read that the MetricSpec describes the proto structure, that sound kind of complicated to me. But I may be misinterpreting it. What I picture is something like a MetricSpec that looks like (note: my picture looks a lot like Stackdriver :): { name: "my_timer" >>> >>> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep >>> requiring namespaces). Or "beam:metric:[some non-user designation]" >>> >> >> Sure. Looks good. >> >> >>> >>> labels: { "ptransform" } >>> >>> How does an SDK act on this information? >>> >> >> The SDK is obligated to submit any metric values for that spec with a >> "ptransform" -> "transformName" in the labels field. Autogenerating code >> from the spec to avoid typos should be easy. >> >> >>> >>> type: GAUGE value_type: int64 >>> >>> I was lumping type and value_type into the same field, as a urn for >>> possibly extensibility, as they're tightly coupled (e.g. quantiles, >>> distributions). >>> >> >> My inclination is that keeping this set relatively small and fixed to a >> set that can be readily exported to external monitoring systems is more >> useful than the added indirection to support extensibility. Lumping >> together seems reasonable. >> >> >>> >>> units: SECONDS description: "Times my stuff" >>> >>> Are both of these optional metadata, in the form of key-value field, for >>> flattened into the field itself (along with every other kind of metadata >>> you may want to attach)? >>> >> >> Optional metadata in the form of fixed fields. Is there a use case for >> arbitrary metadata? What would you do with it when exporting? >> >> >>> >>> } Then metrics submitted would look like: { name: "my_timer" labels: {"ptransform": "MyTransform"} int_value: 100 } >>> >>> Yes, or value could be a bytes field that is encoded according to >>> [value_]type above, if we want that extensibility (e.g. if we want to >>> bundle the pardo sub-timings together, we'd need a proto for the value, but >>> that seems to specific to hard code into the basic structure). >>> >>> >> The simplicity coming from the fact that there's only one proto format for the spec and for the value. The only thing that varies are the entries in the map and the value field set. It's pretty easy to establish contracts around this type of spec and even generate protos for use the in SDK that make the expectations explicit.
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
One reason I resist the user/system distinction is that Beam is a multi-party system with at least SDK, runner, and pipeline. Often there may be a DSL like SQL or Scio, or similarly someone may be building a platform for their company where there is no user authoring the pipeline. Should Scio, SQL, or MyCompanyFramework metrics end up in "user"? Who decides to tack on the prefix? It looks like it is the SDK harness? Are there just three namespaces "runner", "sdk", and "user"? Most of what you'd think of as "user" version "system" should simply be the different between dynamically defined & typed metrics and fields in control plane protos. If that layer of the namespaces is not finite and limited, who can extend make a valid extension? Just some questions that I think would flesh out the meaning of the "user" prefix. Kenn On Fri, Apr 13, 2018 at 5:26 PM Andrea Foegler wrote: > > > On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler >> wrote: >> >>> Thanks, Robert! >>> >>> I think my lack of clarity is around the MetricSpec. Maybe what's in my >>> head and what's being proposed are the same thing. When I read that the >>> MetricSpec describes the proto structure, that sound kind of complicated to >>> me. But I may be misinterpreting it. What I picture is something like a >>> MetricSpec that looks like (note: my picture looks a lot like Stackdriver >>> :): >>> >>> { >>> name: "my_timer" >>> >> >> name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep >> requiring namespaces). Or "beam:metric:[some non-user designation]" >> > > Sure. Looks good. > > >> >> labels: { "ptransform" } >>> >> >> How does an SDK act on this information? >> > > The SDK is obligated to submit any metric values for that spec with a > "ptransform" -> "transformName" in the labels field. Autogenerating code > from the spec to avoid typos should be easy. > > >> >> >>> type: GAUGE >>> value_type: int64 >>> >> >> I was lumping type and value_type into the same field, as a urn for >> possibly extensibility, as they're tightly coupled (e.g. quantiles, >> distributions). >> > > My inclination is that keeping this set relatively small and fixed to a > set that can be readily exported to external monitoring systems is more > useful than the added indirection to support extensibility. Lumping > together seems reasonable. > > >> >> >>> units: SECONDS >>> description: "Times my stuff" >>> >> >> Are both of these optional metadata, in the form of key-value field, for >> flattened into the field itself (along with every other kind of metadata >> you may want to attach)? >> > > Optional metadata in the form of fixed fields. Is there a use case for > arbitrary metadata? What would you do with it when exporting? > > >> >> >>> } >>> >>> Then metrics submitted would look like: >>> { >>> name: "my_timer" >>> labels: {"ptransform": "MyTransform"} >>> int_value: 100 >>> } >>> >> >> Yes, or value could be a bytes field that is encoded according to >> [value_]type above, if we want that extensibility (e.g. if we want to >> bundle the pardo sub-timings together, we'd need a proto for the value, but >> that seems to specific to hard code into the basic structure). >> >> > The simplicity coming from the fact that there's only one proto format for >>> the spec and for the value. The only thing that varies are the entries in >>> the map and the value field set. It's pretty easy to establish contracts >>> around this type of spec and even generate protos for use the in SDK that >>> make the expectations explicit. >>> >>> >>> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw >>> wrote: >>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: > > Or just "beam:counter::" or even > "beam:metric::" since metrics have a type separate from > their name. > I proposed keeping the "user" in there to avoid possible clashes with the system namespaces. (No preference on counter vs. metric, I wasn't trying to imply counter = SumInts) On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler wrote: > I like the generalization from entity -> labels. I view the purpose > of those fields to provide context. And labels feel like they supports a > richer set of contexts. > If we think such a generalization provides value, I'm fine with doing that now, as sets or key-value maps, if we have good enough examples to justify this. > The URN concept gets a little tricky. I totally agree that the > context fields should not be embedded in the name. > There's a "name" which is the identifier that can be used to > communicate what context values are supported / allowed for metrics with > that name (for example, element_count expects a ptransform ID). But then > there's the context. In Stackdriver, this context is a map of key-value > pairs; the type is consider
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 5:00 PM Robert Bradshaw wrote: > On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler wrote: > >> Thanks, Robert! >> >> I think my lack of clarity is around the MetricSpec. Maybe what's in my >> head and what's being proposed are the same thing. When I read that the >> MetricSpec describes the proto structure, that sound kind of complicated to >> me. But I may be misinterpreting it. What I picture is something like a >> MetricSpec that looks like (note: my picture looks a lot like Stackdriver >> :): >> >> { >> name: "my_timer" >> > > name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep > requiring namespaces). Or "beam:metric:[some non-user designation]" > Sure. Looks good. > > labels: { "ptransform" } >> > > How does an SDK act on this information? > The SDK is obligated to submit any metric values for that spec with a "ptransform" -> "transformName" in the labels field. Autogenerating code from the spec to avoid typos should be easy. > > >> type: GAUGE >> value_type: int64 >> > > I was lumping type and value_type into the same field, as a urn for > possibly extensibility, as they're tightly coupled (e.g. quantiles, > distributions). > My inclination is that keeping this set relatively small and fixed to a set that can be readily exported to external monitoring systems is more useful than the added indirection to support extensibility. Lumping together seems reasonable. > > >> units: SECONDS >> description: "Times my stuff" >> > > Are both of these optional metadata, in the form of key-value field, for > flattened into the field itself (along with every other kind of metadata > you may want to attach)? > Optional metadata in the form of fixed fields. Is there a use case for arbitrary metadata? What would you do with it when exporting? > > >> } >> >> Then metrics submitted would look like: >> { >> name: "my_timer" >> labels: {"ptransform": "MyTransform"} >> int_value: 100 >> } >> > > Yes, or value could be a bytes field that is encoded according to > [value_]type above, if we want that extensibility (e.g. if we want to > bundle the pardo sub-timings together, we'd need a proto for the value, but > that seems to specific to hard code into the basic structure). > > The simplicity coming from the fact that there's only one proto format for >> the spec and for the value. The only thing that varies are the entries in >> the map and the value field set. It's pretty easy to establish contracts >> around this type of spec and even generate protos for use the in SDK that >> make the expectations explicit. >> >> >> On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw >> wrote: >> >>> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: >>> Or just "beam:counter::" or even "beam:metric::" since metrics have a type separate from their name. >>> >>> I proposed keeping the "user" in there to avoid possible clashes with >>> the system namespaces. (No preference on counter vs. metric, I wasn't >>> trying to imply counter = SumInts) >>> >>> >>> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler >>> wrote: >>> I like the generalization from entity -> labels. I view the purpose of those fields to provide context. And labels feel like they supports a richer set of contexts. >>> >>> If we think such a generalization provides value, I'm fine with doing >>> that now, as sets or key-value maps, if we have good enough examples to >>> justify this. >>> >>> The URN concept gets a little tricky. I totally agree that the context fields should not be embedded in the name. There's a "name" which is the identifier that can be used to communicate what context values are supported / allowed for metrics with that name (for example, element_count expects a ptransform ID). But then there's the context. In Stackdriver, this context is a map of key-value pairs; the type is considered metadata associated with the name, but not communicated with the value. >>> >>> I'm not quite following you here. If context contains a ptransform id, >>> then it cannot be associated with a single name. >>> >>> Could the URN be "beam:namespace:name" and every metric have a map of key-value pairs for context? >>> >>> The URN is the name. Something like >>> "beam:metric:ptransform_execution_times:v1." >>> >>> Not sure where this fits in the discussion or if this is handled somewhere, but allowing for a metric configuration that's provided independently of the value allows for configuring "type", "units", etc in a uniform way without having to encode them in the metric name / value. Stackdriver expects each metric type has been configured ahead of time with these annotations / metadata. Then values are reported separately. For system metrics, the definitions can be packaged with the SDK. For user metrics, they'd be defined at runtime. >>> >>> This feels
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 4:30 PM Alex Amato wrote: > There are a few more confusing concepts in this thread > *Name* > >- Name can mean a *"string name"* used to refer to a metric in a >metrics system such as stackdriver, i.e. "ElementCount", "ExecutionTime" >- Name can mean a set of *context* fields added to a counter, either >embedded in a complex string, or in a structured name. Typically referring >to *aggregation entities, *which define how the metric updates get >aggregated into final metric values, i.e. all Metric updates with the same >field are aggregated together. > - e.g.my_ptransform_id-ElementCount > - e.g.{ name : 'ElementCount', 'ptransform_name' : > 'my_ptransform_id' } >- The *URN* of a Metric, which identifies a proto to use in a payload >field for the Metric and MetricSpec. Note: The string name, can literally >be the URN value in most cases, except for metrics which can specify a >separate name (i.e. user counters). > > @Robert, > You have proposed that metrics should contain the following parts, I still > don't fully understand what you mean by each one. > >- Name - Why is a name a URN + bytes payload? What type of name are >you referring to, *string name*? *context*? *URN*? Or something else. > > As you say above, the URN can literally be the string name. I see no reason why this can't be the case for user counters as well (the user counter name becoming part of the urn). The payload, should we decide to keep it, is "part" of the name because it helps identify what exactly we're counting. I.e. {urnX, payload1} would be distinct from {urnX, payload2}. The only reason to have a payload is to avoid sticking stuff that would be ugly to parse into the URN. > >- Entity - This is how the metric is aggregated together. If I >understand you correctly. And you correctly point out that a singular >entity is not sufficient, a set of labels may be more appropriate. > > Alternatively, the entity/labels specifies possible sub-partitions of the metric identified by its name (as above). > >- Value - *Are you saying this is just the metric value, not including >any fields related to entity or name.* > > Exactly. Like "5077." For some types it would be composite. The type also indicates how it's encoded (e.g. as bytes, or which field of a oneof should be populated). > >- Type - I am not clear at all on what this is or what it would look >like. Are you referring to units, like milliseconds/seconds? Why it >wouldn't be part of the value payload. Is this some sort of reason to >separate it out from the value? What if the value has multiple fields for >example. > > Type would be "beam:metric_type:sum:ints" or "beam:metric_type:distribution:doubles." We could separate "data type" from "aggregation type" if desired, though of course the full cross-product doesn't makes sense. We could put the unit in the type (e.g. sum_durations != sum_ints), but, preferably, I'd put this as metadata on the counter spec. It is often fully determined by the URN, but provided so one can reason about the metric without having to interpret the URN. It also means we don't have to have a separate URN for each user metric type. (In fact, any metric the runner doesn't understand would be treated as a user metric, and aggregated as such if it understand the type.) Some pros and cons as I see them > Pros: > >- More separation and flexibility for an SDK to specify labels >separately from the value/type. Though, maybe I don't understand enough, >and I am not so sure this is a con over just having the URN payload contain >everything in itself. > > We can't interpret a URN payload unless we know the URN. Separating things out allows us to act on metrics without interpreting the URN (both for unknown URNs, and simplifying the logic by not having to do lookups on the URN everywhere). > Cons: > >- I think this means that the SDK must properly pick two separate >payloads and populate them correctly. We can run into issues where. > - Having one URN which specifies all the fields you would need to > populate for a specific metric avoids this, this was a concern brought > up > by Luke. The runner would then be responsible for packaging metrics up > to > send to external monitoring systems. > > I'm not following you here. We'd return exactly what Andrea suggested. > > @Andrea, please correct me if I misunderstand > Thank you for the metric spec example in your last response, I think that > makes the idea much more clear. > > Using your approach I see the following pros and cons > Pros: > >- Runners have a cleaner more reusable codepath to forwarding metrics >to external monitoring systems. This will mean less work on the runner side >to support each metric (perhaps none in many cases). >- SDKs may need less code as well to package up new metrics. >- As long\ as
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
On Fri, Apr 13, 2018 at 3:28 PM Andrea Foegler wrote: > Thanks, Robert! > > I think my lack of clarity is around the MetricSpec. Maybe what's in my > head and what's being proposed are the same thing. When I read that the > MetricSpec describes the proto structure, that sound kind of complicated to > me. But I may be misinterpreting it. What I picture is something like a > MetricSpec that looks like (note: my picture looks a lot like Stackdriver > :): > > { > name: "my_timer" > name: "beam:metric:user:my_namespace:my_timer" (assuming we want to keep requiring namespaces). Or "beam:metric:[some non-user designation]" labels: { "ptransform" } > How does an SDK act on this information? > type: GAUGE > value_type: int64 > I was lumping type and value_type into the same field, as a urn for possibly extensibility, as they're tightly coupled (e.g. quantiles, distributions). > units: SECONDS > description: "Times my stuff" > Are both of these optional metadata, in the form of key-value field, for flattened into the field itself (along with every other kind of metadata you may want to attach)? > } > > Then metrics submitted would look like: > { > name: "my_timer" > labels: {"ptransform": "MyTransform"} > int_value: 100 > } > Yes, or value could be a bytes field that is encoded according to [value_]type above, if we want that extensibility (e.g. if we want to bundle the pardo sub-timings together, we'd need a proto for the value, but that seems to specific to hard code into the basic structure). > The simplicity coming from the fact that there's only one proto format for > the spec and for the value. The only thing that varies are the entries in > the map and the value field set. It's pretty easy to establish contracts > around this type of spec and even generate protos for use the in SDK that > make the expectations explicit. > > > On Fri, Apr 13, 2018 at 2:23 PM Robert Bradshaw > wrote: > >> On Fri, Apr 13, 2018 at 1:32 PM Kenneth Knowles wrote: >> >>> >>> Or just "beam:counter::" or even >>> "beam:metric::" since metrics have a type separate from >>> their name. >>> >> >> I proposed keeping the "user" in there to avoid possible clashes with the >> system namespaces. (No preference on counter vs. metric, I wasn't trying to >> imply counter = SumInts) >> >> >> On Fri, Apr 13, 2018 at 2:02 PM Andrea Foegler >> wrote: >> >>> I like the generalization from entity -> labels. I view the purpose of >>> those fields to provide context. And labels feel like they supports a >>> richer set of contexts. >>> >> >> If we think such a generalization provides value, I'm fine with doing >> that now, as sets or key-value maps, if we have good enough examples to >> justify this. >> >> >>> The URN concept gets a little tricky. I totally agree that the context >>> fields should not be embedded in the name. >>> There's a "name" which is the identifier that can be used to communicate >>> what context values are supported / allowed for metrics with that name (for >>> example, element_count expects a ptransform ID). But then there's the >>> context. In Stackdriver, this context is a map of key-value pairs; the >>> type is considered metadata associated with the name, but not communicated >>> with the value. >>> >> >> I'm not quite following you here. If context contains a ptransform id, >> then it cannot be associated with a single name. >> >> >>> Could the URN be "beam:namespace:name" and every metric have a map of >>> key-value pairs for context? >>> >> >> The URN is the name. Something like >> "beam:metric:ptransform_execution_times:v1." >> >> >>> Not sure where this fits in the discussion or if this is handled >>> somewhere, but allowing for a metric configuration that's provided >>> independently of the value allows for configuring "type", "units", etc in a >>> uniform way without having to encode them in the metric name / value. >>> Stackdriver expects each metric type has been configured ahead of time with >>> these annotations / metadata. Then values are reported separately. For >>> system metrics, the definitions can be packaged with the SDK. For user >>> metrics, they'd be defined at runtime. >>> >> >> This feels like the metrics spec, that specifies that the metric with >> name/URN X has this type plus a bunch of other metadata (e.g. units, if >> they're not implicit in the type? This gets into whether the type should be >> Duration{Sum,Max,Distribution,...} vs. Int{Sum,Max,Distribution,...} + >> units metadata). >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
:25 AM Kenneth Knowles >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I >>>>>>>>>>>>>> was going to bring here. Let's keep it simple and use concrete >>>>>>>>>>>>>> use cases to >>>>>>>>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Kenn >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers < >>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom >>>>>>>>>>>>>>> metrics of supported type" didn't include new ways of >>>>>>>>>>>>>>> aggregating ints. As >>>>>>>>>>>>>>> long as that means we have a fixed set of aggregations (that >>>>>>>>>>>>>>> align with >>>>>>>>>>>>>>> what what users want and metrics back end support) it seems >>>>>>>>>>>>>>> like we are >>>>>>>>>>>>>>> doing user metrics right. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> - Ben >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters >>>>>>>>>>>>>>>> are used a lot but others are less mainstream so being too >>>>>>>>>>>>>>>> fine from the >>>>>>>>>>>>>>>> start can just add complexity and bugs in impls IMHO. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" < >>>>>>>>>>>>>>>> rober...@google.com> a écrit : >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be >>>>>>>>>>>>>>>>> a type, as >>>>>>>>>>>>>>>>> would double-distribution. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> When you say type do you mean accumulator type, result >>>>>>>>>>>>>>>>>> type, or accumulator strategy? Specifically, what is the >>>>>>>>>>>>>>>>>> "type" of sumint, >>>>>>>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the >>>>>>>>>>>>>>>>>>> road (and may >>>>>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support >>>>>>>>>>>>>>>>>>> custom metrics of >>>>>>>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined >>>>>>>>>>>>>>>>>>>> metric types based on the fact they just weren't used >>>>>>>>>>>>>>>>>>>> enough to justify >>>>>>>>>>>>>>>>>>>> support. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set >>>>>>>>>>>>>>>>>>>> plus any special >>>>>>>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're >>>>>>>>>>>>>>>>>>>>> conflating the idea of custom metrics and custom metric >>>>>>>>>>>>>>>>>>>>> types. I would >>>>>>>>>>>>>>>>>>>>> propose the MetricSpec field be augmented with an >>>>>>>>>>>>>>>>>>>>> additional field "type" >>>>>>>>>>>>>>>>>>>>> which is a urn specifying the type of metric it is (i.e. >>>>>>>>>>>>>>>>>>>>> the contents of >>>>>>>>>>>>>>>>>>>>> its payload, as well as the form of aggregation). Summing >>>>>>>>>>>>>>>>>>>>> or maxing over >>>>>>>>>>>>>>>>>>>>> ints would be a typical example. Though we could pursue >>>>>>>>>>>>>>>>>>>>> making this opaque >>>>>>>>>>>>>>>>>>>>> to the runner in the long run, that's a more speculative >>>>>>>>>>>>>>>>>>>>> (and difficult) >>>>>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at >>>>>>>>>>>>>>>>>>>>> least aggregate and >>>>>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of >>>>>>>>>>>>>>>>>>>>> the specialization >>>>>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand >>>>>>>>>>>>>>>>>>>>> as well.) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for >>>>>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric >>>>>>>>>>>>>>>>>>>>> and it spec would >>>>>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) >>>>>>>>>>>>>>>>>>>>> name. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all >>>>>>>>>>>>>>>>>>>>>> metrics refer to a primary entity, so I have >>>>>>>>>>>>>>>>>>>>>> restructured some of the >>>>>>>>>>>>>>>>>>>>>> protos a little bit. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity >>>>>>>>>>>>>>>>>>>>>> associated with it (e.g. PCollection, PTransform), we >>>>>>>>>>>>>>>>>>>>>> can design an >>>>>>>>>>>>>>>>>>>>>> approach which forwards the opaque bytes metric updates, >>>>>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>>>>> deserializing them. These are forwarded to user provided >>>>>>>>>>>>>>>>>>>>>> code which then >>>>>>>>>>>>>>>>>>>>>> would deserialize the metric update payloads and perform >>>>>>>>>>>>>>>>>>>>>> the custom >>>>>>>>>>>>>>>>>>>>>> aggregations. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform >>>>>>>>>>>>>>>>>>>>>> names inside >>>>>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for >>>>>>>>>>>>>>>>>>>>>> the metrics as the >>>>>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to >>>>>>>>>>>>>>>>>>>>>> draw attention to this particular revision. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to >>>>>>>>>>>>>>>>>>>>>>> make a decision by Friday, and begin working on related >>>>>>>>>>>>>>>>>>>>>>> PRs next week. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback >>>>>>>>>>>>>>>>>>>>>>> before then and I will post the final decisions made to >>>>>>>>>>>>>>>>>>>>>>> this thread Friday >>>>>>>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to >>>>>>>>>>>>>>>>>>>>>>>> it easily in >>>>>>>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some >>>>>>>>>>>>>>>>>>>>>>>> comments. >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on >>>>>>>>>>>>>>>>>>>>>>>> this proposal so far. I >>>>>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. >>>>>>>>>>>>>>>>>>>>>>>> There were some larger >>>>>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed >>>>>>>>>>>>>>>>>>>>>>>> my recommendation as well >>>>>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could >>>>>>>>>>>>>>>>>>>>>>>> please take another look after >>>>>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
;>>> they'll be 100% redundant for all entities for a given metric convinces >>>>>>> me >>>>>>> that it's not worth creating and tracking an enum for the type alongside >>>>>>> the id. >>>>>>> >>>>>>> >>>>>>>> *}* >>>>>>>> >>>>>>>> On Fri, Apr 13, 2018 at 9:14 AM Robert Bradshaw < >>>>>>>> rober...@google.com> wrote: >>>>>>>> >>>>>>>>> On Fri, Apr 13, 2018 at 8:31 AM Kenneth Knowles >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> To Robert's proto: >>>>>>>>>> >>>>>>>>>> // A mapping of entities to (encoded) values. >>>>>>>>>>> map values; >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Are the keys here the names of the metrics, aka what is used for >>>>>>>>>> URNs in the doc? >>>>>>>>>> >>>>>>>>>>> >>>>>>>>> They're the entities to which a metric is attached, e.g. a >>>>>>>>> PTransform, a PCollection, or perhaps a process/worker. >>>>>>>>> >>>>>>>>> >>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 12, 2018 at 9:25 AM Kenneth Knowles >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Agree with all of this. It echoes a thread on the doc that I >>>>>>>>>>>>> was going to bring here. Let's keep it simple and use concrete >>>>>>>>>>>>> use cases to >>>>>>>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>>>>>>> >>>>>>>>>>>>> Kenn >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers < >>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics >>>>>>>>>>>>>> of supported type" didn't include new ways of aggregating ints. >>>>>>>>>>>>>> As long as >>>>>>>>>>>>>> that means we have a fixed set of aggregations (that align with >>>>>>>>>>>>>> what what >>>>>>>>>>>>>> users want and metrics back end support) it seems like we are >>>>>>>>>>>>>> doing user >>>>>>>>>>>>>> metrics right. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Ben >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters >>>>>>>>>>>>>>> are used a lot but others are less mainstream so being too fine >>>>>>>>>>>>>>> from the >>>>>>>>>>>>>>> start can just add complexity and bugs in impls IMHO. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" < >>>>>>>>>>>>>>> rober...@google.com> a écrit : >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>&g
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
>>>>> >>>>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>>>>>> type, as >>>>>>>>>>>>>>>> would double-distribution. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> When you say type do you mean accumulator type, result >>>>>>>>>>>>>>>>> type, or accumulator strategy? Specifically, what is the >>>>>>>>>>>>>>>>> "type" of sumint, >>>>>>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the >>>>>>>>>>>>>>>>>> road (and may >>>>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined >>>>>>>>>>>>>>>>>>> metric types based on the fact they just weren't used >>>>>>>>>>>>>>>>>>> enough to justify >>>>>>>>>>>>>>>>>>> support. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set >>>>>>>>>>>>>>>>>>> plus any special >>>>>>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're >>>>>>>>>>>>>>>>>>>> conflating the idea of custom metrics and custom metric >>>>>>>>>>>>>>>>>>>> types. I would >>>>>>>>>>>>>>>>>>>> propose the MetricSpec field be augmented with an >>>>>>>>>>>>>>>>>>>> additional field "type" >>>>>>>>>>>>>>>>>>>> which is a urn specifying the type of metric it is (i.e. >>>>>>>>>>>>>>>>>>>> the contents of >>>>>>>>>>>>>>>>>>>> its payload, as well as the form of aggregation). Summing >>>>>>>>>>>>>>>>>>>> or maxing over >>>>>>>>>>>>>>>>>>>> ints would be a typical example. Though we could pursue >>>>>>>>>>>>>>>>>>>> making this opaque >>>>>>>>>>>>>>>>>>>> to the runner in the long run, that's a more speculative >>>>>>>>>>>>>>>>>>>> (and difficult) >>>>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of >>>>>>>>>>>>>>>>>>>> the specialization >>>>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand >>>>>>>>>>>>>>>>>>>> as well.) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for >>>>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric >>>>>>>>>>>>>>>>>>>> and it spec would >>>>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) >>>>>>>>>>>>>>>>>>>> name. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all >>>>>>>>>>>>>>>>>>>>> metrics refer to a primary entity, so I have restructured >>>>>>>>>>>>>>>>>>>>> some of the >>>>>>>>>>>>>>>>>>>>> protos a little bit. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity >>>>>>>>>>>>>>>>>>>>> associated with it (e.g. PCollection, PTransform), we can >>>>>>>>>>>>>>>>>>>>> design an >>>>>>>>>>>>>>>>>>>>> approach which forwards the opaque bytes metric updates, >>>>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>>>> deserializing them. These are forwarded to user provided >>>>>>>>>>>>>>>>>>>>> code which then >>>>>>>>>>>>>>>>>>>>> would deserialize the metric update payloads and perform >>>>>>>>>>>>>>>>>>>>> the custom >>>>>>>>>>>>>>>>>>>>> aggregations. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform >>>>>>>>>>>>>>>>>>>>> names inside >>>>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to >>>>>>>>>>>>>>>>>>>>> draw attention to this particular revision. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to >>>>>>>>>>>>>>>>>>>>>> make a decision by Friday, and begin working on related >>>>>>>>>>>>>>>>>>>>>> PRs next week. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback >>>>>>>>>>>>>>>>>>>>>> before then and I will post the final decisions made to >>>>>>>>>>>>>>>>>>>>>> this thread Friday >>>>>>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to >>>>>>>>>>>>>>>>>>>>>>> it easily in >>>>>>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on >>>>>>>>>>>>>>>>>>>>>>> this proposal so far. I >>>>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. >>>>>>>>>>>>>>>>>>>>>>> There were some larger >>>>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed >>>>>>>>>>>>>>>>>>>>>>> my recommendation as well >>>>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could >>>>>>>>>>>>>>>>>>>>>>> please take another look after >>>>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
; cases to >>>>>>>>>>>> drive additional abstraction if/when it becomes compelling. >>>>>>>>>>>> >>>>>>>>>>>> Kenn >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Apr 12, 2018 at 9:21 AM Ben Chambers < >>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Sounds perfect. Just wanted to make sure that "custom metrics >>>>>>>>>>>>> of supported type" didn't include new ways of aggregating ints. >>>>>>>>>>>>> As long as >>>>>>>>>>>>> that means we have a fixed set of aggregations (that align with >>>>>>>>>>>>> what what >>>>>>>>>>>>> users want and metrics back end support) it seems like we are >>>>>>>>>>>>> doing user >>>>>>>>>>>>> metrics right. >>>>>>>>>>>>> >>>>>>>>>>>>> - Ben >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters >>>>>>>>>>>>>> are used a lot but others are less mainstream so being too fine >>>>>>>>>>>>>> from the >>>>>>>>>>>>>> start can just add complexity and bugs in impls IMHO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" >>>>>>>>>>>>>> a écrit : >>>>>>>>>>>>>> >>>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>>>>> type, as >>>>>>>>>>>>>>> would double-distribution. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When you say type do you mean accumulator type, result >>>>>>>>>>>>>>>> type, or accumulator strategy? Specifically, what is the >>>>>>>>>>>>>>>> "type" of sumint, >>>>>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>>>>> (and may >>>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metr
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
; >>>>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>>>> type, as >>>>>>>>>>>>>> would double-distribution. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> When you say type do you mean accumulator type, result type, >>>>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of >>>>>>>>>>>>>>> sumint, >>>>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>>>> (and may >>>>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're >>>>>>>>>>>>>>>>>> conflating the idea of custom metrics and custom metric >>>>>>>>>>>>>>>>>> types. I would >>>>>>>>>>>>>>>>>> propose the MetricSpec field be augmented with an additional >>>>>>>>>>>>>>>>>> field "type" >>>>>>>>>>>>>>>>>> which is a urn specifying the type of metric it is (i.e. the >>>>>>>>>>>>>>>>>> contents of >>>>>>>>>>>>>>>>>> its payload, as well as the form of aggregation). Summing or >>>>>>>>>>>>>>>>>> maxing over >>>>>>>>>>>>>>>>>> ints would be a typical example. Though we could pursue >>>>>>>>>>>>>>>>>> making this opaque >>>>>>>>>>>>>>>>>> to the runner in the long run, that's a more speculative >>>>>>>>>>>>>>>>>> (and difficult) >>>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>>>>>> specialization >>>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>>>>>> well.) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for >>>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric and >>>>>>>>>>>>>>>>>> it spec would >>>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) >>>>>>>>>>>>>>>>>> name. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>>>> I have made a revision today which is to make all >>>>>>>>>>>>>>>>>>> metrics refer to a primary entity, so I have restructured >>>>>>>>>>>>>>>>>>> some of the >>>>>>>>>>>>>>>>>>> protos a little bit. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity >>>>>>>>>>>>>>>>>>> associated with it (e.g. PCollection, PTransform), we can >>>>>>>>>>>>>>>>>>> design an >>>>>>>>>>>>>>>>>>> approach which forwards the opaque bytes metric updates, >>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>> deserializing them. These are forwarded to user provided >>>>>>>>>>>>>>>>>>> code which then >>>>>>>>>>>>>>>>>>> would deserialize the metric update payloads and perform >>>>>>>>>>>>>>>>>>> the custom >>>>>>>>>>>>>>>>>>> aggregations. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform >>>>>>>>>>>>>>>>>>> names inside >>>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make >>>>>>>>>>>>>>>>>>>> a decision by Friday, and begin working on related PRs >>>>>>>>>>>>>>>>>>>> next week. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before >>>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this >>>>>>>>>>>>>>>>>>>> thread Friday >>>>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on >>>>>>>>>>>>>>>>>>>>> this proposal so far. I >>>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. >>>>>>>>>>>>>>>>>>>>> There were some larger >>>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please >>>>>>>>>>>>>>>>>>>>> take another look after >>>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
prevent user defined metric >>>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>>>>> propose >>>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>>>>>>>>>> "type" which is >>>>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents >>>>>>>>>>>>>>>>> of its >>>>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or >>>>>>>>>>>>>>>>> maxing over ints >>>>>>>>>>>>>>>>> would be a typical example. Though we could pursue making >>>>>>>>>>>>>>>>> this opaque to >>>>>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>>>>>>> difficult) >>>>>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>>>>> understand the >>>>>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>>>>> specialization >>>>>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>>>>> well.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for >>>>>>>>>>>>>>>>> every type X one would have a single URN for UserMetric and >>>>>>>>>>>>>>>>> it spec would >>>>>>>>>>>>>>>>> designate the type and payload designate the (qualified) name. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>>>>> I have made a revision today which is to make all metrics >>>>>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of >>>>>>>>>>>>>>>>>> the protos a >>>>>>>>>>>>>>>>>> little bit. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The point of this change was to futureproof the >>>>>>>>>>>>>>>>>> possibility of allowing custom user metrics, with custom >>>>>>>>>>>>>>>>>> aggregation >>>>>>>>>>>>>>>>>> functions for its metric updates. >>>>>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated >>>>>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an >>>>>>>>>>>>>>>>>> approach which >>>>>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without >>>>>>>>>>>>>>>>>> deserializing them. These >>>>>>>>>>>>>>>>>> are forwarded to user provided code which then would >>>>>>>>>>>>>>>>>> deserialize the metric >>>>>>>>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform >>>>>>>>>>>>>>>>>> names inside >>>>>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make >>>>>>>>>>>>>>>>>>> a decision by Friday, and begin working on related PRs next >>>>>>>>>>>>>>>>>>> week. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Please make sure that you provide your feedback before >>>>>>>>>>>>>>>>>>> then and I will post the final decisions made to this >>>>>>>>>>>>>>>>>>> thread Friday >>>>>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. >>>>>>>>>>>>>>>>>>>> There were some larger >>>>>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of >>>>>>>>>>>>>>>>>>>> these I have added a >>>>>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please >>>>>>>>>>>>>>>>>>>> take another look after >>>>>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
gt;>>>>>>>>> supported type" didn't include new ways of aggregating ints. As long >>>>>>>>>> as >>>>>>>>>> that means we have a fixed set of aggregations (that align with what >>>>>>>>>> what >>>>>>>>>> users want and metrics back end support) it seems like we are doing >>>>>>>>>> user >>>>>>>>>> metrics right. >>>>>>>>>> >>>>>>>>>> - Ben >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>>> used a lot but others are less mainstream so being too fine from >>>>>>>>>>> the start >>>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>>> >>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" >>>>>>>>>>> a écrit : >>>>>>>>>>> >>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>> type, as >>>>>>>>>>>> would double-distribution. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When you say type do you mean accumulator type, result type, >>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of >>>>>>>>>>>>> sumint, >>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>> (and may >>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One thing that has occurred to me
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
want and metrics back end support) it seems like we are doing >>>>>>>>>> user >>>>>>>>>> metrics right. >>>>>>>>>> >>>>>>>>>> - Ben >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau < >>>>>>>>>> rmannibu...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>>> used a lot but others are less mainstream so being too fine from >>>>>>>>>>> the start >>>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>>> >>>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" >>>>>>>>>>> a écrit : >>>>>>>>>>> >>>>>>>>>>>> By "type" of metric, I mean both the data types (including >>>>>>>>>>>> their encoding) and accumulator strategy. So sumint would be a >>>>>>>>>>>> type, as >>>>>>>>>>>> would double-distribution. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> When you say type do you mean accumulator type, result type, >>>>>>>>>>>>> or accumulator strategy? Specifically, what is the "type" of >>>>>>>>>>>>> sumint, >>>>>>>>>>>>> sumlong, meanlong, etc? >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>>> (and may >>>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>>> metrics of >>>>>>>>>>>>>> standard type. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>>> types based on the fact they just weren't used enough to >>>>>>>>>>>>>>> justify support. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus >>>>>>>>>>>>>>> any special >>>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>>>> propose >>>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
om> wrote: >>>>>>>>> >>>>>>>>>> Maybe leave it out until proven it is needed. ATM counters are >>>>>>>>>> used a lot but others are less mainstream so being too fine from the >>>>>>>>>> start >>>>>>>>>> can just add complexity and bugs in impls IMHO. >>>>>>>>>> >>>>>>>>>> Le 12 avr. 2018 08:06, "Robert Bradshaw" a >>>>>>>>>> écrit : >>>>>>>>>> >>>>>>>>>>> By "type" of metric, I mean both the data types (including their >>>>>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as >>>>>>>>>>> would >>>>>>>>>>> double-distribution. >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> When you say type do you mean accumulator type, result type, or >>>>>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, >>>>>>>>>>>> sumlong, >>>>>>>>>>>> meanlong, etc? >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>>>> difficult" feature that I was proposing we kick down the road >>>>>>>>>>>>> (and may >>>>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>>>> metrics of >>>>>>>>>>>>> standard type. >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> The metric api is designed to prevent user defined metric >>>>>>>>>>>>>> types based on the fact they just weren't used enough to justify >>>>>>>>>>>>>> support. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any >>>>>>>>>>>>>> special >>>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>>> propose >>>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>>>>>>>> "type" which is >>>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of >>>>>>>>>>>>>>> its >>>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing >>>>>>>>>>>>>>> over ints >>>>>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>>>>> opaque to >>>>>&
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
on the fact they just weren't used enough to justify >>>>>>>>>>>>> support. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any >>>>>>>>>>>>> special >>>>>>>>>>>>> system metrivs? >>>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>>> >>>>>>>>>>>>>> One thing that has occurred to me is that we're conflating >>>>>>>>>>>>>> the idea of custom metrics and custom metric types. I would >>>>>>>>>>>>>> propose >>>>>>>>>>>>>> the MetricSpec field be augmented with an additional field >>>>>>>>>>>>>> "type" which is >>>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of >>>>>>>>>>>>>> its >>>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing >>>>>>>>>>>>>> over ints >>>>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>>>> opaque to >>>>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>>>> difficult) >>>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>>> aggregate and >>>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>>> understand the >>>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>>> specialization >>>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>>> well.) >>>>>>>>>>>>>> >>>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every >>>>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec >>>>>>>>>>>>>> would >>>>>>>>>>>>>> designate the type and payload designate the (qualified) name. >>>>>>>>>>>>>> >>>>>>>>>>>>>> - Robert >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato < >>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>>>> I have made a revision today which is to make all metrics >>>>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the >>>>>>>>>>>>>>> protos a >>>>>>>>>>>>>>> little bit. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The point of this change was to futureproof the possibility >>>>>>>>>>>>>>> of allowing custom user metrics, with custom aggregation >>>>>>>>>>>>>>> functions for its >>>>>>>>>>>>>>> metric updates. >>>>>>>>>>>>>>> Now that each metric has an aggregation_entity associated >>>>>>>>>>>>>>> with it (e.g. PCollection, PTransform), we can design an >>>>>>>>>>>>>>> approach which >>>>>>>>>>>>>>> forwards the opaque bytes metric updates, without deserializing >>>>>>>>>>>>>>> them. These >>>>>>>>>>>>>>> are forwarded to user provided code which then would >>>>>>>>>>>>>>> deserialize the metric >>>>>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think it has also simplified some of the URN metric >>>>>>>>>>>>>>> protos, as they do not need to keep track of ptransform names >>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>> themselves now. The result is simpler structures, for the >>>>>>>>>>>>>>> metrics as the >>>>>>>>>>>>>>> entities are pulled outside of the metric. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato < >>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please make sure that you provide your feedback before then >>>>>>>>>>>>>>>> and I will post the final decisions made to this thread Friday >>>>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There >>>>>>>>>>>>>>>>> were some larger >>>>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these >>>>>>>>>>>>>>>>> I have added a >>>>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised >>>>>>>>>>>>>>>>> proposal. Please take >>>>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>>>>>>>> another look after >>>>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
t;>>>> By "type" of metric, I mean both the data types (including their >>>>>>>>> encoding) and accumulator strategy. So sumint would be a type, as >>>>>>>>> would >>>>>>>>> double-distribution. >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers < >>>>>>>>> bjchamb...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> When you say type do you mean accumulator type, result type, or >>>>>>>>>> accumulator strategy? Specifically, what is the "type" of sumint, >>>>>>>>>> sumlong, >>>>>>>>>> meanlong, etc? >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw < >>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>> >>>>>>>>>>> Fully custom metric types is the "more speculative and >>>>>>>>>>> difficult" feature that I was proposing we kick down the road (and >>>>>>>>>>> may >>>>>>>>>>> never get to). What I'm suggesting is that we support custom >>>>>>>>>>> metrics of >>>>>>>>>>> standard type. >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> The metric api is designed to prevent user defined metric types >>>>>>>>>>>> based on the fact they just weren't used enough to justify support. >>>>>>>>>>>> >>>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any >>>>>>>>>>>> special >>>>>>>>>>>> system metrivs? >>>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>>> >>>>>>>>>>>>> One thing that has occurred to me is that we're conflating the >>>>>>>>>>>>> idea of custom metrics and custom metric types. I would propose >>>>>>>>>>>>> the MetricSpec field be augmented with an additional field "type" >>>>>>>>>>>>> which is >>>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of >>>>>>>>>>>>> its >>>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing >>>>>>>>>>>>> over ints >>>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>>> opaque to >>>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>>> difficult) >>>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>>> aggregate and >>>>>>>>>>>>> report/return to the SDK metrics that it did not itself >>>>>>>>>>>>> understand the >>>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>>> specialization >>>>>>>>>>>>> in the runner itself for metrics that it *did* understand as >>>>>>>>>>>>> well.) >>>>>>>>>>>>> >>>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every >>>>>>>>>>>>> type X one would have
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
es is the "more speculative and difficult" >>>>>>>>>> feature that I was proposing we kick down the road (and may never >>>>>>>>>> get to). >>>>>>>>>> What I'm suggesting is that we support custom metrics of standard >>>>>>>>>> type. >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers < >>>>>>>>>> bchamb...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> The metric api is designed to prevent user defined metric types >>>>>>>>>>> based on the fact they just weren't used enough to justify support. >>>>>>>>>>> >>>>>>>>>>> Is there a reason we are bringing that complexity back? >>>>>>>>>>> Shouldn't we just need the ability for the standard set plus any >>>>>>>>>>> special >>>>>>>>>>> system metrivs? >>>>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw < >>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>>>> >>>>>>>>>>>> One thing that has occurred to me is that we're conflating the >>>>>>>>>>>> idea of custom metrics and custom metric types. I would propose >>>>>>>>>>>> the MetricSpec field be augmented with an additional field "type" >>>>>>>>>>>> which is >>>>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its >>>>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing >>>>>>>>>>>> over ints >>>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>>> opaque to >>>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>>> difficult) >>>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>>> aggregate and >>>>>>>>>>>> report/return to the SDK metrics that it did not itself understand >>>>>>>>>>>> the >>>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>>> specialization >>>>>>>>>>>> in the runner itself for metrics that it *did* understand as well.) >>>>>>>>>>>> >>>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every >>>>>>>>>>>> type X one would have a single URN for UserMetric and it spec would >>>>>>>>>>>> designate the type and payload designate the (qualified) name. >>>>>>>>>>>> >>>>>>>>>>>> - Robert >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>>> I have made a revision today which is to make all metrics >>>>>>>>>>>>> refer to a primary entity, so I have restructured some of the >>>>>>>>>>>>> protos a >>>>>>>>>>>>> little bit. >>>>>>>>>>>>> >>>>>>>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>>>>>>> allowing custom user metrics, with custom aggregation functions >>>>>>>>>>>>> for its >>>>>>>>>>>>> metric updates. >>>>>>>>>>>>> Now that each metric has an aggregation_entity associated with >>>>>>>>>>>>> it (e.g. PCollection, PTransform), we can design an approach >>>>>>>>>>>>> which forwards >>>>>>>>>>>>> the opaque bytes metric updates, without deserializing them. >>>>>>>>>>>>> These are >>>>>>>>>>>>> forwarded to user provided code which then would deserialize the >>>>>>>>>>>>> metric >>>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>>> >>>>>>>>>>>>> I think it has also simplified some of the URN metric protos, >>>>>>>>>>>>> as they do not need to keep track of ptransform names inside >>>>>>>>>>>>> themselves >>>>>>>>>>>>> now. The result is simpler structures, for the metrics as the >>>>>>>>>>>>> entities are >>>>>>>>>>>>> pulled outside of the metric. >>>>>>>>>>>>> >>>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Please make sure that you provide your feedback before then >>>>>>>>>>>>>> and I will post the final decisions made to this thread Friday >>>>>>>>>>>>>> afternoon. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía < >>>>>>>>>>>>>> ieme...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Nice, I created a short link so people can refer to it >>>>>>>>>>>>>>> easily in >>>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>>> >> have made some revisions based on the feedback. There >>>>>>>>>>>>>>> were some larger >>>>>>>>>>>>>>> >> questions asking about alternatives. For each of these I >>>>>>>>>>>>>>> have added a >>>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>>>>>>> Please take >>>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>>>>>> another look after >>>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
axing >>>>>>>>>>> over ints >>>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>>> opaque to >>>>>>>>>>> the runner in the long run, that's a more speculative (and >>>>>>>>>>> difficult) >>>>>>>>>>> feature to tackle. This would allow the runner to at least >>>>>>>>>>> aggregate and >>>>>>>>>>> report/return to the SDK metrics that it did not itself understand >>>>>>>>>>> the >>>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>>> specialization >>>>>>>>>>> in the runner itself for metrics that it *did* understand as well.) >>>>>>>>>>> >>>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type >>>>>>>>>>> X one would have a single URN for UserMetric and it spec would >>>>>>>>>>> designate >>>>>>>>>>> the type and payload designate the (qualified) name. >>>>>>>>>>> >>>>>>>>>>> - Robert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>>> I have made a revision today which is to make all metrics refer >>>>>>>>>>>> to a primary entity, so I have restructured some of the protos a >>>>>>>>>>>> little bit. >>>>>>>>>>>> >>>>>>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>>>>>> allowing custom user metrics, with custom aggregation functions >>>>>>>>>>>> for its >>>>>>>>>>>> metric updates. >>>>>>>>>>>> Now that each metric has an aggregation_entity associated with >>>>>>>>>>>> it (e.g. PCollection, PTransform), we can design an approach which >>>>>>>>>>>> forwards >>>>>>>>>>>> the opaque bytes metric updates, without deserializing them. These >>>>>>>>>>>> are >>>>>>>>>>>> forwarded to user provided code which then would deserialize the >>>>>>>>>>>> metric >>>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>>> >>>>>>>>>>>> I think it has also simplified some of the URN metric protos, >>>>>>>>>>>> as they do not need to keep track of ptransform names inside >>>>>>>>>>>> themselves >>>>>>>>>>>> now. The result is simpler structures, for the metrics as the >>>>>>>>>>>> entities are >>>>>>>>>>>> pulled outside of the metric. >>>>>>>>>>>> >>>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>>> attention to this particular revision. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>>>>> >>>>>>>>>>>>> Please make sure that you provide your feedback before then >>>>>>>>>>>>> and I will post the final decisions made to this thread Friday >>>>>>>>>>>>> afternoon. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Nice, I created a short link so people can refer to it easily >>>>>>>>>>>>>> in >>>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>>> >>>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>>> >> have made some revisions based on the feedback. There were >>>>>>>>>>>>>> some larger >>>>>>>>>>>>>> >> questions asking about alternatives. For each of these I >>>>>>>>>>>>>> have added a >>>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>>>>>> Please take >>>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>>>>> another look after >>>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>>> >> Alex >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
inging that complexity back? Shouldn't >>>>>>>>> we just need the ability for the standard set plus any special system >>>>>>>>> metrivs? >>>>>>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks. I think this has simplified things. >>>>>>>>>> >>>>>>>>>> One thing that has occurred to me is that we're conflating the >>>>>>>>>> idea of custom metrics and custom metric types. I would propose >>>>>>>>>> the MetricSpec field be augmented with an additional field "type" >>>>>>>>>> which is >>>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its >>>>>>>>>> payload, as well as the form of aggregation). Summing or maxing over >>>>>>>>>> ints >>>>>>>>>> would be a typical example. Though we could pursue making this >>>>>>>>>> opaque to >>>>>>>>>> the runner in the long run, that's a more speculative (and difficult) >>>>>>>>>> feature to tackle. This would allow the runner to at least aggregate >>>>>>>>>> and >>>>>>>>>> report/return to the SDK metrics that it did not itself understand >>>>>>>>>> the >>>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>>> specialization >>>>>>>>>> in the runner itself for metrics that it *did* understand as well.) >>>>>>>>>> >>>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type >>>>>>>>>> X one would have a single URN for UserMetric and it spec would >>>>>>>>>> designate >>>>>>>>>> the type and payload designate the (qualified) name. >>>>>>>>>> >>>>>>>>>> - Robert >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>>> I have made a revision today which is to make all metrics refer >>>>>>>>>>> to a primary entity, so I have restructured some of the protos a >>>>>>>>>>> little bit. >>>>>>>>>>> >>>>>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>>>>> allowing custom user metrics, with custom aggregation functions for >>>>>>>>>>> its >>>>>>>>>>> metric updates. >>>>>>>>>>> Now that each metric has an aggregation_entity associated with >>>>>>>>>>> it (e.g. PCollection, PTransform), we can design an approach which >>>>>>>>>>> forwards >>>>>>>>>>> the opaque bytes metric updates, without deserializing them. These >>>>>>>>>>> are >>>>>>>>>>> forwarded to user provided code which then would deserialize the >>>>>>>>>>> metric >>>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>>> >>>>>>>>>>> I think it has also simplified some of the URN metric protos, as >>>>>>>>>>> they do not need to keep track of ptransform names inside >>>>>>>>>>> themselves now. >>>>>>>>>>> The result is simpler structures, for the metrics as the entities >>>>>>>>>>> are >>>>>>>>>>> pulled outside of the metric. >>>>>>>>>>> >>>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>>> attention to this particular revision. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>>>> >>>>>>>>>>>> Please make sure that you provide your feedback before then and >>>>>>>>>>>> I will post the final decisions made to this thread Friday >>>>>>>>>>>> afternoon. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Nice, I created a short link so people can refer to it easily >>>>>>>>>>>>> in >>>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>>> > >>>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato < >>>>>>>>>>>>> ajam...@google.com> wrote: >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>>> proposal so far. I >>>>>>>>>>>>> >> have made some revisions based on the feedback. There were >>>>>>>>>>>>> some larger >>>>>>>>>>>>> >> questions asking about alternatives. For each of these I >>>>>>>>>>>>> have added a >>>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>>> recommendation as well >>>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>>>>> Please take >>>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>>>> another look after >>>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>>> >> Alex >>>>>>>>>>>>> >> >>>>>>>>>>>>> > >>>>>>>>>>>>> >>>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
ed with an additional field "type" >>>>>>>>> which is >>>>>>>>> a urn specifying the type of metric it is (i.e. the contents of its >>>>>>>>> payload, as well as the form of aggregation). Summing or maxing over >>>>>>>>> ints >>>>>>>>> would be a typical example. Though we could pursue making this opaque >>>>>>>>> to >>>>>>>>> the runner in the long run, that's a more speculative (and difficult) >>>>>>>>> feature to tackle. This would allow the runner to at least aggregate >>>>>>>>> and >>>>>>>>> report/return to the SDK metrics that it did not itself understand the >>>>>>>>> semantic meaning of. (It would probably simplify much of the >>>>>>>>> specialization >>>>>>>>> in the runner itself for metrics that it *did* understand as well.) >>>>>>>>> >>>>>>>>> In addition, rather than having UserMetricOfTypeX for every type X >>>>>>>>> one would have a single URN for UserMetric and it spec would >>>>>>>>> designate the >>>>>>>>> type and payload designate the (qualified) name. >>>>>>>>> >>>>>>>>> - Robert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thank you everyone for your feedback so far. >>>>>>>>>> I have made a revision today which is to make all metrics refer >>>>>>>>>> to a primary entity, so I have restructured some of the protos a >>>>>>>>>> little bit. >>>>>>>>>> >>>>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>>>> allowing custom user metrics, with custom aggregation functions for >>>>>>>>>> its >>>>>>>>>> metric updates. >>>>>>>>>> Now that each metric has an aggregation_entity associated with it >>>>>>>>>> (e.g. PCollection, PTransform), we can design an approach which >>>>>>>>>> forwards >>>>>>>>>> the opaque bytes metric updates, without deserializing them. These >>>>>>>>>> are >>>>>>>>>> forwarded to user provided code which then would deserialize the >>>>>>>>>> metric >>>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>>> >>>>>>>>>> I think it has also simplified some of the URN metric protos, as >>>>>>>>>> they do not need to keep track of ptransform names inside themselves >>>>>>>>>> now. >>>>>>>>>> The result is simpler structures, for the metrics as the entities are >>>>>>>>>> pulled outside of the metric. >>>>>>>>>> >>>>>>>>>> I have mentioned this in the doc now, and wanted to draw >>>>>>>>>> attention to this particular revision. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>>> >>>>>>>>>>> Please make sure that you provide your feedback before then and >>>>>>>>>>> I will post the final decisions made to this thread Friday >>>>>>>>>>> afternoon. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Nice, I created a short link so people can refer to it easily in >>>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>>> >>>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>>> >>>>>>>>>>>> Thanks for sharing. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>>> > >>>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>>>>>>>>>>> wrote: >>>>>>>>>>>> >> >>>>>>>>>>>> >> Hello beam community, >>>>>>>>>>>> >> >>>>>>>>>>>> >> Thank you everyone for your initial feedback on this >>>>>>>>>>>> proposal so far. I >>>>>>>>>>>> >> have made some revisions based on the feedback. There were >>>>>>>>>>>> some larger >>>>>>>>>>>> >> questions asking about alternatives. For each of these I >>>>>>>>>>>> have added a >>>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>>> recommendation as well >>>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>>> >> >>>>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>>>> Please take >>>>>>>>>>>> >> another look and let me know >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>>> >> >>>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>>> another look after >>>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>>> >> >>>>>>>>>>>> >> Thanks again, >>>>>>>>>>>> >> Alex >>>>>>>>>>>> >> >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
it. >>>>>>>>> >>>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>>> allowing custom user metrics, with custom aggregation functions for >>>>>>>>> its >>>>>>>>> metric updates. >>>>>>>>> Now that each metric has an aggregation_entity associated with it >>>>>>>>> (e.g. PCollection, PTransform), we can design an approach which >>>>>>>>> forwards >>>>>>>>> the opaque bytes metric updates, without deserializing them. These are >>>>>>>>> forwarded to user provided code which then would deserialize the >>>>>>>>> metric >>>>>>>>> update payloads and perform the custom aggregations. >>>>>>>>> >>>>>>>>> I think it has also simplified some of the URN metric protos, as >>>>>>>>> they do not need to keep track of ptransform names inside themselves >>>>>>>>> now. >>>>>>>>> The result is simpler structures, for the metrics as the entities are >>>>>>>>> pulled outside of the metric. >>>>>>>>> >>>>>>>>> I have mentioned this in the doc now, and wanted to draw attention >>>>>>>>> to this particular revision. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> I've gathered a lot of feedback so far and want to make a >>>>>>>>>> decision by Friday, and begin working on related PRs next week. >>>>>>>>>> >>>>>>>>>> Please make sure that you provide your feedback before then and I >>>>>>>>>> will post the final decisions made to this thread Friday afternoon. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Nice, I created a short link so people can refer to it easily in >>>>>>>>>>> future discussions, website, etc. >>>>>>>>>>> >>>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>>> >>>>>>>>>>> Thanks for sharing. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>>> > >>>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>>>>>>>>>> wrote: >>>>>>>>>>> >> >>>>>>>>>>> >> Hello beam community, >>>>>>>>>>> >> >>>>>>>>>>> >> Thank you everyone for your initial feedback on this proposal >>>>>>>>>>> so far. I >>>>>>>>>>> >> have made some revisions based on the feedback. There were >>>>>>>>>>> some larger >>>>>>>>>>> >> questions asking about alternatives. For each of these I have >>>>>>>>>>> added a >>>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>>> recommendation as well >>>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>>> >> >>>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>>> Please take >>>>>>>>>>> >> another look and let me know >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>>> >> >>>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>>> another look after >>>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>>> >> >>>>>>>>>>> >> Thanks again, >>>>>>>>>>> >> Alex >>>>>>>>>>> >> >>>>>>>>>>> > >>>>>>>>>>> >>>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
>> bit. >>>>>>>> >>>>>>>> The point of this change was to futureproof the possibility of >>>>>>>> allowing custom user metrics, with custom aggregation functions for its >>>>>>>> metric updates. >>>>>>>> Now that each metric has an aggregation_entity associated with it >>>>>>>> (e.g. PCollection, PTransform), we can design an approach which >>>>>>>> forwards >>>>>>>> the opaque bytes metric updates, without deserializing them. These are >>>>>>>> forwarded to user provided code which then would deserialize the metric >>>>>>>> update payloads and perform the custom aggregations. >>>>>>>> >>>>>>>> I think it has also simplified some of the URN metric protos, as >>>>>>>> they do not need to keep track of ptransform names inside themselves >>>>>>>> now. >>>>>>>> The result is simpler structures, for the metrics as the entities are >>>>>>>> pulled outside of the metric. >>>>>>>> >>>>>>>> I have mentioned this in the doc now, and wanted to draw attention >>>>>>>> to this particular revision. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I've gathered a lot of feedback so far and want to make a decision >>>>>>>>> by Friday, and begin working on related PRs next week. >>>>>>>>> >>>>>>>>> Please make sure that you provide your feedback before then and I >>>>>>>>> will post the final decisions made to this thread Friday afternoon. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Nice, I created a short link so people can refer to it easily in >>>>>>>>>> future discussions, website, etc. >>>>>>>>>> >>>>>>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>>>>>> >>>>>>>>>> Thanks for sharing. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw < >>>>>>>>>> rober...@google.com> wrote: >>>>>>>>>> > Thanks for the nice writeup. I added some comments. >>>>>>>>>> > >>>>>>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>>>>>>>>> wrote: >>>>>>>>>> >> >>>>>>>>>> >> Hello beam community, >>>>>>>>>> >> >>>>>>>>>> >> Thank you everyone for your initial feedback on this proposal >>>>>>>>>> so far. I >>>>>>>>>> >> have made some revisions based on the feedback. There were >>>>>>>>>> some larger >>>>>>>>>> >> questions asking about alternatives. For each of these I have >>>>>>>>>> added a >>>>>>>>>> >> section tagged with [Alternatives] and discussed my >>>>>>>>>> recommendation as well >>>>>>>>>> >> as as few other choices we considered. >>>>>>>>>> >> >>>>>>>>>> >> I would appreciate more feedback on the revised proposal. >>>>>>>>>> Please take >>>>>>>>>> >> another look and let me know >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>>>>>> >> >>>>>>>>>> >> Etienne, I would appreciate it if you could please take >>>>>>>>>> another look after >>>>>>>>>> >> the revisions I have made as well. >>>>>>>>>> >> >>>>>>>>>> >> Thanks again, >>>>>>>>>> >> Alex >>>>>>>>>> >> >>>>>>>>>> > >>>>>>>>>> >>>>>>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Sounds perfect. Just wanted to make sure that "custom metrics of supported type" didn't include new ways of aggregating ints. As long as that means we have a fixed set of aggregations (that align with what what users want and metrics back end support) it seems like we are doing user metrics right. - Ben On Wed, Apr 11, 2018, 11:30 PM Romain Manni-Bucau wrote: > Maybe leave it out until proven it is needed. ATM counters are used a lot > but others are less mainstream so being too fine from the start can just > add complexity and bugs in impls IMHO. > > Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > >> By "type" of metric, I mean both the data types (including their >> encoding) and accumulator strategy. So sumint would be a type, as would >> double-distribution. >> >> On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers >> wrote: >> >>> When you say type do you mean accumulator type, result type, or >>> accumulator strategy? Specifically, what is the "type" of sumint, sumlong, >>> meanlong, etc? >>> >>> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw >>> wrote: >>> >>>> Fully custom metric types is the "more speculative and difficult" >>>> feature that I was proposing we kick down the road (and may never get to). >>>> What I'm suggesting is that we support custom metrics of standard type. >>>> >>>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers >>>> wrote: >>>> >>>>> The metric api is designed to prevent user defined metric types based >>>>> on the fact they just weren't used enough to justify support. >>>>> >>>>> Is there a reason we are bringing that complexity back? Shouldn't we >>>>> just need the ability for the standard set plus any special system >>>>> metrivs? >>>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >>>>> wrote: >>>>> >>>>>> Thanks. I think this has simplified things. >>>>>> >>>>>> One thing that has occurred to me is that we're conflating the idea >>>>>> of custom metrics and custom metric types. I would propose the MetricSpec >>>>>> field be augmented with an additional field "type" which is a urn >>>>>> specifying the type of metric it is (i.e. the contents of its payload, as >>>>>> well as the form of aggregation). Summing or maxing over ints would be a >>>>>> typical example. Though we could pursue making this opaque to the runner >>>>>> in >>>>>> the long run, that's a more speculative (and difficult) feature to >>>>>> tackle. >>>>>> This would allow the runner to at least aggregate and report/return to >>>>>> the >>>>>> SDK metrics that it did not itself understand the semantic meaning of. >>>>>> (It >>>>>> would probably simplify much of the specialization in the runner itself >>>>>> for >>>>>> metrics that it *did* understand as well.) >>>>>> >>>>>> In addition, rather than having UserMetricOfTypeX for every type X >>>>>> one would have a single URN for UserMetric and it spec would designate >>>>>> the >>>>>> type and payload designate the (qualified) name. >>>>>> >>>>>> - Robert >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato >>>>>> wrote: >>>>>> >>>>>>> Thank you everyone for your feedback so far. >>>>>>> I have made a revision today which is to make all metrics refer to a >>>>>>> primary entity, so I have restructured some of the protos a little bit. >>>>>>> >>>>>>> The point of this change was to futureproof the possibility of >>>>>>> allowing custom user metrics, with custom aggregation functions for its >>>>>>> metric updates. >>>>>>> Now that each metric has an aggregation_entity associated with it >>>>>>> (e.g. PCollection, PTransform), we can design an approach which forwards >>>>>>> the opaque bytes metric updates, without deserializing them. These are >>>>>>> forwarded to user provided code which then would deserialize the metric &
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Maybe leave it out until proven it is needed. ATM counters are used a lot but others are less mainstream so being too fine from the start can just add complexity and bugs in impls IMHO. Le 12 avr. 2018 08:06, "Robert Bradshaw" a écrit : > By "type" of metric, I mean both the data types (including their encoding) > and accumulator strategy. So sumint would be a type, as would > double-distribution. > > On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers > wrote: > >> When you say type do you mean accumulator type, result type, or >> accumulator strategy? Specifically, what is the "type" of sumint, sumlong, >> meanlong, etc? >> >> On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw >> wrote: >> >>> Fully custom metric types is the "more speculative and difficult" >>> feature that I was proposing we kick down the road (and may never get to). >>> What I'm suggesting is that we support custom metrics of standard type. >>> >>> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers >>> wrote: >>> >>>> The metric api is designed to prevent user defined metric types based >>>> on the fact they just weren't used enough to justify support. >>>> >>>> Is there a reason we are bringing that complexity back? Shouldn't we >>>> just need the ability for the standard set plus any special system metrivs? >>>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >>>> wrote: >>>> >>>>> Thanks. I think this has simplified things. >>>>> >>>>> One thing that has occurred to me is that we're conflating the idea of >>>>> custom metrics and custom metric types. I would propose the MetricSpec >>>>> field be augmented with an additional field "type" which is a urn >>>>> specifying the type of metric it is (i.e. the contents of its payload, as >>>>> well as the form of aggregation). Summing or maxing over ints would be a >>>>> typical example. Though we could pursue making this opaque to the runner >>>>> in >>>>> the long run, that's a more speculative (and difficult) feature to tackle. >>>>> This would allow the runner to at least aggregate and report/return to the >>>>> SDK metrics that it did not itself understand the semantic meaning of. (It >>>>> would probably simplify much of the specialization in the runner itself >>>>> for >>>>> metrics that it *did* understand as well.) >>>>> >>>>> In addition, rather than having UserMetricOfTypeX for every type X one >>>>> would have a single URN for UserMetric and it spec would designate the >>>>> type >>>>> and payload designate the (qualified) name. >>>>> >>>>> - Robert >>>>> >>>>> >>>>> >>>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >>>>> >>>>>> Thank you everyone for your feedback so far. >>>>>> I have made a revision today which is to make all metrics refer to a >>>>>> primary entity, so I have restructured some of the protos a little bit. >>>>>> >>>>>> The point of this change was to futureproof the possibility of >>>>>> allowing custom user metrics, with custom aggregation functions for its >>>>>> metric updates. >>>>>> Now that each metric has an aggregation_entity associated with it >>>>>> (e.g. PCollection, PTransform), we can design an approach which forwards >>>>>> the opaque bytes metric updates, without deserializing them. These are >>>>>> forwarded to user provided code which then would deserialize the metric >>>>>> update payloads and perform the custom aggregations. >>>>>> >>>>>> I think it has also simplified some of the URN metric protos, as they >>>>>> do not need to keep track of ptransform names inside themselves now. The >>>>>> result is simpler structures, for the metrics as the entities are pulled >>>>>> outside of the metric. >>>>>> >>>>>> I have mentioned this in the doc now, and wanted to draw attention to >>>>>> this particular revision. >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato >>>>>> wrote: >>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
By "type" of metric, I mean both the data types (including their encoding) and accumulator strategy. So sumint would be a type, as would double-distribution. On Wed, Apr 11, 2018 at 10:39 PM Ben Chambers wrote: > When you say type do you mean accumulator type, result type, or > accumulator strategy? Specifically, what is the "type" of sumint, sumlong, > meanlong, etc? > > On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > >> Fully custom metric types is the "more speculative and difficult" feature >> that I was proposing we kick down the road (and may never get to). What I'm >> suggesting is that we support custom metrics of standard type. >> >> On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers >> wrote: >> >>> The metric api is designed to prevent user defined metric types based on >>> the fact they just weren't used enough to justify support. >>> >>> Is there a reason we are bringing that complexity back? Shouldn't we >>> just need the ability for the standard set plus any special system metrivs? >>> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >>> wrote: >>> >>>> Thanks. I think this has simplified things. >>>> >>>> One thing that has occurred to me is that we're conflating the idea of >>>> custom metrics and custom metric types. I would propose the MetricSpec >>>> field be augmented with an additional field "type" which is a urn >>>> specifying the type of metric it is (i.e. the contents of its payload, as >>>> well as the form of aggregation). Summing or maxing over ints would be a >>>> typical example. Though we could pursue making this opaque to the runner in >>>> the long run, that's a more speculative (and difficult) feature to tackle. >>>> This would allow the runner to at least aggregate and report/return to the >>>> SDK metrics that it did not itself understand the semantic meaning of. (It >>>> would probably simplify much of the specialization in the runner itself for >>>> metrics that it *did* understand as well.) >>>> >>>> In addition, rather than having UserMetricOfTypeX for every type X one >>>> would have a single URN for UserMetric and it spec would designate the type >>>> and payload designate the (qualified) name. >>>> >>>> - Robert >>>> >>>> >>>> >>>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >>>> >>>>> Thank you everyone for your feedback so far. >>>>> I have made a revision today which is to make all metrics refer to a >>>>> primary entity, so I have restructured some of the protos a little bit. >>>>> >>>>> The point of this change was to futureproof the possibility of >>>>> allowing custom user metrics, with custom aggregation functions for its >>>>> metric updates. >>>>> Now that each metric has an aggregation_entity associated with it >>>>> (e.g. PCollection, PTransform), we can design an approach which forwards >>>>> the opaque bytes metric updates, without deserializing them. These are >>>>> forwarded to user provided code which then would deserialize the metric >>>>> update payloads and perform the custom aggregations. >>>>> >>>>> I think it has also simplified some of the URN metric protos, as they >>>>> do not need to keep track of ptransform names inside themselves now. The >>>>> result is simpler structures, for the metrics as the entities are pulled >>>>> outside of the metric. >>>>> >>>>> I have mentioned this in the doc now, and wanted to draw attention to >>>>> this particular revision. >>>>> >>>>> >>>>> >>>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >>>>> >>>>>> I've gathered a lot of feedback so far and want to make a decision by >>>>>> Friday, and begin working on related PRs next week. >>>>>> >>>>>> Please make sure that you provide your feedback before then and I >>>>>> will post the final decisions made to this thread Friday afternoon. >>>>>> >>>>>> >>>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía >>>>>> wrote: >>>>>> >>>>>>> Nice, I created a short link so people can refer to it eas
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
When you say type do you mean accumulator type, result type, or accumulator strategy? Specifically, what is the "type" of sumint, sumlong, meanlong, etc? On Wed, Apr 11, 2018, 9:38 PM Robert Bradshaw wrote: > Fully custom metric types is the "more speculative and difficult" feature > that I was proposing we kick down the road (and may never get to). What I'm > suggesting is that we support custom metrics of standard type. > > On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > >> The metric api is designed to prevent user defined metric types based on >> the fact they just weren't used enough to justify support. >> >> Is there a reason we are bringing that complexity back? Shouldn't we just >> need the ability for the standard set plus any special system metrivs? >> On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw >> wrote: >> >>> Thanks. I think this has simplified things. >>> >>> One thing that has occurred to me is that we're conflating the idea of >>> custom metrics and custom metric types. I would propose the MetricSpec >>> field be augmented with an additional field "type" which is a urn >>> specifying the type of metric it is (i.e. the contents of its payload, as >>> well as the form of aggregation). Summing or maxing over ints would be a >>> typical example. Though we could pursue making this opaque to the runner in >>> the long run, that's a more speculative (and difficult) feature to tackle. >>> This would allow the runner to at least aggregate and report/return to the >>> SDK metrics that it did not itself understand the semantic meaning of. (It >>> would probably simplify much of the specialization in the runner itself for >>> metrics that it *did* understand as well.) >>> >>> In addition, rather than having UserMetricOfTypeX for every type X one >>> would have a single URN for UserMetric and it spec would designate the type >>> and payload designate the (qualified) name. >>> >>> - Robert >>> >>> >>> >>> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >>> >>>> Thank you everyone for your feedback so far. >>>> I have made a revision today which is to make all metrics refer to a >>>> primary entity, so I have restructured some of the protos a little bit. >>>> >>>> The point of this change was to futureproof the possibility of allowing >>>> custom user metrics, with custom aggregation functions for its metric >>>> updates. >>>> Now that each metric has an aggregation_entity associated with it (e.g. >>>> PCollection, PTransform), we can design an approach which forwards the >>>> opaque bytes metric updates, without deserializing them. These are >>>> forwarded to user provided code which then would deserialize the metric >>>> update payloads and perform the custom aggregations. >>>> >>>> I think it has also simplified some of the URN metric protos, as they >>>> do not need to keep track of ptransform names inside themselves now. The >>>> result is simpler structures, for the metrics as the entities are pulled >>>> outside of the metric. >>>> >>>> I have mentioned this in the doc now, and wanted to draw attention to >>>> this particular revision. >>>> >>>> >>>> >>>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >>>> >>>>> I've gathered a lot of feedback so far and want to make a decision by >>>>> Friday, and begin working on related PRs next week. >>>>> >>>>> Please make sure that you provide your feedback before then and I will >>>>> post the final decisions made to this thread Friday afternoon. >>>>> >>>>> >>>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: >>>>> >>>>>> Nice, I created a short link so people can refer to it easily in >>>>>> future discussions, website, etc. >>>>>> >>>>>> https://s.apache.org/beam-fn-api-metrics >>>>>> >>>>>> Thanks for sharing. >>>>>> >>>>>> >>>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >>>>>> wrote: >>>>>> > Thanks for the nice writeup. I added some comments. >>>>>> > >>>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>>>>> wrote: >>>>>> >> >>>>>> >> Hello beam community, >>>>>> >> >>>>>> >> Thank you everyone for your initial feedback on this proposal so >>>>>> far. I >>>>>> >> have made some revisions based on the feedback. There were some >>>>>> larger >>>>>> >> questions asking about alternatives. For each of these I have >>>>>> added a >>>>>> >> section tagged with [Alternatives] and discussed my recommendation >>>>>> as well >>>>>> >> as as few other choices we considered. >>>>>> >> >>>>>> >> I would appreciate more feedback on the revised proposal. Please >>>>>> take >>>>>> >> another look and let me know >>>>>> >> >>>>>> >> >>>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>>> >> >>>>>> >> Etienne, I would appreciate it if you could please take another >>>>>> look after >>>>>> >> the revisions I have made as well. >>>>>> >> >>>>>> >> Thanks again, >>>>>> >> Alex >>>>>> >> >>>>>> > >>>>>> >>>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Fully custom metric types is the "more speculative and difficult" feature that I was proposing we kick down the road (and may never get to). What I'm suggesting is that we support custom metrics of standard type. On Wed, Apr 11, 2018 at 5:52 PM Ben Chambers wrote: > The metric api is designed to prevent user defined metric types based on > the fact they just weren't used enough to justify support. > > Is there a reason we are bringing that complexity back? Shouldn't we just > need the ability for the standard set plus any special system metrivs? > On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw wrote: > >> Thanks. I think this has simplified things. >> >> One thing that has occurred to me is that we're conflating the idea of >> custom metrics and custom metric types. I would propose the MetricSpec >> field be augmented with an additional field "type" which is a urn >> specifying the type of metric it is (i.e. the contents of its payload, as >> well as the form of aggregation). Summing or maxing over ints would be a >> typical example. Though we could pursue making this opaque to the runner in >> the long run, that's a more speculative (and difficult) feature to tackle. >> This would allow the runner to at least aggregate and report/return to the >> SDK metrics that it did not itself understand the semantic meaning of. (It >> would probably simplify much of the specialization in the runner itself for >> metrics that it *did* understand as well.) >> >> In addition, rather than having UserMetricOfTypeX for every type X one >> would have a single URN for UserMetric and it spec would designate the type >> and payload designate the (qualified) name. >> >> - Robert >> >> >> >> On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: >> >>> Thank you everyone for your feedback so far. >>> I have made a revision today which is to make all metrics refer to a >>> primary entity, so I have restructured some of the protos a little bit. >>> >>> The point of this change was to futureproof the possibility of allowing >>> custom user metrics, with custom aggregation functions for its metric >>> updates. >>> Now that each metric has an aggregation_entity associated with it (e.g. >>> PCollection, PTransform), we can design an approach which forwards the >>> opaque bytes metric updates, without deserializing them. These are >>> forwarded to user provided code which then would deserialize the metric >>> update payloads and perform the custom aggregations. >>> >>> I think it has also simplified some of the URN metric protos, as they do >>> not need to keep track of ptransform names inside themselves now. The >>> result is simpler structures, for the metrics as the entities are pulled >>> outside of the metric. >>> >>> I have mentioned this in the doc now, and wanted to draw attention to >>> this particular revision. >>> >>> >>> >>> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >>> >>>> I've gathered a lot of feedback so far and want to make a decision by >>>> Friday, and begin working on related PRs next week. >>>> >>>> Please make sure that you provide your feedback before then and I will >>>> post the final decisions made to this thread Friday afternoon. >>>> >>>> >>>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: >>>> >>>>> Nice, I created a short link so people can refer to it easily in >>>>> future discussions, website, etc. >>>>> >>>>> https://s.apache.org/beam-fn-api-metrics >>>>> >>>>> Thanks for sharing. >>>>> >>>>> >>>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >>>>> wrote: >>>>> > Thanks for the nice writeup. I added some comments. >>>>> > >>>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato >>>>> wrote: >>>>> >> >>>>> >> Hello beam community, >>>>> >> >>>>> >> Thank you everyone for your initial feedback on this proposal so >>>>> far. I >>>>> >> have made some revisions based on the feedback. There were some >>>>> larger >>>>> >> questions asking about alternatives. For each of these I have added >>>>> a >>>>> >> section tagged with [Alternatives] and discussed my recommendation >>>>> as well >>>>> >> as as few other choices we considered. >>>>> >> >>>>> >> I would appreciate more feedback on the revised proposal. Please >>>>> take >>>>> >> another look and let me know >>>>> >> >>>>> >> >>>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>>> >> >>>>> >> Etienne, I would appreciate it if you could please take another >>>>> look after >>>>> >> the revisions I have made as well. >>>>> >> >>>>> >> Thanks again, >>>>> >> Alex >>>>> >> >>>>> > >>>>> >>>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
The metric api is designed to prevent user defined metric types based on the fact they just weren't used enough to justify support. Is there a reason we are bringing that complexity back? Shouldn't we just need the ability for the standard set plus any special system metrivs? On Wed, Apr 11, 2018, 5:43 PM Robert Bradshaw wrote: > Thanks. I think this has simplified things. > > One thing that has occurred to me is that we're conflating the idea of > custom metrics and custom metric types. I would propose the MetricSpec > field be augmented with an additional field "type" which is a urn > specifying the type of metric it is (i.e. the contents of its payload, as > well as the form of aggregation). Summing or maxing over ints would be a > typical example. Though we could pursue making this opaque to the runner in > the long run, that's a more speculative (and difficult) feature to tackle. > This would allow the runner to at least aggregate and report/return to the > SDK metrics that it did not itself understand the semantic meaning of. (It > would probably simplify much of the specialization in the runner itself for > metrics that it *did* understand as well.) > > In addition, rather than having UserMetricOfTypeX for every type X one > would have a single URN for UserMetric and it spec would designate the type > and payload designate the (qualified) name. > > - Robert > > > > On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: > >> Thank you everyone for your feedback so far. >> I have made a revision today which is to make all metrics refer to a >> primary entity, so I have restructured some of the protos a little bit. >> >> The point of this change was to futureproof the possibility of allowing >> custom user metrics, with custom aggregation functions for its metric >> updates. >> Now that each metric has an aggregation_entity associated with it (e.g. >> PCollection, PTransform), we can design an approach which forwards the >> opaque bytes metric updates, without deserializing them. These are >> forwarded to user provided code which then would deserialize the metric >> update payloads and perform the custom aggregations. >> >> I think it has also simplified some of the URN metric protos, as they do >> not need to keep track of ptransform names inside themselves now. The >> result is simpler structures, for the metrics as the entities are pulled >> outside of the metric. >> >> I have mentioned this in the doc now, and wanted to draw attention to >> this particular revision. >> >> >> >> On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: >> >>> I've gathered a lot of feedback so far and want to make a decision by >>> Friday, and begin working on related PRs next week. >>> >>> Please make sure that you provide your feedback before then and I will >>> post the final decisions made to this thread Friday afternoon. >>> >>> >>> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: >>> >>>> Nice, I created a short link so people can refer to it easily in >>>> future discussions, website, etc. >>>> >>>> https://s.apache.org/beam-fn-api-metrics >>>> >>>> Thanks for sharing. >>>> >>>> >>>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >>>> wrote: >>>> > Thanks for the nice writeup. I added some comments. >>>> > >>>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >>>> >> >>>> >> Hello beam community, >>>> >> >>>> >> Thank you everyone for your initial feedback on this proposal so >>>> far. I >>>> >> have made some revisions based on the feedback. There were some >>>> larger >>>> >> questions asking about alternatives. For each of these I have added a >>>> >> section tagged with [Alternatives] and discussed my recommendation >>>> as well >>>> >> as as few other choices we considered. >>>> >> >>>> >> I would appreciate more feedback on the revised proposal. Please take >>>> >> another look and let me know >>>> >> >>>> >> >>>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>>> >> >>>> >> Etienne, I would appreciate it if you could please take another look >>>> after >>>> >> the revisions I have made as well. >>>> >> >>>> >> Thanks again, >>>> >> Alex >>>> >> >>>> > >>>> >>>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thanks. I think this has simplified things. One thing that has occurred to me is that we're conflating the idea of custom metrics and custom metric types. I would propose the MetricSpec field be augmented with an additional field "type" which is a urn specifying the type of metric it is (i.e. the contents of its payload, as well as the form of aggregation). Summing or maxing over ints would be a typical example. Though we could pursue making this opaque to the runner in the long run, that's a more speculative (and difficult) feature to tackle. This would allow the runner to at least aggregate and report/return to the SDK metrics that it did not itself understand the semantic meaning of. (It would probably simplify much of the specialization in the runner itself for metrics that it *did* understand as well.) In addition, rather than having UserMetricOfTypeX for every type X one would have a single URN for UserMetric and it spec would designate the type and payload designate the (qualified) name. - Robert On Wed, Apr 11, 2018 at 5:12 PM Alex Amato wrote: > Thank you everyone for your feedback so far. > I have made a revision today which is to make all metrics refer to a > primary entity, so I have restructured some of the protos a little bit. > > The point of this change was to futureproof the possibility of allowing > custom user metrics, with custom aggregation functions for its metric > updates. > Now that each metric has an aggregation_entity associated with it (e.g. > PCollection, PTransform), we can design an approach which forwards the > opaque bytes metric updates, without deserializing them. These are > forwarded to user provided code which then would deserialize the metric > update payloads and perform the custom aggregations. > > I think it has also simplified some of the URN metric protos, as they do > not need to keep track of ptransform names inside themselves now. The > result is simpler structures, for the metrics as the entities are pulled > outside of the metric. > > I have mentioned this in the doc now, and wanted to draw attention to this > particular revision. > > > > On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: > >> I've gathered a lot of feedback so far and want to make a decision by >> Friday, and begin working on related PRs next week. >> >> Please make sure that you provide your feedback before then and I will >> post the final decisions made to this thread Friday afternoon. >> >> >> On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: >> >>> Nice, I created a short link so people can refer to it easily in >>> future discussions, website, etc. >>> >>> https://s.apache.org/beam-fn-api-metrics >>> >>> Thanks for sharing. >>> >>> >>> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >>> wrote: >>> > Thanks for the nice writeup. I added some comments. >>> > >>> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >>> >> >>> >> Hello beam community, >>> >> >>> >> Thank you everyone for your initial feedback on this proposal so far. >>> I >>> >> have made some revisions based on the feedback. There were some larger >>> >> questions asking about alternatives. For each of these I have added a >>> >> section tagged with [Alternatives] and discussed my recommendation as >>> well >>> >> as as few other choices we considered. >>> >> >>> >> I would appreciate more feedback on the revised proposal. Please take >>> >> another look and let me know >>> >> >>> >> >>> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >>> >> >>> >> Etienne, I would appreciate it if you could please take another look >>> after >>> >> the revisions I have made as well. >>> >> >>> >> Thanks again, >>> >> Alex >>> >> >>> > >>> >>
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thank you everyone for your feedback so far. I have made a revision today which is to make all metrics refer to a primary entity, so I have restructured some of the protos a little bit. The point of this change was to futureproof the possibility of allowing custom user metrics, with custom aggregation functions for its metric updates. Now that each metric has an aggregation_entity associated with it (e.g. PCollection, PTransform), we can design an approach which forwards the opaque bytes metric updates, without deserializing them. These are forwarded to user provided code which then would deserialize the metric update payloads and perform the custom aggregations. I think it has also simplified some of the URN metric protos, as they do not need to keep track of ptransform names inside themselves now. The result is simpler structures, for the metrics as the entities are pulled outside of the metric. I have mentioned this in the doc now, and wanted to draw attention to this particular revision. On Tue, Apr 10, 2018 at 9:53 AM Alex Amato wrote: > I've gathered a lot of feedback so far and want to make a decision by > Friday, and begin working on related PRs next week. > > Please make sure that you provide your feedback before then and I will > post the final decisions made to this thread Friday afternoon. > > > On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > >> Nice, I created a short link so people can refer to it easily in >> future discussions, website, etc. >> >> https://s.apache.org/beam-fn-api-metrics >> >> Thanks for sharing. >> >> >> On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw >> wrote: >> > Thanks for the nice writeup. I added some comments. >> > >> > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> >> >> Hello beam community, >> >> >> >> Thank you everyone for your initial feedback on this proposal so far. I >> >> have made some revisions based on the feedback. There were some larger >> >> questions asking about alternatives. For each of these I have added a >> >> section tagged with [Alternatives] and discussed my recommendation as >> well >> >> as as few other choices we considered. >> >> >> >> I would appreciate more feedback on the revised proposal. Please take >> >> another look and let me know >> >> >> >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> >> >> Etienne, I would appreciate it if you could please take another look >> after >> >> the revisions I have made as well. >> >> >> >> Thanks again, >> >> Alex >> >> >> > >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
I've gathered a lot of feedback so far and want to make a decision by Friday, and begin working on related PRs next week. Please make sure that you provide your feedback before then and I will post the final decisions made to this thread Friday afternoon. On Thu, Apr 5, 2018 at 1:38 AM Ismaël Mejía wrote: > Nice, I created a short link so people can refer to it easily in > future discussions, website, etc. > > https://s.apache.org/beam-fn-api-metrics > > Thanks for sharing. > > > On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw > wrote: > > Thanks for the nice writeup. I added some comments. > > > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: > >> > >> Hello beam community, > >> > >> Thank you everyone for your initial feedback on this proposal so far. I > >> have made some revisions based on the feedback. There were some larger > >> questions asking about alternatives. For each of these I have added a > >> section tagged with [Alternatives] and discussed my recommendation as > well > >> as as few other choices we considered. > >> > >> I would appreciate more feedback on the revised proposal. Please take > >> another look and let me know > >> > >> > https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit > >> > >> Etienne, I would appreciate it if you could please take another look > after > >> the revisions I have made as well. > >> > >> Thanks again, > >> Alex > >> > > >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Nice, I created a short link so people can refer to it easily in future discussions, website, etc. https://s.apache.org/beam-fn-api-metrics Thanks for sharing. On Wed, Apr 4, 2018 at 11:28 PM, Robert Bradshaw wrote: > Thanks for the nice writeup. I added some comments. > > On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: >> >> Hello beam community, >> >> Thank you everyone for your initial feedback on this proposal so far. I >> have made some revisions based on the feedback. There were some larger >> questions asking about alternatives. For each of these I have added a >> section tagged with [Alternatives] and discussed my recommendation as well >> as as few other choices we considered. >> >> I would appreciate more feedback on the revised proposal. Please take >> another look and let me know >> >> https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit >> >> Etienne, I would appreciate it if you could please take another look after >> the revisions I have made as well. >> >> Thanks again, >> Alex >> >
Re: Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Thanks for the nice writeup. I added some comments. On Wed, Apr 4, 2018 at 1:53 PM Alex Amato wrote: > Hello beam community, > > Thank you everyone for your initial feedback on this proposal so far. I > have made some revisions based on the feedback. There were some larger > questions asking about alternatives. For each of these I have added a > section tagged with [Alternatives] and discussed my recommendation as well > as as few other choices we considered. > > I would appreciate more feedback on the revised proposal. Please take > another look and let me know > > https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit > > Etienne, I would appreciate it if you could please take another look after > the revisions I have made as well. > > Thanks again, > Alex > >
Updated [Proposal] Apache Beam Fn API : Defining and adding SDK Metrics
Hello beam community, Thank you everyone for your initial feedback on this proposal so far. I have made some revisions based on the feedback. There were some larger questions asking about alternatives. For each of these I have added a section tagged with [Alternatives] and discussed my recommendation as well as as few other choices we considered. I would appreciate more feedback on the revised proposal. Please take another look and let me know https://docs.google.com/document/d/1MtBZYV7NAcfbwyy9Op8STeFNBxtljxgy69FkHMvhTMA/edit Etienne, I would appreciate it if you could please take another look after the revisions I have made as well. Thanks again, Alex
Re: Beam Fn API
Thank! Looks good. I've added some comments to the doc. On Wed, May 31, 2017 at 7:00 AM, Etienne Chauchot wrote: > Thanks for all these docs! They are exactly what was needed for new > contributors as discussed in this thread > > https://lists.apache.org/thread.html/ac93d29424e19d57097373b > 78f3f5bcbc701e4b51385a52a6e27b7ed@%3Cdev.beam.apache.org%3E > > Etienne > > > Le 31/05/2017 à 11:12, Aljoscha Krettek a écrit : > >> Thanks for banging these out Lukasz. I’ll try and read them all this week. >> >> We’re also planning to add support for the Fn API to the Flink Runner so >> that we can execute Python programs. I’m sure we’ll get some valuable >> feedback for you while doing that. >> >> On 26. May 2017, at 22:49, Lukasz Cwik wrote: >>> >>> I would like to share another document about the Fn API. This document >>> specifically discusses how to access side inputs, access remote >>> references >>> (e.g. large iterables for hot keys produced by a GBK), and support user >>> state. >>> https://s.apache.org/beam-fn-state-api-and-bundle-processing >>> >>> The document does require a strong foundation in the Apache Beam model >>> and >>> a good understanding of the prior shared docs: >>> * How to process a bundle: https://s.apache.org/beam-fn-api >>> -processing-a-bundle >>> * How to send and receive data: https://s.apache.org/beam-fn-api >>> -send-and-receive-data >>> >>> I could really use the help of runner contributors to review the caching >>> semantics within the SDK harness and whether they would work well for the >>> runner they contribute to the most. >>> >>> On Sun, May 21, 2017 at 6:40 PM, Lukasz Cwik wrote: >>> >>> Manu, the goal is to share here initially, update the docs addressing >>>> people's comments, and then publish them on the website once they are >>>> stable enough. >>>> >>>> On Sun, May 21, 2017 at 5:54 PM, Manu Zhang >>>> wrote: >>>> >>>> Thanks Lukasz. The following two links were somehow incorrectly >>>>> formatted >>>>> in your mail. >>>>> >>>>> * How to process a bundle: >>>>> https://s.apache.org/beam-fn-api-processing-a-bundle >>>>> * How to send and receive data: >>>>> https://s.apache.org/beam-fn-api-send-and-receive-data >>>>> >>>>> By the way, is there a way to find them from the Beam website ? >>>>> >>>>> >>>>> On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik >>>>> wrote: >>>>> >>>>> Now that I'm back from vacation and the 2.0.0 release is not taking all >>>>>> >>>>> my >>>>> >>>>>> time, I am focusing my attention on working on the Beam Portability >>>>>> framework, specifically the Fn API so that we can get Python and other >>>>>> language integrations work with any runner. >>>>>> >>>>>> For new comers, I would like to reshare the overview: >>>>>> https://s.apache.org/beam-fn-api >>>>>> >>>>>> And for those of you who have been following this thread and >>>>>> >>>>> contributors >>>>> >>>>>> focusing on Runner integration with Apache Beam: >>>>>> * How to process a bundle: https://s.apache.org/beam-fn-a >>>>>> >>>>> pi-processing-a- >>>>> >>>>>> bundle >>>>>> * How to send and receive data: https://s.apache.org/ >>>>>> beam-fn-api-send-and-receive-data >>>>>> >>>>>> If you want to dive deeper, you should look at: >>>>>> * Runner API Protobuf: https://github.com/apache/beam >>>>>> /blob/master/sdks/ >>>>>> common/runner-api/src/main/proto/beam_runner_api.proto >>>>>> <https://github.com/apache/beam/blob/master/sdks/common/runn >>>>>> >>>>> er-api/src/main/proto/beam_runner_api.proto> >>>>> >>>>>> * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ >>>>>> common/fn-api/src/main/proto/beam_fn_api.proto >>>>>> <https://github.com/apache/beam/blob/master/sdks/common/fn- >>>>>> >>>>> api/src/main/proto/beam_fn_api.proto> >>>>> >
Re: Beam Fn API
Thanks for all these docs! They are exactly what was needed for new contributors as discussed in this thread https://lists.apache.org/thread.html/ac93d29424e19d57097373b78f3f5bcbc701e4b51385a52a6e27b7ed@%3Cdev.beam.apache.org%3E Etienne Le 31/05/2017 à 11:12, Aljoscha Krettek a écrit : Thanks for banging these out Lukasz. I’ll try and read them all this week. We’re also planning to add support for the Fn API to the Flink Runner so that we can execute Python programs. I’m sure we’ll get some valuable feedback for you while doing that. On 26. May 2017, at 22:49, Lukasz Cwik wrote: I would like to share another document about the Fn API. This document specifically discusses how to access side inputs, access remote references (e.g. large iterables for hot keys produced by a GBK), and support user state. https://s.apache.org/beam-fn-state-api-and-bundle-processing The document does require a strong foundation in the Apache Beam model and a good understanding of the prior shared docs: * How to process a bundle: https://s.apache.org/beam-fn-api -processing-a-bundle * How to send and receive data: https://s.apache.org/beam-fn-api -send-and-receive-data I could really use the help of runner contributors to review the caching semantics within the SDK harness and whether they would work well for the runner they contribute to the most. On Sun, May 21, 2017 at 6:40 PM, Lukasz Cwik wrote: Manu, the goal is to share here initially, update the docs addressing people's comments, and then publish them on the website once they are stable enough. On Sun, May 21, 2017 at 5:54 PM, Manu Zhang wrote: Thanks Lukasz. The following two links were somehow incorrectly formatted in your mail. * How to process a bundle: https://s.apache.org/beam-fn-api-processing-a-bundle * How to send and receive data: https://s.apache.org/beam-fn-api-send-and-receive-data By the way, is there a way to find them from the Beam website ? On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik wrote: Now that I'm back from vacation and the 2.0.0 release is not taking all my time, I am focusing my attention on working on the Beam Portability framework, specifically the Fn API so that we can get Python and other language integrations work with any runner. For new comers, I would like to reshare the overview: https://s.apache.org/beam-fn-api And for those of you who have been following this thread and contributors focusing on Runner integration with Apache Beam: * How to process a bundle: https://s.apache.org/beam-fn-a pi-processing-a- bundle * How to send and receive data: https://s.apache.org/ beam-fn-api-send-and-receive-data If you want to dive deeper, you should look at: * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ common/runner-api/src/main/proto/beam_runner_api.proto <https://github.com/apache/beam/blob/master/sdks/common/runn er-api/src/main/proto/beam_runner_api.proto> * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ common/fn-api/src/main/proto/beam_fn_api.proto <https://github.com/apache/beam/blob/master/sdks/common/fn- api/src/main/proto/beam_fn_api.proto> * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ java/harness <https://github.com/apache/beam/tree/master/sdks/java/harness> * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ python/apache_beam/runners/worker <https://github.com/apache/beam/tree/master/sdks/python/apac he_beam/runners/worker> Next I'm planning on talking about Beam Fn State API and will need help from Runner contributors to talk about caching semantics and key spaces and whether the integrations mesh well with current Runner implementations. The State API is meant to support user state, side inputs, and re-iteration for large values produced by GroupByKey. On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: Yes, I was using a Pipeline that was: Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a batch pipeline in the global window using the default trigger) In Google Cloud Dataflow, the shuffle step uses the binary representation to compare keys, so the above pipeline would normally be converted to the following two stages: Read -> GBK Writer GBK Reader -> IdentityParDo Note that the GBK Writer and GBK Reader need to use a coder to encode and decode the value. When using the Fn API, those two stages expanded because of the Fn Api crossings using a gRPC Write/Read pair: Read -> gRPC Write -> gRPC Read -> GBK Writer GBK Reader -> gRPC Write -> gRPC Read -> IdentityParDo In my naive prototype implementation, the coder was used to encode elements at the gRPC steps. This meant that the coder was encoding/decoding/encoding in the first stage and decoding/encoding/decoding in the second stage. This tripled the amount of times the coder was being invoked per element. This additional use of the coder a
Re: Beam Fn API
Thanks for banging these out Lukasz. I’ll try and read them all this week. We’re also planning to add support for the Fn API to the Flink Runner so that we can execute Python programs. I’m sure we’ll get some valuable feedback for you while doing that. > On 26. May 2017, at 22:49, Lukasz Cwik wrote: > > I would like to share another document about the Fn API. This document > specifically discusses how to access side inputs, access remote references > (e.g. large iterables for hot keys produced by a GBK), and support user > state. > https://s.apache.org/beam-fn-state-api-and-bundle-processing > > The document does require a strong foundation in the Apache Beam model and > a good understanding of the prior shared docs: > * How to process a bundle: https://s.apache.org/beam-fn-api > -processing-a-bundle > * How to send and receive data: https://s.apache.org/beam-fn-api > -send-and-receive-data > > I could really use the help of runner contributors to review the caching > semantics within the SDK harness and whether they would work well for the > runner they contribute to the most. > > On Sun, May 21, 2017 at 6:40 PM, Lukasz Cwik wrote: > >> Manu, the goal is to share here initially, update the docs addressing >> people's comments, and then publish them on the website once they are >> stable enough. >> >> On Sun, May 21, 2017 at 5:54 PM, Manu Zhang >> wrote: >> >>> Thanks Lukasz. The following two links were somehow incorrectly formatted >>> in your mail. >>> >>> * How to process a bundle: >>> https://s.apache.org/beam-fn-api-processing-a-bundle >>> * How to send and receive data: >>> https://s.apache.org/beam-fn-api-send-and-receive-data >>> >>> By the way, is there a way to find them from the Beam website ? >>> >>> >>> On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik >>> wrote: >>> >>>> Now that I'm back from vacation and the 2.0.0 release is not taking all >>> my >>>> time, I am focusing my attention on working on the Beam Portability >>>> framework, specifically the Fn API so that we can get Python and other >>>> language integrations work with any runner. >>>> >>>> For new comers, I would like to reshare the overview: >>>> https://s.apache.org/beam-fn-api >>>> >>>> And for those of you who have been following this thread and >>> contributors >>>> focusing on Runner integration with Apache Beam: >>>> * How to process a bundle: https://s.apache.org/beam-fn-a >>> pi-processing-a- >>>> bundle >>>> * How to send and receive data: https://s.apache.org/ >>>> beam-fn-api-send-and-receive-data >>>> >>>> If you want to dive deeper, you should look at: >>>> * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ >>>> common/runner-api/src/main/proto/beam_runner_api.proto >>>> <https://github.com/apache/beam/blob/master/sdks/common/runn >>> er-api/src/main/proto/beam_runner_api.proto> >>>> * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ >>>> common/fn-api/src/main/proto/beam_fn_api.proto >>>> <https://github.com/apache/beam/blob/master/sdks/common/fn- >>> api/src/main/proto/beam_fn_api.proto> >>>> * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ >>>> java/harness >>>> <https://github.com/apache/beam/tree/master/sdks/java/harness> >>>> * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ >>>> python/apache_beam/runners/worker >>>> <https://github.com/apache/beam/tree/master/sdks/python/apac >>> he_beam/runners/worker> >>>> >>>> Next I'm planning on talking about Beam Fn State API and will need help >>>> from Runner contributors to talk about caching semantics and key spaces >>> and >>>> whether the integrations mesh well with current Runner implementations. >>> The >>>> State API is meant to support user state, side inputs, and re-iteration >>> for >>>> large values produced by GroupByKey. >>>> >>>> On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: >>>> >>>>> Yes, I was using a Pipeline that was: >>>>> Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a >>> batch >>>>> pipeline in the global window using the default trigger) >>>>> >>>&
Re: Beam Fn API
I would like to share another document about the Fn API. This document specifically discusses how to access side inputs, access remote references (e.g. large iterables for hot keys produced by a GBK), and support user state. https://s.apache.org/beam-fn-state-api-and-bundle-processing The document does require a strong foundation in the Apache Beam model and a good understanding of the prior shared docs: * How to process a bundle: https://s.apache.org/beam-fn-api -processing-a-bundle * How to send and receive data: https://s.apache.org/beam-fn-api -send-and-receive-data I could really use the help of runner contributors to review the caching semantics within the SDK harness and whether they would work well for the runner they contribute to the most. On Sun, May 21, 2017 at 6:40 PM, Lukasz Cwik wrote: > Manu, the goal is to share here initially, update the docs addressing > people's comments, and then publish them on the website once they are > stable enough. > > On Sun, May 21, 2017 at 5:54 PM, Manu Zhang > wrote: > >> Thanks Lukasz. The following two links were somehow incorrectly formatted >> in your mail. >> >> * How to process a bundle: >> https://s.apache.org/beam-fn-api-processing-a-bundle >> * How to send and receive data: >> https://s.apache.org/beam-fn-api-send-and-receive-data >> >> By the way, is there a way to find them from the Beam website ? >> >> >> On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik >> wrote: >> >> > Now that I'm back from vacation and the 2.0.0 release is not taking all >> my >> > time, I am focusing my attention on working on the Beam Portability >> > framework, specifically the Fn API so that we can get Python and other >> > language integrations work with any runner. >> > >> > For new comers, I would like to reshare the overview: >> > https://s.apache.org/beam-fn-api >> > >> > And for those of you who have been following this thread and >> contributors >> > focusing on Runner integration with Apache Beam: >> > * How to process a bundle: https://s.apache.org/beam-fn-a >> pi-processing-a- >> > bundle >> > * How to send and receive data: https://s.apache.org/ >> > beam-fn-api-send-and-receive-data >> > >> > If you want to dive deeper, you should look at: >> > * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ >> > common/runner-api/src/main/proto/beam_runner_api.proto >> > <https://github.com/apache/beam/blob/master/sdks/common/runn >> er-api/src/main/proto/beam_runner_api.proto> >> > * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ >> > common/fn-api/src/main/proto/beam_fn_api.proto >> > <https://github.com/apache/beam/blob/master/sdks/common/fn- >> api/src/main/proto/beam_fn_api.proto> >> > * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ >> > java/harness >> > <https://github.com/apache/beam/tree/master/sdks/java/harness> >> > * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ >> > python/apache_beam/runners/worker >> > <https://github.com/apache/beam/tree/master/sdks/python/apac >> he_beam/runners/worker> >> > >> > Next I'm planning on talking about Beam Fn State API and will need help >> > from Runner contributors to talk about caching semantics and key spaces >> and >> > whether the integrations mesh well with current Runner implementations. >> The >> > State API is meant to support user state, side inputs, and re-iteration >> for >> > large values produced by GroupByKey. >> > >> > On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: >> > >> > > Yes, I was using a Pipeline that was: >> > > Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a >> batch >> > > pipeline in the global window using the default trigger) >> > > >> > > In Google Cloud Dataflow, the shuffle step uses the binary >> representation >> > > to compare keys, so the above pipeline would normally be converted to >> the >> > > following two stages: >> > > Read -> GBK Writer >> > > GBK Reader -> IdentityParDo >> > > >> > > Note that the GBK Writer and GBK Reader need to use a coder to encode >> and >> > > decode the value. >> > > >> > > When using the Fn API, those two stages expanded because of the Fn Api >> > > crossings using a gRPC Write/Read pair: >> > > Rea
Re: Beam Fn API
Manu, the goal is to share here initially, update the docs addressing people's comments, and then publish them on the website once they are stable enough. On Sun, May 21, 2017 at 5:54 PM, Manu Zhang wrote: > Thanks Lukasz. The following two links were somehow incorrectly formatted > in your mail. > > * How to process a bundle: > https://s.apache.org/beam-fn-api-processing-a-bundle > * How to send and receive data: > https://s.apache.org/beam-fn-api-send-and-receive-data > > By the way, is there a way to find them from the Beam website ? > > > On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik > wrote: > > > Now that I'm back from vacation and the 2.0.0 release is not taking all > my > > time, I am focusing my attention on working on the Beam Portability > > framework, specifically the Fn API so that we can get Python and other > > language integrations work with any runner. > > > > For new comers, I would like to reshare the overview: > > https://s.apache.org/beam-fn-api > > > > And for those of you who have been following this thread and contributors > > focusing on Runner integration with Apache Beam: > > * How to process a bundle: https://s.apache.org/beam-fn- > api-processing-a- > > bundle > > * How to send and receive data: https://s.apache.org/ > > beam-fn-api-send-and-receive-data > > > > If you want to dive deeper, you should look at: > > * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ > > common/runner-api/src/main/proto/beam_runner_api.proto > > <https://github.com/apache/beam/blob/master/sdks/common/ > runner-api/src/main/proto/beam_runner_api.proto> > > * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ > > common/fn-api/src/main/proto/beam_fn_api.proto > > <https://github.com/apache/beam/blob/master/sdks/common/ > fn-api/src/main/proto/beam_fn_api.proto> > > * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ > > java/harness > > <https://github.com/apache/beam/tree/master/sdks/java/harness> > > * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ > > python/apache_beam/runners/worker > > <https://github.com/apache/beam/tree/master/sdks/python/ > apache_beam/runners/worker> > > > > Next I'm planning on talking about Beam Fn State API and will need help > > from Runner contributors to talk about caching semantics and key spaces > and > > whether the integrations mesh well with current Runner implementations. > The > > State API is meant to support user state, side inputs, and re-iteration > for > > large values produced by GroupByKey. > > > > On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: > > > > > Yes, I was using a Pipeline that was: > > > Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a > batch > > > pipeline in the global window using the default trigger) > > > > > > In Google Cloud Dataflow, the shuffle step uses the binary > representation > > > to compare keys, so the above pipeline would normally be converted to > the > > > following two stages: > > > Read -> GBK Writer > > > GBK Reader -> IdentityParDo > > > > > > Note that the GBK Writer and GBK Reader need to use a coder to encode > and > > > decode the value. > > > > > > When using the Fn API, those two stages expanded because of the Fn Api > > > crossings using a gRPC Write/Read pair: > > > Read -> gRPC Write -> gRPC Read -> GBK Writer > > > GBK Reader -> gRPC Write -> gRPC Read -> IdentityParDo > > > > > > In my naive prototype implementation, the coder was used to encode > > > elements at the gRPC steps. This meant that the coder was > > > encoding/decoding/encoding in the first stage and > > > decoding/encoding/decoding in the second stage. This tripled the amount > > of > > > times the coder was being invoked per element. This additional use of > the > > > coder accounted for ~12% (80% of the 15%) of the extra execution time. > > This > > > implementation is quite inefficient and would benefit from merging the > > gRPC > > > Read + GBK Writer into one actor and also the GBK Reader + gRPC Write > > into > > > another actor allowing for the creation of a fast path that can skip > > parts > > > of the decode/encode cycle through the coder. By using a byte array > view > > > over the logical stream, one can minimize the number of byte array > copies > > > which plague
Re: Beam Fn API
Thanks Lukasz. The following two links were somehow incorrectly formatted in your mail. * How to process a bundle: https://s.apache.org/beam-fn-api-processing-a-bundle * How to send and receive data: https://s.apache.org/beam-fn-api-send-and-receive-data By the way, is there a way to find them from the Beam website ? On Fri, May 19, 2017 at 6:44 AM Lukasz Cwik wrote: > Now that I'm back from vacation and the 2.0.0 release is not taking all my > time, I am focusing my attention on working on the Beam Portability > framework, specifically the Fn API so that we can get Python and other > language integrations work with any runner. > > For new comers, I would like to reshare the overview: > https://s.apache.org/beam-fn-api > > And for those of you who have been following this thread and contributors > focusing on Runner integration with Apache Beam: > * How to process a bundle: https://s.apache.org/beam-fn-api-processing-a- > bundle > * How to send and receive data: https://s.apache.org/ > beam-fn-api-send-and-receive-data > > If you want to dive deeper, you should look at: > * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ > common/runner-api/src/main/proto/beam_runner_api.proto > <https://github.com/apache/beam/blob/master/sdks/common/runner-api/src/main/proto/beam_runner_api.proto> > * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ > common/fn-api/src/main/proto/beam_fn_api.proto > <https://github.com/apache/beam/blob/master/sdks/common/fn-api/src/main/proto/beam_fn_api.proto> > * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ > java/harness > <https://github.com/apache/beam/tree/master/sdks/java/harness> > * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ > python/apache_beam/runners/worker > <https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/worker> > > Next I'm planning on talking about Beam Fn State API and will need help > from Runner contributors to talk about caching semantics and key spaces and > whether the integrations mesh well with current Runner implementations. The > State API is meant to support user state, side inputs, and re-iteration for > large values produced by GroupByKey. > > On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: > > > Yes, I was using a Pipeline that was: > > Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a batch > > pipeline in the global window using the default trigger) > > > > In Google Cloud Dataflow, the shuffle step uses the binary representation > > to compare keys, so the above pipeline would normally be converted to the > > following two stages: > > Read -> GBK Writer > > GBK Reader -> IdentityParDo > > > > Note that the GBK Writer and GBK Reader need to use a coder to encode and > > decode the value. > > > > When using the Fn API, those two stages expanded because of the Fn Api > > crossings using a gRPC Write/Read pair: > > Read -> gRPC Write -> gRPC Read -> GBK Writer > > GBK Reader -> gRPC Write -> gRPC Read -> IdentityParDo > > > > In my naive prototype implementation, the coder was used to encode > > elements at the gRPC steps. This meant that the coder was > > encoding/decoding/encoding in the first stage and > > decoding/encoding/decoding in the second stage. This tripled the amount > of > > times the coder was being invoked per element. This additional use of the > > coder accounted for ~12% (80% of the 15%) of the extra execution time. > This > > implementation is quite inefficient and would benefit from merging the > gRPC > > Read + GBK Writer into one actor and also the GBK Reader + gRPC Write > into > > another actor allowing for the creation of a fast path that can skip > parts > > of the decode/encode cycle through the coder. By using a byte array view > > over the logical stream, one can minimize the number of byte array copies > > which plagued my naive implementation. This can be done by only parsing > the > > element boundaries out of the stream to produce those logical byte array > > views. I have a very rough estimate that performing this optimization > would > > reduce the 12% overhead to somewhere between 4% and 6%. > > > > The remaining 3% (15% - 12%) overhead went to many parts of gRPC: > > marshalling/unmarshalling protos > > handling/managing the socket > > flow control > > ... > > > > Finally, I did try experiments with different buffer sizes (10KB, 100KB, > > 1000KB), flow control (separate thread[1] vs same thread with phaser[2]), > > and channel type [
Re: Beam Fn API
Now that I'm back from vacation and the 2.0.0 release is not taking all my time, I am focusing my attention on working on the Beam Portability framework, specifically the Fn API so that we can get Python and other language integrations work with any runner. For new comers, I would like to reshare the overview: https://s.apache.org/beam-fn-api And for those of you who have been following this thread and contributors focusing on Runner integration with Apache Beam: * How to process a bundle: https://s.apache.org/beam-fn-api-processing-a- bundle * How to send and receive data: https://s.apache.org/ beam-fn-api-send-and-receive-data If you want to dive deeper, you should look at: * Runner API Protobuf: https://github.com/apache/beam/blob/master/sdks/ common/runner-api/src/main/proto/beam_runner_api.proto * Fn API Protobuf: https://github.com/apache/beam/blob/master/sdks/ common/fn-api/src/main/proto/beam_fn_api.proto * Java SDK Harness: https://github.com/apache/beam/tree/master/sdks/ java/harness * Python SDK Harness: https://github.com/apache/beam/tree/master/sdks/ python/apache_beam/runners/worker Next I'm planning on talking about Beam Fn State API and will need help from Runner contributors to talk about caching semantics and key spaces and whether the integrations mesh well with current Runner implementations. The State API is meant to support user state, side inputs, and re-iteration for large values produced by GroupByKey. On Tue, Jan 24, 2017 at 9:46 AM, Lukasz Cwik wrote: > Yes, I was using a Pipeline that was: > Read(10 GiBs of KV (10,000,000 values)) -> GBK -> IdentityParDo (a batch > pipeline in the global window using the default trigger) > > In Google Cloud Dataflow, the shuffle step uses the binary representation > to compare keys, so the above pipeline would normally be converted to the > following two stages: > Read -> GBK Writer > GBK Reader -> IdentityParDo > > Note that the GBK Writer and GBK Reader need to use a coder to encode and > decode the value. > > When using the Fn API, those two stages expanded because of the Fn Api > crossings using a gRPC Write/Read pair: > Read -> gRPC Write -> gRPC Read -> GBK Writer > GBK Reader -> gRPC Write -> gRPC Read -> IdentityParDo > > In my naive prototype implementation, the coder was used to encode > elements at the gRPC steps. This meant that the coder was > encoding/decoding/encoding in the first stage and > decoding/encoding/decoding in the second stage. This tripled the amount of > times the coder was being invoked per element. This additional use of the > coder accounted for ~12% (80% of the 15%) of the extra execution time. This > implementation is quite inefficient and would benefit from merging the gRPC > Read + GBK Writer into one actor and also the GBK Reader + gRPC Write into > another actor allowing for the creation of a fast path that can skip parts > of the decode/encode cycle through the coder. By using a byte array view > over the logical stream, one can minimize the number of byte array copies > which plagued my naive implementation. This can be done by only parsing the > element boundaries out of the stream to produce those logical byte array > views. I have a very rough estimate that performing this optimization would > reduce the 12% overhead to somewhere between 4% and 6%. > > The remaining 3% (15% - 12%) overhead went to many parts of gRPC: > marshalling/unmarshalling protos > handling/managing the socket > flow control > ... > > Finally, I did try experiments with different buffer sizes (10KB, 100KB, > 1000KB), flow control (separate thread[1] vs same thread with phaser[2]), > and channel type [3] (NIO, epoll, domain socket), but coder overhead easily > dominated the differences in these other experiments. > > Further analysis would need to be done to more accurately distill this > down. > > 1: https://github.com/lukecwik/incubator-beam/blob/ > fn_api/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/ > BufferingStreamObserver.java > 2: https://github.com/lukecwik/incubator-beam/blob/ > fn_api/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/stream/ > DirectStreamObserver.java > 3: https://github.com/lukecwik/incubator-beam/blob/ > fn_api/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/channel/ > ManagedChannelFactory.java > > > On Tue, Jan 24, 2017 at 8:04 AM, Ismaël Mejía wrote: > >> Awesome job Lukasz, Excellent, I have to confess the first time I heard >> about >> the Fn API idea I was a bit incredulous, but you are making it real, >> amazing! >> >> Just one question from your document, you said that 80% of the extra (15%) >> time >> goes into encoding and decoding the data for your test case, ca
Re: Beam Fn API
Java 8 functional interface extensions > > > > > > > > > > > > On Fri, Jan 20, 2017 at 1:26 PM, Kenneth Knowles > > > > > > > > > wrote: > > > > > > > > > This is awesome! Any chance you could roadmap the PR for us with > some > > > > links > > > > > into the most interesting bits? > > > > > > > > > > On Fri, Jan 20, 2017 at 12:19 PM, Robert Bradshaw < > > > > > rober...@google.com.invalid> wrote: > > > > > > > > > > > Also, note that we can still support the "simple" case. For > > example, > > > > > > if the user supplies us with a jar file (as they do now) a runner > > > > > > could launch it as a subprocesses and communicate with it via > this > > > > > > same Fn API or install it in a fixed container itself--the user > > > > > > doesn't *need* to know about docker or manually manage containers > > > (and > > > > > > indeed the Fn API could be used in-process, cross-process, > > > > > > cross-container, and even cross-machine). > > > > > > > > > > > > However docker provides a nice cross-language way of specifying > the > > > > > > environment including all dependencies (especially for languages > > like > > > > > > Python or C where the equivalent of a cross-platform, > > self-contained > > > > > > jar isn't as easy to produce) and is strictly more powerful and > > > > > > flexible (specifically it isolates the runtime environment and > one > > > can > > > > > > even use it for local testing). > > > > > > > > > > > > Slicing a worker up like this without sacrificing performance is > an > > > > > > ambitious goal, but essential to the story of being able to mix > and > > > > > > match runners and SDKs arbitrarily, and I think this is a great > > > start. > > > > > > > > > > > > > > > > > > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik > > > > > > > > > > > > > wrote: > > > > > > > Your correct, a docker container is created that contains the > > > > execution > > > > > > > environment the user wants or the user re-uses an existing one > > > > > (allowing > > > > > > > for a user to embed all their code/dependencies or use a > > container > > > > that > > > > > > can > > > > > > > deploy code/dependencies on demand). > > > > > > > A user creates a pipeline saying which docker container they > want > > > to > > > > > use > > > > > > > (this starts to allow for multiple container definitions > within a > > > > > single > > > > > > > pipeline to support multiple languages, versioning, ...). > > > > > > > A runner would then be responsible for launching one or more of > > > these > > > > > > > containers in a cluster manager of their choice (scaling up or > > down > > > > the > > > > > > > number of instances depending on demand/load/...). > > > > > > > A runner then interacts with the docker containers over the > gRPC > > > > > service > > > > > > > definitions to delegate processing to. > > > > > > > > > > > > > > > > > > > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré < > > > > j...@nanthrax.net > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > >> Hi Luke, > > > > > > >> > > > > > > >> that's really great and very promising ! > > > > > > >> > > > > > > >> It's really ambitious but I like the idea. Just to clarify: > the > > > > > purpose > > > > > > of > > > > > > >> using gRPC is once the docker container is running, then we > can > > > > > > "interact" > > > > > > >> with the container to spread and delegate processing to the > > docker > > > > > > >> container, correct ? > > > > > > >> The users/devops have to setup the docker containers as > > > > prerequisite. > > > > > > >> Then, the "location" of the containers (kind of container > > > registry) > > > > is > > > > > > set > > > > > > >> via the pipeline options and used by gRPC ? > > > > > > >> > > > > > > >> Thanks Luke ! > > > > > > >> > > > > > > >> Regards > > > > > > >> JB > > > > > > >> > > > > > > >> > > > > > > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > > > > > > >> > > > > > > >>> I have been prototyping several components towards the Beam > > > > technical > > > > > > >>> vision of being able to execute an arbitrary language using > an > > > > > > arbitrary > > > > > > >>> runner. > > > > > > >>> > > > > > > >>> I would like to share this overview [1] of what I have been > > > working > > > > > > >>> towards. I also share this PR [2] with a proposed API, > service > > > > > > definitions > > > > > > >>> and partial implementation. > > > > > > >>> > > > > > > >>> 1: https://s.apache.org/beam-fn-api > > > > > > >>> 2: https://github.com/apache/beam/pull/1801 > > > > > > >>> > > > > > > >>> Please comment on the overview within this thread, and any > > > specific > > > > > > code > > > > > > >>> comments on the PR directly. > > > > > > >>> > > > > > > >>> Luke > > > > > > >>> > > > > > > >>> > > > > > > >> -- > > > > > > >> Jean-Baptiste Onofré > > > > > > >> jbono...@apache.org > > > > > > >> http://blog.nanthrax.net > > > > > > >> Talend - http://www.talend.com > > > > > > >> > > > > > > > > > > > > > > > > > > > > >
Re: Beam Fn API
; > > support sources and gRPC endpoints. > > > > > > Unless your really interested in how domain sockets, epoll, nio channel > > > factories or how stream readiness callbacks work in gRPC, I would avoid > > the > > > packages org.apache.beam.fn.harness.channel and > > > org.apache.beam.fn.harness.stream. Similarly I would avoid > > > org.apache.beam.fn.harness.fn and org.apache.beam.fn.harness.fake as > > they > > > don't add anything meaningful to the api. > > > > > > Code package descriptions: > > > > > > org.apache.beam.fn.harness.FnHarness: main entry point > > > org.apache.beam.fn.harness.control: Control service client and > > individual > > > request handlers > > > org.apache.beam.fn.harness.data: Data service client and logical > stream > > > multiplexing > > > org.apache.beam.runners.core: Additional runners akin to the DoFnRunner > > > found in runners-core to support sources and gRPC endpoints > > > org.apache.beam.fn.harness.logging: Logging client implementation and > > JUL > > > logging handler adapter > > > org.apache.beam.fn.harness.channel: gRPC channel management > > > org.apache.beam.fn.harness.stream: gRPC stream management > > > org.apache.beam.fn.harness.fn: Java 8 functional interface extensions > > > > > > > > > On Fri, Jan 20, 2017 at 1:26 PM, Kenneth Knowles > > > > > > wrote: > > > > > > > This is awesome! Any chance you could roadmap the PR for us with some > > > links > > > > into the most interesting bits? > > > > > > > > On Fri, Jan 20, 2017 at 12:19 PM, Robert Bradshaw < > > > > rober...@google.com.invalid> wrote: > > > > > > > > > Also, note that we can still support the "simple" case. For > example, > > > > > if the user supplies us with a jar file (as they do now) a runner > > > > > could launch it as a subprocesses and communicate with it via this > > > > > same Fn API or install it in a fixed container itself--the user > > > > > doesn't *need* to know about docker or manually manage containers > > (and > > > > > indeed the Fn API could be used in-process, cross-process, > > > > > cross-container, and even cross-machine). > > > > > > > > > > However docker provides a nice cross-language way of specifying the > > > > > environment including all dependencies (especially for languages > like > > > > > Python or C where the equivalent of a cross-platform, > self-contained > > > > > jar isn't as easy to produce) and is strictly more powerful and > > > > > flexible (specifically it isolates the runtime environment and one > > can > > > > > even use it for local testing). > > > > > > > > > > Slicing a worker up like this without sacrificing performance is an > > > > > ambitious goal, but essential to the story of being able to mix and > > > > > match runners and SDKs arbitrarily, and I think this is a great > > start. > > > > > > > > > > > > > > > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik > > > > > > > > > > wrote: > > > > > > Your correct, a docker container is created that contains the > > > execution > > > > > > environment the user wants or the user re-uses an existing one > > > > (allowing > > > > > > for a user to embed all their code/dependencies or use a > container > > > that > > > > > can > > > > > > deploy code/dependencies on demand). > > > > > > A user creates a pipeline saying which docker container they want > > to > > > > use > > > > > > (this starts to allow for multiple container definitions within a > > > > single > > > > > > pipeline to support multiple languages, versioning, ...). > > > > > > A runner would then be responsible for launching one or more of > > these > > > > > > containers in a cluster manager of their choice (scaling up or > down > > > the > > > > > > number of instances depending on demand/load/...). > > > > > > A runner then interacts with the docker containers over the gRPC > > > > service > > > > > > definitions to delegate processing to. > > > > >
Re: Beam Fn API
; found in runners-core to support sources and gRPC endpoints > > org.apache.beam.fn.harness.logging: Logging client implementation and > JUL > > logging handler adapter > > org.apache.beam.fn.harness.channel: gRPC channel management > > org.apache.beam.fn.harness.stream: gRPC stream management > > org.apache.beam.fn.harness.fn: Java 8 functional interface extensions > > > > > > On Fri, Jan 20, 2017 at 1:26 PM, Kenneth Knowles > > > wrote: > > > > > This is awesome! Any chance you could roadmap the PR for us with some > > links > > > into the most interesting bits? > > > > > > On Fri, Jan 20, 2017 at 12:19 PM, Robert Bradshaw < > > > rober...@google.com.invalid> wrote: > > > > > > > Also, note that we can still support the "simple" case. For example, > > > > if the user supplies us with a jar file (as they do now) a runner > > > > could launch it as a subprocesses and communicate with it via this > > > > same Fn API or install it in a fixed container itself--the user > > > > doesn't *need* to know about docker or manually manage containers > (and > > > > indeed the Fn API could be used in-process, cross-process, > > > > cross-container, and even cross-machine). > > > > > > > > However docker provides a nice cross-language way of specifying the > > > > environment including all dependencies (especially for languages like > > > > Python or C where the equivalent of a cross-platform, self-contained > > > > jar isn't as easy to produce) and is strictly more powerful and > > > > flexible (specifically it isolates the runtime environment and one > can > > > > even use it for local testing). > > > > > > > > Slicing a worker up like this without sacrificing performance is an > > > > ambitious goal, but essential to the story of being able to mix and > > > > match runners and SDKs arbitrarily, and I think this is a great > start. > > > > > > > > > > > > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik > > > > > > > wrote: > > > > > Your correct, a docker container is created that contains the > > execution > > > > > environment the user wants or the user re-uses an existing one > > > (allowing > > > > > for a user to embed all their code/dependencies or use a container > > that > > > > can > > > > > deploy code/dependencies on demand). > > > > > A user creates a pipeline saying which docker container they want > to > > > use > > > > > (this starts to allow for multiple container definitions within a > > > single > > > > > pipeline to support multiple languages, versioning, ...). > > > > > A runner would then be responsible for launching one or more of > these > > > > > containers in a cluster manager of their choice (scaling up or down > > the > > > > > number of instances depending on demand/load/...). > > > > > A runner then interacts with the docker containers over the gRPC > > > service > > > > > definitions to delegate processing to. > > > > > > > > > > > > > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré < > > j...@nanthrax.net > > > > > > > > > wrote: > > > > > > > > > >> Hi Luke, > > > > >> > > > > >> that's really great and very promising ! > > > > >> > > > > >> It's really ambitious but I like the idea. Just to clarify: the > > > purpose > > > > of > > > > >> using gRPC is once the docker container is running, then we can > > > > "interact" > > > > >> with the container to spread and delegate processing to the docker > > > > >> container, correct ? > > > > >> The users/devops have to setup the docker containers as > > prerequisite. > > > > >> Then, the "location" of the containers (kind of container > registry) > > is > > > > set > > > > >> via the pipeline options and used by gRPC ? > > > > >> > > > > >> Thanks Luke ! > > > > >> > > > > >> Regards > > > > >> JB > > > > >> > > > > >> > > > > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > > > > >> > > > > >>> I have been prototyping several components towards the Beam > > technical > > > > >>> vision of being able to execute an arbitrary language using an > > > > arbitrary > > > > >>> runner. > > > > >>> > > > > >>> I would like to share this overview [1] of what I have been > working > > > > >>> towards. I also share this PR [2] with a proposed API, service > > > > definitions > > > > >>> and partial implementation. > > > > >>> > > > > >>> 1: https://s.apache.org/beam-fn-api > > > > >>> 2: https://github.com/apache/beam/pull/1801 > > > > >>> > > > > >>> Please comment on the overview within this thread, and any > specific > > > > code > > > > >>> comments on the PR directly. > > > > >>> > > > > >>> Luke > > > > >>> > > > > >>> > > > > >> -- > > > > >> Jean-Baptiste Onofré > > > > >> jbono...@apache.org > > > > >> http://blog.nanthrax.net > > > > >> Talend - http://www.talend.com > > > > >> > > > > > > > > > >
Re: Beam Fn API
ut sacrificing performance is an > > > ambitious goal, but essential to the story of being able to mix and > > > match runners and SDKs arbitrarily, and I think this is a great start. > > > > > > > > > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik > > > > wrote: > > > > Your correct, a docker container is created that contains the > execution > > > > environment the user wants or the user re-uses an existing one > > (allowing > > > > for a user to embed all their code/dependencies or use a container > that > > > can > > > > deploy code/dependencies on demand). > > > > A user creates a pipeline saying which docker container they want to > > use > > > > (this starts to allow for multiple container definitions within a > > single > > > > pipeline to support multiple languages, versioning, ...). > > > > A runner would then be responsible for launching one or more of these > > > > containers in a cluster manager of their choice (scaling up or down > the > > > > number of instances depending on demand/load/...). > > > > A runner then interacts with the docker containers over the gRPC > > service > > > > definitions to delegate processing to. > > > > > > > > > > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré < > j...@nanthrax.net > > > > > > > wrote: > > > > > > > >> Hi Luke, > > > >> > > > >> that's really great and very promising ! > > > >> > > > >> It's really ambitious but I like the idea. Just to clarify: the > > purpose > > > of > > > >> using gRPC is once the docker container is running, then we can > > > "interact" > > > >> with the container to spread and delegate processing to the docker > > > >> container, correct ? > > > >> The users/devops have to setup the docker containers as > prerequisite. > > > >> Then, the "location" of the containers (kind of container registry) > is > > > set > > > >> via the pipeline options and used by gRPC ? > > > >> > > > >> Thanks Luke ! > > > >> > > > >> Regards > > > >> JB > > > >> > > > >> > > > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > > > >> > > > >>> I have been prototyping several components towards the Beam > technical > > > >>> vision of being able to execute an arbitrary language using an > > > arbitrary > > > >>> runner. > > > >>> > > > >>> I would like to share this overview [1] of what I have been working > > > >>> towards. I also share this PR [2] with a proposed API, service > > > definitions > > > >>> and partial implementation. > > > >>> > > > >>> 1: https://s.apache.org/beam-fn-api > > > >>> 2: https://github.com/apache/beam/pull/1801 > > > >>> > > > >>> Please comment on the overview within this thread, and any specific > > > code > > > >>> comments on the PR directly. > > > >>> > > > >>> Luke > > > >>> > > > >>> > > > >> -- > > > >> Jean-Baptiste Onofré > > > >> jbono...@apache.org > > > >> http://blog.nanthrax.net > > > >> Talend - http://www.talend.com > > > >> > > > > > >
Re: Beam Fn API
gt; >> Hi Luke, > > >> > > >> that's really great and very promising ! > > >> > > >> It's really ambitious but I like the idea. Just to clarify: the > purpose > > of > > >> using gRPC is once the docker container is running, then we can > > "interact" > > >> with the container to spread and delegate processing to the docker > > >> container, correct ? > > >> The users/devops have to setup the docker containers as prerequisite. > > >> Then, the "location" of the containers (kind of container registry) is > > set > > >> via the pipeline options and used by gRPC ? > > >> > > >> Thanks Luke ! > > >> > > >> Regards > > >> JB > > >> > > >> > > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > > >> > > >>> I have been prototyping several components towards the Beam technical > > >>> vision of being able to execute an arbitrary language using an > > arbitrary > > >>> runner. > > >>> > > >>> I would like to share this overview [1] of what I have been working > > >>> towards. I also share this PR [2] with a proposed API, service > > definitions > > >>> and partial implementation. > > >>> > > >>> 1: https://s.apache.org/beam-fn-api > > >>> 2: https://github.com/apache/beam/pull/1801 > > >>> > > >>> Please comment on the overview within this thread, and any specific > > code > > >>> comments on the PR directly. > > >>> > > >>> Luke > > >>> > > >>> > > >> -- > > >> Jean-Baptiste Onofré > > >> jbono...@apache.org > > >> http://blog.nanthrax.net > > >> Talend - http://www.talend.com > > >> > > >
Re: Beam Fn API
This is awesome! Any chance you could roadmap the PR for us with some links into the most interesting bits? On Fri, Jan 20, 2017 at 12:19 PM, Robert Bradshaw < rober...@google.com.invalid> wrote: > Also, note that we can still support the "simple" case. For example, > if the user supplies us with a jar file (as they do now) a runner > could launch it as a subprocesses and communicate with it via this > same Fn API or install it in a fixed container itself--the user > doesn't *need* to know about docker or manually manage containers (and > indeed the Fn API could be used in-process, cross-process, > cross-container, and even cross-machine). > > However docker provides a nice cross-language way of specifying the > environment including all dependencies (especially for languages like > Python or C where the equivalent of a cross-platform, self-contained > jar isn't as easy to produce) and is strictly more powerful and > flexible (specifically it isolates the runtime environment and one can > even use it for local testing). > > Slicing a worker up like this without sacrificing performance is an > ambitious goal, but essential to the story of being able to mix and > match runners and SDKs arbitrarily, and I think this is a great start. > > > On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik > wrote: > > Your correct, a docker container is created that contains the execution > > environment the user wants or the user re-uses an existing one (allowing > > for a user to embed all their code/dependencies or use a container that > can > > deploy code/dependencies on demand). > > A user creates a pipeline saying which docker container they want to use > > (this starts to allow for multiple container definitions within a single > > pipeline to support multiple languages, versioning, ...). > > A runner would then be responsible for launching one or more of these > > containers in a cluster manager of their choice (scaling up or down the > > number of instances depending on demand/load/...). > > A runner then interacts with the docker containers over the gRPC service > > definitions to delegate processing to. > > > > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré > > wrote: > > > >> Hi Luke, > >> > >> that's really great and very promising ! > >> > >> It's really ambitious but I like the idea. Just to clarify: the purpose > of > >> using gRPC is once the docker container is running, then we can > "interact" > >> with the container to spread and delegate processing to the docker > >> container, correct ? > >> The users/devops have to setup the docker containers as prerequisite. > >> Then, the "location" of the containers (kind of container registry) is > set > >> via the pipeline options and used by gRPC ? > >> > >> Thanks Luke ! > >> > >> Regards > >> JB > >> > >> > >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > >> > >>> I have been prototyping several components towards the Beam technical > >>> vision of being able to execute an arbitrary language using an > arbitrary > >>> runner. > >>> > >>> I would like to share this overview [1] of what I have been working > >>> towards. I also share this PR [2] with a proposed API, service > definitions > >>> and partial implementation. > >>> > >>> 1: https://s.apache.org/beam-fn-api > >>> 2: https://github.com/apache/beam/pull/1801 > >>> > >>> Please comment on the overview within this thread, and any specific > code > >>> comments on the PR directly. > >>> > >>> Luke > >>> > >>> > >> -- > >> Jean-Baptiste Onofré > >> jbono...@apache.org > >> http://blog.nanthrax.net > >> Talend - http://www.talend.com > >> >
Re: Beam Fn API
Also, note that we can still support the "simple" case. For example, if the user supplies us with a jar file (as they do now) a runner could launch it as a subprocesses and communicate with it via this same Fn API or install it in a fixed container itself--the user doesn't *need* to know about docker or manually manage containers (and indeed the Fn API could be used in-process, cross-process, cross-container, and even cross-machine). However docker provides a nice cross-language way of specifying the environment including all dependencies (especially for languages like Python or C where the equivalent of a cross-platform, self-contained jar isn't as easy to produce) and is strictly more powerful and flexible (specifically it isolates the runtime environment and one can even use it for local testing). Slicing a worker up like this without sacrificing performance is an ambitious goal, but essential to the story of being able to mix and match runners and SDKs arbitrarily, and I think this is a great start. On Fri, Jan 20, 2017 at 9:39 AM, Lukasz Cwik wrote: > Your correct, a docker container is created that contains the execution > environment the user wants or the user re-uses an existing one (allowing > for a user to embed all their code/dependencies or use a container that can > deploy code/dependencies on demand). > A user creates a pipeline saying which docker container they want to use > (this starts to allow for multiple container definitions within a single > pipeline to support multiple languages, versioning, ...). > A runner would then be responsible for launching one or more of these > containers in a cluster manager of their choice (scaling up or down the > number of instances depending on demand/load/...). > A runner then interacts with the docker containers over the gRPC service > definitions to delegate processing to. > > > On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré > wrote: > >> Hi Luke, >> >> that's really great and very promising ! >> >> It's really ambitious but I like the idea. Just to clarify: the purpose of >> using gRPC is once the docker container is running, then we can "interact" >> with the container to spread and delegate processing to the docker >> container, correct ? >> The users/devops have to setup the docker containers as prerequisite. >> Then, the "location" of the containers (kind of container registry) is set >> via the pipeline options and used by gRPC ? >> >> Thanks Luke ! >> >> Regards >> JB >> >> >> On 01/19/2017 03:56 PM, Lukasz Cwik wrote: >> >>> I have been prototyping several components towards the Beam technical >>> vision of being able to execute an arbitrary language using an arbitrary >>> runner. >>> >>> I would like to share this overview [1] of what I have been working >>> towards. I also share this PR [2] with a proposed API, service definitions >>> and partial implementation. >>> >>> 1: https://s.apache.org/beam-fn-api >>> 2: https://github.com/apache/beam/pull/1801 >>> >>> Please comment on the overview within this thread, and any specific code >>> comments on the PR directly. >>> >>> Luke >>> >>> >> -- >> Jean-Baptiste Onofré >> jbono...@apache.org >> http://blog.nanthrax.net >> Talend - http://www.talend.com >>
Re: Beam Fn API
Your correct, a docker container is created that contains the execution environment the user wants or the user re-uses an existing one (allowing for a user to embed all their code/dependencies or use a container that can deploy code/dependencies on demand). A user creates a pipeline saying which docker container they want to use (this starts to allow for multiple container definitions within a single pipeline to support multiple languages, versioning, ...). A runner would then be responsible for launching one or more of these containers in a cluster manager of their choice (scaling up or down the number of instances depending on demand/load/...). A runner then interacts with the docker containers over the gRPC service definitions to delegate processing to. On Fri, Jan 20, 2017 at 4:56 AM, Jean-Baptiste Onofré wrote: > Hi Luke, > > that's really great and very promising ! > > It's really ambitious but I like the idea. Just to clarify: the purpose of > using gRPC is once the docker container is running, then we can "interact" > with the container to spread and delegate processing to the docker > container, correct ? > The users/devops have to setup the docker containers as prerequisite. > Then, the "location" of the containers (kind of container registry) is set > via the pipeline options and used by gRPC ? > > Thanks Luke ! > > Regards > JB > > > On 01/19/2017 03:56 PM, Lukasz Cwik wrote: > >> I have been prototyping several components towards the Beam technical >> vision of being able to execute an arbitrary language using an arbitrary >> runner. >> >> I would like to share this overview [1] of what I have been working >> towards. I also share this PR [2] with a proposed API, service definitions >> and partial implementation. >> >> 1: https://s.apache.org/beam-fn-api >> 2: https://github.com/apache/beam/pull/1801 >> >> Please comment on the overview within this thread, and any specific code >> comments on the PR directly. >> >> Luke >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >
Re: Beam Fn API
Hi Luke, that's really great and very promising ! It's really ambitious but I like the idea. Just to clarify: the purpose of using gRPC is once the docker container is running, then we can "interact" with the container to spread and delegate processing to the docker container, correct ? The users/devops have to setup the docker containers as prerequisite. Then, the "location" of the containers (kind of container registry) is set via the pipeline options and used by gRPC ? Thanks Luke ! Regards JB On 01/19/2017 03:56 PM, Lukasz Cwik wrote: I have been prototyping several components towards the Beam technical vision of being able to execute an arbitrary language using an arbitrary runner. I would like to share this overview [1] of what I have been working towards. I also share this PR [2] with a proposed API, service definitions and partial implementation. 1: https://s.apache.org/beam-fn-api 2: https://github.com/apache/beam/pull/1801 Please comment on the overview within this thread, and any specific code comments on the PR directly. Luke -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: Beam Fn API
"relatively little extra work" once the base APIs are implemented. On Thu, Jan 19, 2017 at 11:26 PM, Dan Halperin wrote: > This is an extremely ambitious part of the technical vision. I think it's > a lot of work, but well worth it -- Python-SDK-on-Java-runner with > relatively extra work? I don't care what the overhead is, this is making > the impossible possible. > > On Thu, Jan 19, 2017 at 3:56 PM, Lukasz Cwik > wrote: > >> I have been prototyping several components towards the Beam technical >> vision of being able to execute an arbitrary language using an arbitrary >> runner. >> >> I would like to share this overview [1] of what I have been working >> towards. I also share this PR [2] with a proposed API, service definitions >> and partial implementation. >> >> 1: https://s.apache.org/beam-fn-api >> 2: https://github.com/apache/beam/pull/1801 >> >> Please comment on the overview within this thread, and any specific code >> comments on the PR directly. >> >> Luke >> > >
Re: Beam Fn API
This is an extremely ambitious part of the technical vision. I think it's a lot of work, but well worth it -- Python-SDK-on-Java-runner with relatively extra work? I don't care what the overhead is, this is making the impossible possible. On Thu, Jan 19, 2017 at 3:56 PM, Lukasz Cwik wrote: > I have been prototyping several components towards the Beam technical > vision of being able to execute an arbitrary language using an arbitrary > runner. > > I would like to share this overview [1] of what I have been working > towards. I also share this PR [2] with a proposed API, service definitions > and partial implementation. > > 1: https://s.apache.org/beam-fn-api > 2: https://github.com/apache/beam/pull/1801 > > Please comment on the overview within this thread, and any specific code > comments on the PR directly. > > Luke >
Beam Fn API
I have been prototyping several components towards the Beam technical vision of being able to execute an arbitrary language using an arbitrary runner. I would like to share this overview [1] of what I have been working towards. I also share this PR [2] with a proposed API, service definitions and partial implementation. 1: https://s.apache.org/beam-fn-api 2: https://github.com/apache/beam/pull/1801 Please comment on the overview within this thread, and any specific code comments on the PR directly. Luke