Kenneth’s idea of using sketches for state with the State API is really interesting, it really opens some interesting use cases, I haven’t really thought about it but I believe it is really an appealing use case for the sketches. Note that the origin of this work was in the line of statistics, in particular we were interested in data sketches (specially the Cardinality ones) as a ‘lightweight’ way to have approximate metrics.
There are two pending subjects to discuss: 1. Having sketches as approximate metrics seems interesting, however the current Beam Metrics API does not allow User-Defined Metrics. I don’t really know the details of the current metrics implementation. It is eventually possibly to support this? I mean to extend metrics to reuse something like the sketches extension? 2. There is also another contribution that Arnaud did in case there is interest, it is just a transform for standard deviation. We decided not to include it as part of the sketches extension since it was not consistent with the approximate nature of the extension, but I think it could be another interesting contribution as a subsequent PR (if there is interest also on this). Regards, Ismaël On Sat, Aug 12, 2017 at 11:20 AM, Arnaud Fournier <arnaudfournier...@gmail.com> wrote: > Hello Kenneth, thank you for your answer. > > I read your blog post about stateful processing and that is indeed a great > feature ! > > So if I understood correctly we could use the combineFns to declare > combiningStates so it can be used while processing elements in a DoFn. That > opens up a lot more use cases for the sketches ! > > Actually this was already possible for 2 sketches but now I refined the > constructors of the 2 other sketches, and will do so for the other ones to > come. > > > Regards, > > Arnaud > > 2017-08-08 2:07 GMT+02:00 Kenneth Knowles <k...@google.com.invalid>: > >> This is a great development! I have wanted Beam to have a library of >> sketches. >> >> What Eugene is referring to is the fact that you can write >> Combine.perKey(combineFn) to use these in a transform but also >> StateSpecs.combiningState(combineFn) to use them in a stateful ParDo. So >> it >> is good to make the CombineFn public and refine their constructors to be >> user-friendly. >> >> Kenn >> >> On Fri, Aug 4, 2017 at 7:45 AM, Arnaud Fournier < >> arnaudfournier...@gmail.com >> > wrote: >> >> > Thanks for your comments, that is very encouraging ! >> > >> > I have created a Jira : https://issues.apache.org/jira/browse/BEAM-2728 >> > and a PR : https://github.com/apache/beam/pull/3686 >> > >> > Eugene and Lucas I saw that you already have some ideas so I put you as >> > reviewers, >> > I look forward to hear more from you. >> > >> > With Ismael and JB, we already thought about using some of these >> indicators >> > as metric cells, >> > as it can be useful for some kinds of monitoring. >> > But I have never heard about state cells, is it something like the >> > QuantileState in ApproximateQuantiles ? >> > >> > >> > >> > 2017-08-04 3:14 GMT+02:00 Anand Iyer <iyer.anan...@gmail.com>: >> > >> > > This is awesome!! Very exciting to see the addition of statistical and >> > > data-mining algorithms to Apache Beam. >> > > >> > > On Thu, Aug 3, 2017 at 2:32 PM, Eugene Kirpichov < >> > > kirpic...@google.com.invalid> wrote: >> > > >> > > > +1, Very exciting! I have some suggestions on the exact API to expose >> > > (e.g. >> > > > I think it makes sense to expose the CombineFn's directly, so that >> they >> > > can >> > > > also be used for combining state cells and not just as PTransforms), >> > but >> > > > that can be handled during regular code review. >> > > > >> > > > On Thu, Aug 3, 2017 at 2:23 PM Sourabh Bajaj >> > > > <sourabhba...@google.com.invalid> wrote: >> > > > >> > > > > +1 to this. >> > > > > >> > > > > On Thu, Aug 3, 2017 at 6:28 AM Lukasz Cwik >> <lc...@google.com.invalid >> > > >> > > > > wrote: >> > > > > >> > > > > > I'm most interested in the frequency / cardinality tools as it >> > could >> > > be >> > > > > > used to help improve performance automatically for combiners by >> > > > detecting >> > > > > > the few keys case or automatically handle hot keys without >> needing >> > > > users >> > > > > to >> > > > > > specify the hints when they use a combiner. >> > > > > > >> > > > > > On Thu, Aug 3, 2017 at 5:35 AM, Jean-Baptiste Onofré < >> > > j...@nanthrax.net> >> > > > > > wrote: >> > > > > > >> > > > > > > Nice work Arnaud ;) >> > > > > > > >> > > > > > > Happy to have been able to help. >> > > > > > > >> > > > > > > Let's see what the others will think about this. >> > > > > > > >> > > > > > > Regards >> > > > > > > JB >> > > > > > > >> > > > > > > >> > > > > > > On 08/03/2017 02:32 PM, Arnaud Fournier wrote: >> > > > > > > >> > > > > > >> Hello everyone, >> > > > > > >> >> > > > > > >> My name is Arnaud Fournier and I am a CS student. I am >> currently >> > > > doing >> > > > > > an >> > > > > > >> internship at Talend. >> > > > > > >> >> > > > > > >> With the support of Jean-Baptiste Onofre and Ismaël Mejia, I >> > have >> > > > been >> > > > > > >> working on statistical analysis of streams with Beam, using >> > > > > > probabilistic >> > > > > > >> data structures like HyperLogLog. >> > > > > > >> >> > > > > > >> I would like to share this work with the community, but I >> wanted >> > > > first >> > > > > > to >> > > > > > >> show you my work in progress and ask you if this humble >> > > contribution >> > > > > > could >> > > > > > >> be interesting as an extension. >> > > > > > >> >> > > > > > >> I have made a little doc with more details about what I have >> > done >> > > in >> > > > > > case >> > > > > > >> you are interested and want to give me some feedback : >> > > > > > >> *https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe >> > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit* >> > > > > > >> <https://docs.google.com/document/d/1Xy6g5RPBYX_HadpIr_2WrUe >> > > > > > >> usiwL0Jo2ACI5PEOP1kc/edit> >> > > > > > >> >> > > > > > >> You can also find the current work implementation in progress >> > here >> > > > : >> > > > > > >> >> > > > > > >> https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/e >> > > > > > >> xtensions/sketching >> > > > > > >> >> > > > > > >> >> > > > > > >> <https://github.com/ArnaudFnr/beam/tree/sketching/sdks/java/ >> > > > > > >> extensions/sketching> >> > > > > > >> >> > > > > > >> Thanks ! >> > > > > > >> >> > > > > > >> Arnaud >> > > > > > >> >> > > > > > >> >> > > > > > > -- >> > > > > > > Jean-Baptiste Onofré >> > > > > > > jbono...@apache.org >> > > > > > > http://blog.nanthrax.net >> > > > > > > Talend - http://www.talend.com >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >>