I agree with you Aljoscha, a data driven approach of what features work based upon test results being summarized and which ones scale based upon benchmarks seems like a great way to differentiate runners strengths.
On Mon, Aug 28, 2017 at 8:39 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > I like where this is going! > > Regarding benchmarking, I think we could do this if we had common > benchmarking infrastructure and pipelines that regularly run on different > Runners so that we have up-to-date data. > > I think we can also have a more technical section where we show stats on > the level of support via the excluded ValidatesRunner tests. This is hard > data that we have on every Runner and we can annotate it to explain why a > certain Runner has a given restriction. This is a bit different from what > Kenn initially suggested but I think we should have both. Plus, this very > clearly specifies what feature is (somewhat) validated to work in a given > Runner. > > Regarding PCollectionView support in Flink, I think this actually works > and the ValidatesRunner tests pass for this. Not sure what is going on in > that test case yet. For reference, this is the Issue: > https://issues.apache.org/jira/browse/BEAM-2806 < > https://issues.apache.org/jira/browse/BEAM-2806> > > Best, > Aljoscha > > > On 23. Aug 2017, at 21:24, Mingmin Xu <mingm...@gmail.com> wrote: > > > > I would like to have an API compatibility testing. AFAIK there's still > gap > > to achieve our goal (one job for any runner), that means developers > should > > notice the limitation when writing the job. For example PCollectionView > is > > not well supported in FlinkRunner(not quite sure the current status as my > > test job is broken)/SparkRunner streaming. > > > >> 5. Reorganize the windowing section to be just support for merging / > > non-merging windowing. > > sliding/fix_window/session is more straightforward to me, > > merging/non-merging is more about the backend implementation. > > > > > > On Tue, Aug 22, 2017 at 7:28 PM, Kenneth Knowles <k...@google.com.invalid > > > > wrote: > > > >> Oh, I missed > >> > >> 11. Quantitative properties. This seems like an interesting and > important > >> project all on its own. Since Beam is so generic, we need pretty diverse > >> measurements for a user to have a hope of extrapolating to their use > case. > >> > >> Kenn > >> > >> On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <k...@google.com> > wrote: > >> > >>> OK, so adding these good ideas to the list: > >>> > >>> 8. Plain-English summary that comes before the nitty-gritty. > >>> 9. Comment on production readiness from maintainers. Maybe testimonials > >>> are helpful if they can be obtained? > >>> 10. Versioning of all of the above > >>> > >>> Any more thoughts? I'll summarize in a JIRA in a bit. > >>> > >>> Kenn > >>> > >>> On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas > >> <g...@google.com.invalid > >>>> wrote: > >>> > >>>> Hi, I'd also like to ask if versioning as proposed in BEAM-166 < > >>>> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If > >> it > >>>> is, would this be something we want to add to this proposal? > >>>> > >>>> G > >>>> > >>>> On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid> > >>>> wrote: > >>>> > >>>>> Is there any way we could add quantitative runner metrics to this as > >>>> well? > >>>>> Like by having some benchmarks that process X amount of data, and > then > >>>>> detailing in the matrix latency, throughput, and (where possible) > >> cost, > >>>>> etc, numbers for each of the given runners? Semantic support is one > >>>> thing, > >>>>> but there are other differences between runners that aren't captured > >> by > >>>>> just checking feature boxes. I'd be curious if anyone has other ideas > >> in > >>>>> this vein as well. The benchmark idea might not be the best way to go > >>>> about > >>>>> it. > >>>>> > >>>>> -Tyler > >>>>> > >>>>> On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson < > >>>> je...@bigdatainstitute.io> > >>>>> wrote: > >>>>> > >>>>>> It'd be awesome to see these updated. I'd add two more: > >>>>>> > >>>>>> 1. A plain English summary of the runner's support in Beam. > >> People > >>>> who > >>>>>> are new to Beam won't understand the in-depth coverage and need a > >>>>>> general > >>>>>> idea of how it is supported. > >>>>>> 2. The production readiness of the runner. Does the maintainer > >>>> think > >>>>>> this runner is production ready? > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles > >>>> <k...@google.com.invalid> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I want to revamp > >>>>>>> https://beam.apache.org/documentation/runners/capability-matrix/ > >>>>>>> > >>>>>>> When Beam first started, we didn't work on feature branches for > >> the > >>>>> core > >>>>>>> runners, and they had a lot more gaps compared to what goes on > >>>> `master` > >>>>>>> today, so this tracked our progress in a way that was easy for > >>>> users to > >>>>>>> read. Now it is still our best/only comparison page for users, > >> but I > >>>>>> think > >>>>>>> we could improve its usefulness. > >>>>>>> > >>>>>>> For the benefit of the thread, let me inline all the capabilities > >>>> fully > >>>>>>> here: > >>>>>>> > >>>>>>> ======================== > >>>>>>> > >>>>>>> "What is being computed?" > >>>>>>> - ParDo > >>>>>>> - GroupByKey > >>>>>>> - Flatten > >>>>>>> - Combine > >>>>>>> - Composite Transforms > >>>>>>> - Side Inputs > >>>>>>> - Source API > >>>>>>> - Splittable DoFn > >>>>>>> - Metrics > >>>>>>> - Stateful Processing > >>>>>>> > >>>>>>> "Where in event time?" > >>>>>>> - Global windows > >>>>>>> - Fixed windows > >>>>>>> - Sliding windows > >>>>>>> - Session windows > >>>>>>> - Custom windows > >>>>>>> - Custom merging windows > >>>>>>> - Timestamp control > >>>>>>> > >>>>>>> "When in processing time?" > >>>>>>> - Configurable triggering > >>>>>>> - Event-time triggers > >>>>>>> - Processing-time triggers > >>>>>>> - Count triggers > >>>>>>> - [Meta]data driven triggers > >>>>>>> - Composite triggers > >>>>>>> - Allowed lateness > >>>>>>> - Timers > >>>>>>> > >>>>>>> "How do refinements relate?" > >>>>>>> - Discarding > >>>>>>> - Accumulating > >>>>>>> - Accumulating & Retracting > >>>>>>> > >>>>>>> ======================== > >>>>>>> > >>>>>>> Here are some issues I'd like to improve: > >>>>>>> > >>>>>>> - Rows that are impossible to not support (ParDo) > >>>>>>> - Rows where "support" doesn't really make sense (Composite > >>>>> transforms) > >>>>>>> - Rows are actually the same model feature (non-merging > >> windowfns) > >>>>>>> - Rows that represent optimizations (Combine) > >>>>>>> - Rows in the wrong place (Timers) > >>>>>>> - Rows have not been designed ([Meta]Data driven triggers) > >>>>>>> - Rows with names that appear no where else (Timestamp control) > >>>>>>> - No place to compare non-model differences between runners > >>>>>>> > >>>>>>> I'm still pondering how to improve this, but I thought I'd send > >> the > >>>>>> notion > >>>>>>> out for discussion. Some imperfect ideas I've had: > >>>>>>> > >>>>>>> 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into > >>>> one > >>>>>> row > >>>>>>> 2. Make sections as users see them, like "ParDo" / "side Inputs" > >> not > >>>>>>> "What?" / "side inputs" > >>>>>>> 3. Add rows for non-model things, like portability framework > >>>> support, > >>>>>>> metrics backends, etc > >>>>>>> 4. Drop rows that are not informative, like Composite transforms, > >> or > >>>>> not > >>>>>>> designed > >>>>>>> 5. Reorganize the windowing section to be just support for > >> merging / > >>>>>>> non-merging windowing. > >>>>>>> 6. Switch to a more distinct color scheme than the solid vs faded > >>>>> colors > >>>>>>> currently used. > >>>>>>> 7. Find a web design to get short descriptions into the foreground > >>>> to > >>>>>> make > >>>>>>> it easier to grok. > >>>>>>> > >>>>>>> These are just a few thoughts, and not necessarily compatible with > >>>> each > >>>>>>> other. What do you think? > >>>>>>> > >>>>>>> Kenn > >>>>>>> > >>>>>> -- > >>>>>> Thanks, > >>>>>> > >>>>>> Jesse > >>>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > > > > > -- > > ---- > > Mingmin > >