Re: [DISCUSS] Capability Matrix revamp

Lukasz Cwik Mon, 28 Aug 2017 09:23:45 -0700

I agree with you Aljoscha, a data driven approach of what features work
based upon test results being summarized and which ones scale based upon
benchmarks seems like a great way to differentiate runners strengths.


On Mon, Aug 28, 2017 at 8:39 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> I like where this is going!
>
> Regarding benchmarking, I think we could do this if we had common
> benchmarking infrastructure and pipelines that regularly run on different
> Runners so that we have up-to-date data.
>
> I think we can also have a more technical section where we show stats on
> the level of support via the excluded ValidatesRunner tests. This is hard
> data that we have on every Runner and we can annotate it to explain why a
> certain Runner has a given restriction. This is a bit different from what
> Kenn initially suggested but I think we should have both. Plus, this very
> clearly specifies what feature is (somewhat) validated to work in a given
> Runner.
>
> Regarding PCollectionView support in Flink, I think this actually works
> and the ValidatesRunner tests pass for this. Not sure what is going on in
> that test case yet. For reference, this is the Issue:
> https://issues.apache.org/jira/browse/BEAM-2806 <
> https://issues.apache.org/jira/browse/BEAM-2806>
>
> Best,
> Aljoscha
>
> > On 23. Aug 2017, at 21:24, Mingmin Xu <mingm...@gmail.com> wrote:
> >
> > I would like to have an API compatibility testing. AFAIK there's still
> gap
> > to achieve our goal (one job for any runner), that means developers
> should
> > notice the limitation when writing the job. For example PCollectionView
> is
> > not well supported in FlinkRunner(not quite sure the current status as my
> > test job is broken)/SparkRunner streaming.
> >
> >> 5. Reorganize the windowing section to be just support for merging /
> > non-merging windowing.
> > sliding/fix_window/session is more straightforward to me,
> > merging/non-merging is more about the backend implementation.
> >
> >
> > On Tue, Aug 22, 2017 at 7:28 PM, Kenneth Knowles <k...@google.com.invalid
> >
> > wrote:
> >
> >> Oh, I missed
> >>
> >> 11. Quantitative properties. This seems like an interesting and
> important
> >> project all on its own. Since Beam is so generic, we need pretty diverse
> >> measurements for a user to have a hope of extrapolating to their use
> case.
> >>
> >> Kenn
> >>
> >> On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <k...@google.com>
> wrote:
> >>
> >>> OK, so adding these good ideas to the list:
> >>>
> >>> 8. Plain-English summary that comes before the nitty-gritty.
> >>> 9. Comment on production readiness from maintainers. Maybe testimonials
> >>> are helpful if they can be obtained?
> >>> 10. Versioning of all of the above
> >>>
> >>> Any more thoughts? I'll summarize in a JIRA in a bit.
> >>>
> >>> Kenn
> >>>
> >>> On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas
> >> <g...@google.com.invalid
> >>>> wrote:
> >>>
> >>>> Hi, I'd also like to ask if versioning as proposed in BEAM-166 <
> >>>> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If
> >> it
> >>>> is, would this be something we want to add to this proposal?
> >>>>
> >>>> G
> >>>>
> >>>> On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid>
> >>>> wrote:
> >>>>
> >>>>> Is there any way we could add quantitative runner metrics to this as
> >>>> well?
> >>>>> Like by having some benchmarks that process X amount of data, and
> then
> >>>>> detailing in the matrix latency, throughput, and (where possible)
> >> cost,
> >>>>> etc, numbers for each of the given runners? Semantic support is one
> >>>> thing,
> >>>>> but there are other differences between runners that aren't captured
> >> by
> >>>>> just checking feature boxes. I'd be curious if anyone has other ideas
> >> in
> >>>>> this vein as well. The benchmark idea might not be the best way to go
> >>>> about
> >>>>> it.
> >>>>>
> >>>>> -Tyler
> >>>>>
> >>>>> On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson <
> >>>> je...@bigdatainstitute.io>
> >>>>> wrote:
> >>>>>
> >>>>>> It'd be awesome to see these updated. I'd add two more:
> >>>>>>
> >>>>>>   1. A plain English summary of the runner's support in Beam.
> >> People
> >>>> who
> >>>>>>   are new to Beam won't understand the in-depth coverage and need a
> >>>>>> general
> >>>>>>   idea of how it is supported.
> >>>>>>   2. The production readiness of the runner. Does the maintainer
> >>>> think
> >>>>>>   this runner is production ready?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles
> >>>> <k...@google.com.invalid>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I want to revamp
> >>>>>>> https://beam.apache.org/documentation/runners/capability-matrix/
> >>>>>>>
> >>>>>>> When Beam first started, we didn't work on feature branches for
> >> the
> >>>>> core
> >>>>>>> runners, and they had a lot more gaps compared to what goes on
> >>>> `master`
> >>>>>>> today, so this tracked our progress in a way that was easy for
> >>>> users to
> >>>>>>> read. Now it is still our best/only comparison page for users,
> >> but I
> >>>>>> think
> >>>>>>> we could improve its usefulness.
> >>>>>>>
> >>>>>>> For the benefit of the thread, let me inline all the capabilities
> >>>> fully
> >>>>>>> here:
> >>>>>>>
> >>>>>>> ========================
> >>>>>>>
> >>>>>>> "What is being computed?"
> >>>>>>> - ParDo
> >>>>>>> - GroupByKey
> >>>>>>> - Flatten
> >>>>>>> - Combine
> >>>>>>> - Composite Transforms
> >>>>>>> - Side Inputs
> >>>>>>> - Source API
> >>>>>>> - Splittable DoFn
> >>>>>>> - Metrics
> >>>>>>> - Stateful Processing
> >>>>>>>
> >>>>>>> "Where in event time?"
> >>>>>>> - Global windows
> >>>>>>> - Fixed windows
> >>>>>>> - Sliding windows
> >>>>>>> - Session windows
> >>>>>>> - Custom windows
> >>>>>>> - Custom merging windows
> >>>>>>> - Timestamp control
> >>>>>>>
> >>>>>>> "When in processing time?"
> >>>>>>> - Configurable triggering
> >>>>>>> - Event-time triggers
> >>>>>>> - Processing-time triggers
> >>>>>>> - Count triggers
> >>>>>>> - [Meta]data driven triggers
> >>>>>>> - Composite triggers
> >>>>>>> - Allowed lateness
> >>>>>>> - Timers
> >>>>>>>
> >>>>>>> "How do refinements relate?"
> >>>>>>> - Discarding
> >>>>>>> - Accumulating
> >>>>>>> - Accumulating & Retracting
> >>>>>>>
> >>>>>>> ========================
> >>>>>>>
> >>>>>>> Here are some issues I'd like to improve:
> >>>>>>>
> >>>>>>> - Rows that are impossible to not support (ParDo)
> >>>>>>> - Rows where "support" doesn't really make sense (Composite
> >>>>> transforms)
> >>>>>>> - Rows are actually the same model feature (non-merging
> >> windowfns)
> >>>>>>> - Rows that represent optimizations (Combine)
> >>>>>>> - Rows in the wrong place (Timers)
> >>>>>>> - Rows have not been designed ([Meta]Data driven triggers)
> >>>>>>> - Rows with names that appear no where else (Timestamp control)
> >>>>>>> - No place to compare non-model differences between runners
> >>>>>>>
> >>>>>>> I'm still pondering how to improve this, but I thought I'd send
> >> the
> >>>>>> notion
> >>>>>>> out for discussion. Some imperfect ideas I've had:
> >>>>>>>
> >>>>>>> 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into
> >>>> one
> >>>>>> row
> >>>>>>> 2. Make sections as users see them, like "ParDo" / "side Inputs"
> >> not
> >>>>>>> "What?" / "side inputs"
> >>>>>>> 3. Add rows for non-model things, like portability framework
> >>>> support,
> >>>>>>> metrics backends, etc
> >>>>>>> 4. Drop rows that are not informative, like Composite transforms,
> >> or
> >>>>> not
> >>>>>>> designed
> >>>>>>> 5. Reorganize the windowing section to be just support for
> >> merging /
> >>>>>>> non-merging windowing.
> >>>>>>> 6. Switch to a more distinct color scheme than the solid vs faded
> >>>>> colors
> >>>>>>> currently used.
> >>>>>>> 7. Find a web design to get short descriptions into the foreground
> >>>> to
> >>>>>> make
> >>>>>>> it easier to grok.
> >>>>>>>
> >>>>>>> These are just a few thoughts, and not necessarily compatible with
> >>>> each
> >>>>>>> other. What do you think?
> >>>>>>>
> >>>>>>> Kenn
> >>>>>>>
> >>>>>> --
> >>>>>> Thanks,
> >>>>>>
> >>>>>> Jesse
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > ----
> > Mingmin
>
>

Re: [DISCUSS] Capability Matrix revamp

Reply via email to