Re: [DISCUSS] Capability Matrix revamp

Kenneth Knowles Tue, 22 Aug 2017 19:29:33 -0700

Oh, I missed

11. Quantitative properties. This seems like an interesting and important
project all on its own. Since Beam is so generic, we need pretty diverse
measurements for a user to have a hope of extrapolating to their use case.


Kenn

On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <k...@google.com> wrote:

> OK, so adding these good ideas to the list:
>
> 8. Plain-English summary that comes before the nitty-gritty.
> 9. Comment on production readiness from maintainers. Maybe testimonials
> are helpful if they can be obtained?
> 10. Versioning of all of the above
>
> Any more thoughts? I'll summarize in a JIRA in a bit.
>
> Kenn
>
> On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas <g...@google.com.invalid
> > wrote:
>
>> Hi, I'd also like to ask if versioning as proposed in BEAM-166 <
>> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If it
>> is, would this be something we want to add to this proposal?
>>
>> G
>>
>> On 21 August 2017 at 08:31, Tyler Akidau <taki...@google.com.invalid>
>> wrote:
>>
>> > Is there any way we could add quantitative runner metrics to this as
>> well?
>> > Like by having some benchmarks that process X amount of data, and then
>> > detailing in the matrix latency, throughput, and (where possible) cost,
>> > etc, numbers for each of the given runners? Semantic support is one
>> thing,
>> > but there are other differences between runners that aren't captured by
>> > just checking feature boxes. I'd be curious if anyone has other ideas in
>> > this vein as well. The benchmark idea might not be the best way to go
>> about
>> > it.
>> >
>> > -Tyler
>> >
>> > On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson <
>> je...@bigdatainstitute.io>
>> > wrote:
>> >
>> > > It'd be awesome to see these updated. I'd add two more:
>> > >
>> > >    1. A plain English summary of the runner's support in Beam. People
>> who
>> > >    are new to Beam won't understand the in-depth coverage and need a
>> > > general
>> > >    idea of how it is supported.
>> > >    2. The production readiness of the runner. Does the maintainer
>> think
>> > >    this runner is production ready?
>> > >
>> > >
>> > >
>> > > On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles
>> <k...@google.com.invalid>
>> > > wrote:
>> > >
>> > > > Hi all,
>> > > >
>> > > > I want to revamp
>> > > > https://beam.apache.org/documentation/runners/capability-matrix/
>> > > >
>> > > > When Beam first started, we didn't work on feature branches for the
>> > core
>> > > > runners, and they had a lot more gaps compared to what goes on
>> `master`
>> > > > today, so this tracked our progress in a way that was easy for
>> users to
>> > > > read. Now it is still our best/only comparison page for users, but I
>> > > think
>> > > > we could improve its usefulness.
>> > > >
>> > > > For the benefit of the thread, let me inline all the capabilities
>> fully
>> > > > here:
>> > > >
>> > > > ========================
>> > > >
>> > > > "What is being computed?"
>> > > >  - ParDo
>> > > >  - GroupByKey
>> > > >  - Flatten
>> > > >  - Combine
>> > > >  - Composite Transforms
>> > > >  - Side Inputs
>> > > >  - Source API
>> > > >  - Splittable DoFn
>> > > >  - Metrics
>> > > >  - Stateful Processing
>> > > >
>> > > > "Where in event time?"
>> > > >  - Global windows
>> > > >  - Fixed windows
>> > > >  - Sliding windows
>> > > >  - Session windows
>> > > >  - Custom windows
>> > > >  - Custom merging windows
>> > > >  - Timestamp control
>> > > >
>> > > > "When in processing time?"
>> > > >  - Configurable triggering
>> > > >  - Event-time triggers
>> > > >  - Processing-time triggers
>> > > >  - Count triggers
>> > > >  - [Meta]data driven triggers
>> > > >  - Composite triggers
>> > > >  - Allowed lateness
>> > > >  - Timers
>> > > >
>> > > > "How do refinements relate?"
>> > > >  - Discarding
>> > > >  - Accumulating
>> > > >  - Accumulating & Retracting
>> > > >
>> > > > ========================
>> > > >
>> > > > Here are some issues I'd like to improve:
>> > > >
>> > > >  - Rows that are impossible to not support (ParDo)
>> > > >  - Rows where "support" doesn't really make sense (Composite
>> > transforms)
>> > > >  - Rows are actually the same model feature (non-merging windowfns)
>> > > >  - Rows that represent optimizations (Combine)
>> > > >  - Rows in the wrong place (Timers)
>> > > >  - Rows have not been designed ([Meta]Data driven triggers)
>> > > >  - Rows with names that appear no where else (Timestamp control)
>> > > >  - No place to compare non-model differences between runners
>> > > >
>> > > > I'm still pondering how to improve this, but I thought I'd send the
>> > > notion
>> > > > out for discussion. Some imperfect ideas I've had:
>> > > >
>> > > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into
>> one
>> > > row
>> > > > 2. Make sections as users see them, like "ParDo" / "side Inputs" not
>> > > > "What?" / "side inputs"
>> > > > 3. Add rows for non-model things, like portability framework
>> support,
>> > > > metrics backends, etc
>> > > > 4. Drop rows that are not informative, like Composite transforms, or
>> > not
>> > > > designed
>> > > > 5. Reorganize the windowing section to be just support for merging /
>> > > > non-merging windowing.
>> > > > 6. Switch to a more distinct color scheme than the solid vs faded
>> > colors
>> > > > currently used.
>> > > > 7. Find a web design to get short descriptions into the foreground
>> to
>> > > make
>> > > > it easier to grok.
>> > > >
>> > > > These are just a few thoughts, and not necessarily compatible with
>> each
>> > > > other. What do you think?
>> > > >
>> > > > Kenn
>> > > >
>> > > --
>> > > Thanks,
>> > >
>> > > Jesse
>> > >
>> >
>>
>
>

Re: [DISCUSS] Capability Matrix revamp

Reply via email to