It'd be awesome to see these updated. I'd add two more: 1. A plain English summary of the runner's support in Beam. People who are new to Beam won't understand the in-depth coverage and need a general idea of how it is supported. 2. The production readiness of the runner. Does the maintainer think this runner is production ready?
On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles <k...@google.com.invalid> wrote: > Hi all, > > I want to revamp > https://beam.apache.org/documentation/runners/capability-matrix/ > > When Beam first started, we didn't work on feature branches for the core > runners, and they had a lot more gaps compared to what goes on `master` > today, so this tracked our progress in a way that was easy for users to > read. Now it is still our best/only comparison page for users, but I think > we could improve its usefulness. > > For the benefit of the thread, let me inline all the capabilities fully > here: > > ======================== > > "What is being computed?" > - ParDo > - GroupByKey > - Flatten > - Combine > - Composite Transforms > - Side Inputs > - Source API > - Splittable DoFn > - Metrics > - Stateful Processing > > "Where in event time?" > - Global windows > - Fixed windows > - Sliding windows > - Session windows > - Custom windows > - Custom merging windows > - Timestamp control > > "When in processing time?" > - Configurable triggering > - Event-time triggers > - Processing-time triggers > - Count triggers > - [Meta]data driven triggers > - Composite triggers > - Allowed lateness > - Timers > > "How do refinements relate?" > - Discarding > - Accumulating > - Accumulating & Retracting > > ======================== > > Here are some issues I'd like to improve: > > - Rows that are impossible to not support (ParDo) > - Rows where "support" doesn't really make sense (Composite transforms) > - Rows are actually the same model feature (non-merging windowfns) > - Rows that represent optimizations (Combine) > - Rows in the wrong place (Timers) > - Rows have not been designed ([Meta]Data driven triggers) > - Rows with names that appear no where else (Timestamp control) > - No place to compare non-model differences between runners > > I'm still pondering how to improve this, but I thought I'd send the notion > out for discussion. Some imperfect ideas I've had: > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into one row > 2. Make sections as users see them, like "ParDo" / "side Inputs" not > "What?" / "side inputs" > 3. Add rows for non-model things, like portability framework support, > metrics backends, etc > 4. Drop rows that are not informative, like Composite transforms, or not > designed > 5. Reorganize the windowing section to be just support for merging / > non-merging windowing. > 6. Switch to a more distinct color scheme than the solid vs faded colors > currently used. > 7. Find a web design to get short descriptions into the foreground to make > it easier to grok. > > These are just a few thoughts, and not necessarily compatible with each > other. What do you think? > > Kenn > -- Thanks, Jesse