Oh, I missed 11. Quantitative properties. This seems like an interesting and important project all on its own. Since Beam is so generic, we need pretty diverse measurements for a user to have a hope of extrapolating to their use case.
Kenn On Tue, Aug 22, 2017 at 7:22 PM, Kenneth Knowles <[email protected]> wrote: > OK, so adding these good ideas to the list: > > 8. Plain-English summary that comes before the nitty-gritty. > 9. Comment on production readiness from maintainers. Maybe testimonials > are helpful if they can be obtained? > 10. Versioning of all of the above > > Any more thoughts? I'll summarize in a JIRA in a bit. > > Kenn > > On Tue, Aug 22, 2017 at 10:45 AM, Griselda Cuevas <[email protected] > > wrote: > >> Hi, I'd also like to ask if versioning as proposed in BEAM-166 < >> https://issues.apache.org/jira/browse/BEAM-166> is still relevant? If it >> is, would this be something we want to add to this proposal? >> >> G >> >> On 21 August 2017 at 08:31, Tyler Akidau <[email protected]> >> wrote: >> >> > Is there any way we could add quantitative runner metrics to this as >> well? >> > Like by having some benchmarks that process X amount of data, and then >> > detailing in the matrix latency, throughput, and (where possible) cost, >> > etc, numbers for each of the given runners? Semantic support is one >> thing, >> > but there are other differences between runners that aren't captured by >> > just checking feature boxes. I'd be curious if anyone has other ideas in >> > this vein as well. The benchmark idea might not be the best way to go >> about >> > it. >> > >> > -Tyler >> > >> > On Sun, Aug 20, 2017 at 9:43 AM Jesse Anderson < >> [email protected]> >> > wrote: >> > >> > > It'd be awesome to see these updated. I'd add two more: >> > > >> > > 1. A plain English summary of the runner's support in Beam. People >> who >> > > are new to Beam won't understand the in-depth coverage and need a >> > > general >> > > idea of how it is supported. >> > > 2. The production readiness of the runner. Does the maintainer >> think >> > > this runner is production ready? >> > > >> > > >> > > >> > > On Sun, Aug 20, 2017 at 8:03 AM Kenneth Knowles >> <[email protected]> >> > > wrote: >> > > >> > > > Hi all, >> > > > >> > > > I want to revamp >> > > > https://beam.apache.org/documentation/runners/capability-matrix/ >> > > > >> > > > When Beam first started, we didn't work on feature branches for the >> > core >> > > > runners, and they had a lot more gaps compared to what goes on >> `master` >> > > > today, so this tracked our progress in a way that was easy for >> users to >> > > > read. Now it is still our best/only comparison page for users, but I >> > > think >> > > > we could improve its usefulness. >> > > > >> > > > For the benefit of the thread, let me inline all the capabilities >> fully >> > > > here: >> > > > >> > > > ======================== >> > > > >> > > > "What is being computed?" >> > > > - ParDo >> > > > - GroupByKey >> > > > - Flatten >> > > > - Combine >> > > > - Composite Transforms >> > > > - Side Inputs >> > > > - Source API >> > > > - Splittable DoFn >> > > > - Metrics >> > > > - Stateful Processing >> > > > >> > > > "Where in event time?" >> > > > - Global windows >> > > > - Fixed windows >> > > > - Sliding windows >> > > > - Session windows >> > > > - Custom windows >> > > > - Custom merging windows >> > > > - Timestamp control >> > > > >> > > > "When in processing time?" >> > > > - Configurable triggering >> > > > - Event-time triggers >> > > > - Processing-time triggers >> > > > - Count triggers >> > > > - [Meta]data driven triggers >> > > > - Composite triggers >> > > > - Allowed lateness >> > > > - Timers >> > > > >> > > > "How do refinements relate?" >> > > > - Discarding >> > > > - Accumulating >> > > > - Accumulating & Retracting >> > > > >> > > > ======================== >> > > > >> > > > Here are some issues I'd like to improve: >> > > > >> > > > - Rows that are impossible to not support (ParDo) >> > > > - Rows where "support" doesn't really make sense (Composite >> > transforms) >> > > > - Rows are actually the same model feature (non-merging windowfns) >> > > > - Rows that represent optimizations (Combine) >> > > > - Rows in the wrong place (Timers) >> > > > - Rows have not been designed ([Meta]Data driven triggers) >> > > > - Rows with names that appear no where else (Timestamp control) >> > > > - No place to compare non-model differences between runners >> > > > >> > > > I'm still pondering how to improve this, but I thought I'd send the >> > > notion >> > > > out for discussion. Some imperfect ideas I've had: >> > > > >> > > > 1. Lump all the basic stuff (ParDo, GroupByKey, Read, Window) into >> one >> > > row >> > > > 2. Make sections as users see them, like "ParDo" / "side Inputs" not >> > > > "What?" / "side inputs" >> > > > 3. Add rows for non-model things, like portability framework >> support, >> > > > metrics backends, etc >> > > > 4. Drop rows that are not informative, like Composite transforms, or >> > not >> > > > designed >> > > > 5. Reorganize the windowing section to be just support for merging / >> > > > non-merging windowing. >> > > > 6. Switch to a more distinct color scheme than the solid vs faded >> > colors >> > > > currently used. >> > > > 7. Find a web design to get short descriptions into the foreground >> to >> > > make >> > > > it easier to grok. >> > > > >> > > > These are just a few thoughts, and not necessarily compatible with >> each >> > > > other. What do you think? >> > > > >> > > > Kenn >> > > > >> > > -- >> > > Thanks, >> > > >> > > Jesse >> > > >> > >> > >
