I think a focus on the runners is what's key to Beam's adoption. The runners are the foundation on which Beam sits. If the runners don't work properly, Beam won't work.
A focus on improved unit tests is a good start, but isn't what's needed. Compatibility matrices will help see how your runner of choice stacks up, but that requires too much knowledge of Beam's internals to be interpretable. Imagine you're an (enterprise) architect looking at adopting Beam. What do you look at or what do you look for before going deeper? What would make you stick your neck out to adopt Beam? For my experience, there are several/pass fails along the way. Here are a few of the common ones I've seen: - Will this dramatically improve the problems I'm trying to solve? (not writing APIs/better programming model/Beam's better handling of windowing) - Can I get commercial support for Beam? (This is changing soon) - Are other people using Beam with the configuration and use case as me? (e.g. I'm using Spark with Beam to process imagery. Are others doing this in production?) - Is there good documentation and books on the subject? (Tyler's and others' book will improve this) - Can I get my team trained on this new technology? (I have Beam training and Google has some cursory training) I think the one the community can improve on the most is the social proof of Beam. I've tried to do this ( http://www.jesse-anderson.com/2017/06/beam-2-0-q-and-a/ and http://www.jesse-anderson.com/2016/07/question-and-answers-with-the-apache-beam-team/). We need to get the message out more about people using Beam in production, which configuration they have, and what their results were. I think we have the social proof on Dataflow, but not as much on Spark/Flink/Apex. I think it's important to note that these checks don't look at the hardcore language or API semantics that we're working on. These are much later stage issues, if they're ever used at all. In my experience with other open source adoption at enterprises, it starts with architects and works its way around the organization from there. Thanks, Jesse On Mon, Jan 15, 2018 at 8:14 AM Ted Yu <[email protected]> wrote: > bq. are hard to detect in our unit-test framework > > Looks like more integration tests would help discover bug / regression > more quickly. If committer reviewing the PR has concern in this regard, the > concern should be stated on the PR so that the contributor (and reviewer) > can spend more time in solidifying the solution. > > bq. I've gone and fixed these issues myself when merging > > We can make stricter checkstyle rules so that the code wouldn't pass build > without addressing commonly known issues. > > Cheers > > On Sun, Jan 14, 2018 at 12:37 PM, Reuven Lax <[email protected]> wrote: > >> I agree with the sentiment, but I don't completely agree with the >> criteria. >> >> I think we need to be much better about reviewing PRs. Some PRs languish >> for too long before the reviewer gets to it (and I've been guilty of this >> too), which does not send a good message. Also new PRs sometimes languish >> because there is no reviewer assigned; maybe we could write a gitbot to >> automatically assign a reviewer to every new PR? >> >> Also, I think that the bar for merging a PR from a contributor should not >> be "the PR is perfect." It's perfectly fine to merge a PR that still has >> some issues (especially if the issues are stylistic). In the past when I've >> done this, I've gone and fixed these issues myself when merging. It was a >> bit more work for me to fix these things myself, but it was a small price >> to pay in order to portray Beam as a welcoming place for contributions. >> >> On the other hand, "the build does not break" is - in my opinion - too >> weak of a criterion for merging. A few reasons for this: >> >> * Beam is a data-processing framework, and data integrity is paramount. >> If a reviewer sees an issue that could lead to data loss (or duplication, >> or corruption), I don't think that PR should be merged. Historically many >> such issues only actually manifest at scale, and are hard to detect in our >> unit-test framework. (we also need to invest in more at-scale tests to >> catch such issues). >> >> * Beam guarantees backwards compatibility for users (except across >> major versions). If a bad API gets merged and released (and the chances of >> "forgetting" about it before the release is cut is unfortunately high), we >> are stuck with it. This is less of an issue for many other open-source >> projects that do not make such a compatibility guarantee, as they are able >> to simply remove or fix the API in the next version. >> >> I think we still need honest review of PRs, with the criteria being >> stronger than "the build doesn't break." However reviewers also need to be >> reasonable about what they ask for. >> >> Reuven >> >> On Sun, Jan 14, 2018 at 11:19 AM, Ted Yu <[email protected]> wrote: >> >>> bq. if a PR is basically right (it does what it should) without >>> breaking the build, then it has to be merged fast >>> >>> +1 on above. >>> This would give contributors positive feedback. >>> >>> On Sun, Jan 14, 2018 at 8:13 AM, Jean-Baptiste Onofré <[email protected]> >>> wrote: >>> >>>> Hi Davor, >>>> >>>> Thanks a lot for this e-mail. >>>> >>>> I would like to emphasize two areas where we have to improve: >>>> >>>> 1. Apache way and community. We still have to focus and being dedicated >>>> on our communities (both user & dev). Helping, encouraging, growing our >>>> communities is key for the project. Building bridges between communities is >>>> also very important. We have to be more "accessible": sometime simplifying >>>> our discussions, showing more interest and open minded in the proposals >>>> would help as well. I think we do a good job already: we just have to >>>> improve. >>>> >>>> 2. Execution: a successful project is a project with a regular activity >>>> in term of releases, fixes, improvements. >>>> Regarding the PR, I think today we have a PR opened for long. And I >>>> think for three reasons: >>>> - some are not ready, not good enough, no question on these ones >>>> - some needs reviewer and speed up: we have to be careful on the open >>>> PRs and review asap >>>> - some are under review but we have a lot of "ping pong" and long >>>> discussion, not always justified. I already said that on the mailing list >>>> but, as for other Apache projects, if a PR is basically right (it does what >>>> it should) without breaking the build, then it has to be merged fast. If it >>>> requires additional changes (tests, polishing, improvements, ...), then it >>>> can be addressed in new PRs. >>>> As already mentioned in the Beam 2.3.0 thread, we have to adopt a >>>> regular schedule for releases. It's a best effort to have a release every 2 >>>> months, whatever the release will contain. That's essential to maintain a >>>> good activity in the project and for the third party projects using Beam. >>>> >>>> Again, don't get me wrong: we already do a good job ! It's just area >>>> where I think we have to improve. >>>> >>>> Anyway, thanks for all the hard work we are doing all together ! >>>> >>>> Regards >>>> JB >>>> >>>> >>>> On 13/01/2018 05:12, Davor Bonaci wrote: >>>> >>>>> Hi everyone -- >>>>> Apache Beam was established as a top-level project a year ago (on >>>>> December 21, to be exact). This first anniversary is a great opportunity >>>>> for us to look back at the past year, celebrate its successes, learn from >>>>> any mistakes we have made, and plan for the next 1+ years. >>>>> >>>>> I’d like to invite everyone in the community, particularly users and >>>>> observers on this mailing list, to participate in this discussion. Apache >>>>> Beam is your project and I, for one, would much appreciate your candid >>>>> thoughts and comments. Just as some other projects do, I’d like to make >>>>> this “state of the project” discussion an annual tradition in this >>>>> community. >>>>> >>>>> In terms of successes, the availability of the first stable release, >>>>> version 2.0.0, was the biggest and most important milestone last year. >>>>> Additionally, we have expanded the project’s breadth with new components, >>>>> including several new runners, SDKs, and DSLs, and interconnected a large >>>>> number of storage/messaging systems with new Beam IOs. In terms of >>>>> community growth, crossing 200 lifetime individual contributors and >>>>> achieving 76 contributors to a single release were other highlights. We >>>>> have doubled the number of committers, and invited a handful of new PMC >>>>> members. Thanks to each and every one of you for making all of this >>>>> possible in our first year. >>>>> >>>>> On the other hand, in such a young project as Beam, there are >>>>> naturally many areas for improvement. This is the principal purpose of >>>>> this >>>>> thread (and any of its forks). To organize the separate discussions, I’d >>>>> suggest to fork separate threads for different discussion areas: >>>>> * Culture and governance (anything related to people and their >>>>> behavior) >>>>> * Community growth (what can we do to further grow a diverse and >>>>> vibrant community) >>>>> * Technical execution (anything related to releases, their frequency, >>>>> website, infrastructure) >>>>> * Feature roadmap for 2018 (what can we do to make the project more >>>>> attractive to users, Beam 3.0, etc.). >>>>> >>>>> I know many passionate folks who particularly care about each of these >>>>> areas, but let me call on some folks from the community to get things >>>>> started: Ismael for culture, Gris for community, JB for technical >>>>> execution, and Ben for feature roadmap. >>>>> >>>>> Perhaps we can use this thread to discuss project-wide vision. To seed >>>>> that discussion, I’d start somewhat provocatively -- we aren’t doing so >>>>> well on the diversity of users across runners, which is very important to >>>>> the realization of the project’s vision. Would you agree, and would you be >>>>> willing to make it the project’s #1 priority for the next 1-2 years? >>>>> >>>>> Thanks -- and please join us in what would hopefully be a productive >>>>> and informative discussion that shapes the future of this project! >>>>> >>>>> Davor >>>>> >>>> >>> >> >
