Seems reasonable, and the timeline Davor suggests makes a lot of sense.

On Tue, Jan 17, 2017 at 3:59 PM, Lukasz Cwik <lc...@google.com.invalid>
wrote:

> I'm also for merging to master.
>
> On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > It makes sense to merge after 0.5.0 release.
> >
> > Good point Davor: +1
> >
> > Regards
> > JB
> >
> >
> > On 01/17/2017 03:34 PM, Davor Bonaci wrote:
> >
> >> +1. I think merging to master would be an awesome next step for the
> Python
> >> SDK.
> >>
> >> And, thanks for a great summary of the current state, roadmap, and
> impact
> >> to the project as a whole -- awesome!
> >>
> >> Process-wise, I'd suggest starting a formal vote once this discussion
> >> seems
> >> to be trending towards a conclusion, and complete the merge as soon as
> the
> >> next release (0.5.0) is cut. This would enable additional time before
> >> 0.6.0
> >> to figure out compliance, release process impact, etc.
> >>
> >> Great work everyone!
> >>
> >> On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >> Hi
> >>>
> >>> I didn't try the Python SDK recently but you provided a clear "state of
> >>> the art". Anyway I'm in favor of merging things as quick as possible
> >>> (assuming it's in a good shape in term of build, test, ...): it would
> >>> potentially grow up the "external" contributions.
> >>>
> >>> So +1 from my side.
> >>>
> >>> Regards
> >>> JB⁣​
> >>>
> >>> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay <al...@google.com.INVALID
> >
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> tl;dr: I would like to start a discussion about merging python-sdk
> >>>> branch
> >>>> to master branch. Python SDK is mature enough and merging it to master
> >>>> will
> >>>> accelerate its development and adoption.
> >>>>
> >>>> With a great effort from a lot of contributors(*), Python SDK [1] is
> >>>> now a
> >>>> mostly complete, tested, performant Python implementation of the Beam
> >>>> model. Since June, when we first started with Python SDK in Apache
> Beam
> >>>> we
> >>>> have been continuously improving it.
> >>>>
> >>>> ** Python SDK currently supports:
> >>>>
> >>>> * Model: All main concepts are present (ParDo, GroupByKey, Windowing
> >>>> etc.).
> >>>> * IO: There are extensible APIs for writing new bounded sources and
> >>>> sinks.
> >>>> Implementations are provided for Text, Avro, BigQuery, and Datastore.
> >>>> * Runners: Python SDK has an extensible base runner module that allows
> >>>> building specific runners on top of it. The SDK comes with two
> pipeline
> >>>> runners: DirectRunner and DataflowRunner; and it is possible to add
> >>>> more.
> >>>> The existing runners are currently limited to bounded execution and
> >>>> otherwise equivalent to their Java SDK counterparts in functionality.
> >>>> * Testing: Python SDK implements ValidatesRunner test framework for
> >>>> implementing integration test for current and future runners. There is
> >>>> unit
> >>>> test coverage for all modules, and a number of integrations test for
> >>>> validating existing runners.
> >>>> * Documentation and examples: Documentation work has started on Python
> >>>> SDK.
> >>>> Beam Programming Guide page has been updated to include Python [2].
> The
> >>>> code comes with many ready to use examples and we are in a good place
> >>>> to
> >>>> start documenting those on the website.
> >>>>
> >>>> ** We are not done yet, next on the roadmap we have:
> >>>>
> >>>> * Streaming: Both of the existing runners lack support for streaming
> >>>> execution, and currently there is work going on for adding streaming
> >>>> support to DirectRunner [3].
> >>>> * Documentation: Filling the rest of the Beam documentations with
> >>>> Python
> >>>> SDK specific information and examples.
> >>>> * SDK consistency: Making Python SDK consistent with the Java SDK. We
> >>>> have
> >>>> come a long way on this and have only a few items left [4].
> >>>> * Beamifying: We have been working on removing Dataflow-specific
> >>>> references
> >>>> both from the documentation and from the code. There is some work
> left,
> >>>> and
> >>>> we are currently working on those as well [5].
> >>>>
> >>>> ** Steps and implications of merging to master:
> >>>>
> >>>> * Master branch is merged to python-sdk branch at regular intervals
> and
> >>>> the
> >>>> last merge was on 12/22. All the past merges were uneventful because
> >>>> there
> >>>> is a minimal overlap in modified files between branches. Integrating
> >>>> python-sdk to master will similarly touch a small number of existing
> >>>> files.
> >>>>
> >>>> * Python SDK is using the same tools for building and testing. It is
> >>>> already integrated with Maven, Jenkins and Travis. Specifically the
> >>>> impact
> >>>> to the testing infrastructure would be:
> >>>> - There will be two additional test configurations in Travis. Since
> >>>> Travis
> >>>> runs all configurations in parallel there should not be a noticeable
> >>>> change
> >>>> in the Travis run time.
> >>>> - Jenkins pre-commit test will start running the Python SDK tests. It
> >>>> will
> >>>> add an additional 5 minutes to the completion time of pre-commit test.
> >>>> Historically Python SDK tests were not flaky and did not cause any
> >>>> random
> >>>> failures.
> >>>> - Jenkins Python post-commit test is already separated from the other
> >>>> post-commit tests and will continue to exist. It would not change the
> >>>> testing time for any other test.
> >>>>
> >>>> * The release process needs to be updated to accommodate releasing
> >>>> Python
> >>>> artifacts. Python SDK would fit in the existing release schedule and
> >>>> could
> >>>> be released along with the Java SDK. The additional steps would
> >>>> include:
> >>>> - Generating Python artifacts. This could be done with a single
> command
> >>>> using Maven today.
> >>>> - Publishing the artifacts to a central repository such as PyPI.
> >>>> - Updating the release guide to reflect the changes above.
> >>>>
> >>>> * Users: There are existing users using the Python SDK. To give a
> rough
> >>>> estimate, a distribution of the Beam Python SDK had a total of 23K
> >>>> downloads in the past 6 months [6]. Some of those users are already
> >>>> engaged
> >>>> with the community (e.g. [7]). There might be an increased amount
> >>>> engagement from the rest of them after the merge.
> >>>>
> >>>> Looking forward to hearing your thoughts and comments on “graduating”
> >>>> python-sdk to the master.
> >>>>
> >>>> Thank you,
> >>>> Ahmet
> >>>>
> >>>> (*) Python SDK branch currently has a diverse group of contributors.
> >>>> Regular contributors include Charles Chen, Chamikara Jayalath, María
> >>>> García
> >>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam PMC),
> >>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had contributions
> >>>> from
> >>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, and
> >>>> Younghee Kwon.
> >>>>
> >>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python
> >>>> [2] https://beam.apache.org/documentation/programming-guide/
> >>>> [3] https://issues.apache.org/jira/browse/BEAM-1265
> >>>> [4]
> >>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%
> >>>>
> >>> 20Open%20AND%20labels%20%3D%20sdk-consistency
> >>>
> >>>> [5] https://issues.apache.org/jira/browse/BEAM-1218
> >>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json
> >>>> [7] https://issues.apache.org/jira/browse/BEAM-1251
> >>>>
> >>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Reply via email to