I'm also for merging to master.

On Tue, Jan 17, 2017 at 3:39 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> It makes sense to merge after 0.5.0 release.
>
> Good point Davor: +1
>
> Regards
> JB
>
>
> On 01/17/2017 03:34 PM, Davor Bonaci wrote:
>
>> +1. I think merging to master would be an awesome next step for the Python
>> SDK.
>>
>> And, thanks for a great summary of the current state, roadmap, and impact
>> to the project as a whole -- awesome!
>>
>> Process-wise, I'd suggest starting a formal vote once this discussion
>> seems
>> to be trending towards a conclusion, and complete the merge as soon as the
>> next release (0.5.0) is cut. This would enable additional time before
>> 0.6.0
>> to figure out compliance, release process impact, etc.
>>
>> Great work everyone!
>>
>> On Tue, Jan 17, 2017 at 8:26 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>> Hi
>>>
>>> I didn't try the Python SDK recently but you provided a clear "state of
>>> the art". Anyway I'm in favor of merging things as quick as possible
>>> (assuming it's in a good shape in term of build, test, ...): it would
>>> potentially grow up the "external" contributions.
>>>
>>> So +1 from my side.
>>>
>>> Regards
>>> JB⁣​
>>>
>>> On Jan 17, 2017, 08:22, at 08:22, Ahmet Altay <al...@google.com.INVALID>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> tl;dr: I would like to start a discussion about merging python-sdk
>>>> branch
>>>> to master branch. Python SDK is mature enough and merging it to master
>>>> will
>>>> accelerate its development and adoption.
>>>>
>>>> With a great effort from a lot of contributors(*), Python SDK [1] is
>>>> now a
>>>> mostly complete, tested, performant Python implementation of the Beam
>>>> model. Since June, when we first started with Python SDK in Apache Beam
>>>> we
>>>> have been continuously improving it.
>>>>
>>>> ** Python SDK currently supports:
>>>>
>>>> * Model: All main concepts are present (ParDo, GroupByKey, Windowing
>>>> etc.).
>>>> * IO: There are extensible APIs for writing new bounded sources and
>>>> sinks.
>>>> Implementations are provided for Text, Avro, BigQuery, and Datastore.
>>>> * Runners: Python SDK has an extensible base runner module that allows
>>>> building specific runners on top of it. The SDK comes with two pipeline
>>>> runners: DirectRunner and DataflowRunner; and it is possible to add
>>>> more.
>>>> The existing runners are currently limited to bounded execution and
>>>> otherwise equivalent to their Java SDK counterparts in functionality.
>>>> * Testing: Python SDK implements ValidatesRunner test framework for
>>>> implementing integration test for current and future runners. There is
>>>> unit
>>>> test coverage for all modules, and a number of integrations test for
>>>> validating existing runners.
>>>> * Documentation and examples: Documentation work has started on Python
>>>> SDK.
>>>> Beam Programming Guide page has been updated to include Python [2]. The
>>>> code comes with many ready to use examples and we are in a good place
>>>> to
>>>> start documenting those on the website.
>>>>
>>>> ** We are not done yet, next on the roadmap we have:
>>>>
>>>> * Streaming: Both of the existing runners lack support for streaming
>>>> execution, and currently there is work going on for adding streaming
>>>> support to DirectRunner [3].
>>>> * Documentation: Filling the rest of the Beam documentations with
>>>> Python
>>>> SDK specific information and examples.
>>>> * SDK consistency: Making Python SDK consistent with the Java SDK. We
>>>> have
>>>> come a long way on this and have only a few items left [4].
>>>> * Beamifying: We have been working on removing Dataflow-specific
>>>> references
>>>> both from the documentation and from the code. There is some work left,
>>>> and
>>>> we are currently working on those as well [5].
>>>>
>>>> ** Steps and implications of merging to master:
>>>>
>>>> * Master branch is merged to python-sdk branch at regular intervals and
>>>> the
>>>> last merge was on 12/22. All the past merges were uneventful because
>>>> there
>>>> is a minimal overlap in modified files between branches. Integrating
>>>> python-sdk to master will similarly touch a small number of existing
>>>> files.
>>>>
>>>> * Python SDK is using the same tools for building and testing. It is
>>>> already integrated with Maven, Jenkins and Travis. Specifically the
>>>> impact
>>>> to the testing infrastructure would be:
>>>> - There will be two additional test configurations in Travis. Since
>>>> Travis
>>>> runs all configurations in parallel there should not be a noticeable
>>>> change
>>>> in the Travis run time.
>>>> - Jenkins pre-commit test will start running the Python SDK tests. It
>>>> will
>>>> add an additional 5 minutes to the completion time of pre-commit test.
>>>> Historically Python SDK tests were not flaky and did not cause any
>>>> random
>>>> failures.
>>>> - Jenkins Python post-commit test is already separated from the other
>>>> post-commit tests and will continue to exist. It would not change the
>>>> testing time for any other test.
>>>>
>>>> * The release process needs to be updated to accommodate releasing
>>>> Python
>>>> artifacts. Python SDK would fit in the existing release schedule and
>>>> could
>>>> be released along with the Java SDK. The additional steps would
>>>> include:
>>>> - Generating Python artifacts. This could be done with a single command
>>>> using Maven today.
>>>> - Publishing the artifacts to a central repository such as PyPI.
>>>> - Updating the release guide to reflect the changes above.
>>>>
>>>> * Users: There are existing users using the Python SDK. To give a rough
>>>> estimate, a distribution of the Beam Python SDK had a total of 23K
>>>> downloads in the past 6 months [6]. Some of those users are already
>>>> engaged
>>>> with the community (e.g. [7]). There might be an increased amount
>>>> engagement from the rest of them after the merge.
>>>>
>>>> Looking forward to hearing your thoughts and comments on “graduating”
>>>> python-sdk to the master.
>>>>
>>>> Thank you,
>>>> Ahmet
>>>>
>>>> (*) Python SDK branch currently has a diverse group of contributors.
>>>> Regular contributors include Charles Chen, Chamikara Jayalath, María
>>>> García
>>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam PMC),
>>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had contributions
>>>> from
>>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, and
>>>> Younghee Kwon.
>>>>
>>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python
>>>> [2] https://beam.apache.org/documentation/programming-guide/
>>>> [3] https://issues.apache.org/jira/browse/BEAM-1265
>>>> [4]
>>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%
>>>>
>>> 20Open%20AND%20labels%20%3D%20sdk-consistency
>>>
>>>> [5] https://issues.apache.org/jira/browse/BEAM-1218
>>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json
>>>> [7] https://issues.apache.org/jira/browse/BEAM-1251
>>>>
>>>
>>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to