Thank you Prabeesh and Sergio for fixing those! On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré <j...@nanthrax.net> wrote:
> Awesome, thanks Sergio ! Much appreciated ;) > > Regards > JB > > > On 01/31/2017 01:42 PM, Sergio Fernández wrote: > >> PR #1879 provides the basics: https://github.com/apache/beam/pull/1879 >> >> On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré <j...@nanthrax.net> >> wrote: >> >> No, that's fine as soon as we clearly document the prerequisite for the >>> build. IMHO, we should provide quick BUILDING instructions in the >>> README.md. >>> >>> Regards >>> JB >>> >>> >>> On 01/31/2017 01:24 PM, Sergio Fernández wrote: >>> >>> Originally we integrate the build in Maven with the default profile. >>>> Do you feel like it'd be better to have it under a separated profile or >>>> so? >>>> >>>> On Tue, Jan 31, 2017 at 11:07 AM, Jean-Baptiste Onofré <j...@nanthrax.net >>>> > >>>> wrote: >>>> >>>> Just to be clear, the prerequisite to be able to build the Python SDK >>>> are: >>>> >>>>> >>>>> apt-get install python-setuptools >>>>> apt-get install python-pip >>>>> >>>>> It's also required by the default "regular" build. >>>>> >>>>> Regards >>>>> JB >>>>> >>>>> >>>>> On 01/31/2017 11:02 AM, Jean-Baptiste Onofré wrote: >>>>> >>>>> Just one thing I noticed (and can be helpful for others): to build Beam >>>>> >>>>>> we now need python setuptools installed. >>>>>> >>>>>> For instance, on Ubuntu, you have to do: >>>>>> >>>>>> apt-get install python-setuptools >>>>>> >>>>>> Same for the pip distribution. >>>>>> >>>>>> I guess (if not already done), we have to update README/Building >>>>>> instructions. >>>>>> >>>>>> Correct ? >>>>>> >>>>>> Regards >>>>>> JB >>>>>> >>>>>> On 01/31/2017 08:10 AM, Ahmet Altay wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>>> >>>>>>> This merge is completed. Python SDK is now officially part of the >>>>>>> master >>>>>>> branch! Thank you all for the support. Please open an issue, if you >>>>>>> notice >>>>>>> a reference to the now obsolete python-sdk branch in the >>>>>>> documentation. >>>>>>> >>>>>>> There will not be any more merges to the python-sdk branch. Going >>>>>>> forward >>>>>>> please use the master branch for Python SDK development. There are a >>>>>>> few >>>>>>> existing open PRs to the python-sdk [1]. If you are the author of one >>>>>>> of >>>>>>> those PRs, please rebase them on top of master. >>>>>>> >>>>>>> Thank you, >>>>>>> Ahmet >>>>>>> >>>>>>> [1] https://github.com/pulls?utf8=✓&q=is%3Aopen+is%3Apr+base% >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25> >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25 >>>>>>> > >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25 >>>>>>> > >>>>>>> 3Apython-sdk+repo%3Aapache%2Fbeam+ >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr >>>>>>> +base%3Apython-sdk+repo%3Aapache%2Fbeam+> >>>>>>> >>>>>>> >>>>>>> On Fri, Jan 20, 2017 at 10:06 AM, Kenneth Knowles >>>>>>> <k...@google.com.invalid> >>>>>>> wrote: >>>>>>> >>>>>>> To clarify the implied criteria of that last exchange, it is "An SDK >>>>>>> >>>>>>> should >>>>>>>> have at least one runner that can execute the complete model (may >>>>>>>> be a >>>>>>>> direct runner)" >>>>>>>> >>>>>>>> I want to highlight this, because whether an _SDK_ supports >>>>>>>> unbounded >>>>>>>> data >>>>>>>> is not particularly well-defined, and will evolve: >>>>>>>> >>>>>>>> - With the Runner API, an SDK will need to support building a graph >>>>>>>> with >>>>>>>> unbounded constructs, as today with probably minimal changes. >>>>>>>> >>>>>>>> - With the Fn API, if any part of the Fn API is specific to >>>>>>>> unbounded >>>>>>>> data, the SDK will need to implement it. I think right now there is >>>>>>>> no such >>>>>>>> thing, and we don't want such a thing, so SDKs implementing the Fn >>>>>>>> API >>>>>>>> automatically support unbounded data. >>>>>>>> >>>>>>>> - There will also likely be an SDK-specific shim just as there is >>>>>>>> today, >>>>>>>> to leverage idiomatic deserialized representations. The richness of >>>>>>>> this >>>>>>>> shim will decrease so that it will need to "support" unbounded data >>>>>>>> but >>>>>>>> that will be a ~one liner. >>>>>>>> >>>>>>>> Getting the Python SDK on master will accelerate our progress >>>>>>>> towards >>>>>>>> the >>>>>>>> Fn API - partly technical, partly community - which is the best path >>>>>>>> towards support for unbounded data across multiple runners. I think >>>>>>>> the >>>>>>>> criteria are written with the completed portability framework in >>>>>>>> mind. So >>>>>>>> this exchange makes me actually more convinced we should merge >>>>>>>> python-sdk >>>>>>>> to master. >>>>>>>> >>>>>>>> On Fri, Jan 20, 2017 at 9:53 AM, Robert Bradshaw < >>>>>>>> rober...@google.com.invalid> wrote: >>>>>>>> >>>>>>>> On Thu, Jan 19, 2017 at 11:56 PM, Dan Halperin >>>>>>>> >>>>>>>> <dhalp...@google.com.invalid> wrote: >>>>>>>>> >>>>>>>>> I do not think that Python SDK yet meets the bar [1] for >>>>>>>>> implementing >>>>>>>>> >>>>>>>>>> >>>>>>>>>> the >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Beam model -- supporting Unbounded data is very important. That >>>>>>>> said, >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> given >>>>>>>>>> >>>>>>>>> >>>>>>>>> the committed and sustained set of contributors, it generally makes >>>>>>>>> >>>>>>>>>> >>>>>>>>>> sense >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> to me to make an exception in anticipation of these features being >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> fleshed >>>>>>>>>> >>>>>>>>> >>>>>>>>> out soon; including potentially new users/contributors that would >>>>>>>>> >>>>>>>>>> >>>>>>>>>> arrive >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> once in master. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> [1] https://lists.apache.org/thread.html/CAAzyFAxcmexUQnbF=Y >>>>>>>>>> k0plmm3f5e5bqwjz4+c5doruclnxo...@mail.gmail.com >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> That is a valid point. The Python SDK supports all the unbounded >>>>>>>>> parts >>>>>>>>> of the model except for unbounded sources, which was deferred while >>>>>>>>> seeing how https://s.apache.org/splittable-do-fn played out. I've >>>>>>>>> been >>>>>>>>> working with the team and merging/reviewing most of their code, and >>>>>>>>> have full confidence this will be coming (and on that note can >>>>>>>>> vouch >>>>>>>>> for a healthy community and support which are much harder to add >>>>>>>>> later). >>>>>>>>> >>>>>>>>> In short, I think it has the required maturity, and I'm in favor of >>>>>>>>> merging soonish. >>>>>>>>> >>>>>>>>> On Wed, Jan 18, 2017 at 12:24 AM, Ahmet Altay >>>>>>>>> >>>>>>>>> <al...@google.com.invalid >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank you all for the comments so far. I would follow the process >>>>>>>>>> as >>>>>>>>>> >>>>>>>>>> suggested by Davor and others in this thread. >>>>>>>>>>> >>>>>>>>>>> Ahmet >>>>>>>>>>> >>>>>>>>>>> On Tue, Jan 17, 2017 at 11:47 PM, Sergio Fernández < >>>>>>>>>>> wik...@apache.org >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay >>>>>>>>>>>> >>>>>>>>>>>> <al...@google.com.invalid >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> tl;dr: I would like to start a discussion about merging >>>>>>>>>>>>> python-sdk >>>>>>>>>>>>> >>>>>>>>>>>>> branch >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> to master branch. Python SDK is mature enough and merging it to >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> master >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> will >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> accelerate its development and adoption. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Good point, Ahmet! >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I've following closed the development since it was imported in >>>>>>>>>>>> June. >>>>>>>>>>>> >>>>>>>>>>>> For >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> the prototypes I've implemented so far it works quite well; I >>>>>>>>> guess >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> we'd >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> just need to focus the next months in bringing more runners >>>>>>>>> support. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> With a great effort from a lot of contributors(*), Python SDK [1] >>>>>>>>>>>> is >>>>>>>>>>>> >>>>>>>>>>>> now >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> a >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> mostly complete, tested, performant Python implementation of the >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Beam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> model. Since June, when we first started with Python SDK in >>>>>>>>> Apache >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Beam >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> we >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> have been continuously improving it. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I wouldn't merge during the preparation of 0.5.0 release, but >>>>>>>>>>>>> >>>>>>>>>>>> after >>>>>>>>>>>> >>>>>>>>>>>> that >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> could be a good time to merge back into master. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> ** Python SDK currently supports: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> * Model: All main concepts are present (ParDo, GroupByKey, >>>>>>>>>>>>> >>>>>>>>>>>>> Windowing >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> etc.). >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> * IO: There are extensible APIs for writing new bounded sources >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> sinks. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Implementations are provided for Text, Avro, BigQuery, and >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Datastore. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> * Runners: Python SDK has an extensible base runner module that >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> allows >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> building specific runners on top of it. The SDK comes with two >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> pipeline >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> runners: DirectRunner and DataflowRunner; and it is possible to >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> add >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> more. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> The existing runners are currently limited to bounded execution >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> otherwise equivalent to their Java SDK counterparts in >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> functionality. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>>> What would the effort of porting, and maintaining, parallel >>>>>>>>>>>>> >>>>>>>>>>>> versions >>>>>>>>>>>> >>>>>>>>>>>> of >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> the >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Java runners? I guess I'd need to dig deeper in the model, but >>>>>>>>>>> this >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> may >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> represent a major effort for the project, right? >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> It is somewhat higher for DirectRunner because DirectRunner also >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> implements >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> the code for execution. It is not that high for DataflowRunner >>>>>>>>> >>>>>>>>>> >>>>>>>>>> because >>>>>>>>>>> >>>>>>>>>>> the >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> base runner module has a lot of helpers with the right hooks for >>>>>>>>> >>>>>>>>>> >>>>>>>>>> implementing a generic runner. I would _expect_ the experience in >>>>>>>>>>> >>>>>>>>>>> general >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> would be similar to the latter. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> * Testing: Python SDK implements ValidatesRunner test framework >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> for >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> implementing integration test for current and future runners. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> There >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> is >>>>>>>> >>>>>>>>> >>>>>>>>> unit >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> test coverage for all modules, and a number of integrations test >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> for >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> validating existing runners. >>>>>>>> >>>>>>>>> >>>>>>>>> * Documentation and examples: Documentation work has started on >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Python >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> SDK. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Beam Programming Guide page has been updated to include Python >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> [2]. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> The >>>>>>>> >>>>>>>>> >>>>>>>>> code comes with many ready to use examples and we are in a good >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> place >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> to >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> start documenting those on the website. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> ** We are not done yet, next on the roadmap we have: >>>>>>>>>>>>> >>>>>>>>>>>>> * Streaming: Both of the existing runners lack support for >>>>>>>>>>>>> >>>>>>>>>>>>> streaming >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> execution, and currently there is work going on for adding >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> streaming >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> support to DirectRunner [3]. >>>>>>>> >>>>>>>>> >>>>>>>>> * Documentation: Filling the rest of the Beam documentations with >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> Python >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> SDK specific information and examples. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> * SDK consistency: Making Python SDK consistent with the Java >>>>>>>>>>>>> SDK. >>>>>>>>>>>>> >>>>>>>>>>>>> We >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> have >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> come a long way on this and have only a few items left [4]. >>>>>>>>>>>> >>>>>>>>>>>>> * Beamifying: We have been working on removing >>>>>>>>>>>>> Dataflow-specific >>>>>>>>>>>>> >>>>>>>>>>>>> references >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> both from the documentation and from the code. There is some >>>>>>>>>>>> work >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> left, >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> we are currently working on those as well [5]. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> ** Steps and implications of merging to master: >>>>>>>>>>>>> >>>>>>>>>>>>> * Master branch is merged to python-sdk branch at regular >>>>>>>>>>>>> >>>>>>>>>>>>> intervals >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> and >>>>>>>> >>>>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> last merge was on 12/22. All the past merges were uneventful >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> because >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> there >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> is a minimal overlap in modified files between branches. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Integrating >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> python-sdk to master will similarly touch a small number of >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> existing >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> files. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> * Python SDK is using the same tools for building and testing. >>>>>>>>>>>>> It >>>>>>>>>>>>> >>>>>>>>>>>>> is >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> already integrated with Maven, Jenkins and Travis. Specifically >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> impact >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> to the testing infrastructure would be: >>>>>>>>>>>> >>>>>>>>>>>>> - There will be two additional test configurations in Travis. >>>>>>>>>>>>> >>>>>>>>>>>>> Since >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Travis >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> runs all configurations in parallel there should not be a >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> noticeable >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> change >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> in the Travis run time. >>>>>>>>>>>> >>>>>>>>>>>>> - Jenkins pre-commit test will start running the Python SDK >>>>>>>>>>>>> tests. >>>>>>>>>>>>> >>>>>>>>>>>>> It >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> will >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> add an additional 5 minutes to the completion time of pre-commit >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> test. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Historically Python SDK tests were not flaky and did not cause >>>>>>>>> any >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> random >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> failures. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - Jenkins Python post-commit test is already separated from the >>>>>>>>>>>>> >>>>>>>>>>>>> other >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> post-commit tests and will continue to exist. It would not change >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> the >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> testing time for any other test. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> * The release process needs to be updated to accommodate >>>>>>>>>>>>> releasing >>>>>>>>>>>>> >>>>>>>>>>>>> Python >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> artifacts. Python SDK would fit in the existing release schedule >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> could >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> be released along with the Java SDK. The additional steps would >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> include: >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> - Generating Python artifacts. This could be done with a single >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> command >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> using Maven today. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> - Publishing the artifacts to a central repository such as PyPI. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I'm more than happy to help on this. We left on purpose some >>>>>>>>>>>>> >>>>>>>>>>>> things >>>>>>>>>>>> >>>>>>>>>>>> open >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> when we added Maven support to the Python build. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> That would be awesome. We can coordinate on that post-merge. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> - Updating the release guide to reflect the changes above. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> * Users: There are existing users using the Python SDK. To >>>>>>>>>>>>> give a >>>>>>>>>>>>> >>>>>>>>>>>>> rough >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> estimate, a distribution of the Beam Python SDK had a total of >>>>>>>>> 23K >>>>>>>>> >>>>>>>>>> >>>>>>>>>> downloads in the past 6 months [6]. Some of those users are >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> already >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> engaged >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> with the community (e.g. [7]). There might be an increased amount >>>>>>>>>>>> >>>>>>>>>>>>> engagement from the rest of them after the merge. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Python 3 support is something we definitively need to look >>>>>>>>>>>>> ahead. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'd >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> try >>>>>>>> >>>>>>>>> >>>>>>>>> to make the codebase compatible with both 2.7.x and 3.6.x, rather >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> than >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> using other solutions like 2to3. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> I agree with you. I think it makes more sense to make codebase >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> compatible >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> with both. As you mentioned Python 3 support is not a short-term >>>>>>>>> goal >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> in >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> the roadmap, and we can discuss it more as we approach that. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Looking forward to hearing your thoughts and comments on >>>>>>>>>>>> >>>>>>>>>>>> “graduating” >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> python-sdk to the master. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>>>>> Ahmet >>>>>>>>>>>>> >>>>>>>>>>>>> (*) Python SDK branch currently has a diverse group of >>>>>>>>>>>>> >>>>>>>>>>>>> contributors. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Regular contributors include Charles Chen, Chamikara Jayalath, >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> María >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> García >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> PMC), >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> contributions >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> from >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> and >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Younghee Kwon. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python >>>>>>>>>>>>> [2] https://beam.apache.org/documentation/programming-guide/ >>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/BEAM-1265 >>>>>>>>>>>>> [4] >>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%20Op >>>>>>>>>>>>> en%20AND%20labels%20%3D%20sdk-consistency >>>>>>>>>>>>> [5] https://issues.apache.org/jira/browse/BEAM-1218 >>>>>>>>>>>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json >>>>>>>>>>>>> [7] https://issues.apache.org/jira/browse/BEAM-1251 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Great summary, Ahmet. Thanks. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Sergio Fernández >>>>>>>>>>>> Partner Technology Manager >>>>>>>>>>>> Redlink GmbH >>>>>>>>>>>> m: +43 6602747925 >>>>>>>>>>>> e: sergio.fernan...@redlink.co >>>>>>>>>>>> w: http://redlink.co >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> -- >>>>>> >>>>> Jean-Baptiste Onofré >>>>> jbono...@apache.org >>>>> http://blog.nanthrax.net >>>>> Talend - http://www.talend.com >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>> >> >> >> > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com >