Awesome! On Tue, Jan 31, 2017 at 9:38 AM, Ahmet Altay <al...@google.com.invalid> wrote:
> Thank you Prabeesh and Sergio for fixing those! > > On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > > > Awesome, thanks Sergio ! Much appreciated ;) > > > > Regards > > JB > > > > > > On 01/31/2017 01:42 PM, Sergio Fernández wrote: > > > >> PR #1879 provides the basics: https://github.com/apache/beam/pull/1879 > >> > >> On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré <j...@nanthrax.net> > >> wrote: > >> > >> No, that's fine as soon as we clearly document the prerequisite for the > >>> build. IMHO, we should provide quick BUILDING instructions in the > >>> README.md. > >>> > >>> Regards > >>> JB > >>> > >>> > >>> On 01/31/2017 01:24 PM, Sergio Fernández wrote: > >>> > >>> Originally we integrate the build in Maven with the default profile. > >>>> Do you feel like it'd be better to have it under a separated profile > or > >>>> so? > >>>> > >>>> On Tue, Jan 31, 2017 at 11:07 AM, Jean-Baptiste Onofré < > j...@nanthrax.net > >>>> > > >>>> wrote: > >>>> > >>>> Just to be clear, the prerequisite to be able to build the Python SDK > >>>> are: > >>>> > >>>>> > >>>>> apt-get install python-setuptools > >>>>> apt-get install python-pip > >>>>> > >>>>> It's also required by the default "regular" build. > >>>>> > >>>>> Regards > >>>>> JB > >>>>> > >>>>> > >>>>> On 01/31/2017 11:02 AM, Jean-Baptiste Onofré wrote: > >>>>> > >>>>> Just one thing I noticed (and can be helpful for others): to build > Beam > >>>>> > >>>>>> we now need python setuptools installed. > >>>>>> > >>>>>> For instance, on Ubuntu, you have to do: > >>>>>> > >>>>>> apt-get install python-setuptools > >>>>>> > >>>>>> Same for the pip distribution. > >>>>>> > >>>>>> I guess (if not already done), we have to update README/Building > >>>>>> instructions. > >>>>>> > >>>>>> Correct ? > >>>>>> > >>>>>> Regards > >>>>>> JB > >>>>>> > >>>>>> On 01/31/2017 08:10 AM, Ahmet Altay wrote: > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>>> > >>>>>>> This merge is completed. Python SDK is now officially part of the > >>>>>>> master > >>>>>>> branch! Thank you all for the support. Please open an issue, if you > >>>>>>> notice > >>>>>>> a reference to the now obsolete python-sdk branch in the > >>>>>>> documentation. > >>>>>>> > >>>>>>> There will not be any more merges to the python-sdk branch. Going > >>>>>>> forward > >>>>>>> please use the master branch for Python SDK development. There are > a > >>>>>>> few > >>>>>>> existing open PRs to the python-sdk [1]. If you are the author of > one > >>>>>>> of > >>>>>>> those PRs, please rebase them on top of master. > >>>>>>> > >>>>>>> Thank you, > >>>>>>> Ahmet > >>>>>>> > >>>>>>> [1] https://github.com/pulls?utf8=✓&q=is%3Aopen+is%3Apr+base% > <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25> > >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is% > 3Apr+base%25> > >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is% > 3Apr+base%25 > >>>>>>> > > >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is% > 3Apr+base%25 > >>>>>>> > > >>>>>>> 3Apython-sdk+repo%3Aapache%2Fbeam+ > >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr > >>>>>>> +base%3Apython-sdk+repo%3Aapache%2Fbeam+> > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Jan 20, 2017 at 10:06 AM, Kenneth Knowles > >>>>>>> <k...@google.com.invalid> > >>>>>>> wrote: > >>>>>>> > >>>>>>> To clarify the implied criteria of that last exchange, it is "An > SDK > >>>>>>> > >>>>>>> should > >>>>>>>> have at least one runner that can execute the complete model (may > >>>>>>>> be a > >>>>>>>> direct runner)" > >>>>>>>> > >>>>>>>> I want to highlight this, because whether an _SDK_ supports > >>>>>>>> unbounded > >>>>>>>> data > >>>>>>>> is not particularly well-defined, and will evolve: > >>>>>>>> > >>>>>>>> - With the Runner API, an SDK will need to support building a > graph > >>>>>>>> with > >>>>>>>> unbounded constructs, as today with probably minimal changes. > >>>>>>>> > >>>>>>>> - With the Fn API, if any part of the Fn API is specific to > >>>>>>>> unbounded > >>>>>>>> data, the SDK will need to implement it. I think right now there > is > >>>>>>>> no such > >>>>>>>> thing, and we don't want such a thing, so SDKs implementing the Fn > >>>>>>>> API > >>>>>>>> automatically support unbounded data. > >>>>>>>> > >>>>>>>> - There will also likely be an SDK-specific shim just as there is > >>>>>>>> today, > >>>>>>>> to leverage idiomatic deserialized representations. The richness > of > >>>>>>>> this > >>>>>>>> shim will decrease so that it will need to "support" unbounded > data > >>>>>>>> but > >>>>>>>> that will be a ~one liner. > >>>>>>>> > >>>>>>>> Getting the Python SDK on master will accelerate our progress > >>>>>>>> towards > >>>>>>>> the > >>>>>>>> Fn API - partly technical, partly community - which is the best > path > >>>>>>>> towards support for unbounded data across multiple runners. I > think > >>>>>>>> the > >>>>>>>> criteria are written with the completed portability framework in > >>>>>>>> mind. So > >>>>>>>> this exchange makes me actually more convinced we should merge > >>>>>>>> python-sdk > >>>>>>>> to master. > >>>>>>>> > >>>>>>>> On Fri, Jan 20, 2017 at 9:53 AM, Robert Bradshaw < > >>>>>>>> rober...@google.com.invalid> wrote: > >>>>>>>> > >>>>>>>> On Thu, Jan 19, 2017 at 11:56 PM, Dan Halperin > >>>>>>>> > >>>>>>>> <dhalp...@google.com.invalid> wrote: > >>>>>>>>> > >>>>>>>>> I do not think that Python SDK yet meets the bar [1] for > >>>>>>>>> implementing > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> the > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> Beam model -- supporting Unbounded data is very important. That > >>>>>>>> said, > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> given > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> the committed and sustained set of contributors, it generally > makes > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> sense > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> to me to make an exception in anticipation of these features being > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> fleshed > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> out soon; including potentially new users/contributors that would > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> arrive > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> once in master. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> [1] https://lists.apache.org/thread.html/CAAzyFAxcmexUQnbF=Y > >>>>>>>>>> k0plmm3f5e5bqwjz4+c5doruclnxo...@mail.gmail.com > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> That is a valid point. The Python SDK supports all the unbounded > >>>>>>>>> parts > >>>>>>>>> of the model except for unbounded sources, which was deferred > while > >>>>>>>>> seeing how https://s.apache.org/splittable-do-fn played out. > I've > >>>>>>>>> been > >>>>>>>>> working with the team and merging/reviewing most of their code, > and > >>>>>>>>> have full confidence this will be coming (and on that note can > >>>>>>>>> vouch > >>>>>>>>> for a healthy community and support which are much harder to add > >>>>>>>>> later). > >>>>>>>>> > >>>>>>>>> In short, I think it has the required maturity, and I'm in favor > of > >>>>>>>>> merging soonish. > >>>>>>>>> > >>>>>>>>> On Wed, Jan 18, 2017 at 12:24 AM, Ahmet Altay > >>>>>>>>> > >>>>>>>>> <al...@google.com.invalid > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Thank you all for the comments so far. I would follow the > process > >>>>>>>>>> as > >>>>>>>>>> > >>>>>>>>>> suggested by Davor and others in this thread. > >>>>>>>>>>> > >>>>>>>>>>> Ahmet > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Jan 17, 2017 at 11:47 PM, Sergio Fernández < > >>>>>>>>>>> wik...@apache.org > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Hi > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay > >>>>>>>>>>>> > >>>>>>>>>>>> <al...@google.com.invalid > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> tl;dr: I would like to start a discussion about merging > >>>>>>>>>>>>> python-sdk > >>>>>>>>>>>>> > >>>>>>>>>>>>> branch > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> to master branch. Python SDK is mature enough and merging it to > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> master > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> will > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> accelerate its development and adoption. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Good point, Ahmet! > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I've following closed the development since it was imported in > >>>>>>>>>>>> June. > >>>>>>>>>>>> > >>>>>>>>>>>> For > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> the prototypes I've implemented so far it works quite well; I > >>>>>>>>> guess > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> we'd > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> just need to focus the next months in bringing more runners > >>>>>>>>> support. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> With a great effort from a lot of contributors(*), Python SDK > [1] > >>>>>>>>>>>> is > >>>>>>>>>>>> > >>>>>>>>>>>> now > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> a > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> mostly complete, tested, performant Python implementation of > the > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> Beam > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> model. Since June, when we first started with Python SDK in > >>>>>>>>> Apache > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Beam > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> we > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> have been continuously improving it. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I wouldn't merge during the preparation of 0.5.0 release, but > >>>>>>>>>>>>> > >>>>>>>>>>>> after > >>>>>>>>>>>> > >>>>>>>>>>>> that > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> could be a good time to merge back into master. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> ** Python SDK currently supports: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> * Model: All main concepts are present (ParDo, GroupByKey, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Windowing > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> etc.). > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> * IO: There are extensible APIs for writing new bounded sources > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> and > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> sinks. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Implementations are provided for Text, Avro, BigQuery, and > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Datastore. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> * Runners: Python SDK has an extensible base runner module that > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> allows > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> building specific runners on top of it. The SDK comes with two > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> pipeline > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> runners: DirectRunner and DataflowRunner; and it is possible to > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> add > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> more. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> The existing runners are currently limited to bounded execution > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> and > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> otherwise equivalent to their Java SDK counterparts in > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> functionality. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>>> What would the effort of porting, and maintaining, parallel > >>>>>>>>>>>>> > >>>>>>>>>>>> versions > >>>>>>>>>>>> > >>>>>>>>>>>> of > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> the > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Java runners? I guess I'd need to dig deeper in the model, but > >>>>>>>>>>> this > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> may > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> represent a major effort for the project, right? > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> It is somewhat higher for DirectRunner because DirectRunner > also > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> implements > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> the code for execution. It is not that high for DataflowRunner > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> because > >>>>>>>>>>> > >>>>>>>>>>> the > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> base runner module has a lot of helpers with the right hooks for > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> implementing a generic runner. I would _expect_ the experience > in > >>>>>>>>>>> > >>>>>>>>>>> general > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> would be similar to the latter. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> * Testing: Python SDK implements ValidatesRunner test > framework > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> for > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> implementing integration test for current and future runners. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> There > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> is > >>>>>>>> > >>>>>>>>> > >>>>>>>>> unit > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> test coverage for all modules, and a number of integrations > test > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> for > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> validating existing runners. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> * Documentation and examples: Documentation work has started on > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> Python > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> SDK. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Beam Programming Guide page has been updated to include Python > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> [2]. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> The > >>>>>>>> > >>>>>>>>> > >>>>>>>>> code comes with many ready to use examples and we are in a good > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> place > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> to > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> start documenting those on the website. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> ** We are not done yet, next on the roadmap we have: > >>>>>>>>>>>>> > >>>>>>>>>>>>> * Streaming: Both of the existing runners lack support for > >>>>>>>>>>>>> > >>>>>>>>>>>>> streaming > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> execution, and currently there is work going on for adding > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> streaming > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> support to DirectRunner [3]. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> * Documentation: Filling the rest of the Beam documentations with > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>> Python > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> SDK specific information and examples. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> * SDK consistency: Making Python SDK consistent with the Java > >>>>>>>>>>>>> SDK. > >>>>>>>>>>>>> > >>>>>>>>>>>>> We > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> have > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> come a long way on this and have only a few items left [4]. > >>>>>>>>>>>> > >>>>>>>>>>>>> * Beamifying: We have been working on removing > >>>>>>>>>>>>> Dataflow-specific > >>>>>>>>>>>>> > >>>>>>>>>>>>> references > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> both from the documentation and from the code. There is some > >>>>>>>>>>>> work > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> left, > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> and > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> we are currently working on those as well [5]. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> ** Steps and implications of merging to master: > >>>>>>>>>>>>> > >>>>>>>>>>>>> * Master branch is merged to python-sdk branch at regular > >>>>>>>>>>>>> > >>>>>>>>>>>>> intervals > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> and > >>>>>>>> > >>>>>>>>> > >>>>>>>>> the > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> last merge was on 12/22. All the past merges were uneventful > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> because > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> there > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> is a minimal overlap in modified files between branches. > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Integrating > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> python-sdk to master will similarly touch a small number of > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> existing > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> files. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> * Python SDK is using the same tools for building and testing. > >>>>>>>>>>>>> It > >>>>>>>>>>>>> > >>>>>>>>>>>>> is > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> already integrated with Maven, Jenkins and Travis. Specifically > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> the > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> impact > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> to the testing infrastructure would be: > >>>>>>>>>>>> > >>>>>>>>>>>>> - There will be two additional test configurations in Travis. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Since > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> Travis > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> runs all configurations in parallel there should not be a > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> noticeable > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> change > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> in the Travis run time. > >>>>>>>>>>>> > >>>>>>>>>>>>> - Jenkins pre-commit test will start running the Python SDK > >>>>>>>>>>>>> tests. > >>>>>>>>>>>>> > >>>>>>>>>>>>> It > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> will > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> add an additional 5 minutes to the completion time of > pre-commit > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> test. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> Historically Python SDK tests were not flaky and did not cause > >>>>>>>>> any > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> random > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> failures. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> - Jenkins Python post-commit test is already separated from > the > >>>>>>>>>>>>> > >>>>>>>>>>>>> other > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> post-commit tests and will continue to exist. It would not > change > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> the > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> testing time for any other test. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> * The release process needs to be updated to accommodate > >>>>>>>>>>>>> releasing > >>>>>>>>>>>>> > >>>>>>>>>>>>> Python > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> artifacts. Python SDK would fit in the existing release > schedule > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> and > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> could > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> be released along with the Java SDK. The additional steps would > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> include: > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> - Generating Python artifacts. This could be done with a single > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> command > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> using Maven today. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> - Publishing the artifacts to a central repository such as PyPI. > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm more than happy to help on this. We left on purpose some > >>>>>>>>>>>>> > >>>>>>>>>>>> things > >>>>>>>>>>>> > >>>>>>>>>>>> open > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> when we added Maven support to the Python build. > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> That would be awesome. We can coordinate on that post-merge. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> - Updating the release guide to reflect the changes above. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> * Users: There are existing users using the Python SDK. To > >>>>>>>>>>>>> give a > >>>>>>>>>>>>> > >>>>>>>>>>>>> rough > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> estimate, a distribution of the Beam Python SDK had a total of > >>>>>>>>> 23K > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> downloads in the past 6 months [6]. Some of those users are > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> already > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> engaged > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> with the community (e.g. [7]). There might be an increased > amount > >>>>>>>>>>>> > >>>>>>>>>>>>> engagement from the rest of them after the merge. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Python 3 support is something we definitively need to look > >>>>>>>>>>>>> ahead. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> I'd > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> try > >>>>>>>> > >>>>>>>>> > >>>>>>>>> to make the codebase compatible with both 2.7.x and 3.6.x, rather > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> than > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> using other solutions like 2to3. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>> I agree with you. I think it makes more sense to make codebase > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> compatible > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> with both. As you mentioned Python 3 support is not a short-term > >>>>>>>>> goal > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> in > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> the roadmap, and we can discuss it more as we approach that. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Looking forward to hearing your thoughts and comments on > >>>>>>>>>>>> > >>>>>>>>>>>> “graduating” > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> python-sdk to the master. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Thank you, > >>>>>>>>>>>>> Ahmet > >>>>>>>>>>>>> > >>>>>>>>>>>>> (*) Python SDK branch currently has a diverse group of > >>>>>>>>>>>>> > >>>>>>>>>>>>> contributors. > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> Regular contributors include Charles Chen, Chamikara Jayalath, > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> María > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> García > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam > >>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> PMC), > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> contributions > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> from > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee, > >>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> and > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> Younghee Kwon. > >>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python > >>>>>>>>>>>>> [2] https://beam.apache.org/documentation/programming-guide/ > >>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/BEAM-1265 > >>>>>>>>>>>>> [4] > >>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%20Op > >>>>>>>>>>>>> en%20AND%20labels%20%3D%20sdk-consistency > >>>>>>>>>>>>> [5] https://issues.apache.org/jira/browse/BEAM-1218 > >>>>>>>>>>>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json > >>>>>>>>>>>>> [7] https://issues.apache.org/jira/browse/BEAM-1251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> Great summary, Ahmet. Thanks. > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Sergio Fernández > >>>>>>>>>>>> Partner Technology Manager > >>>>>>>>>>>> Redlink GmbH > >>>>>>>>>>>> m: +43 6602747925 > >>>>>>>>>>>> e: sergio.fernan...@redlink.co > >>>>>>>>>>>> w: http://redlink.co > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> -- > >>>>>> > >>>>> Jean-Baptiste Onofré > >>>>> jbono...@apache.org > >>>>> http://blog.nanthrax.net > >>>>> Talend - http://www.talend.com > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>> Jean-Baptiste Onofré > >>> jbono...@apache.org > >>> http://blog.nanthrax.net > >>> Talend - http://www.talend.com > >>> > >>> > >> > >> > >> > > -- > > Jean-Baptiste Onofré > > jbono...@apache.org > > http://blog.nanthrax.net > > Talend - http://www.talend.com > > >