Thank you Prabeesh and Sergio for fixing those!

On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
wrote:

> Awesome, thanks Sergio ! Much appreciated ;)
>
> Regards
> JB
>
>
> On 01/31/2017 01:42 PM, Sergio Fernández wrote:
>
>> PR #1879 provides the basics: https://github.com/apache/beam/pull/1879
>>
>> On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
>> wrote:
>>
>> No, that's fine as soon as we clearly document the prerequisite for the
>>> build. IMHO, we should provide quick BUILDING instructions in the
>>> README.md.
>>>
>>> Regards
>>> JB
>>>
>>>
>>> On 01/31/2017 01:24 PM, Sergio Fernández wrote:
>>>
>>> Originally we integrate the build in Maven with the default profile.
>>>> Do you feel like it'd be better to have it under a separated profile or
>>>> so?
>>>>
>>>> On Tue, Jan 31, 2017 at 11:07 AM, Jean-Baptiste Onofré <j...@nanthrax.net
>>>> >
>>>> wrote:
>>>>
>>>> Just to be clear, the prerequisite to be able to build the Python SDK
>>>> are:
>>>>
>>>>>
>>>>> apt-get install python-setuptools
>>>>> apt-get install python-pip
>>>>>
>>>>> It's also required by the default "regular" build.
>>>>>
>>>>> Regards
>>>>> JB
>>>>>
>>>>>
>>>>> On 01/31/2017 11:02 AM, Jean-Baptiste Onofré wrote:
>>>>>
>>>>> Just one thing I noticed (and can be helpful for others): to build Beam
>>>>>
>>>>>> we now need python setuptools installed.
>>>>>>
>>>>>> For instance, on Ubuntu, you have to do:
>>>>>>
>>>>>> apt-get install python-setuptools
>>>>>>
>>>>>> Same for the pip distribution.
>>>>>>
>>>>>> I guess (if not already done), we have to update README/Building
>>>>>> instructions.
>>>>>>
>>>>>> Correct ?
>>>>>>
>>>>>> Regards
>>>>>> JB
>>>>>>
>>>>>> On 01/31/2017 08:10 AM, Ahmet Altay wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>>>
>>>>>>> This merge is completed. Python SDK is now officially part of the
>>>>>>> master
>>>>>>> branch! Thank you all for the support. Please open an issue, if you
>>>>>>> notice
>>>>>>> a reference to the now obsolete python-sdk branch in the
>>>>>>> documentation.
>>>>>>>
>>>>>>> There will not be any more merges to the python-sdk branch. Going
>>>>>>> forward
>>>>>>> please use the master branch for Python SDK development. There are a
>>>>>>> few
>>>>>>> existing open PRs to the python-sdk [1]. If you are the author of one
>>>>>>> of
>>>>>>> those PRs, please rebase them on top of master.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Ahmet
>>>>>>>
>>>>>>> [1] https://github.com/pulls?utf8=✓&q=is%3Aopen+is%3Apr+base%
>>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25>
>>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25
>>>>>>> >
>>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25
>>>>>>> >
>>>>>>> 3Apython-sdk+repo%3Aapache%2Fbeam+
>>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr
>>>>>>> +base%3Apython-sdk+repo%3Aapache%2Fbeam+>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jan 20, 2017 at 10:06 AM, Kenneth Knowles
>>>>>>> <k...@google.com.invalid>
>>>>>>> wrote:
>>>>>>>
>>>>>>> To clarify the implied criteria of that last exchange, it is "An SDK
>>>>>>>
>>>>>>> should
>>>>>>>> have at least one runner that can execute the complete model (may
>>>>>>>> be a
>>>>>>>> direct runner)"
>>>>>>>>
>>>>>>>> I want to highlight this, because whether an _SDK_ supports
>>>>>>>> unbounded
>>>>>>>> data
>>>>>>>> is not particularly well-defined, and will evolve:
>>>>>>>>
>>>>>>>>  - With the Runner API, an SDK will need to support building a graph
>>>>>>>> with
>>>>>>>> unbounded constructs, as today with probably minimal changes.
>>>>>>>>
>>>>>>>>  - With the Fn API, if any part of the Fn API is specific to
>>>>>>>> unbounded
>>>>>>>> data, the SDK will need to implement it. I think right now there is
>>>>>>>> no such
>>>>>>>> thing, and we don't want such a thing, so SDKs implementing the Fn
>>>>>>>> API
>>>>>>>> automatically support unbounded data.
>>>>>>>>
>>>>>>>>  - There will also likely be an SDK-specific shim just as there is
>>>>>>>> today,
>>>>>>>> to leverage idiomatic deserialized representations. The richness of
>>>>>>>> this
>>>>>>>> shim will decrease so that it will need to "support" unbounded data
>>>>>>>> but
>>>>>>>> that will be a ~one liner.
>>>>>>>>
>>>>>>>> Getting the Python SDK on master will accelerate our progress
>>>>>>>> towards
>>>>>>>> the
>>>>>>>> Fn API - partly technical, partly community - which is the best path
>>>>>>>> towards support for unbounded data across multiple runners. I think
>>>>>>>> the
>>>>>>>> criteria are written with the completed portability framework in
>>>>>>>> mind. So
>>>>>>>> this exchange makes me actually more convinced we should merge
>>>>>>>> python-sdk
>>>>>>>> to master.
>>>>>>>>
>>>>>>>> On Fri, Jan 20, 2017 at 9:53 AM, Robert Bradshaw <
>>>>>>>> rober...@google.com.invalid> wrote:
>>>>>>>>
>>>>>>>> On Thu, Jan 19, 2017 at 11:56 PM, Dan Halperin
>>>>>>>>
>>>>>>>> <dhalp...@google.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>> I do not think that Python SDK yet meets the bar [1] for
>>>>>>>>> implementing
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Beam model -- supporting Unbounded data is very important. That
>>>>>>>> said,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> given
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> the committed and sustained set of contributors, it generally makes
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> sense
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> to me to make an exception in anticipation of these features being
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> fleshed
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> out soon; including potentially new users/contributors that would
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> arrive
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> once in master.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> [1] https://lists.apache.org/thread.html/CAAzyFAxcmexUQnbF=Y
>>>>>>>>>> k0plmm3f5e5bqwjz4+c5doruclnxo...@mail.gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That is a valid point. The Python SDK supports all the unbounded
>>>>>>>>> parts
>>>>>>>>> of the model except for unbounded sources, which was deferred while
>>>>>>>>> seeing how https://s.apache.org/splittable-do-fn played out. I've
>>>>>>>>> been
>>>>>>>>> working with the team and merging/reviewing most of their code, and
>>>>>>>>> have full confidence this will be coming (and on that note can
>>>>>>>>> vouch
>>>>>>>>> for a healthy community and support which are much harder to add
>>>>>>>>> later).
>>>>>>>>>
>>>>>>>>> In short, I think it has the required maturity, and I'm in favor of
>>>>>>>>> merging soonish.
>>>>>>>>>
>>>>>>>>> On Wed, Jan 18, 2017 at 12:24 AM, Ahmet Altay
>>>>>>>>>
>>>>>>>>> <al...@google.com.invalid
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thank you all for the comments so far. I would follow the process
>>>>>>>>>> as
>>>>>>>>>>
>>>>>>>>>> suggested by Davor and others in this thread.
>>>>>>>>>>>
>>>>>>>>>>> Ahmet
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 17, 2017 at 11:47 PM, Sergio Fernández <
>>>>>>>>>>> wik...@apache.org
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay
>>>>>>>>>>>>
>>>>>>>>>>>> <al...@google.com.invalid
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> tl;dr: I would like to start a discussion about merging
>>>>>>>>>>>>> python-sdk
>>>>>>>>>>>>>
>>>>>>>>>>>>> branch
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> to master branch. Python SDK is mature enough and merging it to
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> master
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> accelerate its development and adoption.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Good point, Ahmet!
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I've following closed the development since it was imported in
>>>>>>>>>>>> June.
>>>>>>>>>>>>
>>>>>>>>>>>> For
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> the prototypes I've implemented so far it works quite well; I
>>>>>>>>> guess
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> we'd
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> just need to focus the next months in bringing more runners
>>>>>>>>> support.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> With a great effort from a lot of contributors(*), Python SDK [1]
>>>>>>>>>>>> is
>>>>>>>>>>>>
>>>>>>>>>>>> now
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> a
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> mostly complete, tested, performant Python implementation of the
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Beam
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> model. Since June, when we first started with Python SDK in
>>>>>>>>> Apache
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Beam
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> have been continuously improving it.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I wouldn't merge during the preparation of 0.5.0 release, but
>>>>>>>>>>>>>
>>>>>>>>>>>> after
>>>>>>>>>>>>
>>>>>>>>>>>> that
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> could be a good time to merge back into master.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> ** Python SDK currently supports:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> * Model: All main concepts are present (ParDo, GroupByKey,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Windowing
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> etc.).
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> * IO: There are extensible APIs for writing new bounded sources
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> sinks.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Implementations are provided for Text, Avro, BigQuery, and
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Datastore.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> * Runners: Python SDK has an extensible base runner module that
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> allows
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> building specific runners on top of it. The SDK comes with two
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> pipeline
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> runners: DirectRunner and DataflowRunner; and it is possible to
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> add
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> more.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> The existing runners are currently limited to bounded execution
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> otherwise equivalent to their Java SDK counterparts in
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> functionality.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> What would the effort of porting, and maintaining, parallel
>>>>>>>>>>>>>
>>>>>>>>>>>> versions
>>>>>>>>>>>>
>>>>>>>>>>>> of
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Java runners? I guess I'd need to dig deeper in the model, but
>>>>>>>>>>> this
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> may
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> represent a major effort for the project, right?
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> It is somewhat higher for DirectRunner because DirectRunner also
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> implements
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> the code for execution. It is not that high for DataflowRunner
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> because
>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> base runner module has a lot of helpers with the right hooks for
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> implementing a generic runner. I would _expect_ the experience in
>>>>>>>>>>>
>>>>>>>>>>> general
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> would be similar to the latter.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> * Testing: Python SDK implements ValidatesRunner test framework
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> for
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> implementing integration test for current and future runners.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> There
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> is
>>>>>>>>
>>>>>>>>>
>>>>>>>>> unit
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> test coverage for all modules, and a number of integrations test
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> for
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> validating existing runners.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> * Documentation and examples: Documentation work has started on
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Python
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> SDK.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Beam Programming Guide page has been updated to include Python
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> The
>>>>>>>>
>>>>>>>>>
>>>>>>>>> code comes with many ready to use examples and we are in a good
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> place
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> start documenting those on the website.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ** We are not done yet, next on the roadmap we have:
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Streaming: Both of the existing runners lack support for
>>>>>>>>>>>>>
>>>>>>>>>>>>> streaming
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> execution, and currently there is work going on for adding
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> streaming
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> support to DirectRunner [3].
>>>>>>>>
>>>>>>>>>
>>>>>>>>> * Documentation: Filling the rest of the Beam documentations with
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Python
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> SDK specific information and examples.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> * SDK consistency: Making Python SDK consistent with the Java
>>>>>>>>>>>>> SDK.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> have
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> come a long way on this and have only a few items left [4].
>>>>>>>>>>>>
>>>>>>>>>>>>> * Beamifying: We have been working on removing
>>>>>>>>>>>>> Dataflow-specific
>>>>>>>>>>>>>
>>>>>>>>>>>>> references
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> both from the documentation and from the code. There is some
>>>>>>>>>>>> work
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> left,
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> we are currently working on those as well [5].
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> ** Steps and implications of merging to master:
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Master branch is merged to python-sdk branch at regular
>>>>>>>>>>>>>
>>>>>>>>>>>>> intervals
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> and
>>>>>>>>
>>>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> last merge was on 12/22. All the past merges were uneventful
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> because
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> there
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> is a minimal overlap in modified files between branches.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Integrating
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> python-sdk to master will similarly touch a small number of
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> existing
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> files.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> * Python SDK is using the same tools for building and testing.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>
>>>>>>>>>>>>> is
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> already integrated with Maven, Jenkins and Travis. Specifically
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> impact
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> to the testing infrastructure would be:
>>>>>>>>>>>>
>>>>>>>>>>>>> - There will be two additional test configurations in Travis.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Since
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Travis
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> runs all configurations in parallel there should not be a
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> noticeable
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> change
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> in the Travis run time.
>>>>>>>>>>>>
>>>>>>>>>>>>> - Jenkins pre-commit test will start running the Python SDK
>>>>>>>>>>>>> tests.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> will
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> add an additional 5 minutes to the completion time of pre-commit
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> test.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Historically Python SDK tests were not flaky and did not cause
>>>>>>>>> any
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> random
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> failures.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> - Jenkins Python post-commit test is already separated from the
>>>>>>>>>>>>>
>>>>>>>>>>>>> other
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> post-commit tests and will continue to exist. It would not change
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> testing time for any other test.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> * The release process needs to be updated to accommodate
>>>>>>>>>>>>> releasing
>>>>>>>>>>>>>
>>>>>>>>>>>>> Python
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> artifacts. Python SDK would fit in the existing release schedule
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> could
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> be released along with the Java SDK. The additional steps would
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> include:
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> - Generating Python artifacts. This could be done with a single
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> command
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> using Maven today.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> - Publishing the artifacts to a central repository such as PyPI.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm more than happy to help on this. We left on purpose some
>>>>>>>>>>>>>
>>>>>>>>>>>> things
>>>>>>>>>>>>
>>>>>>>>>>>> open
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> when we added Maven support to the Python build.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> That would be awesome. We can coordinate on that post-merge.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> - Updating the release guide to reflect the changes above.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> * Users: There are existing users using the Python SDK. To
>>>>>>>>>>>>> give a
>>>>>>>>>>>>>
>>>>>>>>>>>>> rough
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> estimate, a distribution of the Beam Python SDK had a total of
>>>>>>>>> 23K
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> downloads in the past 6 months [6]. Some of those users are
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> already
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> engaged
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> with the community (e.g. [7]). There might be an increased amount
>>>>>>>>>>>>
>>>>>>>>>>>>> engagement from the rest of them after the merge.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Python 3 support is something we definitively need to look
>>>>>>>>>>>>> ahead.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I'd
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> try
>>>>>>>>
>>>>>>>>>
>>>>>>>>> to make the codebase compatible with both 2.7.x and 3.6.x, rather
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> than
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> using other  solutions like 2to3.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> I agree with you. I think it makes more sense to make codebase
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> compatible
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> with both. As you mentioned Python 3 support is not a short-term
>>>>>>>>> goal
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> the roadmap, and we can discuss it more as we approach that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Looking forward to hearing your thoughts and comments on
>>>>>>>>>>>>
>>>>>>>>>>>> “graduating”
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> python-sdk to the master.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>>>>> Ahmet
>>>>>>>>>>>>>
>>>>>>>>>>>>> (*) Python SDK branch currently has a diverse group of
>>>>>>>>>>>>>
>>>>>>>>>>>>> contributors.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Regular contributors include Charles Chen, Chamikara Jayalath,
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> María
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> García
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> PMC),
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> contributions
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> from
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee,
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> and
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Younghee Kwon.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python
>>>>>>>>>>>>> [2] https://beam.apache.org/documentation/programming-guide/
>>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/BEAM-1265
>>>>>>>>>>>>> [4]
>>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%20Op
>>>>>>>>>>>>> en%20AND%20labels%20%3D%20sdk-consistency
>>>>>>>>>>>>> [5] https://issues.apache.org/jira/browse/BEAM-1218
>>>>>>>>>>>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json
>>>>>>>>>>>>> [7] https://issues.apache.org/jira/browse/BEAM-1251
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Great summary, Ahmet. Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Sergio Fernández
>>>>>>>>>>>> Partner Technology Manager
>>>>>>>>>>>> Redlink GmbH
>>>>>>>>>>>> m: +43 6602747925
>>>>>>>>>>>> e: sergio.fernan...@redlink.co
>>>>>>>>>>>> w: http://redlink.co
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>> --
>>>>>>
>>>>> Jean-Baptiste Onofré
>>>>> jbono...@apache.org
>>>>> http://blog.nanthrax.net
>>>>> Talend - http://www.talend.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>> Jean-Baptiste Onofré
>>> jbono...@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>>
>>>
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Reply via email to