Awesome!

On Tue, Jan 31, 2017 at 9:38 AM, Ahmet Altay <al...@google.com.invalid>
wrote:

> Thank you Prabeesh and Sergio for fixing those!
>
> On Tue, Jan 31, 2017 at 4:51 AM, Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
> > Awesome, thanks Sergio ! Much appreciated ;)
> >
> > Regards
> > JB
> >
> >
> > On 01/31/2017 01:42 PM, Sergio Fernández wrote:
> >
> >> PR #1879 provides the basics: https://github.com/apache/beam/pull/1879
> >>
> >> On Tue, Jan 31, 2017 at 1:33 PM, Jean-Baptiste Onofré <j...@nanthrax.net>
> >> wrote:
> >>
> >> No, that's fine as soon as we clearly document the prerequisite for the
> >>> build. IMHO, we should provide quick BUILDING instructions in the
> >>> README.md.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 01/31/2017 01:24 PM, Sergio Fernández wrote:
> >>>
> >>> Originally we integrate the build in Maven with the default profile.
> >>>> Do you feel like it'd be better to have it under a separated profile
> or
> >>>> so?
> >>>>
> >>>> On Tue, Jan 31, 2017 at 11:07 AM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> >>>> >
> >>>> wrote:
> >>>>
> >>>> Just to be clear, the prerequisite to be able to build the Python SDK
> >>>> are:
> >>>>
> >>>>>
> >>>>> apt-get install python-setuptools
> >>>>> apt-get install python-pip
> >>>>>
> >>>>> It's also required by the default "regular" build.
> >>>>>
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>>
> >>>>> On 01/31/2017 11:02 AM, Jean-Baptiste Onofré wrote:
> >>>>>
> >>>>> Just one thing I noticed (and can be helpful for others): to build
> Beam
> >>>>>
> >>>>>> we now need python setuptools installed.
> >>>>>>
> >>>>>> For instance, on Ubuntu, you have to do:
> >>>>>>
> >>>>>> apt-get install python-setuptools
> >>>>>>
> >>>>>> Same for the pip distribution.
> >>>>>>
> >>>>>> I guess (if not already done), we have to update README/Building
> >>>>>> instructions.
> >>>>>>
> >>>>>> Correct ?
> >>>>>>
> >>>>>> Regards
> >>>>>> JB
> >>>>>>
> >>>>>> On 01/31/2017 08:10 AM, Ahmet Altay wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>>>
> >>>>>>> This merge is completed. Python SDK is now officially part of the
> >>>>>>> master
> >>>>>>> branch! Thank you all for the support. Please open an issue, if you
> >>>>>>> notice
> >>>>>>> a reference to the now obsolete python-sdk branch in the
> >>>>>>> documentation.
> >>>>>>>
> >>>>>>> There will not be any more merges to the python-sdk branch. Going
> >>>>>>> forward
> >>>>>>> please use the master branch for Python SDK development. There are
> a
> >>>>>>> few
> >>>>>>> existing open PRs to the python-sdk [1]. If you are the author of
> one
> >>>>>>> of
> >>>>>>> those PRs, please rebase them on top of master.
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>> Ahmet
> >>>>>>>
> >>>>>>> [1] https://github.com/pulls?utf8=✓&q=is%3Aopen+is%3Apr+base%
> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr+base%25>
> >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%
> 3Apr+base%25>
> >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%
> 3Apr+base%25
> >>>>>>> >
> >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%
> 3Apr+base%25
> >>>>>>> >
> >>>>>>> 3Apython-sdk+repo%3Aapache%2Fbeam+
> >>>>>>> <https://github.com/pulls?utf8=%E2%9C%93&q=is%3Aopen+is%3Apr
> >>>>>>> +base%3Apython-sdk+repo%3Aapache%2Fbeam+>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Fri, Jan 20, 2017 at 10:06 AM, Kenneth Knowles
> >>>>>>> <k...@google.com.invalid>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> To clarify the implied criteria of that last exchange, it is "An
> SDK
> >>>>>>>
> >>>>>>> should
> >>>>>>>> have at least one runner that can execute the complete model (may
> >>>>>>>> be a
> >>>>>>>> direct runner)"
> >>>>>>>>
> >>>>>>>> I want to highlight this, because whether an _SDK_ supports
> >>>>>>>> unbounded
> >>>>>>>> data
> >>>>>>>> is not particularly well-defined, and will evolve:
> >>>>>>>>
> >>>>>>>>  - With the Runner API, an SDK will need to support building a
> graph
> >>>>>>>> with
> >>>>>>>> unbounded constructs, as today with probably minimal changes.
> >>>>>>>>
> >>>>>>>>  - With the Fn API, if any part of the Fn API is specific to
> >>>>>>>> unbounded
> >>>>>>>> data, the SDK will need to implement it. I think right now there
> is
> >>>>>>>> no such
> >>>>>>>> thing, and we don't want such a thing, so SDKs implementing the Fn
> >>>>>>>> API
> >>>>>>>> automatically support unbounded data.
> >>>>>>>>
> >>>>>>>>  - There will also likely be an SDK-specific shim just as there is
> >>>>>>>> today,
> >>>>>>>> to leverage idiomatic deserialized representations. The richness
> of
> >>>>>>>> this
> >>>>>>>> shim will decrease so that it will need to "support" unbounded
> data
> >>>>>>>> but
> >>>>>>>> that will be a ~one liner.
> >>>>>>>>
> >>>>>>>> Getting the Python SDK on master will accelerate our progress
> >>>>>>>> towards
> >>>>>>>> the
> >>>>>>>> Fn API - partly technical, partly community - which is the best
> path
> >>>>>>>> towards support for unbounded data across multiple runners. I
> think
> >>>>>>>> the
> >>>>>>>> criteria are written with the completed portability framework in
> >>>>>>>> mind. So
> >>>>>>>> this exchange makes me actually more convinced we should merge
> >>>>>>>> python-sdk
> >>>>>>>> to master.
> >>>>>>>>
> >>>>>>>> On Fri, Jan 20, 2017 at 9:53 AM, Robert Bradshaw <
> >>>>>>>> rober...@google.com.invalid> wrote:
> >>>>>>>>
> >>>>>>>> On Thu, Jan 19, 2017 at 11:56 PM, Dan Halperin
> >>>>>>>>
> >>>>>>>> <dhalp...@google.com.invalid> wrote:
> >>>>>>>>>
> >>>>>>>>> I do not think that Python SDK yet meets the bar [1] for
> >>>>>>>>> implementing
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> the
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> Beam model -- supporting Unbounded data is very important. That
> >>>>>>>> said,
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> given
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> the committed and sustained set of contributors, it generally
> makes
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> sense
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> to me to make an exception in anticipation of these features being
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> fleshed
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> out soon; including potentially new users/contributors that would
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> arrive
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>> once in master.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> [1] https://lists.apache.org/thread.html/CAAzyFAxcmexUQnbF=Y
> >>>>>>>>>> k0plmm3f5e5bqwjz4+c5doruclnxo...@mail.gmail.com
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> That is a valid point. The Python SDK supports all the unbounded
> >>>>>>>>> parts
> >>>>>>>>> of the model except for unbounded sources, which was deferred
> while
> >>>>>>>>> seeing how https://s.apache.org/splittable-do-fn played out.
> I've
> >>>>>>>>> been
> >>>>>>>>> working with the team and merging/reviewing most of their code,
> and
> >>>>>>>>> have full confidence this will be coming (and on that note can
> >>>>>>>>> vouch
> >>>>>>>>> for a healthy community and support which are much harder to add
> >>>>>>>>> later).
> >>>>>>>>>
> >>>>>>>>> In short, I think it has the required maturity, and I'm in favor
> of
> >>>>>>>>> merging soonish.
> >>>>>>>>>
> >>>>>>>>> On Wed, Jan 18, 2017 at 12:24 AM, Ahmet Altay
> >>>>>>>>>
> >>>>>>>>> <al...@google.com.invalid
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Thank you all for the comments so far. I would follow the
> process
> >>>>>>>>>> as
> >>>>>>>>>>
> >>>>>>>>>> suggested by Davor and others in this thread.
> >>>>>>>>>>>
> >>>>>>>>>>> Ahmet
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Jan 17, 2017 at 11:47 PM, Sergio Fernández <
> >>>>>>>>>>> wik...@apache.org
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Hi
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Jan 17, 2017 at 5:22 PM, Ahmet Altay
> >>>>>>>>>>>>
> >>>>>>>>>>>> <al...@google.com.invalid
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> tl;dr: I would like to start a discussion about merging
> >>>>>>>>>>>>> python-sdk
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> branch
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> to master branch. Python SDK is mature enough and merging it to
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> master
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> accelerate its development and adoption.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Good point, Ahmet!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've following closed the development since it was imported in
> >>>>>>>>>>>> June.
> >>>>>>>>>>>>
> >>>>>>>>>>>> For
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> the prototypes I've implemented so far it works quite well; I
> >>>>>>>>> guess
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> we'd
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> just need to focus the next months in bringing more runners
> >>>>>>>>> support.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> With a great effort from a lot of contributors(*), Python SDK
> [1]
> >>>>>>>>>>>> is
> >>>>>>>>>>>>
> >>>>>>>>>>>> now
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> a
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> mostly complete, tested, performant Python implementation of
> the
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Beam
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> model. Since June, when we first started with Python SDK in
> >>>>>>>>> Apache
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Beam
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> we
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> have been continuously improving it.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I wouldn't merge during the preparation of 0.5.0 release, but
> >>>>>>>>>>>>>
> >>>>>>>>>>>> after
> >>>>>>>>>>>>
> >>>>>>>>>>>> that
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> could be a good time to merge back into master.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> ** Python SDK currently supports:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * Model: All main concepts are present (ParDo, GroupByKey,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Windowing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> etc.).
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> * IO: There are extensible APIs for writing new bounded sources
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> sinks.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Implementations are provided for Text, Avro, BigQuery, and
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Datastore.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> * Runners: Python SDK has an extensible base runner module that
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> allows
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> building specific runners on top of it. The SDK comes with two
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> pipeline
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> runners: DirectRunner and DataflowRunner; and it is possible to
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> add
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> more.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> The existing runners are currently limited to bounded execution
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> otherwise equivalent to their Java SDK counterparts in
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> functionality.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> What would the effort of porting, and maintaining, parallel
> >>>>>>>>>>>>>
> >>>>>>>>>>>> versions
> >>>>>>>>>>>>
> >>>>>>>>>>>> of
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Java runners? I guess I'd need to dig deeper in the model, but
> >>>>>>>>>>> this
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> may
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> represent a major effort for the project, right?
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> It is somewhat higher for DirectRunner because DirectRunner
> also
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> implements
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> the code for execution. It is not that high for DataflowRunner
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> because
> >>>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> base runner module has a lot of helpers with the right hooks for
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> implementing a generic runner. I would _expect_ the experience
> in
> >>>>>>>>>>>
> >>>>>>>>>>> general
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> would be similar to the latter.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> * Testing: Python SDK implements ValidatesRunner test
> framework
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> implementing integration test for current and future runners.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> There
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> is
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> unit
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> test coverage for all modules, and a number of integrations
> test
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> validating existing runners.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> * Documentation and examples: Documentation work has started on
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> Python
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> SDK.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Beam Programming Guide page has been updated to include Python
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [2].
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> The
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> code comes with many ready to use examples and we are in a good
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> place
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> start documenting those on the website.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> ** We are not done yet, next on the roadmap we have:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * Streaming: Both of the existing runners lack support for
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> streaming
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> execution, and currently there is work going on for adding
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> streaming
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> support to DirectRunner [3].
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> * Documentation: Filling the rest of the Beam documentations with
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> Python
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> SDK specific information and examples.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> * SDK consistency: Making Python SDK consistent with the Java
> >>>>>>>>>>>>> SDK.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> have
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> come a long way on this and have only a few items left [4].
> >>>>>>>>>>>>
> >>>>>>>>>>>>> * Beamifying: We have been working on removing
> >>>>>>>>>>>>> Dataflow-specific
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> references
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> both from the documentation and from the code. There is some
> >>>>>>>>>>>> work
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> left,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> and
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> we are currently working on those as well [5].
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ** Steps and implications of merging to master:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * Master branch is merged to python-sdk branch at regular
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> intervals
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> and
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> the
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> last merge was on 12/22. All the past merges were uneventful
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> because
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> there
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> is a minimal overlap in modified files between branches.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Integrating
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> python-sdk to master will similarly touch a small number of
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> existing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> files.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> * Python SDK is using the same tools for building and testing.
> >>>>>>>>>>>>> It
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> already integrated with Maven, Jenkins and Travis. Specifically
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> the
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> impact
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> to the testing infrastructure would be:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> - There will be two additional test configurations in Travis.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Since
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Travis
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> runs all configurations in parallel there should not be a
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> noticeable
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> change
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> in the Travis run time.
> >>>>>>>>>>>>
> >>>>>>>>>>>>> - Jenkins pre-commit test will start running the Python SDK
> >>>>>>>>>>>>> tests.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> It
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> will
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> add an additional 5 minutes to the completion time of
> pre-commit
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> test.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Historically Python SDK tests were not flaky and did not cause
> >>>>>>>>> any
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> random
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> failures.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> - Jenkins Python post-commit test is already separated from
> the
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> other
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> post-commit tests and will continue to exist. It would not
> change
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> the
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> testing time for any other test.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> * The release process needs to be updated to accommodate
> >>>>>>>>>>>>> releasing
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Python
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> artifacts. Python SDK would fit in the existing release
> schedule
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> could
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> be released along with the Java SDK. The additional steps would
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> include:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> - Generating Python artifacts. This could be done with a single
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> command
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> using Maven today.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> - Publishing the artifacts to a central repository such as PyPI.
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I'm more than happy to help on this. We left on purpose some
> >>>>>>>>>>>>>
> >>>>>>>>>>>> things
> >>>>>>>>>>>>
> >>>>>>>>>>>> open
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> when we added Maven support to the Python build.
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> That would be awesome. We can coordinate on that post-merge.
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> - Updating the release guide to reflect the changes above.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> * Users: There are existing users using the Python SDK. To
> >>>>>>>>>>>>> give a
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> rough
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> estimate, a distribution of the Beam Python SDK had a total of
> >>>>>>>>> 23K
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> downloads in the past 6 months [6]. Some of those users are
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> already
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> engaged
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> with the community (e.g. [7]). There might be an increased
> amount
> >>>>>>>>>>>>
> >>>>>>>>>>>>> engagement from the rest of them after the merge.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Python 3 support is something we definitively need to look
> >>>>>>>>>>>>> ahead.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> try
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> to make the codebase compatible with both 2.7.x and 3.6.x, rather
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> than
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> using other  solutions like 2to3.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>> I agree with you. I think it makes more sense to make codebase
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> compatible
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> with both. As you mentioned Python 3 support is not a short-term
> >>>>>>>>> goal
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> the roadmap, and we can discuss it more as we approach that.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Looking forward to hearing your thoughts and comments on
> >>>>>>>>>>>>
> >>>>>>>>>>>> “graduating”
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>> python-sdk to the master.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Thank you,
> >>>>>>>>>>>>> Ahmet
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> (*) Python SDK branch currently has a diverse group of
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> contributors.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Regular contributors include Charles Chen, Chamikara Jayalath,
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> María
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> García
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Herrero, Mark Liu, Pablo Estrada, Robert Bradshaw (Apache Beam
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> PMC),
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Sourabh Bajaj, and Vikas Kedigehalli. We have also had
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> contributions
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> from
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> Abdullah Bashir, Marco Buccini, Sergio Fernández, Seunghyun Lee,
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> Younghee Kwon.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> [1] https://github.com/apache/beam/tree/python-sdk/sdks/python
> >>>>>>>>>>>>> [2] https://beam.apache.org/documentation/programming-guide/
> >>>>>>>>>>>>> [3] https://issues.apache.org/jira/browse/BEAM-1265
> >>>>>>>>>>>>> [4]
> >>>>>>>>>>>>> https://issues.apache.org/jira/issues/?jql=status%20%3D%20Op
> >>>>>>>>>>>>> en%20AND%20labels%20%3D%20sdk-consistency
> >>>>>>>>>>>>> [5] https://issues.apache.org/jira/browse/BEAM-1218
> >>>>>>>>>>>>> [6] https://pypi.python.org/pypi/google-cloud-dataflow/json
> >>>>>>>>>>>>> [7] https://issues.apache.org/jira/browse/BEAM-1251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Great summary, Ahmet. Thanks.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Sergio Fernández
> >>>>>>>>>>>> Partner Technology Manager
> >>>>>>>>>>>> Redlink GmbH
> >>>>>>>>>>>> m: +43 6602747925
> >>>>>>>>>>>> e: sergio.fernan...@redlink.co
> >>>>>>>>>>>> w: http://redlink.co
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>> --
> >>>>>>
> >>>>> Jean-Baptiste Onofré
> >>>>> jbono...@apache.org
> >>>>> http://blog.nanthrax.net
> >>>>> Talend - http://www.talend.com
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> >>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>

Reply via email to