Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Yoshiki Obata Tue, 06 Oct 2020 06:53:10 -0700

I've written a mini doc[1] about how to update python tests to reduce
consumption test resources.
It would be helpful to check this and comment if there are better solutions.


[1] 
https://docs.google.com/document/d/1tfCWtMxfqjgsokjRkOGh2I4UAvX8B98ZOys0crzCMiw/edit?usp=sharing

2020年7月31日(金) 9:44 Valentyn Tymofieiev <valen...@google.com>:
>
> We have added Python 3.8 support in Apache Beam 2.23.0 release[1] and 
> established the plan to remove Python 2.7 support in 2.25.0 release[2].
>
> I think it is in the interest of the community to reduce the overhead 
> associated with adding and removing support of Python minor versions in Beam 
> in the future. To do so, I opened a ticket [3] to document the process of 
> adding/removing a Python version on the Beam website, and would like to recap 
> the discussion on this thread.
>
> It seems that the consensus is to align support of Python versions in Beam 
> with Python annual release cycle[4]. This means:
>
> 1. We will aim to add support for a new Python 3.x version in Beam as soon as 
> it is released.
> 2. After a Python 3.x version reaches the end of support[5], we will remove 
> support for this version in Beam, starting from the first Beam release that 
> is cut after the end-of-support date.
> 3. The rules above are our default course of action, but can be adjusted on a 
> case-by-case basis via a discussion on dev@.
>
> Please let me know if you think this needs further discussion.
>
> A corollary of 1-3 is that:
> - we should plan to remove support for Python 3.5 starting from 2.25.0 
> release, since Python 3.5 reaches[5] end-of-support on 2020-09-13, and we 
> plan to cut 2.25.0 on 2020-09-23 according to our release calendar [6],
> - we can start working on adding Python 3.9 support shortly after.
>
> Thanks,
> Valentyn
>
> [1] https://beam.apache.org/blog/beam-2.23.0/
> [2] 
> https://lists.apache.org/thread.html/r4be18d50ccfc5543a34e083f3e6711f9f3711110896f109f21f4677c%40%3Cdev.beam.apache.org%3E
> [3] https://issues.apache.org/jira/browse/BEAM-10605
> [4] https://www.python.org/dev/peps/pep-0602/
> [5] https://www.python.org/downloads/
> [6] 
> https://calendar.google.com/calendar/embed?src=0p73sl034k80oob7seouanigd0%40group.calendar.google.com
>
> On Thu, May 14, 2020 at 9:56 AM Yoshiki Obata <yoshiki.ob...@gmail.com> wrote:
>>
>> Thank you, Kyle and Valentyn.
>>
>> I'll update test codes to treat Python 3.5 and 3.7 as high-priority
>> versions at this point.
>>
>> 2020年5月12日(火) 2:10 Valentyn Tymofieiev <valen...@google.com>:
>> >
>> > I agree with the point echoed earlier that the lowest and the highest of 
>> > supported versions will probably give the most useful test signal for 
>> > possible breakages. So 3.5. and 3.7 as high-priority versions SGTM.
>> >
>> > This can change later once Beam drops 3.5 support.
>> >
>> > On Mon, May 11, 2020 at 10:05 AM Yoshiki Obata <yoshiki.ob...@gmail.com> 
>> > wrote:
>> >>
>> >> Hello again,
>> >>
>> >> Test infrastructure update is ongoing and then we should determine
>> >> which Python versions are high-priority.
>> >>
>> >> According to Pypi downloads stats[1], download proportion of Python
>> >> 3.5 is almost always greater than one of 3.6 and 3.7.
>> >> This situation has not changed since Robert told us Python 3.x
>> >> occupies nearly 40% of downloads[2]
>> >>
>> >> On the other hand, according to docker hub[3],
>> >> apachebeam/python3.x_sdk image downloaded the most is one of Python
>> >> 3.7 which was pointed by Kyle[4].
>> >>
>> >> Considering these stats, I think high-priority versions are 3.5 and 3.7.
>> >>
>> >> Is this assumption appropriate?
>> >> I would like to hear your thoughts about this.
>> >>
>> >> [1] https://pypistats.org/packages/apache-beam
>> >> [2] 
>> >> https://lists.apache.org/thread.html/r208c0d11639e790453a17249e511dbfe00a09f91bef8fcd361b4b74a%40%3Cdev.beam.apache.org%3E
>> >> [3] https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
>> >> [4] 
>> >> https://lists.apache.org/thread.html/r9ca9ad316dae3d60a3bf298eedbe4aeecab2b2664454cc352648abc9%40%3Cdev.beam.apache.org%3E
>> >>
>> >> 2020年5月6日(水) 12:48 Yoshiki Obata <yoshiki.ob...@gmail.com>:
>> >> >
>> >> > > Not sure how run_pylint.sh is related here - we should run linter on 
>> >> > > the entire codebase.
>> >> > ah, I mistyped... I meant run_pytest.sh
>> >> >
>> >> > > I am familiar with beam_PostCommit_PythonXX suites. Is there 
>> >> > > something specific about these suites that you wanted to know?
>> >> > Test suite runtime will depend on the number of  tests in the suite,
>> >> > how many tests we run in parallel, how long they take to run. To
>> >> > understand the load on test infrastructure we can monitor Beam test
>> >> > health metrics [1]. In particular, if time in queue[2] is high, it is
>> >> > a sign that there are not enough Jenkins slots available to start the
>> >> > test suite earlier.
>> >> > Sorry for ambiguous question. I wanted to know how to see the load on
>> >> > test infrastructure.
>> >> > The Grafana links you showed serves my purpose. Thank you.
>> >> >
>> >> > 2020年5月6日(水) 2:35 Valentyn Tymofieiev <valen...@google.com>:
>> >> > >
>> >> > > On Mon, May 4, 2020 at 7:06 PM Yoshiki Obata 
>> >> > > <yoshiki.ob...@gmail.com> wrote:
>> >> > >>
>> >> > >> Thank you for comment, Valentyn.
>> >> > >>
>> >> > >> > 1) We can seed the smoke test suite with typehints tests, and add 
>> >> > >> > more tests later if there is a need. We can identify them by the 
>> >> > >> > file path or by special attributes in test files. Identifying them 
>> >> > >> > using filepath seems simpler and independent of test runner.
>> >> > >>
>> >> > >> Yes, making run_pylint.sh allow target test file paths as arguments 
>> >> > >> is
>> >> > >> good way if could.
>> >> > >
>> >> > >
>> >> > > Not sure how run_pylint.sh is related here - we should run linter on 
>> >> > > the entire codebase.
>> >> > >
>> >> > >>
>> >> > >> > 3)  We should reduce the code duplication across  
>> >> > >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move 
>> >> > >> > the suite definition into a common file like 
>> >> > >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and 
>> >> > >> > populate individual suites 
>> >> > >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including 
>> >> > >> > the common file and/or logic from PythonNature [1].
>> >> > >>
>> >> > >> Exactly. I'll check it out.
>> >> > >>
>> >> > >> > 4) We have some tests that we run only under specific Python 3 
>> >> > >> > versions, for example: FlinkValidatesRunner test runs using Python 
>> >> > >> > 3.5: [2]
>> >> > >> > HDFS Python 3 tests are running only with Python 3.7 [3]. 
>> >> > >> > Cross-language Py3 tests for Spark are running under Python 
>> >> > >> > 3.5[4]: , there may be more test suites that selectively use 
>> >> > >> > particular versions.
>> >> > >> > We need to correct such suites, so that we do not tie them  to a 
>> >> > >> > specific Python version. I see several options here: such tests 
>> >> > >> > should run either for all high-priority versions, or run only 
>> >> > >> > under the lowest version among the high-priority versions.  We 
>> >> > >> > don't have to fix them all at the same time. In general, we should 
>> >> > >> > try to make it as easy as possible to configure, whether a suite 
>> >> > >> > runs across all  versions, all high-priority versions, or just one 
>> >> > >> > version.
>> >> > >>
>> >> > >> The way of high-priority/low-priority configuration would be useful 
>> >> > >> for this.
>> >> > >> And which versions to be tested may be related to 5).
>> >> > >>
>> >> > >> > 5) If postcommit suites (that need to run against all versions) 
>> >> > >> > still constitute too much load on the infrastructure, we may need 
>> >> > >> > to investigate how to run these suites less frequently.
>> >> > >>
>> >> > >> That's certainly true, beam_PostCommit_PythonXX and
>> >> > >> beam_PostCommit_Python_Chicago_Taxi_(Dataflow|Flink) take about 1
>> >> > >> hour.
>> >> > >> Does anyone have knowledge about this?
>> >> > >
>> >> > >
>> >> > > I am familiar with beam_PostCommit_PythonXX suites. Is there 
>> >> > > something specific about these suites that you wanted to know?
>> >> > > Test suite runtime will depend on the number of  tests in the suite, 
>> >> > > how many tests we run in parallel, how long they take to run. To 
>> >> > > understand the load on test infrastructure we can monitor Beam test 
>> >> > > health metrics [1]. In particular, if time in queue[2] is high, it is 
>> >> > > a sign that there are not enough Jenkins slots available to start the 
>> >> > > test suite earlier.
>> >> > >
>> >> > > [1] http://104.154.241.245/d/D81lW0pmk/post-commit-test-reliability
>> >> > > [2] 
>> >> > > http://104.154.241.245/d/_TNndF2iz/pre-commit-test-latency?orgId=1&from=1588094891600&to=1588699691600&panelId=6&fullscreen
>> >> > >
>> >> > >
>> >> > >>
>> >> > >> 2020年5月2日(土) 5:18 Valentyn Tymofieiev <valen...@google.com>:
>> >> > >> >
>> >> > >> > Hi Yoshiki,
>> >> > >> >
>> >> > >> > Thanks a lot for your help with Python 3 support so far and most 
>> >> > >> > recently, with your work on Python 3.8.
>> >> > >> >
>> >> > >> > Overall the proposal sounds good to me. I see several aspects here 
>> >> > >> > that we need to address:
>> >> > >> >
>> >> > >> > 1) We can seed the smoke test suite with typehints tests, and add 
>> >> > >> > more tests later if there is a need. We can identify them by the 
>> >> > >> > file path or by special attributes in test files. Identifying them 
>> >> > >> > using filepath seems simpler and independent of test runner.
>> >> > >> >
>> >> > >> > 2) Defining high priority/low priority versions in 
>> >> > >> > gradle.properties sounds good to me.
>> >> > >> >
>> >> > >> > 3)  We should reduce the code duplication across  
>> >> > >> > beam/sdks/python/test-suites/$runner/py3*. I think we could move 
>> >> > >> > the suite definition into a common file like 
>> >> > >> > beam/sdks/python/test-suites/$runner/build.gradle perhaps, and 
>> >> > >> > populate individual suites 
>> >> > >> > (beam/sdks/python/test-suites/$runner/py38/build.gradle) including 
>> >> > >> > the common file and/or logic from PythonNature [1].
>> >> > >> >
>> >> > >> > 4) We have some tests that we run only under specific Python 3 
>> >> > >> > versions, for example: FlinkValidatesRunner test runs using Python 
>> >> > >> > 3.5: [2]
>> >> > >> > HDFS Python 3 tests are running only with Python 3.7 [3]. 
>> >> > >> > Cross-language Py3 tests for Spark are running under Python 
>> >> > >> > 3.5[4]: , there may be more test suites that selectively use 
>> >> > >> > particular versions.
>> >> > >> >
>> >> > >> > We need to correct such suites, so that we do not tie them  to a 
>> >> > >> > specific Python version. I see several options here: such tests 
>> >> > >> > should run either for all high-priority versions, or run only 
>> >> > >> > under the lowest version among the high-priority versions.  We 
>> >> > >> > don't have to fix them all at the same time. In general, we should 
>> >> > >> > try to make it as easy as possible to configure, whether a suite 
>> >> > >> > runs across all  versions, all high-priority versions, or just one 
>> >> > >> > version.
>> >> > >> >
>> >> > >> > 5) If postcommit suites (that need to run against all versions) 
>> >> > >> > still constitute too much load on the infrastructure, we may need 
>> >> > >> > to investigate how to run these suites less frequently.
>> >> > >> >
>> >> > >> > [1] 
>> >> > >> > https://github.com/apache/beam/blob/b78c7ed4836e44177a149155581cfa8188e8f748/sdks/python/test-suites/portable/py37/build.gradle#L19-L20
>> >> > >> > [2] 
>> >> > >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_Python35_ValidatesRunner_Flink.groovy#L34
>> >> > >> > [3] 
>> >> > >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/sdks/python/test-suites/direct/py37/build.gradle#L58
>> >> > >> > [4] 
>> >> > >> > https://github.com/apache/beam/blob/93181e792f648122d3b4a5080d683f21c6338132/.test-infra/jenkins/job_PostCommit_CrossLanguageValidatesRunner_Spark.groovy#L44
>> >> > >> >
>> >> > >> > On Fri, May 1, 2020 at 8:42 AM Yoshiki Obata 
>> >> > >> > <yoshiki.ob...@gmail.com> wrote:
>> >> > >> >>
>> >> > >> >> Hello everyone.
>> >> > >> >>
>> >> > >> >> I'm working on Python 3.8 support[1] and now is the time for 
>> >> > >> >> preparing
>> >> > >> >> test infrastructure.
>> >> > >> >> According to the discussion, I've considered how to prioritize 
>> >> > >> >> tests.
>> >> > >> >> My plan is as below. I'd like to get your thoughts on this.
>> >> > >> >>
>> >> > >> >> - With all low-pri Python, apache_beam.typehints.*_test run in the
>> >> > >> >> PreCommit test.
>> >> > >> >>   New gradle task should be defined like "preCommitPy3*-minimum".
>> >> > >> >>   If there are essential tests for all versions other than 
>> >> > >> >> typehints,
>> >> > >> >> please point out.
>> >> > >> >>
>> >> > >> >> - With high-pri Python, the same tests as running in the current
>> >> > >> >> PreCommit test run for testing extensively; 
>> >> > >> >> "tox:py3*:preCommitPy3*",
>> >> > >> >> "dataflow:py3*:preCommitIT" and "dataflow:py3*:preCommitIT_V2".
>> >> > >> >>
>> >> > >> >> - Low-pri versions' whole PreCommit tests are moved to each 
>> >> > >> >> PostCommit tests.
>> >> > >> >>
>> >> > >> >> - High-pri and low-pri versions are defined in gralde.properties 
>> >> > >> >> and
>> >> > >> >> PreCommit/PostCommit task dependencies are built dynamically 
>> >> > >> >> according
>> >> > >> >> to them.
>> >> > >> >>   It would be easy for switching priorities of Python versions.
>> >> > >> >>
>> >> > >> >> [1] https://issues.apache.org/jira/browse/BEAM-8494
>> >> > >> >>
>> >> > >> >> 2020年4月4日(土) 7:51 Robert Bradshaw <rober...@google.com>:
>> >> > >> >> >
>> >> > >> >> > https://pypistats.org/packages/apache-beam is an interesting 
>> >> > >> >> > data point.
>> >> > >> >> >
>> >> > >> >> > The good news: Python 3.x more than doubled to nearly 40% of 
>> >> > >> >> > downloads last month. Interestingly, it looks like a good chunk 
>> >> > >> >> > of this increase was 3.5 (which is now the most popular 3.x 
>> >> > >> >> > version by this metric...)
>> >> > >> >> >
>> >> > >> >> > I agree with using Python EOL dates as a baseline, with the 
>> >> > >> >> > possibility of case-by-case adjustments. Refactoring our tests 
>> >> > >> >> > to support 3.8 without increasing the load should be our focus 
>> >> > >> >> > now.
>> >> > >> >> >
>> >> > >> >> >
>> >> > >> >> > On Fri, Apr 3, 2020 at 3:41 PM Valentyn Tymofieiev 
>> >> > >> >> > <valen...@google.com> wrote:
>> >> > >> >> >>
>> >> > >> >> >> Some good news on  Python 3.x support: thanks to +David Song 
>> >> > >> >> >> and +Yifan Zou we now have Python 3.8 on Jenkins, and can 
>> >> > >> >> >> start working on adding Python 3.8 support to Beam (BEAM-8494).
>> >> > >> >> >>
>> >> > >> >> >>> One interesting variable that has not being mentioned is what 
>> >> > >> >> >>> versions of python 3
>> >> > >> >> >>> are available to users via their distribution channels (the 
>> >> > >> >> >>> linux
>> >> > >> >> >>> distributions they use to develop/run the pipelines).
>> >> > >> >> >>
>> >> > >> >> >>
>> >> > >> >> >> Good point. Looking at Ubuntu 16.04, which comes with Python 
>> >> > >> >> >> 3.5.2, we can see that  the end-of-life for 16.04 is in 2024, 
>> >> > >> >> >> end-of-support is April 2021 [1]. Both of these dates are 
>> >> > >> >> >> beyond the announced Python 3.5 EOL in September 2020 [2]. I 
>> >> > >> >> >> think it would be difficult for Beam to keep Py3.5 support 
>> >> > >> >> >> until these EOL dates, and users of systems that stock old 
>> >> > >> >> >> versions of Python have viable workarounds:
>> >> > >> >> >> - install a newer version of Python interpreter via pyenv[3], 
>> >> > >> >> >> from sources, or from alternative repositories.
>> >> > >> >> >> - use a docker container that comes with a newer version of 
>> >> > >> >> >> interpreter.
>> >> > >> >> >> - use older versions of Beam.
>> >> > >> >> >>
>> >> > >> >> >> We didn't receive feedback from user@ on how long 3.x versions 
>> >> > >> >> >> on the lower/higher end of the range should stay supported.  I 
>> >> > >> >> >> would suggest for now that we plan to support all Python 3.x 
>> >> > >> >> >> versions that were released and did not reach EOL. We can 
>> >> > >> >> >> discuss exceptions to this rule on a case-by-case basis, 
>> >> > >> >> >> evaluating any maintenance burden to continue support, or stop 
>> >> > >> >> >> early.
>> >> > >> >> >>
>> >> > >> >> >> We should now focus on adjusting our Python test 
>> >> > >> >> >> infrastructure to make it easy to split 3.5, 3.6, 3.7, 3.8  
>> >> > >> >> >> suites into high-priority and low-priority suites according to 
>> >> > >> >> >> the Python version. Ideally, we should make it easy to change 
>> >> > >> >> >> which versions are high/low priority without having to change 
>> >> > >> >> >> all the individual test suites, and without losing test 
>> >> > >> >> >> coverage signal.
>> >> > >> >> >>
>> >> > >> >> >> [1] https://wiki.ubuntu.com/Releases
>> >> > >> >> >> [2] https://devguide.python.org/#status-of-python-branches
>> >> > >> >> >> [3] https://github.com/pyenv/pyenv/blob/master/README.md
>> >> > >> >> >>
>> >> > >> >> >> On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía 
>> >> > >> >> >> <ieme...@gmail.com> wrote:
>> >> > >> >> >>>
>> >> > >> >> >>> One interesting variable that has not being mentioned is what 
>> >> > >> >> >>> versions of python
>> >> > >> >> >>> 3 are available to users via their distribution channels (the 
>> >> > >> >> >>> linux
>> >> > >> >> >>> distributions they use to develop/run the pipelines).
>> >> > >> >> >>>
>> >> > >> >> >>> - RHEL 8 users have python 3.6 available
>> >> > >> >> >>> - RHEL 7 users have python 3.6 available
>> >> > >> >> >>> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
>> >> > >> >> >>> - Debian 9/Ubuntu 16.04 users have python 3.5 available
>> >> > >> >> >>>
>> >> > >> >> >>>
>> >> > >> >> >>> We should consider this when we evaluate future support 
>> >> > >> >> >>> removals.
>> >> > >> >> >>>
>> >> > >> >> >>> Given  that the distros that support python 3.5 are ~4y old 
>> >> > >> >> >>> and since python 3.5
>> >> > >> >> >>> is also losing LTS support soon is probably ok to not support 
>> >> > >> >> >>> it in Beam
>> >> > >> >> >>> anymore as Robert suggests.
>> >> > >> >> >>>
>> >> > >> >> >>>
>> >> > >> >> >>> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev 
>> >> > >> >> >>> <valen...@google.com> wrote:
>> >> > >> >> >>>>
>> >> > >> >> >>>> Thanks everyone for sharing your perspectives so far. It 
>> >> > >> >> >>>> sounds like we can mitigate the cost of test infrastructure 
>> >> > >> >> >>>> by having:
>> >> > >> >> >>>> - a selection of (fast) tests that we will want to run 
>> >> > >> >> >>>> against all Python versions we support.
>> >> > >> >> >>>> - high priority Python versions, which we will test 
>> >> > >> >> >>>> extensively.
>> >> > >> >> >>>> - infrequent postcommit test that exercise low-priority 
>> >> > >> >> >>>> versions.
>> >> > >> >> >>>> We will need test infrastructure improvements to have the 
>> >> > >> >> >>>> flexibility of designating versions of high-pri/low-pri and 
>> >> > >> >> >>>> minimizing efforts requiring adopting a new version.
>> >> > >> >> >>>>
>> >> > >> >> >>>> There is still a question of how long we want to support old 
>> >> > >> >> >>>> Py3.x versions. As mentioned above, I think we should not 
>> >> > >> >> >>>> support them beyond EOL (5 years after a release). I wonder 
>> >> > >> >> >>>> if that is still too long. The cost of supporting a version 
>> >> > >> >> >>>> may include:
>> >> > >> >> >>>>  - Developing against older Python version
>> >> > >> >> >>>>  - Release overhead (building & storing containers, wheels, 
>> >> > >> >> >>>> doing release validation)
>> >> > >> >> >>>>  - Complexity / development cost to support the quirks of 
>> >> > >> >> >>>> the minor versions.
>> >> > >> >> >>>>
>> >> > >> >> >>>> We can decide to drop support, after, say, 4 years, or after 
>> >> > >> >> >>>> usage drops below a threshold, or decide on a case-by-case 
>> >> > >> >> >>>> basis. Thoughts? Also asked for feedback on user@ [1]
>> >> > >> >> >>>>
>> >> > >> >> >>>> [1] 
>> >> > >> >> >>>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
>> >> > >> >> >>>>
>> >> > >> >> >>>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw 
>> >> > >> >> >>>> <rober...@google.com> wrote:
>> >> > >> >> >>>>>
>> >> > >> >> >>>>> On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev 
>> >> > >> >> >>>>> <valen...@google.com> wrote:
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > > +1 to consulting users.
>> >> > >> >> >>>>> > I will message user@ as well and point to this thread.
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > > I would propose getting in warnings about 3.5 EoL well 
>> >> > >> >> >>>>> > > ahead of time.
>> >> > >> >> >>>>> > I think we should document on our website, and  in the 
>> >> > >> >> >>>>> > code (warnings) that users should not expect SDKs to be 
>> >> > >> >> >>>>> > supported in Beam beyond the EOL. If we want to have 
>> >> > >> >> >>>>> > flexibility to drop support earlier than EOL, we need to 
>> >> > >> >> >>>>> > be more careful with messaging because users might 
>> >> > >> >> >>>>> > otherwise expect that support will last until EOL, if we 
>> >> > >> >> >>>>> > mention EOL date.
>> >> > >> >> >>>>>
>> >> > >> >> >>>>> +1
>> >> > >> >> >>>>>
>> >> > >> >> >>>>> > I am hoping that we can establish a consensus for when we 
>> >> > >> >> >>>>> > will be dropping support for a version, so that we don't 
>> >> > >> >> >>>>> > have to discuss it on a case by case basis in the future.
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > > I think it would makes sense to add support for 3.8 
>> >> > >> >> >>>>> > > right away (or at least get a good sense of what work 
>> >> > >> >> >>>>> > > needs to be done and what our dependency situation is 
>> >> > >> >> >>>>> > > like)
>> >> > >> >> >>>>> > https://issues.apache.org/jira/browse/BEAM-8494 is a 
>> >> > >> >> >>>>> > starting point. I tried 3.8 a while ago some dependencies 
>> >> > >> >> >>>>> > were not able to install, checked again just now. SDK is 
>> >> > >> >> >>>>> > "installable" after minor changes. Some tests don't pass. 
>> >> > >> >> >>>>> > BEAM-8494 does not have an owner atm, and if anyone is 
>> >> > >> >> >>>>> > interested I'm happy to give further pointers and help 
>> >> > >> >> >>>>> > get started.
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > > For the 3.x series, I think we will get the most signal 
>> >> > >> >> >>>>> > > out of the lowest and highest version, and can get by 
>> >> > >> >> >>>>> > > with smoke tests +
>> >> > >> >> >>>>> > infrequent post-commits for the ones between.
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > > I agree with having low-frequency tests for 
>> >> > >> >> >>>>> > > low-priority versions. Low-priority versions could be 
>> >> > >> >> >>>>> > > determined according to least usage.
>> >> > >> >> >>>>> >
>> >> > >> >> >>>>> > These are good ideas. Do you think we will want to have 
>> >> > >> >> >>>>> > an ability  to run some (inexpensive) tests for all 
>> >> > >> >> >>>>> > versions  frequently (on presubmits), or this is extra 
>> >> > >> >> >>>>> > complexity that can be avoided? I am thinking about type 
>> >> > >> >> >>>>> > inference for example. Afaik inference logic is very 
>> >> > >> >> >>>>> > sensitive to the version. Would it be acceptable to catch 
>> >> > >> >> >>>>> >  errors there in infrequent postcommits or an early 
>> >> > >> >> >>>>> > signal will be preferred?
>> >> > >> >> >>>>>
>> >> > >> >> >>>>> This is a good example--the type inference tests are 
>> >> > >> >> >>>>> sensitive to
>> >> > >> >> >>>>> version (due to using internal details and relying on the
>> >> > >> >> >>>>> still-evolving typing module) but also run in ~15 seconds. 
>> >> > >> >> >>>>> I think
>> >> > >> >> >>>>> these should be in precommits. We just don't need to run 
>> >> > >> >> >>>>> every test
>> >> > >> >> >>>>> for every version.
>> >> > >> >> >>>>>
>> >> > >> >> >>>>> > On Wed, Feb 26, 2020 at 5:17 PM Kyle Weaver 
>> >> > >> >> >>>>> > <kcwea...@google.com> wrote:
>> >> > >> >> >>>>> >>
>> >> > >> >> >>>>> >> Oh, I didn't see Robert's earlier email:
>> >> > >> >> >>>>> >>
>> >> > >> >> >>>>> >> > Currently 3.5 downloads sit at 3.7%, or about
>> >> > >> >> >>>>> >> > 20% of all Python 3 downloads.
>> >> > >> >> >>>>> >>
>> >> > >> >> >>>>> >> Where did these numbers come from?
>> >> > >> >> >>>>> >>
>> >> > >> >> >>>>> >> On Wed, Feb 26, 2020 at 5:15 PM Kyle Weaver 
>> >> > >> >> >>>>> >> <kcwea...@google.com> wrote:
>> >> > >> >> >>>>> >>>
>> >> > >> >> >>>>> >>> > I agree with having low-frequency tests for 
>> >> > >> >> >>>>> >>> > low-priority versions.
>> >> > >> >> >>>>> >>> > Low-priority versions could be determined according 
>> >> > >> >> >>>>> >>> > to least usage.
>> >> > >> >> >>>>> >>>
>> >> > >> >> >>>>> >>> +1. While the difference may not be as great between, 
>> >> > >> >> >>>>> >>> say, 3.6 and 3.7, I think that if we had to choose, it 
>> >> > >> >> >>>>> >>> would be more useful to test the versions folks are 
>> >> > >> >> >>>>> >>> actually using the most. 3.5 only has about a third of 
>> >> > >> >> >>>>> >>> the Docker pulls of 3.6 or 3.7 [1]. Does anyone have 
>> >> > >> >> >>>>> >>> other usage statistics we can consult?
>> >> > >> >> >>>>> >>>
>> >> > >> >> >>>>> >>> [1] 
>> >> > >> >> >>>>> >>> https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
>> >> > >> >> >>>>> >>>
>> >> > >> >> >>>>> >>> On Wed, Feb 26, 2020 at 5:00 PM Ruoyun Huang 
>> >> > >> >> >>>>> >>> <ruo...@google.com> wrote:
>> >> > >> >> >>>>> >>>>
>> >> > >> >> >>>>> >>>> I feel 4+ versions take too long to run anything.
>> >> > >> >> >>>>> >>>>
>> >> > >> >> >>>>> >>>> would vote for lowest + highest,  2 versions.
>> >> > >> >> >>>>> >>>>
>> >> > >> >> >>>>> >>>> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri 
>> >> > >> >> >>>>> >>>> <eh...@google.com> wrote:
>> >> > >> >> >>>>> >>>>>
>> >> > >> >> >>>>> >>>>> I agree with having low-frequency tests for 
>> >> > >> >> >>>>> >>>>> low-priority versions.
>> >> > >> >> >>>>> >>>>> Low-priority versions could be determined according 
>> >> > >> >> >>>>> >>>>> to least usage.
>> >> > >> >> >>>>> >>>>>
>> >> > >> >> >>>>> >>>>>
>> >> > >> >> >>>>> >>>>>
>> >> > >> >> >>>>> >>>>> On Wed, Feb 26, 2020 at 4:06 PM Robert Bradshaw 
>> >> > >> >> >>>>> >>>>> <rober...@google.com> wrote:
>> >> > >> >> >>>>> >>>>>>
>> >> > >> >> >>>>> >>>>>> On Wed, Feb 26, 2020 at 3:29 PM Kenneth Knowles 
>> >> > >> >> >>>>> >>>>>> <k...@apache.org> wrote:
>> >> > >> >> >>>>> >>>>>> >
>> >> > >> >> >>>>> >>>>>> > Are these divergent enough that they all need to 
>> >> > >> >> >>>>> >>>>>> > consume testing resources? For example can lower 
>> >> > >> >> >>>>> >>>>>> > priority versions be daily runs or some such?
>> >> > >> >> >>>>> >>>>>>
>> >> > >> >> >>>>> >>>>>> For the 3.x series, I think we will get the most 
>> >> > >> >> >>>>> >>>>>> signal out of the
>> >> > >> >> >>>>> >>>>>> lowest and highest version, and can get by with 
>> >> > >> >> >>>>> >>>>>> smoke tests +
>> >> > >> >> >>>>> >>>>>> infrequent post-commits for the ones between.
>> >> > >> >> >>>>> >>>>>>
>> >> > >> >> >>>>> >>>>>> > Kenn
>> >> > >> >> >>>>> >>>>>> >
>> >> > >> >> >>>>> >>>>>> > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw 
>> >> > >> >> >>>>> >>>>>> > <rober...@google.com> wrote:
>> >> > >> >> >>>>> >>>>>> >>
>> >> > >> >> >>>>> >>>>>> >> +1 to consulting users. Currently 3.5 downloads 
>> >> > >> >> >>>>> >>>>>> >> sit at 3.7%, or about
>> >> > >> >> >>>>> >>>>>> >> 20% of all Python 3 downloads.
>> >> > >> >> >>>>> >>>>>> >>
>> >> > >> >> >>>>> >>>>>> >> I would propose getting in warnings about 3.5 EoL 
>> >> > >> >> >>>>> >>>>>> >> well ahead of time,
>> >> > >> >> >>>>> >>>>>> >> at the very least as part of the 2.7 warning.
>> >> > >> >> >>>>> >>>>>> >>
>> >> > >> >> >>>>> >>>>>> >> Fortunately, supporting multiple 3.x versions is 
>> >> > >> >> >>>>> >>>>>> >> significantly easier
>> >> > >> >> >>>>> >>>>>> >> than spanning 2.7 and 3.x. I would rather not 
>> >> > >> >> >>>>> >>>>>> >> impose an ordering on
>> >> > >> >> >>>>> >>>>>> >> dropping 3.5 and adding 3.8 but consider their 
>> >> > >> >> >>>>> >>>>>> >> merits independently.
>> >> > >> >> >>>>> >>>>>> >>
>> >> > >> >> >>>>> >>>>>> >>
>> >> > >> >> >>>>> >>>>>> >> On Wed, Feb 26, 2020 at 3:16 PM Kyle Weaver 
>> >> > >> >> >>>>> >>>>>> >> <kcwea...@google.com> wrote:
>> >> > >> >> >>>>> >>>>>> >> >
>> >> > >> >> >>>>> >>>>>> >> > 5 versions is too many IMO. We've had issues 
>> >> > >> >> >>>>> >>>>>> >> > with Python precommit resource usage in the 
>> >> > >> >> >>>>> >>>>>> >> > past, and adding another version would surely 
>> >> > >> >> >>>>> >>>>>> >> > exacerbate those issues. And we have also 
>> >> > >> >> >>>>> >>>>>> >> > already had to leave out certain features on 
>> >> > >> >> >>>>> >>>>>> >> > 3.5 [1]. Therefore, I am in favor of dropping 
>> >> > >> >> >>>>> >>>>>> >> > 3.5 before adding 3.8. After dropping Python 2 
>> >> > >> >> >>>>> >>>>>> >> > and adding 3.8, that will leave us with the 
>> >> > >> >> >>>>> >>>>>> >> > latest three minor versions (3.6, 3.7, 3.8), 
>> >> > >> >> >>>>> >>>>>> >> > which I think is closer to the "sweet spot." 
>> >> > >> >> >>>>> >>>>>> >> > Though I would be interested in hearing if 
>> >> > >> >> >>>>> >>>>>> >> > there are any users who would prefer we 
>> >> > >> >> >>>>> >>>>>> >> > continue supporting 3.5.
>> >> > >> >> >>>>> >>>>>> >> >
>> >> > >> >> >>>>> >>>>>> >> > [1] 
>> >> > >> >> >>>>> >>>>>> >> > https://github.com/apache/beam/blob/8658b95545352e51f35959f38334f3c7df8b48eb/sdks/python/apache_beam/runners/portability/flink_runner.py#L55
>> >> > >> >> >>>>> >>>>>> >> >
>> >> > >> >> >>>>> >>>>>> >> > On Wed, Feb 26, 2020 at 3:00 PM Valentyn 
>> >> > >> >> >>>>> >>>>>> >> > Tymofieiev <valen...@google.com> wrote:
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> I would like to start a discussion about 
>> >> > >> >> >>>>> >>>>>> >> >> identifying a guideline for answering 
>> >> > >> >> >>>>> >>>>>> >> >> questions like:
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> 1. When will Beam support a new Python version 
>> >> > >> >> >>>>> >>>>>> >> >> (say, Python 3.8)?
>> >> > >> >> >>>>> >>>>>> >> >> 2. When will Beam drop support for an old 
>> >> > >> >> >>>>> >>>>>> >> >> Python version (say, Python 3.5)?
>> >> > >> >> >>>>> >>>>>> >> >> 3. How many Python versions should we aim to 
>> >> > >> >> >>>>> >>>>>> >> >> support concurrently (investigate issues, have 
>> >> > >> >> >>>>> >>>>>> >> >> continuous integration tests)?
>> >> > >> >> >>>>> >>>>>> >> >> 4. What comes first: adding support for a new 
>> >> > >> >> >>>>> >>>>>> >> >> version (3.8) or deprecating older one (3.5)? 
>> >> > >> >> >>>>> >>>>>> >> >> This may affect the max load our test 
>> >> > >> >> >>>>> >>>>>> >> >> infrastructure needs to sustain.
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> We are already getting requests for supporting 
>> >> > >> >> >>>>> >>>>>> >> >> Python 3.8 and there were some good reasons[1] 
>> >> > >> >> >>>>> >>>>>> >> >> to drop support for Python 3.5 (at least, 
>> >> > >> >> >>>>> >>>>>> >> >> early versions of 3.5). Answering these 
>> >> > >> >> >>>>> >>>>>> >> >> questions would help set expectations in Beam 
>> >> > >> >> >>>>> >>>>>> >> >> user community, Beam dev community, and  may 
>> >> > >> >> >>>>> >>>>>> >> >> help us establish resource requirements for 
>> >> > >> >> >>>>> >>>>>> >> >> test infrastructure and plan efforts.
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> PEP-0602 [2] establishes a yearly release 
>> >> > >> >> >>>>> >>>>>> >> >> cycle for Python versions starting from 3.9. 
>> >> > >> >> >>>>> >>>>>> >> >> Each release is a long-term support release 
>> >> > >> >> >>>>> >>>>>> >> >> and is supported for 5 years: first 1.5 years 
>> >> > >> >> >>>>> >>>>>> >> >> allow for general bug fix support, remaining 
>> >> > >> >> >>>>> >>>>>> >> >> 3.5 years have security fix support.
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> At every point, there may be up to 5 Python 
>> >> > >> >> >>>>> >>>>>> >> >> minor versions that did not yet reach EOL, see 
>> >> > >> >> >>>>> >>>>>> >> >> "Release overlap with 12 month diagram" [3]. 
>> >> > >> >> >>>>> >>>>>> >> >> We can try to support all of them, but that 
>> >> > >> >> >>>>> >>>>>> >> >> may come at a cost of velocity: we will have 
>> >> > >> >> >>>>> >>>>>> >> >> more tests to maintain, and we will have to 
>> >> > >> >> >>>>> >>>>>> >> >> develop Beam against a lower version for a 
>> >> > >> >> >>>>> >>>>>> >> >> longer period. Supporting less versions will 
>> >> > >> >> >>>>> >>>>>> >> >> have implications for user experience. It also 
>> >> > >> >> >>>>> >>>>>> >> >> may be difficult to ensure support of the most 
>> >> > >> >> >>>>> >>>>>> >> >> recent version early, since our  dependencies 
>> >> > >> >> >>>>> >>>>>> >> >> (e.g. picklers) may not be supporting them yet.
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> Currently we support 4 Python versions (2.7, 
>> >> > >> >> >>>>> >>>>>> >> >> 3.5, 3.6, 3.7).
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> Is 4 versions a sweet spot? Too much? Too 
>> >> > >> >> >>>>> >>>>>> >> >> little? What do you think?
>> >> > >> >> >>>>> >>>>>> >> >>
>> >> > >> >> >>>>> >>>>>> >> >> [1] 
>> >> > >> >> >>>>> >>>>>> >> >> https://github.com/apache/beam/pull/10821#issuecomment-590167711
>> >> > >> >> >>>>> >>>>>> >> >> [2] https://www.python.org/dev/peps/pep-0602/
>> >> > >> >> >>>>> >>>>>> >> >> [3] 
>> >> > >> >> >>>>> >>>>>> >> >> https://www.python.org/dev/peps/pep-0602/#id17

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Reply via email to