Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Robert Bradshaw Fri, 03 Apr 2020 15:52:27 -0700

https://pypistats.org/packages/apache-beam is an interesting data point.


The good news: Python 3.x more than doubled to nearly 40% of downloads last
month. Interestingly, it looks like a good chunk of this increase was 3.5
(which is now the most popular 3.x version by this metric...)

I agree with using Python EOL dates as a baseline, with the possibility of
case-by-case adjustments. Refactoring our tests to support 3.8 without
increasing the load should be our focus now.


On Fri, Apr 3, 2020 at 3:41 PM Valentyn Tymofieiev <[email protected]>
wrote:

> Some good news on  Python 3.x support: thanks to +David Song
> <[email protected]> and +Yifan Zou <[email protected]> we now
> have Python 3.8 on Jenkins, and can start working on adding Python 3.8
> support to Beam (BEAM-8494).
>
> One interesting variable that has not being mentioned is what versions of
>> python 3
>> are available to users via their distribution channels (the linux
>> distributions they use to develop/run the pipelines).
>
>
> Good point. Looking at Ubuntu 16.04, which comes with Python 3.5.2, we can
> see that  the end-of-life for 16.04 is in 2024, end-of-support is April
> 2021 [1]. Both of these dates are beyond the announced Python 3.5 EOL in
> September 2020 [2]. I think it would be difficult for Beam to keep Py3.5
> support until these EOL dates, and users of systems that stock old versions
> of Python have viable workarounds:
> - install a newer version of Python interpreter via pyenv[3], from
> sources, or from alternative repositories.
> - use a docker container that comes with a newer version of interpreter.
> - use older versions of Beam.
>
> We didn't receive feedback from user@ on how long 3.x versions on the
> lower/higher end of the range should stay supported.  I would suggest for
> now that we plan to support all Python 3.x versions that were released and
> did not reach EOL. We can discuss exceptions to this rule on a case-by-case
> basis, evaluating any maintenance burden to continue support, or stop early.
>
> We should now focus on adjusting our Python test infrastructure to make it
> easy to split 3.5, 3.6, 3.7, 3.8  suites into high-priority and
> low-priority suites according to the Python version. Ideally, we
> should make it easy to change which versions are high/low priority without
> having to change all the individual test suites, and without losing test
> coverage signal.
>
> [1] https://wiki.ubuntu.com/Releases
> [2] https://devguide.python.org/#status-of-python-branches
> [3] https://github.com/pyenv/pyenv/blob/master/README.md
>
> On Fri, Feb 28, 2020 at 1:25 AM Ismaël Mejía <[email protected]> wrote:
>
>> One interesting variable that has not being mentioned is what versions of
>> python
>> 3 are available to users via their distribution channels (the linux
>> distributions they use to develop/run the pipelines).
>>
>> - RHEL 8 users have python 3.6 available
>> - RHEL 7 users have python 3.6 available
>> - Debian 10/Ubuntu 18.04 users have python 3.7/3.6 available
>> - Debian 9/Ubuntu 16.04 users have python 3.5 available
>>
>
>> We should consider this when we evaluate future support removals.
>>
>> Given  that the distros that support python 3.5 are ~4y old and since
>> python 3.5
>> is also losing LTS support soon is probably ok to not support it in Beam
>> anymore as Robert suggests.
>>
>>
>> On Thu, Feb 27, 2020 at 3:57 AM Valentyn Tymofieiev <[email protected]>
>> wrote:
>>
>>> Thanks everyone for sharing your perspectives so far. It sounds like we
>>> can mitigate the cost of test infrastructure by having:
>>> - a selection of (fast) tests that we will want to run against all
>>> Python versions we support.
>>> - high priority Python versions, which we will test extensively.
>>> - infrequent postcommit test that exercise low-priority versions.
>>> We will need test infrastructure improvements to have the flexibility of
>>> designating versions of high-pri/low-pri and minimizing efforts requiring
>>> adopting a new version.
>>>
>>> There is still a question of how long we want to support old Py3.x
>>> versions. As mentioned above, I think we should not support them beyond EOL
>>> (5 years after a release). I wonder if that is still too long. The cost of
>>> supporting a version may include:
>>>  - Developing against older Python version
>>>  - Release overhead (building & storing containers, wheels, doing
>>> release validation)
>>>  - Complexity / development cost to support the quirks of the minor
>>> versions.
>>>
>>> We can decide to drop support, after, say, 4 years, or after usage drops
>>> below a threshold, or decide on a case-by-case basis. Thoughts? Also asked
>>> for feedback on user@ [1]
>>>
>>> [1]
>>> https://lists.apache.org/thread.html/r630a3b55aa8e75c68c8252ea6f824c3ab231ad56e18d916dfb84d9e8%40%3Cuser.beam.apache.org%3E
>>>
>>> On Wed, Feb 26, 2020 at 5:27 PM Robert Bradshaw <[email protected]>
>>> wrote:
>>>
>>>> On Wed, Feb 26, 2020 at 5:21 PM Valentyn Tymofieiev <
>>>> [email protected]> wrote:
>>>> >
>>>> > > +1 to consulting users.
>>>> > I will message user@ as well and point to this thread.
>>>> >
>>>> > > I would propose getting in warnings about 3.5 EoL well ahead of
>>>> time.
>>>> > I think we should document on our website, and  in the code
>>>> (warnings) that users should not expect SDKs to be supported in Beam beyond
>>>> the EOL. If we want to have flexibility to drop support earlier than EOL,
>>>> we need to be more careful with messaging because users might otherwise
>>>> expect that support will last until EOL, if we mention EOL date.
>>>>
>>>> +1
>>>>
>>>> > I am hoping that we can establish a consensus for when we will be
>>>> dropping support for a version, so that we don't have to discuss it on a
>>>> case by case basis in the future.
>>>> >
>>>> > > I think it would makes sense to add support for 3.8 right away (or
>>>> at least get a good sense of what work needs to be done and what our
>>>> dependency situation is like)
>>>> > https://issues.apache.org/jira/browse/BEAM-8494 is a starting point.
>>>> I tried 3.8 a while ago some dependencies were not able to install, checked
>>>> again just now. SDK is "installable" after minor changes. Some tests don't
>>>> pass. BEAM-8494 does not have an owner atm, and if anyone is interested I'm
>>>> happy to give further pointers and help get started.
>>>> >
>>>> > > For the 3.x series, I think we will get the most signal out of the
>>>> lowest and highest version, and can get by with smoke tests +
>>>> > infrequent post-commits for the ones between.
>>>> >
>>>> > > I agree with having low-frequency tests for low-priority versions.
>>>> Low-priority versions could be determined according to least usage.
>>>> >
>>>> > These are good ideas. Do you think we will want to have an ability
>>>> to run some (inexpensive) tests for all versions  frequently (on
>>>> presubmits), or this is extra complexity that can be avoided? I am thinking
>>>> about type inference for example. Afaik inference logic is very sensitive
>>>> to the version. Would it be acceptable to catch  errors there in infrequent
>>>> postcommits or an early signal will be preferred?
>>>>
>>>> This is a good example--the type inference tests are sensitive to
>>>> version (due to using internal details and relying on the
>>>> still-evolving typing module) but also run in ~15 seconds. I think
>>>> these should be in precommits. We just don't need to run every test
>>>> for every version.
>>>>
>>>> > On Wed, Feb 26, 2020 at 5:17 PM Kyle Weaver <[email protected]>
>>>> wrote:
>>>> >>
>>>> >> Oh, I didn't see Robert's earlier email:
>>>> >>
>>>> >> > Currently 3.5 downloads sit at 3.7%, or about
>>>> >> > 20% of all Python 3 downloads.
>>>> >>
>>>> >> Where did these numbers come from?
>>>> >>
>>>> >> On Wed, Feb 26, 2020 at 5:15 PM Kyle Weaver <[email protected]>
>>>> wrote:
>>>> >>>
>>>> >>> > I agree with having low-frequency tests for low-priority versions.
>>>> >>> > Low-priority versions could be determined according to least
>>>> usage.
>>>> >>>
>>>> >>> +1. While the difference may not be as great between, say, 3.6 and
>>>> 3.7, I think that if we had to choose, it would be more useful to test the
>>>> versions folks are actually using the most. 3.5 only has about a third of
>>>> the Docker pulls of 3.6 or 3.7 [1]. Does anyone have other usage statistics
>>>> we can consult?
>>>> >>>
>>>> >>> [1] https://hub.docker.com/search?q=apachebeam%2Fpython&type=image
>>>> >>>
>>>> >>> On Wed, Feb 26, 2020 at 5:00 PM Ruoyun Huang <[email protected]>
>>>> wrote:
>>>> >>>>
>>>> >>>> I feel 4+ versions take too long to run anything.
>>>> >>>>
>>>> >>>> would vote for lowest + highest,  2 versions.
>>>> >>>>
>>>> >>>> On Wed, Feb 26, 2020 at 4:52 PM Udi Meiri <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> I agree with having low-frequency tests for low-priority versions.
>>>> >>>>> Low-priority versions could be determined according to least
>>>> usage.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Wed, Feb 26, 2020 at 4:06 PM Robert Bradshaw <
>>>> [email protected]> wrote:
>>>> >>>>>>
>>>> >>>>>> On Wed, Feb 26, 2020 at 3:29 PM Kenneth Knowles <[email protected]>
>>>> wrote:
>>>> >>>>>> >
>>>> >>>>>> > Are these divergent enough that they all need to consume
>>>> testing resources? For example can lower priority versions be daily runs or
>>>> some such?
>>>> >>>>>>
>>>> >>>>>> For the 3.x series, I think we will get the most signal out of
>>>> the
>>>> >>>>>> lowest and highest version, and can get by with smoke tests +
>>>> >>>>>> infrequent post-commits for the ones between.
>>>> >>>>>>
>>>> >>>>>> > Kenn
>>>> >>>>>> >
>>>> >>>>>> > On Wed, Feb 26, 2020 at 3:25 PM Robert Bradshaw <
>>>> [email protected]> wrote:
>>>> >>>>>> >>
>>>> >>>>>> >> +1 to consulting users. Currently 3.5 downloads sit at 3.7%,
>>>> or about
>>>> >>>>>> >> 20% of all Python 3 downloads.
>>>> >>>>>> >>
>>>> >>>>>> >> I would propose getting in warnings about 3.5 EoL well ahead
>>>> of time,
>>>> >>>>>> >> at the very least as part of the 2.7 warning.
>>>> >>>>>> >>
>>>> >>>>>> >> Fortunately, supporting multiple 3.x versions is
>>>> significantly easier
>>>> >>>>>> >> than spanning 2.7 and 3.x. I would rather not impose an
>>>> ordering on
>>>> >>>>>> >> dropping 3.5 and adding 3.8 but consider their merits
>>>> independently.
>>>> >>>>>> >>
>>>> >>>>>> >>
>>>> >>>>>> >> On Wed, Feb 26, 2020 at 3:16 PM Kyle Weaver <
>>>> [email protected]> wrote:
>>>> >>>>>> >> >
>>>> >>>>>> >> > 5 versions is too many IMO. We've had issues with Python
>>>> precommit resource usage in the past, and adding another version would
>>>> surely exacerbate those issues. And we have also already had to leave out
>>>> certain features on 3.5 [1]. Therefore, I am in favor of dropping 3.5
>>>> before adding 3.8. After dropping Python 2 and adding 3.8, that will leave
>>>> us with the latest three minor versions (3.6, 3.7, 3.8), which I think is
>>>> closer to the "sweet spot." Though I would be interested in hearing if
>>>> there are any users who would prefer we continue supporting 3.5.
>>>> >>>>>> >> >
>>>> >>>>>> >> > [1]
>>>> https://github.com/apache/beam/blob/8658b95545352e51f35959f38334f3c7df8b48eb/sdks/python/apache_beam/runners/portability/flink_runner.py#L55
>>>> >>>>>> >> >
>>>> >>>>>> >> > On Wed, Feb 26, 2020 at 3:00 PM Valentyn Tymofieiev <
>>>> [email protected]> wrote:
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> I would like to start a discussion about identifying a
>>>> guideline for answering questions like:
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> 1. When will Beam support a new Python version (say,
>>>> Python 3.8)?
>>>> >>>>>> >> >> 2. When will Beam drop support for an old Python version
>>>> (say, Python 3.5)?
>>>> >>>>>> >> >> 3. How many Python versions should we aim to support
>>>> concurrently (investigate issues, have continuous integration tests)?
>>>> >>>>>> >> >> 4. What comes first: adding support for a new version
>>>> (3.8) or deprecating older one (3.5)? This may affect the max load our test
>>>> infrastructure needs to sustain.
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> We are already getting requests for supporting Python 3.8
>>>> and there were some good reasons[1] to drop support for Python 3.5 (at
>>>> least, early versions of 3.5). Answering these questions would help set
>>>> expectations in Beam user community, Beam dev community, and  may help us
>>>> establish resource requirements for test infrastructure and plan efforts.
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> PEP-0602 [2] establishes a yearly release cycle for Python
>>>> versions starting from 3.9. Each release is a long-term support release and
>>>> is supported for 5 years: first 1.5 years allow for general bug fix
>>>> support, remaining 3.5 years have security fix support.
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> At every point, there may be up to 5 Python minor versions
>>>> that did not yet reach EOL, see "Release overlap with 12 month diagram"
>>>> [3]. We can try to support all of them, but that may come at a cost of
>>>> velocity: we will have more tests to maintain, and we will have to develop
>>>> Beam against a lower version for a longer period. Supporting less versions
>>>> will have implications for user experience. It also may be difficult to
>>>> ensure support of the most recent version early, since our  dependencies
>>>> (e.g. picklers) may not be supporting them yet.
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> Currently we support 4 Python versions (2.7, 3.5, 3.6,
>>>> 3.7).
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> Is 4 versions a sweet spot? Too much? Too little? What do
>>>> you think?
>>>> >>>>>> >> >>
>>>> >>>>>> >> >> [1]
>>>> https://github.com/apache/beam/pull/10821#issuecomment-590167711
>>>> >>>>>> >> >> [2] https://www.python.org/dev/peps/pep-0602/
>>>> >>>>>> >> >> [3] https://www.python.org/dev/peps/pep-0602/#id17
>>>>
>>>

Re: [DISCUSS] How many Python 3.x minor versions should Beam Python SDK aim to support concurrently?

Reply via email to