Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

Ahmet Altay Wed, 17 Oct 2018 18:22:23 -0700

On Wed, Oct 17, 2018 at 1:57 PM, Lukasz Cwik <lc...@google.com> wrote:


> Gradle works pretty well at executing separate projects in parallel. There
> are a few Java projects which contain only tests with different flags which
> allow us to use the Gradle project based parallelization effectively.
> See https://github.com/apache/beam/blob/master/runners/
> google-cloud-dataflow-java/examples/build.gradle and
> https://github.com/apache/beam/blob/master/runners/
> google-cloud-dataflow-java/examples-streaming/build.gradle since it runs
> the same set of tests, one with --streaming and the other without. This
> should be able to work for Python as well.
>
> The Worker API had some updates in the latest Gradle release but still
> seems rough to use.
>
> On Wed, Oct 17, 2018 at 10:17 AM Udi Meiri <eh...@google.com> wrote:
>
>> On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In light of increasing Python pre-commit times due to the added Python
>>>> 3 tests,
>>>> I thought it might be time to re-evaluate the tools used for Python
>>>> tests and development, and propose an alternative.
>>>>
>>>> Currently, we use nosetests, tox, and virtualenv for testing.
>>>> The proposal is to use Bazel, which I believe can replace the above
>>>> tools while adding:
>>>> - parallel testing: each target has its own build directory,
>>>>
>>>
>>> We could look at detox and/or retox again to get parallel testing
>>> (though not, possibly, at such a low level). We tried detox for a while,
>>> but there were issues debugging timeouts (specifically, it buffered the
>>> stdout while testing to avoid multiplexing it, but that meant little info
>>> when a test actually timed out on jenkins).
>>>
>>> We could alternatively look into leveraging gradle's within-project
>>> paralleliziaton to speed this up. It is a pain that right now every Python
>>> test is run sequentially.
>>>
>> I don't believe that Gradle has an easy solution. The only within-project
>> parallilization I can find requires using the Worker API
>> <https://guides.gradle.org/using-the-worker-api/?_ga=2.143780085.1705314017.1539791984-819557858.1539791984>
>> .
>>
>> I've tried pytest-xdist with limited success (pickling the session with
>> pytest-xdist's execnet dependency fails).
>>
>>
>>>
>>>
>>>> providing isolation from build artifacts such as from Cython
>>>>
>>>
>>> Each tox environment already has (I think) its own build directory. Or
>>> is this not what we're seeing?
>>>
>> Cython-based unit test runs leave behind artifacts that must be cleaned
>> up, which is why we can't run all tox environments in parallel.
>> We use this script to clean up:
>> https://github.com/apache/beam/blob/master/sdks/python/
>> scripts/run_tox_cleanup.sh
>>
>> I am 90% certain that this would not be an issue with bazel, since it
>> stages all build dependencies in a separate build directory, so any
>> generated files would be placed there.
>>
>>
>>>
>>>> - incremental testing: it is possible to precisely define each test's
>>>> dependencies
>>>>
>>>
>>> This is a big plus. It would allow us to enforce non-dependence on
>>> non-dependencies as well.
>>>
>>>
>>>> There's also a requirement to test against specific Python versions,
>>>> such as 2.7 and 3.4.
>>>> This could be done using docker containers having the precise version
>>>> of interpreter and Bazel.
>>>>
>>>
>>> I'm generally -1 on requiring docker to run our unittests.
>>>
>> You would still run unit tests using Bazel (in terminal or with IDE
>> integration, or even directly).
>> Docker would be used to verify they pass on specific Python versions.
>> (2.7, 3.4, 3.5, 3.6)
>> I don't know how to maintain multiple Python versions on my workstation,
>> let alone on Jenkins.
>>
>
I believe pyenv can do this without using docker.


>
>>
>>>
>>>
>>>> In summary:
>>>> Bazel could replace the need for virtualenv, tox, and nosetests.
>>>> The addition of Docker images would allow testing against specific
>>>> Python versions.
>>>>
>>>
>>>
>>>  To be clear, I really like Bazel, and would have liked to see it for
>>> our top-level build, but there were some problems that were never
>>> adequately addressed.
>>>
>>> (1) There were difficulties managing upstream dependencies correctly.
>>> Perhaps there has been some improvement upstream since we last looked at
>>> this (it was fairly new), and perhaps it's not as big a deal in Python, but
>>> this was the blocker for using it for Beam as a whole.
>>> (2) Bazel still has poor support for C (including Cython) extensions.
>>> (3) It's unclear how this would interact with setup.py. Would we keep
>>> both, using one for testing and the other for releases (sdist, wheels)?
>>>
>>> There's also the downside of introducing yet another build tool that's
>>> not familiar to the Python community, rather than sticking with the
>>> "standard" ones.
>>>
>>
This is also my biggest worry.

Aside from the top level build tool, I would rather keep the most
python-native way of building things. (Go is in a very similar state). At
the same time Udi is addressing a real problem with increasing build and
test times.


>
>>> I would, however, be interested in hearing others' thoughts on this
>>> proposal.
>>>
>>>
How about these alternative, some on the more extreme side:
- Drop non-cython builds and tests
- Run non-cython builds in parallel
- Move most combinations to post-commit tests

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

Reply via email to