Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

Udi Meiri Thu, 18 Oct 2018 12:19:33 -0700

Unless I'm missing something, we can't run cython builds in parallel with
other builds without first making cython build artifacts appear in a
separate build directory.


AFAICT --parallel--safe-build is deprecated
<https://github.com/tox-dev/tox/pull/1032/files> / on by default.

I vote to move the python 2 test suite to postcommit, since python
2 + cython seems to already test the non-gcp use case with regards to hard
dependencies on gcp.


On Thu, Oct 18, 2018 at 7:00 AM Robert Bradshaw <rober...@google.com> wrote:

> On Wed, Oct 17, 2018 at 10:57 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> Gradle works pretty well at executing separate projects in parallel.
>> There are a few Java projects which contain only tests with different flags
>> which allow us to use the Gradle project based parallelization effectively.
>> See
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/examples/build.gradle
>> and
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/examples-streaming/build.gradle
>> since it runs the same set of tests, one with --streaming and the other
>> without. This should be able to work for Python as well.
>>
>
> +1, that's a great idea. Together with --parallel--safe-build should be
> sufficient.
>
> We could separately look into whether it's worth adding annotations
> (positive or negative) to mark tests which have low value to be run in all
> the different environments.
>
> On Wed, Oct 17, 2018 at 10:17 AM Udi Meiri <eh...@google.com> wrote:
>>
>>> On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw <rober...@google.com>
>>> wrote:
>>>
>>>> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri <eh...@google.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> In light of increasing Python pre-commit times due to the added Python
>>>>> 3 tests,
>>>>> I thought it might be time to re-evaluate the tools used for Python
>>>>> tests and development, and propose an alternative.
>>>>>
>>>>> Currently, we use nosetests, tox, and virtualenv for testing.
>>>>> The proposal is to use Bazel, which I believe can replace the above
>>>>> tools while adding:
>>>>> - parallel testing: each target has its own build directory,
>>>>>
>>>>
>>>> We could look at detox and/or retox again to get parallel testing
>>>> (though not, possibly, at such a low level). We tried detox for a while,
>>>> but there were issues debugging timeouts (specifically, it buffered the
>>>> stdout while testing to avoid multiplexing it, but that meant little info
>>>> when a test actually timed out on jenkins).
>>>>
>>>> We could alternatively look into leveraging gradle's within-project
>>>> paralleliziaton to speed this up. It is a pain that right now every Python
>>>> test is run sequentially.
>>>>
>>> I don't believe that Gradle has an easy solution. The only
>>> within-project parallilization I can find requires using the Worker API
>>> <https://guides.gradle.org/using-the-worker-api/?_ga=2.143780085.1705314017.1539791984-819557858.1539791984>
>>> .
>>>
>>> I've tried pytest-xdist with limited success (pickling the session with
>>> pytest-xdist's execnet dependency fails).
>>>
>>>
>>>>
>>>>
>>>>> providing isolation from build artifacts such as from Cython
>>>>>
>>>>
>>>> Each tox environment already has (I think) its own build directory. Or
>>>> is this not what we're seeing?
>>>>
>>> Cython-based unit test runs leave behind artifacts that must be cleaned
>>> up, which is why we can't run all tox environments in parallel.
>>> We use this script to clean up:
>>>
>>> https://github.com/apache/beam/blob/master/sdks/python/scripts/run_tox_cleanup.sh
>>>
>>>
>>> I am 90% certain that this would not be an issue with bazel, since it
>>> stages all build dependencies in a separate build directory, so any
>>> generated files would be placed there.
>>>
>>>
>>>>
>>>>> - incremental testing: it is possible to precisely define each test's
>>>>> dependencies
>>>>>
>>>>
>>>> This is a big plus. It would allow us to enforce non-dependence on
>>>> non-dependencies as well.
>>>>
>>>>
>>>>> There's also a requirement to test against specific Python versions,
>>>>> such as 2.7 and 3.4.
>>>>> This could be done using docker containers having the precise version
>>>>> of interpreter and Bazel.
>>>>>
>>>>
>>>> I'm generally -1 on requiring docker to run our unittests.
>>>>
>>> You would still run unit tests using Bazel (in terminal or with IDE
>>> integration, or even directly).
>>> Docker would be used to verify they pass on specific Python versions.
>>> (2.7, 3.4, 3.5, 3.6)
>>> I don't know how to maintain multiple Python versions on my workstation,
>>> let alone on Jenkins.
>>>
>>>
>>>>
>>>>
>>>>> In summary:
>>>>> Bazel could replace the need for virtualenv, tox, and nosetests.
>>>>> The addition of Docker images would allow testing against specific
>>>>> Python versions.
>>>>>
>>>>
>>>>
>>>>  To be clear, I really like Bazel, and would have liked to see it for
>>>> our top-level build, but there were some problems that were never
>>>> adequately addressed.
>>>>
>>>> (1) There were difficulties managing upstream dependencies correctly.
>>>> Perhaps there has been some improvement upstream since we last looked at
>>>> this (it was fairly new), and perhaps it's not as big a deal in Python, but
>>>> this was the blocker for using it for Beam as a whole.
>>>> (2) Bazel still has poor support for C (including Cython) extensions.
>>>> (3) It's unclear how this would interact with setup.py. Would we keep
>>>> both, using one for testing and the other for releases (sdist, wheels)?
>>>>
>>>> There's also the downside of introducing yet another build tool that's
>>>> not familiar to the Python community, rather than sticking with the
>>>> "standard" ones.
>>>>
>>>> I would, however, be interested in hearing others' thoughts on this
>>>> proposal.
>>>>
>>>>

smime.p7s
Description: S/MIME Cryptographic Signature

Re: [PROPOSAL] Using Bazel and Docker for Python SDK development and tests

Reply via email to