On Wed, Oct 17, 2018 at 10:57 PM Lukasz Cwik <lc...@google.com> wrote:

> Gradle works pretty well at executing separate projects in parallel. There
> are a few Java projects which contain only tests with different flags which
> allow us to use the Gradle project based parallelization effectively.
> See
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/examples/build.gradle
> and
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/examples-streaming/build.gradle
> since it runs the same set of tests, one with --streaming and the other
> without. This should be able to work for Python as well.
>

+1, that's a great idea. Together with --parallel--safe-build should be
sufficient.

We could separately look into whether it's worth adding annotations
(positive or negative) to mark tests which have low value to be run in all
the different environments.

On Wed, Oct 17, 2018 at 10:17 AM Udi Meiri <eh...@google.com> wrote:
>
>> On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw <rober...@google.com>
>> wrote:
>>
>>> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri <eh...@google.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> In light of increasing Python pre-commit times due to the added Python
>>>> 3 tests,
>>>> I thought it might be time to re-evaluate the tools used for Python
>>>> tests and development, and propose an alternative.
>>>>
>>>> Currently, we use nosetests, tox, and virtualenv for testing.
>>>> The proposal is to use Bazel, which I believe can replace the above
>>>> tools while adding:
>>>> - parallel testing: each target has its own build directory,
>>>>
>>>
>>> We could look at detox and/or retox again to get parallel testing
>>> (though not, possibly, at such a low level). We tried detox for a while,
>>> but there were issues debugging timeouts (specifically, it buffered the
>>> stdout while testing to avoid multiplexing it, but that meant little info
>>> when a test actually timed out on jenkins).
>>>
>>> We could alternatively look into leveraging gradle's within-project
>>> paralleliziaton to speed this up. It is a pain that right now every Python
>>> test is run sequentially.
>>>
>> I don't believe that Gradle has an easy solution. The only within-project
>> parallilization I can find requires using the Worker API
>> <https://guides.gradle.org/using-the-worker-api/?_ga=2.143780085.1705314017.1539791984-819557858.1539791984>
>> .
>>
>> I've tried pytest-xdist with limited success (pickling the session with
>> pytest-xdist's execnet dependency fails).
>>
>>
>>>
>>>
>>>> providing isolation from build artifacts such as from Cython
>>>>
>>>
>>> Each tox environment already has (I think) its own build directory. Or
>>> is this not what we're seeing?
>>>
>> Cython-based unit test runs leave behind artifacts that must be cleaned
>> up, which is why we can't run all tox environments in parallel.
>> We use this script to clean up:
>>
>> https://github.com/apache/beam/blob/master/sdks/python/scripts/run_tox_cleanup.sh
>>
>>
>> I am 90% certain that this would not be an issue with bazel, since it
>> stages all build dependencies in a separate build directory, so any
>> generated files would be placed there.
>>
>>
>>>
>>>> - incremental testing: it is possible to precisely define each test's
>>>> dependencies
>>>>
>>>
>>> This is a big plus. It would allow us to enforce non-dependence on
>>> non-dependencies as well.
>>>
>>>
>>>> There's also a requirement to test against specific Python versions,
>>>> such as 2.7 and 3.4.
>>>> This could be done using docker containers having the precise version
>>>> of interpreter and Bazel.
>>>>
>>>
>>> I'm generally -1 on requiring docker to run our unittests.
>>>
>> You would still run unit tests using Bazel (in terminal or with IDE
>> integration, or even directly).
>> Docker would be used to verify they pass on specific Python versions.
>> (2.7, 3.4, 3.5, 3.6)
>> I don't know how to maintain multiple Python versions on my workstation,
>> let alone on Jenkins.
>>
>>
>>>
>>>
>>>> In summary:
>>>> Bazel could replace the need for virtualenv, tox, and nosetests.
>>>> The addition of Docker images would allow testing against specific
>>>> Python versions.
>>>>
>>>
>>>
>>>  To be clear, I really like Bazel, and would have liked to see it for
>>> our top-level build, but there were some problems that were never
>>> adequately addressed.
>>>
>>> (1) There were difficulties managing upstream dependencies correctly.
>>> Perhaps there has been some improvement upstream since we last looked at
>>> this (it was fairly new), and perhaps it's not as big a deal in Python, but
>>> this was the blocker for using it for Beam as a whole.
>>> (2) Bazel still has poor support for C (including Cython) extensions.
>>> (3) It's unclear how this would interact with setup.py. Would we keep
>>> both, using one for testing and the other for releases (sdist, wheels)?
>>>
>>> There's also the downside of introducing yet another build tool that's
>>> not familiar to the Python community, rather than sticking with the
>>> "standard" ones.
>>>
>>> I would, however, be interested in hearing others' thoughts on this
>>> proposal.
>>>
>>>

Reply via email to