On Wed, Oct 17, 2018 at 1:57 PM, Lukasz Cwik <lc...@google.com> wrote:
> Gradle works pretty well at executing separate projects in parallel. There > are a few Java projects which contain only tests with different flags which > allow us to use the Gradle project based parallelization effectively. > See https://github.com/apache/beam/blob/master/runners/ > google-cloud-dataflow-java/examples/build.gradle and > https://github.com/apache/beam/blob/master/runners/ > google-cloud-dataflow-java/examples-streaming/build.gradle since it runs > the same set of tests, one with --streaming and the other without. This > should be able to work for Python as well. > > The Worker API had some updates in the latest Gradle release but still > seems rough to use. > > On Wed, Oct 17, 2018 at 10:17 AM Udi Meiri <eh...@google.com> wrote: > >> On Wed, Oct 17, 2018 at 1:38 AM Robert Bradshaw <rober...@google.com> >> wrote: >> >>> On Tue, Oct 16, 2018 at 12:48 AM Udi Meiri <eh...@google.com> wrote: >>> >>>> Hi, >>>> >>>> In light of increasing Python pre-commit times due to the added Python >>>> 3 tests, >>>> I thought it might be time to re-evaluate the tools used for Python >>>> tests and development, and propose an alternative. >>>> >>>> Currently, we use nosetests, tox, and virtualenv for testing. >>>> The proposal is to use Bazel, which I believe can replace the above >>>> tools while adding: >>>> - parallel testing: each target has its own build directory, >>>> >>> >>> We could look at detox and/or retox again to get parallel testing >>> (though not, possibly, at such a low level). We tried detox for a while, >>> but there were issues debugging timeouts (specifically, it buffered the >>> stdout while testing to avoid multiplexing it, but that meant little info >>> when a test actually timed out on jenkins). >>> >>> We could alternatively look into leveraging gradle's within-project >>> paralleliziaton to speed this up. It is a pain that right now every Python >>> test is run sequentially. >>> >> I don't believe that Gradle has an easy solution. The only within-project >> parallilization I can find requires using the Worker API >> <https://guides.gradle.org/using-the-worker-api/?_ga=2.143780085.1705314017.1539791984-819557858.1539791984> >> . >> >> I've tried pytest-xdist with limited success (pickling the session with >> pytest-xdist's execnet dependency fails). >> >> >>> >>> >>>> providing isolation from build artifacts such as from Cython >>>> >>> >>> Each tox environment already has (I think) its own build directory. Or >>> is this not what we're seeing? >>> >> Cython-based unit test runs leave behind artifacts that must be cleaned >> up, which is why we can't run all tox environments in parallel. >> We use this script to clean up: >> https://github.com/apache/beam/blob/master/sdks/python/ >> scripts/run_tox_cleanup.sh >> >> I am 90% certain that this would not be an issue with bazel, since it >> stages all build dependencies in a separate build directory, so any >> generated files would be placed there. >> >> >>> >>>> - incremental testing: it is possible to precisely define each test's >>>> dependencies >>>> >>> >>> This is a big plus. It would allow us to enforce non-dependence on >>> non-dependencies as well. >>> >>> >>>> There's also a requirement to test against specific Python versions, >>>> such as 2.7 and 3.4. >>>> This could be done using docker containers having the precise version >>>> of interpreter and Bazel. >>>> >>> >>> I'm generally -1 on requiring docker to run our unittests. >>> >> You would still run unit tests using Bazel (in terminal or with IDE >> integration, or even directly). >> Docker would be used to verify they pass on specific Python versions. >> (2.7, 3.4, 3.5, 3.6) >> I don't know how to maintain multiple Python versions on my workstation, >> let alone on Jenkins. >> > I believe pyenv can do this without using docker. > >> >>> >>> >>>> In summary: >>>> Bazel could replace the need for virtualenv, tox, and nosetests. >>>> The addition of Docker images would allow testing against specific >>>> Python versions. >>>> >>> >>> >>> To be clear, I really like Bazel, and would have liked to see it for >>> our top-level build, but there were some problems that were never >>> adequately addressed. >>> >>> (1) There were difficulties managing upstream dependencies correctly. >>> Perhaps there has been some improvement upstream since we last looked at >>> this (it was fairly new), and perhaps it's not as big a deal in Python, but >>> this was the blocker for using it for Beam as a whole. >>> (2) Bazel still has poor support for C (including Cython) extensions. >>> (3) It's unclear how this would interact with setup.py. Would we keep >>> both, using one for testing and the other for releases (sdist, wheels)? >>> >>> There's also the downside of introducing yet another build tool that's >>> not familiar to the Python community, rather than sticking with the >>> "standard" ones. >>> >> This is also my biggest worry. Aside from the top level build tool, I would rather keep the most python-native way of building things. (Go is in a very similar state). At the same time Udi is addressing a real problem with increasing build and test times. > >>> I would, however, be interested in hearing others' thoughts on this >>> proposal. >>> >>> How about these alternative, some on the more extreme side: - Drop non-cython builds and tests - Run non-cython builds in parallel - Move most combinations to post-commit tests