On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> wrote:
> hi Krisztian, > > Before talking about any code donations or where to run builds, I > think we first need to discuss the worrisome situation where we have > in some cases 3 (or more) CI configurations for different components > in the project. > > Just taking into account out C++ build, we have: > > * A config for Travis CI > * Multiple configurations in Dockerfiles under cpp/ > * A brand new (?) configuration in this third party ursa-labs/ursabot > repository > > I note for example that the "AMD64 Conda C++" Buildbot build is > failing while Travis CI is succeeding > > https://ci.ursalabs.org/#builders/66/builds/3196 > > Starting from first principles, at least for Linux-based builds, what > I would like to see is: > > * A single build configuration (which can be driven by yaml-based > configuration files and environment variables), rather than 3 like we > have now. This build configuration should be decoupled from any CI > platform, including Travis CI and Buildbot > Yeah, this would be the ideal setup, but I'm afraid the situation is a bit more complicated. TravisCI -------- constructed from a bunch of scripts optimized for travis, this setup is slow and hardly compatible with any of the remaining setups. I think we should ditch it. The "docker-compose setup" -------------------------- Most of the Dockerfiles are part of the docker-compose setup we've developed. This might be a good candidate as the tool to centralize around our future setup, mostly because docker-compose is widely used, and we could setup buildbot builders (or any other CI's) to execute the sequence of docker-compose build and docker-compose run commands. However docker-compose is not suitable for building and running hierarchical images. This is why we have added Makefile [1] to execute a "build" with a single make command instead of manually executing multiple commands involving multiple images (which is error prone). It can also leave a lot of garbage after both containers and images. Docker-compose shines when one needs to orchestrate multiple containers and their networks / volumes on the same machine. We made it work (with a couple of hacky workarounds) for arrow though. Despite that, I still consider the docker-compose setup a good solution, mostly because its biggest advantage, the local reproducibility. Ursabot ------- Ursabot uses low level docker commands to spin up and down the containers and it also has a utility to nicely build the hierarchical images (with much less maintainable code). The builders are reliable, fast (thanks to docker) and it's great so far. Where it falls short compared to docker-compose is the lack of the local reproducibility, currently the docker worker cleans up everything after it except the mounted volumes for caching. `docker-compose run` is a pretty nice way to shell into the container. Use docker-compose from ursabot? -------------------------------- So assume that we should use docker-compose commands in the buildbot builders. Then: - there would be a single build step for all builders [2] (which means a single chunk of unreadable log) - it also complicates working with esoteric builders like the on-demand crossbow trigger and the benchmark runner - no possibility to customize the buildsteps (like aggregating the count of warnings) - no time statistics for the steps which would make it harder to optimize the build times - to properly clean up the container some custom solution would be required - if we'd need to introduce additional parametrizations to the docker-compose.yaml (for example to add other architectures) then it might require full yaml duplication - exchanging data between the docker-compose container and builtbot would be more complicated, for example the benchmark comment reporter reads the result from a file, in order to do the same (reading structured output on stdout and stderr from scripts is more error prone) mounted volumes are required, which brings the usual permission problems on linux. - local reproducibility still requires manual intervention because the scripts within the docker containers are not pausable, they exit and the steps until the failed one must be re-executed* after ssh-ing into the running container. Honestly I see more issues than advantages here. Let's see the other way around. Local reproducibility with ursabot? ----------------------------------- The most wanted feature what docker-compose has but ursabot doesn't is the local reproducibility. First of all, ursabot can be run locally, including all if its builders, so the local reproducibility is partially resolved. The missing piece is the interactive shell into the running container, because buildbot instantly stops and aggressively clean up everything after the container. I have three solutions / workarounds in mind: 1. We have all the power of docker and docker-compose from ursabot through docker-py, and we can easily keep the container running by simply not stopping it [3]. Configuring the locally running buildbot to keep the containers running after a failure seems quite easy. *It has the advantage that all of the buildsteps preceding one are already executed, so it requires less manual intervention. This could be done on the web UI or even from the CLI, like `ursabot reproduce <builder-name>` 2. Generate the docker-compose.yaml and required scripts from the Ursabot builder configurations, including the shell scripts. 3. Generate a set of commands to reproduce the failure without (even asking the comment bot "how to reproduce the failing one"). The response would look similar to: ```bash $ docker pull <image> $ docker run -it <image> bash # cmd1 # cmd2 # <- error occurs here -> ``` TL;DR ----- In the first iteration I'd remove the travis configurations. In the second iteration I'd develop a feature for ursabot to make local reproducibility possible. [1]: https://github.com/apache/arrow/blob/master/Makefile.docker [2]: https://ci.ursalabs.org/#/builders/87/builds/929 [3]: https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343 * A local tool to run any Linux-based builds locally using Docker at > the command line, so that CI behavior can be exactly reproduced > locally > > Does that seem achievable? > Thanks, > Wes > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs > <szucs.kriszt...@gmail.com> wrote: > > > > Hi All, > > > > Ursabot works pretty well so far, and the CI feedback times have become > > even better* after enabling the docker volume caches, the development > > and maintenance of it is still not available for the whole Arrow > community. > > > > While it wasn't straightforward I've managed to separate to source code > > required to configure the Arrow builders into a separate directory, which > > eventually can be donated to Arrow. > > The README is under construction, but the code is available here [1]. > > > > Until this codebase is not governed by the Arrow community, > > decommissioning slow travis builds is not possible, so the overall CI > times > > required to merge a PR will remain high. > > > > Regards, Krisztian > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes > > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 minutes > > > > [1]: > > > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow >