On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com> wrote:

> hi Krisztian,
>
> Before talking about any code donations or where to run builds, I
> think we first need to discuss the worrisome situation where we have
> in some cases 3 (or more) CI configurations for different components
> in the project.
>
> Just taking into account out C++ build, we have:
>
> * A config for Travis CI
> * Multiple configurations in Dockerfiles under cpp/
> * A brand new (?) configuration in this third party ursa-labs/ursabot
> repository
>
> I note for example that the "AMD64 Conda C++" Buildbot build is
> failing while Travis CI is succeeding
>
> https://ci.ursalabs.org/#builders/66/builds/3196
>
> Starting from first principles, at least for Linux-based builds, what
> I would like to see is:
>
> * A single build configuration (which can be driven by yaml-based
> configuration files and environment variables), rather than 3 like we
> have now. This build configuration should be decoupled from any CI
> platform, including Travis CI and Buildbot
>
Yeah, this would be the ideal setup, but I'm afraid the situation is a bit
more complicated.

TravisCI
--------

constructed from a bunch of scripts optimized for travis, this setup is
slow
and hardly compatible with any of the remaining setups.
I think we should ditch it.

The "docker-compose setup"
--------------------------

Most of the Dockerfiles are part of the  docker-compose setup we've
developed.
This might be a good candidate as the tool to centralize around our future
setup, mostly because docker-compose is widely used, and we could setup
buildbot builders (or any other CI's) to execute the sequence of
docker-compose
build and docker-compose run commands.
However docker-compose is not suitable for building and running
hierarchical
images. This is why we have added Makefile [1] to execute a "build" with a
single make command instead of manually executing multiple commands
involving
multiple images (which is error prone). It can also leave a lot of garbage
after both containers and images.
Docker-compose shines when one needs to orchestrate multiple containers and
their networks / volumes on the same machine. We made it work (with a
couple of
hacky workarounds) for arrow though.
Despite that, I still consider the docker-compose setup a good solution,
mostly because its biggest advantage, the local reproducibility.

Ursabot
-------

Ursabot uses low level docker commands to spin up and down the containers
and
it also has a utility to nicely build the hierarchical images (with much
less
maintainable code). The builders are reliable, fast (thanks to docker) and
it's
great so far.
Where it falls short compared to docker-compose is the lack of the local
reproducibility, currently the docker worker cleans up everything after it
except the mounted volumes for caching. `docker-compose run` is a pretty
nice
way to shell into the container.

Use docker-compose from ursabot?
--------------------------------

So assume that we should use docker-compose commands in the buildbot
builders.
Then:
- there would be a single build step for all builders [2] (which means a
  single chunk of unreadable log) - it also complicates working with
esoteric
  builders like the on-demand crossbow trigger and the benchmark runner
- no possibility to customize the buildsteps (like aggregating the count of
  warnings)
- no time statistics for the steps which would make it harder to optimize
the
  build times
- to properly clean up the container some custom solution would be required
- if we'd need to introduce additional parametrizations to the
  docker-compose.yaml (for example to add other architectures) then it might
  require full yaml duplication
- exchanging data between the docker-compose container and builtbot would be
  more complicated, for example the benchmark comment reporter reads
  the result from a file, in order to do the same (reading structured
output on
  stdout and stderr from scripts is more error prone) mounted volumes are
  required, which brings the usual permission problems on linux.
- local reproducibility still requires manual intervention because the
scripts
  within the docker containers are not pausable, they exit and the steps
until
  the failed one must be re-executed* after ssh-ing into the running
container.

Honestly I see more issues than advantages here. Let's see the other way
around.

Local reproducibility with ursabot?
-----------------------------------

The most wanted feature what docker-compose has but ursabot doesn't is the
local reproducibility. First of all, ursabot can be run locally, including
all
if its builders, so the local reproducibility is partially resolved. The
missing piece is the interactive shell into the running container, because
buildbot instantly stops and aggressively clean up everything after the
container.

I have three solutions / workarounds in mind:

1. We have all the power of docker and docker-compose from ursabot through
   docker-py, and we can easily keep the container running by simply not
   stopping it [3]. Configuring the locally running buildbot to keep the
   containers running after a failure seems quite easy. *It has the
advantage
   that all of the buildsteps preceding one are already executed, so it
   requires less manual intervention.
   This could be done on the web UI or even from the CLI, like
   `ursabot reproduce <builder-name>`
2. Generate the docker-compose.yaml and required scripts from the Ursabot
   builder configurations, including the shell scripts.
3. Generate a set of commands to reproduce the failure without (even asking
   the comment bot "how to reproduce the failing one"). The response would
   look similar to:
   ```bash
   $ docker pull <image>
   $ docker run -it <image> bash
   # cmd1
   # cmd2
   # <- error occurs here ->
   ```

TL;DR
-----
In the first iteration I'd remove the travis configurations.
In the second iteration I'd develop a feature for ursabot to make local
reproducibility possible.

[1]: https://github.com/apache/arrow/blob/master/Makefile.docker
[2]: https://ci.ursalabs.org/#/builders/87/builds/929
[3]:
https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343

* A local tool to run any Linux-based builds locally using Docker at
> the command line, so that CI behavior can be exactly reproduced
> locally
>
> Does that seem achievable?
>
Thanks,
> Wes
>
> On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> <szucs.kriszt...@gmail.com> wrote:
> >
> > Hi All,
> >
> > Ursabot works pretty well so far, and the CI feedback times have become
> > even better* after enabling the docker volume caches, the development
> > and maintenance of it is still not available for the whole Arrow
> community.
> >
> > While it wasn't straightforward I've managed to separate to source code
> > required to configure the Arrow builders into a separate directory, which
> > eventually can be donated to Arrow.
> > The README is under construction, but the code is available here [1].
> >
> > Until this codebase is not governed by the Arrow community,
> > decommissioning slow travis builds is not possible, so the overall CI
> times
> > required to merge a PR will remain high.
> >
> > Regards, Krisztian
> >
> > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes
> > * Python builder times have dropped from ~7-8 minutes to ~3-5 minutes
> > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12 minutes
> >
> > [1]:
> >
> https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
>

Reply via email to