Fixed it and restarted a bunch of builds.

On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney <wesmck...@gmail.com> wrote:

> By the way, can you please disable the Buildbot builders that are
> causing builds on master to fail? We haven't had a passing build in
> over a week. Until we reconcile the build configurations we shouldn't
> be failing contributors' builds
>
> On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <wesmck...@gmail.com> wrote:
> >
> > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
> > <szucs.kriszt...@gmail.com> wrote:
> > >
> > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com>
> wrote:
> > >
> > > > hi Krisztian,
> > > >
> > > > Before talking about any code donations or where to run builds, I
> > > > think we first need to discuss the worrisome situation where we have
> > > > in some cases 3 (or more) CI configurations for different components
> > > > in the project.
> > > >
> > > > Just taking into account out C++ build, we have:
> > > >
> > > > * A config for Travis CI
> > > > * Multiple configurations in Dockerfiles under cpp/
> > > > * A brand new (?) configuration in this third party ursa-labs/ursabot
> > > > repository
> > > >
> > > > I note for example that the "AMD64 Conda C++" Buildbot build is
> > > > failing while Travis CI is succeeding
> > > >
> > > > https://ci.ursalabs.org/#builders/66/builds/3196
> > > >
> > > > Starting from first principles, at least for Linux-based builds, what
> > > > I would like to see is:
> > > >
> > > > * A single build configuration (which can be driven by yaml-based
> > > > configuration files and environment variables), rather than 3 like we
> > > > have now. This build configuration should be decoupled from any CI
> > > > platform, including Travis CI and Buildbot
> > > >
> > > Yeah, this would be the ideal setup, but I'm afraid the situation is a
> bit
> > > more complicated.
> > >
> > > TravisCI
> > > --------
> > >
> > > constructed from a bunch of scripts optimized for travis, this setup is
> > > slow
> > > and hardly compatible with any of the remaining setups.
> > > I think we should ditch it.
> > >
> > > The "docker-compose setup"
> > > --------------------------
> > >
> > > Most of the Dockerfiles are part of the  docker-compose setup we've
> > > developed.
> > > This might be a good candidate as the tool to centralize around our
> future
> > > setup, mostly because docker-compose is widely used, and we could setup
> > > buildbot builders (or any other CI's) to execute the sequence of
> > > docker-compose
> > > build and docker-compose run commands.
> > > However docker-compose is not suitable for building and running
> > > hierarchical
> > > images. This is why we have added Makefile [1] to execute a "build"
> with a
> > > single make command instead of manually executing multiple commands
> > > involving
> > > multiple images (which is error prone). It can also leave a lot of
> garbage
> > > after both containers and images.
> > > Docker-compose shines when one needs to orchestrate multiple
> containers and
> > > their networks / volumes on the same machine. We made it work (with a
> > > couple of
> > > hacky workarounds) for arrow though.
> > > Despite that, I still consider the docker-compose setup a good
> solution,
> > > mostly because its biggest advantage, the local reproducibility.
> > >
> >
> > I think what is missing here is an orchestration tool (for example, a
> > Python program) to invoke Docker-based development workflows involving
> > multiple steps.
> >
> > > Ursabot
> > > -------
> > >
> > > Ursabot uses low level docker commands to spin up and down the
> containers
> > > and
> > > it also has a utility to nicely build the hierarchical images (with
> much
> > > less
> > > maintainable code). The builders are reliable, fast (thanks to docker)
> and
> > > it's
> > > great so far.
> > > Where it falls short compared to docker-compose is the lack of the
> local
> > > reproducibility, currently the docker worker cleans up everything
> after it
> > > except the mounted volumes for caching. `docker-compose run` is a
> pretty
> > > nice
> > > way to shell into the container.
> > >
> > > Use docker-compose from ursabot?
> > > --------------------------------
> > >
> > > So assume that we should use docker-compose commands in the buildbot
> > > builders.
> > > Then:
> > > - there would be a single build step for all builders [2] (which means
> a
> > >   single chunk of unreadable log) - it also complicates working with
> > > esoteric
> >
> > I think this is too much of a black-and-white way of looking at
> > things. What I would like to see is a build orchestration tool, which
> > can be used via command line interface, not unlike the current
> > crossbow.py and archery command line scripts, that can invoke a build
> > locally or in a CI setting.
> >
> > >   builders like the on-demand crossbow trigger and the benchmark runner
> > > - no possibility to customize the buildsteps (like aggregating the
> count of
> > >   warnings)
> > > - no time statistics for the steps which would make it harder to
> optimize
> > > the
> > >   build times
> > > - to properly clean up the container some custom solution would be
> required
> > > - if we'd need to introduce additional parametrizations to the
> > >   docker-compose.yaml (for example to add other architectures) then it
> might
> > >   require full yaml duplication
> >
> > I think the tool would need to be higher level than docker-compose
> >
> > In general I'm not very comfortable introducing a hard dependency on
> > Buildbot (or any CI platform, for that matter) into the project. So we
> > have to figure out a way to move forward without such hard dependency
> > or go back to the drawing board.
> >
> > > - exchanging data between the docker-compose container and builtbot
> would be
> > >   more complicated, for example the benchmark comment reporter reads
> > >   the result from a file, in order to do the same (reading structured
> > > output on
> > >   stdout and stderr from scripts is more error prone) mounted volumes
> are
> > >   required, which brings the usual permission problems on linux.
> > > - local reproducibility still requires manual intervention because the
> > > scripts
> > >   within the docker containers are not pausable, they exit and the
> steps
> > > until
> > >   the failed one must be re-executed* after ssh-ing into the running
> > > container.
> > >
> > > Honestly I see more issues than advantages here. Let's see the other
> way
> > > around.
> > >
> > > Local reproducibility with ursabot?
> > > -----------------------------------
> > >
> > > The most wanted feature what docker-compose has but ursabot doesn't is
> the
> > > local reproducibility. First of all, ursabot can be run locally,
> including
> > > all
> > > if its builders, so the local reproducibility is partially resolved.
> The
> > > missing piece is the interactive shell into the running container,
> because
> > > buildbot instantly stops and aggressively clean up everything after the
> > > container.
> > >
> > > I have three solutions / workarounds in mind:
> > >
> > > 1. We have all the power of docker and docker-compose from ursabot
> through
> > >    docker-py, and we can easily keep the container running by simply
> not
> > >    stopping it [3]. Configuring the locally running buildbot to keep
> the
> > >    containers running after a failure seems quite easy. *It has the
> > > advantage
> > >    that all of the buildsteps preceding one are already executed, so it
> > >    requires less manual intervention.
> > >    This could be done on the web UI or even from the CLI, like
> > >    `ursabot reproduce <builder-name>`
> > > 2. Generate the docker-compose.yaml and required scripts from the
> Ursabot
> > >    builder configurations, including the shell scripts.
> > > 3. Generate a set of commands to reproduce the failure without (even
> asking
> > >    the comment bot "how to reproduce the failing one"). The response
> would
> > >    look similar to:
> > >    ```bash
> > >    $ docker pull <image>
> > >    $ docker run -it <image> bash
> > >    # cmd1
> > >    # cmd2
> > >    # <- error occurs here ->
> > >    ```
> > >
> > > TL;DR
> > > -----
> > > In the first iteration I'd remove the travis configurations.
> > > In the second iteration I'd develop a feature for ursabot to make local
> > > reproducibility possible.
> > >
> > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
> > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
> > > [3]:
> > >
> https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
> > >
> > > * A local tool to run any Linux-based builds locally using Docker at
> > > > the command line, so that CI behavior can be exactly reproduced
> > > > locally
> > > >
> > > > Does that seem achievable?
> > > >
> > > Thanks,
> > > > Wes
> > > >
> > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> > > > <szucs.kriszt...@gmail.com> wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > Ursabot works pretty well so far, and the CI feedback times have
> become
> > > > > even better* after enabling the docker volume caches, the
> development
> > > > > and maintenance of it is still not available for the whole Arrow
> > > > community.
> > > > >
> > > > > While it wasn't straightforward I've managed to separate to source
> code
> > > > > required to configure the Arrow builders into a separate
> directory, which
> > > > > eventually can be donated to Arrow.
> > > > > The README is under construction, but the code is available here
> [1].
> > > > >
> > > > > Until this codebase is not governed by the Arrow community,
> > > > > decommissioning slow travis builds is not possible, so the overall
> CI
> > > > times
> > > > > required to merge a PR will remain high.
> > > > >
> > > > > Regards, Krisztian
> > > > >
> > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes
> > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5
> minutes
> > > > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12
> minutes
> > > > >
> > > > > [1]:
> > > > >
> > > >
> https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
> > > >
>

Reply via email to