I will defer to others to investigate this matter further but I would
really like to see a concrete and practical path to local
reproducibility before moving forward on any changes to our current
CI.

On Tue, Jul 30, 2019 at 7:38 AM Krisztián Szűcs
<szucs.kriszt...@gmail.com> wrote:
>
> Fixed it and restarted a bunch of builds.
>
> On Tue, Jul 30, 2019 at 5:13 AM Wes McKinney <wesmck...@gmail.com> wrote:
>
> > By the way, can you please disable the Buildbot builders that are
> > causing builds on master to fail? We haven't had a passing build in
> > over a week. Until we reconcile the build configurations we shouldn't
> > be failing contributors' builds
> >
> > On Mon, Jul 29, 2019 at 8:23 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > >
> > > On Mon, Jul 29, 2019 at 7:58 PM Krisztián Szűcs
> > > <szucs.kriszt...@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 30, 2019 at 1:38 AM Wes McKinney <wesmck...@gmail.com>
> > wrote:
> > > >
> > > > > hi Krisztian,
> > > > >
> > > > > Before talking about any code donations or where to run builds, I
> > > > > think we first need to discuss the worrisome situation where we have
> > > > > in some cases 3 (or more) CI configurations for different components
> > > > > in the project.
> > > > >
> > > > > Just taking into account out C++ build, we have:
> > > > >
> > > > > * A config for Travis CI
> > > > > * Multiple configurations in Dockerfiles under cpp/
> > > > > * A brand new (?) configuration in this third party ursa-labs/ursabot
> > > > > repository
> > > > >
> > > > > I note for example that the "AMD64 Conda C++" Buildbot build is
> > > > > failing while Travis CI is succeeding
> > > > >
> > > > > https://ci.ursalabs.org/#builders/66/builds/3196
> > > > >
> > > > > Starting from first principles, at least for Linux-based builds, what
> > > > > I would like to see is:
> > > > >
> > > > > * A single build configuration (which can be driven by yaml-based
> > > > > configuration files and environment variables), rather than 3 like we
> > > > > have now. This build configuration should be decoupled from any CI
> > > > > platform, including Travis CI and Buildbot
> > > > >
> > > > Yeah, this would be the ideal setup, but I'm afraid the situation is a
> > bit
> > > > more complicated.
> > > >
> > > > TravisCI
> > > > --------
> > > >
> > > > constructed from a bunch of scripts optimized for travis, this setup is
> > > > slow
> > > > and hardly compatible with any of the remaining setups.
> > > > I think we should ditch it.
> > > >
> > > > The "docker-compose setup"
> > > > --------------------------
> > > >
> > > > Most of the Dockerfiles are part of the  docker-compose setup we've
> > > > developed.
> > > > This might be a good candidate as the tool to centralize around our
> > future
> > > > setup, mostly because docker-compose is widely used, and we could setup
> > > > buildbot builders (or any other CI's) to execute the sequence of
> > > > docker-compose
> > > > build and docker-compose run commands.
> > > > However docker-compose is not suitable for building and running
> > > > hierarchical
> > > > images. This is why we have added Makefile [1] to execute a "build"
> > with a
> > > > single make command instead of manually executing multiple commands
> > > > involving
> > > > multiple images (which is error prone). It can also leave a lot of
> > garbage
> > > > after both containers and images.
> > > > Docker-compose shines when one needs to orchestrate multiple
> > containers and
> > > > their networks / volumes on the same machine. We made it work (with a
> > > > couple of
> > > > hacky workarounds) for arrow though.
> > > > Despite that, I still consider the docker-compose setup a good
> > solution,
> > > > mostly because its biggest advantage, the local reproducibility.
> > > >
> > >
> > > I think what is missing here is an orchestration tool (for example, a
> > > Python program) to invoke Docker-based development workflows involving
> > > multiple steps.
> > >
> > > > Ursabot
> > > > -------
> > > >
> > > > Ursabot uses low level docker commands to spin up and down the
> > containers
> > > > and
> > > > it also has a utility to nicely build the hierarchical images (with
> > much
> > > > less
> > > > maintainable code). The builders are reliable, fast (thanks to docker)
> > and
> > > > it's
> > > > great so far.
> > > > Where it falls short compared to docker-compose is the lack of the
> > local
> > > > reproducibility, currently the docker worker cleans up everything
> > after it
> > > > except the mounted volumes for caching. `docker-compose run` is a
> > pretty
> > > > nice
> > > > way to shell into the container.
> > > >
> > > > Use docker-compose from ursabot?
> > > > --------------------------------
> > > >
> > > > So assume that we should use docker-compose commands in the buildbot
> > > > builders.
> > > > Then:
> > > > - there would be a single build step for all builders [2] (which means
> > a
> > > >   single chunk of unreadable log) - it also complicates working with
> > > > esoteric
> > >
> > > I think this is too much of a black-and-white way of looking at
> > > things. What I would like to see is a build orchestration tool, which
> > > can be used via command line interface, not unlike the current
> > > crossbow.py and archery command line scripts, that can invoke a build
> > > locally or in a CI setting.
> > >
> > > >   builders like the on-demand crossbow trigger and the benchmark runner
> > > > - no possibility to customize the buildsteps (like aggregating the
> > count of
> > > >   warnings)
> > > > - no time statistics for the steps which would make it harder to
> > optimize
> > > > the
> > > >   build times
> > > > - to properly clean up the container some custom solution would be
> > required
> > > > - if we'd need to introduce additional parametrizations to the
> > > >   docker-compose.yaml (for example to add other architectures) then it
> > might
> > > >   require full yaml duplication
> > >
> > > I think the tool would need to be higher level than docker-compose
> > >
> > > In general I'm not very comfortable introducing a hard dependency on
> > > Buildbot (or any CI platform, for that matter) into the project. So we
> > > have to figure out a way to move forward without such hard dependency
> > > or go back to the drawing board.
> > >
> > > > - exchanging data between the docker-compose container and builtbot
> > would be
> > > >   more complicated, for example the benchmark comment reporter reads
> > > >   the result from a file, in order to do the same (reading structured
> > > > output on
> > > >   stdout and stderr from scripts is more error prone) mounted volumes
> > are
> > > >   required, which brings the usual permission problems on linux.
> > > > - local reproducibility still requires manual intervention because the
> > > > scripts
> > > >   within the docker containers are not pausable, they exit and the
> > steps
> > > > until
> > > >   the failed one must be re-executed* after ssh-ing into the running
> > > > container.
> > > >
> > > > Honestly I see more issues than advantages here. Let's see the other
> > way
> > > > around.
> > > >
> > > > Local reproducibility with ursabot?
> > > > -----------------------------------
> > > >
> > > > The most wanted feature what docker-compose has but ursabot doesn't is
> > the
> > > > local reproducibility. First of all, ursabot can be run locally,
> > including
> > > > all
> > > > if its builders, so the local reproducibility is partially resolved.
> > The
> > > > missing piece is the interactive shell into the running container,
> > because
> > > > buildbot instantly stops and aggressively clean up everything after the
> > > > container.
> > > >
> > > > I have three solutions / workarounds in mind:
> > > >
> > > > 1. We have all the power of docker and docker-compose from ursabot
> > through
> > > >    docker-py, and we can easily keep the container running by simply
> > not
> > > >    stopping it [3]. Configuring the locally running buildbot to keep
> > the
> > > >    containers running after a failure seems quite easy. *It has the
> > > > advantage
> > > >    that all of the buildsteps preceding one are already executed, so it
> > > >    requires less manual intervention.
> > > >    This could be done on the web UI or even from the CLI, like
> > > >    `ursabot reproduce <builder-name>`
> > > > 2. Generate the docker-compose.yaml and required scripts from the
> > Ursabot
> > > >    builder configurations, including the shell scripts.
> > > > 3. Generate a set of commands to reproduce the failure without (even
> > asking
> > > >    the comment bot "how to reproduce the failing one"). The response
> > would
> > > >    look similar to:
> > > >    ```bash
> > > >    $ docker pull <image>
> > > >    $ docker run -it <image> bash
> > > >    # cmd1
> > > >    # cmd2
> > > >    # <- error occurs here ->
> > > >    ```
> > > >
> > > > TL;DR
> > > > -----
> > > > In the first iteration I'd remove the travis configurations.
> > > > In the second iteration I'd develop a feature for ursabot to make local
> > > > reproducibility possible.
> > > >
> > > > [1]: https://github.com/apache/arrow/blob/master/Makefile.docker
> > > > [2]: https://ci.ursalabs.org/#/builders/87/builds/929
> > > > [3]:
> > > >
> > https://github.com/buildbot/buildbot/blob/e7ff2a3b959cff96c77c07891fa07a35a98e81cb/master/buildbot/worker/docker.py#L343
> > > >
> > > > * A local tool to run any Linux-based builds locally using Docker at
> > > > > the command line, so that CI behavior can be exactly reproduced
> > > > > locally
> > > > >
> > > > > Does that seem achievable?
> > > > >
> > > > Thanks,
> > > > > Wes
> > > > >
> > > > > On Mon, Jul 29, 2019 at 6:22 PM Krisztián Szűcs
> > > > > <szucs.kriszt...@gmail.com> wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > Ursabot works pretty well so far, and the CI feedback times have
> > become
> > > > > > even better* after enabling the docker volume caches, the
> > development
> > > > > > and maintenance of it is still not available for the whole Arrow
> > > > > community.
> > > > > >
> > > > > > While it wasn't straightforward I've managed to separate to source
> > code
> > > > > > required to configure the Arrow builders into a separate
> > directory, which
> > > > > > eventually can be donated to Arrow.
> > > > > > The README is under construction, but the code is available here
> > [1].
> > > > > >
> > > > > > Until this codebase is not governed by the Arrow community,
> > > > > > decommissioning slow travis builds is not possible, so the overall
> > CI
> > > > > times
> > > > > > required to merge a PR will remain high.
> > > > > >
> > > > > > Regards, Krisztian
> > > > > >
> > > > > > * C++ builder times have dropped from ~6-7 minutes to ~3-4 minutes
> > > > > > * Python builder times have dropped from ~7-8 minutes to ~3-5
> > minutes
> > > > > > * ARM C++ builder time have dropped from ~19-20 minutes to ~9-12
> > minutes
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > https://github.com/ursa-labs/ursabot/tree/a46c6aa7b714346b3e4bb7921decb4d4d2f5ed70/projects/arrow
> > > > >
> >

Reply via email to