Hello Krisztián, 

I like this proposal. CI coverage and response time is a crucial thing for the 
health of the project. In general I like the consolidation and local 
reproducibility of tge builds. Some questions I wanted to ask to make sure I 
understand your proposal correctly (hopefully they all can be answered with a 
simple yes):

* Windows builds will stay in Appveyor for now?
* MacOS builds will stay in Travis?
* All other builds will be removed from Travis?
* Machines are currently run and funded by UrsaLabs but others could also 
sponsor an instance that could be added to the setup?
* The build configuration is automatically updated on a merge to master?

And then a not so simple one: What will happen to our current docker-compose 
setup? From the PR it seems like we do similar things with ursabot but not 
using the central docker-compose.yml?


Cheers
Uwe

> Am 29.08.2019 um 14:19 schrieb Krisztián Szűcs <[email protected]>:
> 
> Hi,
> 
> Arrow's current continuous integration setup utilizes multiple CI
> providers,
> tools, and scripts:
> 
> - Unit tests are running on Travis and Appveyor
> - Binary packaging builds are running on crossbow, an abstraction over
> multiple
>   CI providers driven through a GitHub repository
> - For local tests and tasks, there is a docker-compose setup, or of course
> you
>   can maintain your own environment
> 
> This setup has run into some limitations:
> - It’s slow: the CI parallelism of Travis has degraded over the last
> couple of
>   months. Testing a PR takes more than an hour, which is a long time for
> both
>   the maintainers and the contributors, and it has a negative effect on
> the
>   development throughput.
> - Build configurations are not portable, they are tied to specific
> services.
>   You can’t just take a Travis script and run it somewhere else.
> - Because they’re not portable, build configurations are duplicated in
> several
>   places.
> - The Travis, Appveyor and crossbow builds are not reproducible locally,
> so
>   developing them requires the slow git push cycles.
> - Public CI has limited platform support, just for example ARM machines
> are
>   not available.
> - Public CI also has limited hardware support, no GPUs are available
> 
> Resolving all of the issues above is complicated, but is a must for the
> long
> term sustainability of Arrow.
> 
> For some time, we’ve been working on a tool called Ursabot[1], a library on
> top
> of the CI framework Buildbot[2]. Buildbot is well maintained and widely
> used
> for complex projects, including CPython, Webkit, LLVM, MariaDB, etc.
> Buildbot
> is not another hosted CI service like Travis or Appveyor: it is an
> extensible
> framework to implement various automations like continuous integration
> tasks.
> 
> You’ve probably noticed additional “Ursabot” builds appearing on pull
> requests,
> in addition to the Travis and Appveyor builds. We’ve been testing the
> framework
> with a fully featured CI server at ci.ursalabs.org. This service runs build
> configurations we can’t run on Travis, does it faster than Travis, and has
> the
> GitHub comment bot integration for ad hoc build triggering.
> 
> While we’re not prepared to propose moving all CI to a self-hosted setup,
> our
> work has demonstrated the potential of using buildbot to resolve Arrow’s
> continuous integration challenges:
> - The docker-based builders are reusing the docker images, which eliminate
>   slow dependency installation steps. Some builds on this setup, run on
>   Ursa Labs’s infrastructure, run 20 minutes faster than the comparable
>   Travis-CI jobs.
> - It’s scalable. We can deploy buildbot wherever and add more masters and
>   workers, which we can’t do with public CI.
> - It’s platform and CI-provider independent. Builds can be run on
> arbitrary
>   architectures, operating systems, and hardware: Python is the only
>   requirement. Additionally builds specified in buildbot/ursabot can be
> run
>   anywhere: not only on custom buildbot infrastructure but also on Travis,
> or
>   even on your own machine.
> - It improves reproducibility and encourages consolidation of
> configuration.
>   You can run the exact job locally that ran on Travis, and you can even
> get
>   an interactive shell in the build so you can debug a test failure. And
>   because you can run the same job anywhere, we wouldn’t need to have
>   duplicated, Travis-specific or the docker-compose build configuration
> stored
>   separately.
> - It’s extensible. More exotic features like a comment bot, benchmark
>   database, benchmark dashboard, artifact store, integrating other systems
> are
>   easily implementable within the same system.
> 
> I’m proposing to donate the build configuration we’ve been iterating on in
> Ursabot to the Arrow codebase. Here [3] is a patch that adds the
> configuration.
> This will enable us to explore consolidating build configuration using the
> buildbot framework. A next step after to explore that would be to port a
> Travis
> build to Ursabot, and in the Travis configuration, execute the build by the
> shell command `$ ursabot project build <builder-name>`. This is the same
> way we
> would be able to execute the build locally--something we can’t currently do
> with the Travis builds.
> 
> I am not proposing here that we stop using Travis-CI and Appveyor to run CI
> for
> apache/arrow, though that may well be a direction we choose to go in the
> future. Moving build configuration into something like buildbot would be a
> necessary first step to do that; that said, there are other immediate
> benefits
> to be had by porting build configuration into buildbot: local
> reproducibility,
> consolidation of build logic, independence from a particular CI provider,
> and
> ease of using and maintaining faster, Docker-based jobs. Self-hosting CI
> brings
> a number of other challenges, which we will concurrently continue to
> explore,
> but we believe that there are benefits to adopting buildbot build
> configuration
> regardless.
> 
> Regards, Krisztian
> 
> [1]: https://github.com/ursa-labs/ursabot
> [2]: https://buildbot.net
>     https://docs.buildbot.net
>     https://github.com/buildbot/buildbot
> [3]: https://github.com/apache/arrow/pull/5210

Reply via email to