Hi all, For the TLDR, please scroll all the way down.
A couple of years ago, Tristan and Sander took a stab at buildstream and bazel integration [1]. While the end result was a proof of concept with RECC rather than bazel, I think it should be possible to do the same with bazel if one figures out the correct incantation. I'm currently trying to integrate it in buildstream proper, and would like some input on how to do it. There have been some discussions on the issue tracker and I wanted to make sure everybody is aware, and propose something we can discuss. ## What is this about? We'd like to open a hole into the build sandbox for access to CAS (and potentially other related APIs), so that a compatible build system running inside the sandbox can take advantage of it for more granular caching (and potentially remote execution in the future). With RECC [2] as my main usecase, I'm considering the current PoC [3] and the security considerations mentioned at [4]. Please take a look, and give your feedback. I'll try to start implementing it in the next few days/weeks. ## Proposal As with the initial PoC, we expose a UNIX socket inside the sandbox. We don't want to expose it for all sandboxes by default, so there should be a sandbox configuration option to enable it. I think the configuration option in [4] is a good start (though we can bikeshed on the name of the option). The next question is what should the socket allow? ### ContentAddressableStorage (CAS) Obviously, this is the point of allowing this in the first place. It is also safe as it only allows uploading and downloading files with a known "digest" (hash and file size). ### ActionCache Action cache allows associating an arbitrary set of files (plus some metadata) with some arbitrary key. This is a powerful tool but one that can be misused by software running in the sandbox to hurt repeatability. For instance, some software might cache some output files but (inadvertently) base the key on a subset of the input files used. We could try to intercept the requests and add a "salt" value based on the cache key of the element, but that would reduce the usefulness of the feature quite a lot. I think the best course of action is to allow unlimited access to the ActionCache, and document this caveat. ### Execution Remote execution is another service that is part of this set of APIs. The way buildstream currently works with remote execution is that it sends the whole build to be executed remotely. In that case, we don't have control over what runner it is going to run in and can't expose a socket like this unless we manage to convince the remote-apis folks to add an option to do this. When running locally, recent versions of buildbox-casd implement a remote execution server that runs things locally, but this isn't enabled by default (and buildstream doesn't enable it either). This leaves us with no way to run things using remote execution. For this reason, I'd say let's not include remote execution in this proposal. We might want to keep it in mind while designing this, and make a separate proposal for remote execution. ### Capabilities This is a separate service but it is only there to give the server supported functionality for the above. So we should include it. ### Remote Asset Remote Asset was not part of the initial remote execution APIs, but is a more recent addition to the remote APIs that we are using and is supported by buildbox. The remote asset API offers two uses: as a cache and as a downloader (that can optionally cache its results). The former isn't very useful to allow because we already have the action cache. The latter isn't implemented by buildbox-casd and potentially allows downloading things from the internet. While the second mode might be useful in the future for building bazel projects if we know how to restrict it properly (e.g. only allow things that are declared in a special source in the element, have buildstream download and cache them in advance, and not allow access to the internet), I don't think we should have it in this first pass. ### Local CAS Local CAS [5] isn't part of the remote APIs, but is an extension that is supported by buildbox-casd, and it might be worth giving it some consideration. Local CAS calls can be categorised in four categories: * Proxy CAS operations: these are mainly for when buildbox-casd acts as a proxy to another CAS. They can be used to optimise download and upload operations. The can be used without a proxy setup to more efficiently check that things are available in the CAS. * Stage and capture operations: Can be used to efficiently store things in the CAS and stage them to use. While this could be useful to have, it's also a bit dangerous and we need to restrict it to the sandbox where the build is running. * GetInstanceNameFor*: allow setting up buildbox-casd as a proxy to a remote server, as well as setting it up for use in a sandbox. Both of these are very dangerous. * GetDiskUsage: isn't something a build should worry about. All in all, the Local CAS API is nonstandard and potentially dangerous. We should not allow things in the sandbox to access it. ## Conclusion (TL;DR) We add a new sandbox option to allow mounting a unix socket into the sandbox to allow processes in the sandbox to access (a subset of) remote APIs. Access should be restricted to ContentAddressableStorage and ActionCache for now, as well as the Capabilities service (which can be used to query the capabilities on the CAS and AC). Execution should not be supported for now, but can be enabled later with an additional option. For restricting access, we can probably reuse the casserver module we already have [6] and add support for the ActionCache. [1] https://blogs.gnome.org/tvb/2022/10/14/buildstream-at-apachecon-2022-new-orleans/ [2] https://buildgrid.gitlab.io/buildbox/buildbox/recc.html [3] https://github.com/apache/buildstream/pull/1772 [4] https://github.com/apache/buildstream/pull/1945#issuecomment-2275110097 [5] https://gitlab.com/BuildGrid/buildbox/buildbox/-/blob/master/protos/build/buildgrid/local_cas.proto [6] https://github.com/apache/buildstream/blob/master/src/buildstream/_cas/casserver.py
