Hi all,

For the TLDR, please scroll all the way down.

A couple of years ago, Tristan and Sander took a stab at buildstream
and bazel integration [1]. While the end result was a proof of concept
with RECC rather than bazel, I think it should be possible to do the
same with bazel if one figures out the correct incantation.

I'm currently trying to integrate it in buildstream proper, and would
like some input on how to do it. There have been some discussions on
the issue tracker and I wanted to make sure everybody is aware, and
propose something we can discuss.

## What is this about?

We'd like to open a hole into the build sandbox for access to CAS (and
potentially other related APIs), so that a compatible build system
running inside the sandbox can take advantage of it for more granular
caching (and potentially remote execution in the future).

With RECC [2] as my main usecase, I'm considering the current PoC [3]
and the security considerations mentioned at [4]. Please take a look,
and give your feedback. I'll try to start implementing it in the next
few days/weeks.

## Proposal

As with the initial PoC, we expose a UNIX socket inside the sandbox.
We don't want to expose it for all sandboxes by default, so there
should be a sandbox configuration option to enable it. I think the
configuration option in [4] is a good start (though we can bikeshed on
the name of the option).

The next question is what should the socket allow?

### ContentAddressableStorage (CAS)

Obviously, this is the point of allowing this in the first place. It
is also safe as it only allows uploading and downloading files with a
known "digest" (hash and file size).

### ActionCache

Action cache allows associating an arbitrary set of files (plus some
metadata) with some arbitrary key. This is a powerful tool but one
that can be misused by software running in the sandbox to hurt
repeatability. For instance, some software might cache some output
files but (inadvertently) base the key on a subset of the input files
used.

We could try to intercept the requests and add a "salt" value based on
the cache key of the element, but that would reduce the usefulness of
the feature quite a lot.

I think the best course of action is to allow unlimited access to the
ActionCache, and document this caveat.

### Execution

Remote execution is another service that is part of this set of APIs.
The way buildstream currently works with remote execution is that it
sends the whole build to be executed remotely. In that case, we don't
have control over what runner it is going to run in and can't expose a
socket like this unless we manage to convince the remote-apis folks to
add an option to do this. When running locally, recent versions of
buildbox-casd implement a remote execution server that runs things
locally, but this isn't enabled by default (and buildstream doesn't
enable it either). This leaves us with no way to run things using
remote execution.

For this reason, I'd say let's not include remote execution in this
proposal. We might want to keep it in mind while designing this, and
make a separate proposal for remote execution.

### Capabilities

This is a separate service but it is only there to give the server
supported functionality for the above. So we should include it.

### Remote Asset

Remote Asset was not part of the initial remote execution APIs, but is
a more recent addition to the remote APIs that we are using and is
supported by buildbox.

The remote asset API offers two uses: as a cache and as a downloader
(that can optionally cache its results). The former isn't very useful
to allow because we already have the action cache. The latter isn't
implemented by buildbox-casd and potentially allows downloading things
from the internet.

While the second mode might be useful in the future for building bazel
projects if we know how to restrict it properly (e.g. only allow
things that are declared in a special source in the element, have
buildstream download and cache them in advance, and not allow access
to the internet), I don't think we should have it in this first pass.

### Local CAS

Local CAS [5] isn't part of the remote APIs, but is an extension that
is supported by buildbox-casd, and it might be worth giving it some
consideration.

Local CAS calls can be categorised in four categories:
* Proxy CAS operations: these are mainly for when buildbox-casd acts
as a proxy to another CAS. They can be used to optimise download and
upload operations. The can be used without a proxy setup to more
efficiently check that things are available in the CAS.
* Stage and capture operations: Can be used to efficiently store
things in the CAS and stage them to use. While this could be useful to
have, it's also a bit dangerous and we need to restrict it to the
sandbox where the build is running.
* GetInstanceNameFor*: allow setting up buildbox-casd as a proxy to a
remote server, as well as setting it up for use in a sandbox. Both of
these are very dangerous.
* GetDiskUsage: isn't something a build should worry about.

All in all, the Local CAS API is nonstandard and potentially
dangerous. We should not allow things in the sandbox to access it.

## Conclusion (TL;DR)

We add a new sandbox option to allow mounting a unix socket into the
sandbox to allow processes in the sandbox to access (a subset of)
remote APIs.

Access should be restricted to ContentAddressableStorage and
ActionCache for now, as well as the Capabilities service (which can be
used to query the capabilities on the CAS and AC).

Execution should not be supported for now, but can be enabled later
with an additional option.

For restricting access, we can probably reuse the casserver module we
already have [6] and add support for the ActionCache.



[1] 
https://blogs.gnome.org/tvb/2022/10/14/buildstream-at-apachecon-2022-new-orleans/
[2] https://buildgrid.gitlab.io/buildbox/buildbox/recc.html
[3] https://github.com/apache/buildstream/pull/1772
[4] https://github.com/apache/buildstream/pull/1945#issuecomment-2275110097
[5] 
https://gitlab.com/BuildGrid/buildbox/buildbox/-/blob/master/protos/build/buildgrid/local_cas.proto
[6] 
https://github.com/apache/buildstream/blob/master/src/buildstream/_cas/casserver.py

Reply via email to