Hi all,
This is a follow up, more elaborate extension of the proposal[0] to add
SourceMirror plugins raised some months ago.
The original proposal[0] is now implemented to my satisfaction and
posted at [1], however I'm holding off on merging this pending the
agreement and implementation of some other details which would be
required to satisfy the originating motivation, which I'll elaborate on
here.
In brief, this proposal extension, adds to the SourceMirror proposal,
the ability for SourceMirrors to provide additional context for Source
plugins to perform custom authentication methods.
# Problem statement
To elaborate on the original problem statement in the original
proposal[2] which I subsequently ditched.
This is complex and risks being long winded, so I will try my best to
organize this and be as brief as possible.
## Source alias expansion is inflexible
As outlined in the original problem statement[2], I cannot easily
transform the following URL:
https://ftp.gnu.org/gnu/coreutils/coreutils-9.1.tar.xz
Into a URL like this:
https://gitlab.flying-ponies.com/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
Using the current source alias expansion, since the current source
alias substitutions only allow for substution of the alias (i.e. you
may have a `gnu` alias which expands to `https://ftp.gnu.org/gnu/`
and you are expected to mirror that tarball _in the same way_ on your
own infrastructure).
This part is solved by the current SourceMirror implementation[1] by
allowing you to do URL conversions by implementing a python method.
Great.
## Custom authentication methods
One trend that we've been seeing, for instance in the freedesktop-sdk
project[3], is to simply mirror tarballs that are required by the
BuildStream project directly in gitlab.
Any opinions about whether this is a good idea or not aside, this has
been working well for publicly visible projects on gitlab, but is now
failing on it's face for some more secure projects hosted on private
company gitlab instances, which require authentication to work.
The gitlab authentication[4] are various, and support for instance, the
insertion of an OAuth bearer token header.
This kind of free form data which might provide Source plugins the
context needed to do such things, cannot be declared even with the
advent of the SourceMirror plugin.
Boo.
# Proposal
While the example of gitlab tokens in OAuth headers is a motivating
factor to all of this (and we can see Abderrahim "spilling the beans"
about this motivation in his previous reply[5]) ... for the feature to
be suitable in BuildStream core APIs we want to have something flexible
and abstract, such that it would cover most use cases one might throw
at us in the future without needing to expand on core APIs.
Here is my try at this.
## Extend Source.translate_url()
The Source plugin facing Source.translate_url() method[6] gains an
extra, optional keyword argument, which, if provided, is used to
extract extra information about the URL asides from the translated
string.
Calling `Source.translate_url()` could then look like this:
```
url_extra = {}
self.url = self.translate_url(self.original_url, extra=url_extra)
```
Here, a Source, like a DownloadableFileSource for instance, can then
extract some data:
```
self.auth_header_template = url_extra["auth-header-template"]
```
In the gitlab token example, we might have a string like:
"Authorization: Bearer <token>"
## Extend SourceMirror.translate_url()
Similarly, we can extend the yet-to-be-landed
SourceMirror.translate_url() API (which backs Source.translate_url() to
include this same additional keyword argument.
This allows the SourceMirror object to provide the context which is
needed by a supporting plugin to perform some authentication.
In my estimation, this approach puts the data in the right places, so
that a parenting project has the ability to provide context to Source
plugins in subprojects.
Further, the approach of using a keyword argument to
SourceMirror.translate_url() allows the SourceMirror to raise an error
in the case that the backing Source plugin does not support the
expected data.
E.g.:
```
def translate_url(self, ..., *, extra_data=None)
if extra_data is None:
raise SourceMirrorError("Bad source plugin !")
extra_data["auth-header-template"] = \
"Authorization: Bearer <token>"
return frobnicate_url(url)
```
## DownloadableFileSource supports the "auth-header-template"
So far, I haven't got down to how the token itself should be delivered
to the Source implementation.
I had some thoughts about custom variable substitutions and BuildStream
"secrets.yaml" files or suchlike, but Abderrahim proposed a more
convenient and straightforward approach, inspired by what Bazel is
doing[7].
So the proposal would be to leave complicated secrets data out of
BuildStream and to capitalize on the `~/.netrc`.
The way this would work is:
* In Source.fetch() / Source.track() implementations, the Source has
already called `Source.translate_url()`.
So we already have the expanded mirror URL.
* We use urllib to then extract the domain from the URL.
* We use the domain from the URL to extract the corresponding
`password` value set in the `~/.netrc`.
* And in the case of DownloadableFileSource, we can now support
a specific "auth-header-template" and use the password to
substitute the "<token> in the template to compose a header.
And we of course simply add that header to the download request.
# Conclusion
It's a big hairy problem that's been looming over me for quite some
time, I'd really like to put this to rest.
I invite you all to help me fill in the blanks and point out serious
flaws in the plan.
Best Regards,
-Tristan
[0]: https://lists.apache.org/thread/oxp2tmvd66wdwo3hzbgpc8x4hvrfyl5w
[1]: https://github.com/apache/buildstream/pull/1890
[2]: https://lists.apache.org/thread/yochpvdpg28bcml85yj72pf0h58vfy2o
[3]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/
[4]: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/
[5]: https://lists.apache.org/thread/6rzvd3fd162p8653wyhwdkcy4k5h7k0h
[6]:
https://docs.buildstream.build/master/buildstream.source.html#buildstream.source.Source.translate_url
[7]: https://bazel.build/rules/lib/repo/http#http_archive-auth_patterns