Hi all,
This is a proposal for the enhancement of source alias value
resolution, including the default values and the alternative values,
such that we can more adequately handle any alternative URI for a given
source URI.
I've got two concepts to present in order to address this, with
different strenghts and weaknesses.
I would appreciate if people could poke holes in this proposal, and let
me know which approach you think would be better.
At this point I believe my preference is the variable expansion
approach detailed below, but I'm slightly on the fence and could be
swayed.
Any feedback would be greatly appreciated.
Cheers,
-Tristan
Problem statement
=================
Source alias and accompanying mirror substitutions are too rigid. Since
the inception of source aliases, they have only ever been useful for
substituting a leading portion of the URI.
This means that if I have a tarball such as:
https://ftp.gnu.org/gnu/coreutils/coreutils-9.1.tar.xz
Then I cannot easily substitute this URI with:
https://pink-zebra.com/c/coreutils-9.1.tar.xz
In this case, depending on my mirroring solution, I may have added the
tarballs to a different path. A poor mans solution to this would be to
have an alias specifically for `coreutils` declared in project.conf and
expand it in different ways which is workable but quite inconvenient.
Worse still, is if we want to have more complex substitutions, for
example if we want to mirror our tarballs in gitlab using LFS,
depending on the gitlab instance configuration, we may need to use a
URI that looks like this:
https://gitlab.flying-ponies.com/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
The goal of this proposal is to have a flexible solution to more
adequately accomodate mirroring solitions.
To make the challenge interesting, we should consider the case that
there is abolutely no commonality between the origin URL and the
mirror, for instance let's consider a mirror which behaves similar to a
CAS, and the mirror URI for coreutils looks like:
https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
Proposed solution(s)
====================
Variable expansion
------------------
In BuildStream 2 variable expansion is supported in sources already,
but it is not supported in alias value substitutions.
This solution would simply support variable expansion in alias values.
Example:
# Alias declarations
aliases:
ftp_gnu_org: https://ftp.gnu.org/gnu/%{source_basename}/%{source_fullname}
# Mirror declarations
mirrors:
- name: pink_zebra
aliases:
ftp_gnu_org:
- https://pink-zebra.com/%{source_bucket}/%{source_fullname}
- name: flying_ponies
aliases:
ftp_gnu_org:
-
https://gitlab.codethink.co.uk/api/v4/projects/%{source_project_id}/repository/files/gnu%2F%{source_basename}%2F%{source_fullname}/raw?ref=master&lfs=true
- name: hashed_potatoes
aliases:
ftp_gnu_org:
- https://potatoes.org/blobs/${blob_id}
# Element usage
variables:
source_project_id: 1400
source_bucket: c
source_blob_id:
8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
source_basename: coreutils
source_fullname: coreutils-9.1.tar.xz
sources:
- kind: tar
url: ftp_gnu_org
Advantages
~~~~~~~~~~
o Consistant with BuildStream APIs, we simply extend the scope of
variable expansion to also cover alias values.
o Allows usage of the same alias for many sources
Caveats
~~~~~~~
o Requires additional variables to handle URIs which require
different values.
I.e. the pink zebra mirror categorizes mirrored tarballs into
buckets, spreading out mirrored tarballs into directories named
after the first letter of the tarball name, and the flying ponies
mirror requires knowledge of the gitlab project ID in order to
resolve the URI properly.
While this is not horrible, this approach adds some cognitive
complexity to project authors inasmuch as the knowledge required to
evaluate a URI is spread out across more locations (alias values,
source URI strings, variables).
o Variable name collisions
In the above example, it would be prudent to prefix the expected
variable names with the alias name "ftp_gnu_org", like
"ftp_gnu_org_basename", "ftp_gnu_org_fullname", etc.
Since variables are resolved at the element level, it is
conceivable that variables intended for alias value expansion may
conflict in the case that a single element uses multiple sources
and multiple aliases.
While it is rare, it is also possible for a single source to use
multiple URIs with different aliases too, e.g. git submodules.
o A bit tricky to implement, the implementation should probably
include some load time validation to ensure that required variables
are declared for all possible mirrors.
Overall, I think this approach is powerful and only becomes slightly
difficult to work with and confusing in the edge cases.
Mirror overrides
----------------
The brute force option here would be to add some configuration to
sources, such that a source could explicitly override the URI for a
given mirror name.
The rationale here would be that, in the case that regular alias
substitution is insufficient.
Example:
sources:
- kind: tar
url: ftp_gnu_org:coreutils/coreutils-9.1.tar.xz
mirror-uris:
pink_zebra:
ftp_gnu_org: https://pink-zebra.com/c/coreutils-9.1.tar.xz
flying_ponies:
ftp_gnu_org:
https://gitlab.codethink.co.uk/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
hashed_potatoes:
ftp_gnu_org:
https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
Advantages
~~~~~~~~~~
o Highly human readable. By keeping the full URI in one place we
reduce the cognitive complexity required to achieve our goals.
o Very simple to implement
Caveats
~~~~~~~
o Highly repetitive and redundant.
In the case that a project needs to support weird URIs, like the
flying ponies or hashed potatoes showcased in the examples - it is
likely that the project uses this mirror for a large number of
source URIs.
While this approach is more readable, it would require a lot more
redundant information spread across many elements.