Hi again,
On Tue, 2023-09-19 at 15:04 +0900, Tristan van Berkom wrote:
> Hi all,
>
> This is a proposal for the enhancement of source alias value
> resolution, including the default values and the alternative values,
> such that we can more adequately handle any alternative URI for a given
> source URI.
[...]
> Any feedback would be greatly appreciated.
So, while discussing briefly on IRC with Jürg, we already discovered
some holes in this concept.
Specifically, both of my approaches require element level input in
order to provide context in the rendering of a URI, which is IMO fine
if you are just working on your own project, but, this causes issues
when:
* You are working with a third party project for which you want to
mirror all of their sources reliably
* You are working with subprojects
This means that at the very least, one should be able to manipulate the
mirror URLs for elements which are not under your control.
In the current implementation, this is already allowed with the current
limited semantics in the user configuration:
https://docs.buildstream.build/2.0/using_config.html#mirrors
I will try to come up with something better...
Cheers,
-Tristan
> Cheers,
> -Tristan
>
>
> Problem statement
> =================
> Source alias and accompanying mirror substitutions are too rigid.
> Since
> the inception of source aliases, they have only ever been useful for
> substituting a leading portion of the URI.
>
> This means that if I have a tarball such as:
>
> https://ftp.gnu.org/gnu/coreutils/coreutils-9.1.tar.xz
>
> Then I cannot easily substitute this URI with:
>
> https://pink-zebra.com/c/coreutils-9.1.tar.xz
>
> In this case, depending on my mirroring solution, I may have added
> the
> tarballs to a different path. A poor mans solution to this would be
> to
> have an alias specifically for `coreutils` declared in project.conf
> and
> expand it in different ways which is workable but quite inconvenient.
>
> Worse still, is if we want to have more complex substitutions, for
> example if we want to mirror our tarballs in gitlab using LFS,
> depending on the gitlab instance configuration, we may need to use a
> URI that looks like this:
>
>
> https://gitlab.flying-ponies.com/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
>
> The goal of this proposal is to have a flexible solution to more
> adequately accomodate mirroring solitions.
>
> To make the challenge interesting, we should consider the case that
> there is abolutely no commonality between the origin URL and the
> mirror, for instance let's consider a mirror which behaves similar to
> a
> CAS, and the mirror URI for coreutils looks like:
>
>
> https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
>
>
> Proposed solution(s)
> ====================
>
> Variable expansion
> ------------------
> In BuildStream 2 variable expansion is supported in sources already,
> but it is not supported in alias value substitutions.
>
> This solution would simply support variable expansion in alias
> values.
>
> Example:
>
> # Alias declarations
> aliases:
> ftp_gnu_org:
> https://ftp.gnu.org/gnu/%{source_basename}/%{source_fullname}
>
>
> # Mirror declarations
> mirrors:
> - name: pink_zebra
> aliases:
> ftp_gnu_org:
> - https://pink-zebra.com/%{source_bucket}/%{source_fullname}
> - name: flying_ponies
> aliases:
> ftp_gnu_org:
> -
> https://gitlab.codethink.co.uk/api/v4/projects/%{source_project_id}/repository/files/gnu%2F%{source_basename}%2F%{source_fullname}/raw?ref=master&lfs=true
> - name: hashed_potatoes
> aliases:
> ftp_gnu_org:
> - https://potatoes.org/blobs/${blob_id}
>
> # Element usage
> variables:
> source_project_id: 1400
> source_bucket: c
> source_blob_id:
> 8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
> source_basename: coreutils
> source_fullname: coreutils-9.1.tar.xz
> sources:
> - kind: tar
> url: ftp_gnu_org
>
>
> Advantages
> ~~~~~~~~~~
>
> o Consistant with BuildStream APIs, we simply extend the scope of
> variable expansion to also cover alias values.
>
> o Allows usage of the same alias for many sources
>
>
> Caveats
> ~~~~~~~
>
> o Requires additional variables to handle URIs which require
> different values.
>
> I.e. the pink zebra mirror categorizes mirrored tarballs into
> buckets, spreading out mirrored tarballs into directories named
> after the first letter of the tarball name, and the flying ponies
> mirror requires knowledge of the gitlab project ID in order to
> resolve the URI properly.
>
> While this is not horrible, this approach adds some cognitive
> complexity to project authors inasmuch as the knowledge required
> to
> evaluate a URI is spread out across more locations (alias values,
> source URI strings, variables).
>
> o Variable name collisions
>
> In the above example, it would be prudent to prefix the expected
> variable names with the alias name "ftp_gnu_org", like
> "ftp_gnu_org_basename", "ftp_gnu_org_fullname", etc.
>
> Since variables are resolved at the element level, it is
> conceivable that variables intended for alias value expansion may
> conflict in the case that a single element uses multiple sources
> and multiple aliases.
>
> While it is rare, it is also possible for a single source to use
> multiple URIs with different aliases too, e.g. git submodules.
>
> o A bit tricky to implement, the implementation should probably
> include some load time validation to ensure that required
> variables
> are declared for all possible mirrors.
>
> Overall, I think this approach is powerful and only becomes slightly
> difficult to work with and confusing in the edge cases.
>
>
> Mirror overrides
> ----------------
> The brute force option here would be to add some configuration to
> sources, such that a source could explicitly override the URI for a
> given mirror name.
>
> The rationale here would be that, in the case that regular alias
> substitution is insufficient.
>
> Example:
>
> sources:
> - kind: tar
> url: ftp_gnu_org:coreutils/coreutils-9.1.tar.xz
>
> mirror-uris:
> pink_zebra:
> ftp_gnu_org: https://pink-zebra.com/c/coreutils-9.1.tar.xz
> flying_ponies:
> ftp_gnu_org:
> https://gitlab.codethink.co.uk/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
> hashed_potatoes:
> ftp_gnu_org:
> https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
>
>
> Advantages
> ~~~~~~~~~~
>
> o Highly human readable. By keeping the full URI in one place we
> reduce the cognitive complexity required to achieve our goals.
>
> o Very simple to implement
>
> Caveats
> ~~~~~~~
>
> o Highly repetitive and redundant.
>
> In the case that a project needs to support weird URIs, like the
> flying ponies or hashed potatoes showcased in the examples - it
> is
> likely that the project uses this mirror for a large number of
> source URIs.
>
> While this approach is more readable, it would require a lot more
> redundant information spread across many elements.
>
>
>
>