Hi all,

This is a proposal for the enhancement of source alias value
resolution, including the default values and the alternative values,
such that we can more adequately handle any alternative URI for a given
source URI.

I've got two concepts to present in order to address this, with
different strenghts and weaknesses.

I would appreciate if people could poke holes in this proposal, and let
me know which approach you think would be better.

At this point I believe my preference is the variable expansion
approach detailed below, but I'm slightly on the fence and could be
swayed.

Any feedback would be greatly appreciated.

Cheers,
    -Tristan


Problem statement
=================
Source alias and accompanying mirror substitutions are too rigid. Since
the inception of source aliases, they have only ever been useful for
substituting a leading portion of the URI.

This means that if I have a tarball such as:

    https://ftp.gnu.org/gnu/coreutils/coreutils-9.1.tar.xz

Then I cannot easily substitute this URI with:

    https://pink-zebra.com/c/coreutils-9.1.tar.xz

In this case, depending on my mirroring solution, I may have added the
tarballs to a different path. A poor mans solution to this would be to
have an alias specifically for `coreutils` declared in project.conf and
expand it in different ways which is workable but quite inconvenient.

Worse still, is if we want to have more complex substitutions, for
example if we want to mirror our tarballs in gitlab using LFS,
depending on the gitlab instance configuration, we may need to use a
URI that looks like this:

    
https://gitlab.flying-ponies.com/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true

The goal of this proposal is to have a flexible solution to more
adequately accomodate mirroring solitions.

To make the challenge interesting, we should consider the case that
there is abolutely no commonality between the origin URL and the
mirror, for instance let's consider a mirror which behaves similar to a
CAS, and the mirror URI for coreutils looks like:

    
https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981


Proposed solution(s)
====================

Variable expansion
------------------
In BuildStream 2 variable expansion is supported in sources already,
but it is not supported in alias value substitutions.

This solution would simply support variable expansion in alias values.

Example:

  # Alias declarations
  aliases:
    ftp_gnu_org: https://ftp.gnu.org/gnu/%{source_basename}/%{source_fullname}


  # Mirror declarations
  mirrors:
  - name: pink_zebra
    aliases:
      ftp_gnu_org:
      - https://pink-zebra.com/%{source_bucket}/%{source_fullname}
  - name: flying_ponies
    aliases:
      ftp_gnu_org:
      - 
https://gitlab.codethink.co.uk/api/v4/projects/%{source_project_id}/repository/files/gnu%2F%{source_basename}%2F%{source_fullname}/raw?ref=master&lfs=true
  - name: hashed_potatoes
    aliases:
      ftp_gnu_org:
        - https://potatoes.org/blobs/${blob_id}

  # Element usage
  variables:
    source_project_id: 1400
    source_bucket: c
    source_blob_id: 
8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981
    source_basename: coreutils
    source_fullname: coreutils-9.1.tar.xz
  sources:
  - kind: tar
    url: ftp_gnu_org


Advantages
~~~~~~~~~~

  o Consistant with BuildStream APIs, we simply extend the scope of
    variable expansion to also cover alias values.

  o Allows usage of the same alias for many sources


Caveats
~~~~~~~

  o Requires additional variables to handle URIs which require
    different values.

    I.e. the pink zebra mirror categorizes mirrored tarballs into
    buckets, spreading out mirrored tarballs into directories named
    after the first letter of the tarball name, and the flying ponies
    mirror requires knowledge of the gitlab project ID in order to
    resolve the URI properly.

    While this is not horrible, this approach adds some cognitive
    complexity to project authors inasmuch as the knowledge required to
    evaluate a URI is spread out across more locations (alias values,
    source URI strings, variables).

  o Variable name collisions

    In the above example, it would be prudent to prefix the expected
    variable names with the alias name "ftp_gnu_org", like
    "ftp_gnu_org_basename", "ftp_gnu_org_fullname", etc.

    Since variables are resolved at the element level, it is
    conceivable that variables intended for alias value expansion may
    conflict in the case that a single element uses multiple sources
    and multiple aliases.

    While it is rare, it is also possible for a single source to use
    multiple URIs with different aliases too, e.g. git submodules.

  o A bit tricky to implement, the implementation should probably
    include some load time validation to ensure that required variables
    are declared for all possible mirrors.

Overall, I think this approach is powerful and only becomes slightly
difficult to work with and confusing in the edge cases.


Mirror overrides
----------------
The brute force option here would be to add some configuration to
sources, such that a source could explicitly override the URI for a
given mirror name.

The rationale here would be that, in the case that regular alias
substitution is insufficient.

Example:

  sources:
  - kind: tar
    url: ftp_gnu_org:coreutils/coreutils-9.1.tar.xz

    mirror-uris:
      pink_zebra:
        ftp_gnu_org: https://pink-zebra.com/c/coreutils-9.1.tar.xz
      flying_ponies:
        ftp_gnu_org: 
https://gitlab.codethink.co.uk/api/v4/projects/1400/repository/files/gnu%2Fcoreutils%2Fcoreutils-9.1.tar.xz/raw?ref=master&lfs=true
      hashed_potatoes:
        ftp_gnu_org: 
https://potatoes.org/blobs/8a/9bcb733e2c1ea4d773c9d5061b24b7a8e29009c071a34cf5fe041f6533d981


Advantages
~~~~~~~~~~

  o Highly human readable. By keeping the full URI in one place we
    reduce the cognitive complexity required to achieve our goals.

  o Very simple to implement

Caveats
~~~~~~~

  o Highly repetitive and redundant.

    In the case that a project needs to support weird URIs, like the
    flying ponies or hashed potatoes showcased in the examples - it is
    likely that the project uses this mirror for a large number of
    source URIs.

    While this approach is more readable, it would require a lot more
    redundant information spread across many elements.



Reply via email to