Hi,
I have been working on a revamped[1] bazel_source plugin to get
buildstream to
download the resources bazel needs to build (bazel normally fetches
resources
from the internet but cannot do that in a sandbox).
Bazel can be made to fetch sources from a repository cache[2], and bazel
picks
resources from the repository cache based on both the checksum and the
"URL that
bazel thinks the resource lives at" (this is typically the upstream URL). So
when the revamped bazel_source plugin creates a repository cache for
bazel it
needs to know the URL bazel thinks the resource lives at.
Currently we require the element's author to explicitly add the "URL bazel
thinks the resource lives at" but if a `Source.get_urls(self) ->
List[str]` API
existed then (in 90% of cases where "the URL bazel thinks the resource
lives at"
is the upstream URL) the bazel_plugin could determine the URL atomically
(we may
not want to implicitly set it, instead adding it during `bst track`).
TLDR; the `Source.get_urls(self) -> List[str]` API sounds very useful to me!
Harry
[1]: the existing bazel_source plugin only supports `http_archive`, is
hard to
configure, and doesn't use buildstream's aliased urls instead
using "URL
that bazel thinks the resource lives at" directly.
[2]:
https://sluongng.hashnode.dev/bazel-caching-explained-pt-3-repository-cache
On 21/11/2024 10:42, Martín Abente Lahaye wrote:
Hello everyone,
Currently, some of our community plugins like collect_manifest [1] and
tools like bst-to-lorry [2], rely on a combination of assumptions
based on the reported “kind” of the Source and private Python APIs.
This can be problematic as both collect_manifest and bst-to-lorry
query sources for their "kind" and assume that certain attributes and
methods will be present (e.g., source.url). In fact, this has been
discussed before at least once [3].
Although this seems to work, it’s unreliable because even if the
“kind” string matches, there’s no guarantee that it is the expected
Source, as it can be a different Source with the same “kind” string.
Plus, even if it really is the expected Source, a future refactor
could break these assumptions as these aren’t public APIs.
Ultimately, what both of these are trying to determine is:
* The full upstream URL to the source, e.g.,
https://sourceware.org/git/glibc.git, to include it in the output
manifest or the lorry configuration file.
* The version of each source, e.g., 2.40, to include it in the output
manifest.
* Other information such as what the source is configured to track, to
include it in the lorry configuration file.
Therefore, instead of letting plugins and tools do all that unreliable
guessing, we could provide what these ultimately need by adding new
abstract methods that each Source can implement. For example,
something like:
* Source.get_urls(self) -> List[str]: Which would provide a list of
full upstream URL without any guessing or relying on accessing private
attributes, for the caller.
* Source.get_versions(self) -> List[str]: Similarly, for the versions,
but the tricky piece with this would be the need for a regexp for each
source, e.g., in case the version needs to be extracted from the
Source URL.
* Source.get_trackings(self) -> List[Optional[str]]: Similarly, for
the tracking strings. Of course this would only make sense for sources
that can actually be tracked.
Or perhaps, something that better groups these tuples, e.g.,
Source.get_actual_sources(self) -> List[Tuple[str, str,
Optional[str]]], providing URL, version and tracking strings tuples or
equivalent object.
A key question here is whether something like the above would still be
too rigid or over-specified for that plugin and tool, and perhaps we
should be thinking of a more free-form API to query for these.
An idea that was mentioned when discussing this topic with Abderrahim
was to introduce to sources something similar to the elements public
data, e.g., this way we could add that version matching regexp.
What do you all think?
Regards,
Martín.
Refs:
[1]
https://gitlab.com/BuildStream/buildstream-plugins-community/-/blob/22023ad60e91ff3f635c556ed4c32ce4dfd7c2b5/src/buildstream_plugins_community/elements/collect_manifest.py
[2]
https://gitlab.com/CodethinkLabs/lorry/bst-to-lorry/-/blob/d6d3782071502c56611ceffa574d2f81e2a1eedd/bst_to_lorry.py
[3]
https://gitlab.com/BuildStream/buildstream-plugins-community/-/issues/2