Hi,

I have been working on a revamped[1] bazel_source plugin to get buildstream to

download the resources bazel needs to build (bazel normally fetches resources
from the internet but cannot do that in a sandbox).

Bazel can be made to fetch sources from a repository cache[2], and bazel picks resources from the repository cache based on both the checksum and the "URL that
bazel thinks the resource lives at" (this is typically the upstream URL). So
when the revamped bazel_source plugin creates a repository cache for bazel it
needs to know the URL bazel thinks the resource lives at.

Currently we require the element's author to explicitly add the "URL bazel
thinks the resource lives at" but if a `Source.get_urls(self) -> List[str]` API existed then (in 90% of cases where "the URL bazel thinks the resource lives at" is the upstream URL) the bazel_plugin could determine the URL atomically (we may
not want to implicitly set it, instead adding it during `bst track`).

TLDR; the `Source.get_urls(self) -> List[str]` API sounds very useful to me!

Harry

[1]: the existing bazel_source plugin only supports `http_archive`, is hard to      configure, and doesn't use buildstream's aliased urls instead using "URL
     that bazel thinks the resource lives at" directly.
[2]: https://sluongng.hashnode.dev/bazel-caching-explained-pt-3-repository-cache

On 21/11/2024 10:42, Martín Abente Lahaye wrote:
Hello everyone,

Currently, some of our community plugins like collect_manifest [1] and tools like bst-to-lorry [2], rely on a combination of assumptions based on the reported “kind” of the Source and private Python APIs.

This can be problematic as both collect_manifest and bst-to-lorry query sources for their "kind" and assume that certain attributes and methods will be present (e.g., source.url). In fact, this has been discussed before at least once [3].

Although this seems to work, it’s unreliable because even if the “kind” string matches, there’s no guarantee that it is the expected Source, as it can be a different Source with the same “kind” string. Plus, even if it really is the expected Source, a future refactor could break these assumptions as these aren’t public APIs.

Ultimately, what both of these are trying to determine is:

* The full upstream URL to the source, e.g., https://sourceware.org/git/glibc.git, to include it in the output manifest or the lorry configuration file. * The version of each source, e.g., 2.40, to include it in the output manifest. * Other information such as what the source is configured to track, to include it in the lorry configuration file.

Therefore, instead of letting plugins and tools do all that unreliable guessing, we could provide what these ultimately need by adding new abstract methods that each Source can implement. For example, something like:

* Source.get_urls(self) -> List[str]: Which would provide a list of full upstream URL without any guessing or relying on accessing private attributes, for the caller. * Source.get_versions(self) -> List[str]: Similarly, for the versions, but the tricky piece with this would be the need for a regexp for each source, e.g., in case the version needs to be extracted from the Source URL. * Source.get_trackings(self) -> List[Optional[str]]: Similarly, for the tracking strings. Of course this would only make sense for sources that can actually be tracked.

Or perhaps, something that better groups these tuples, e.g., Source.get_actual_sources(self) -> List[Tuple[str, str, Optional[str]]], providing URL, version and tracking strings tuples or equivalent object.

A key question here is whether something like the above would still be too rigid or over-specified for that plugin and tool, and perhaps we should be thinking of a more free-form API to query for these.

An idea that was mentioned when discussing this topic with Abderrahim was to introduce to sources something similar to the elements public data, e.g., this way we could add that version matching regexp.

What do you all think?

Regards,
Martín.

Refs:
[1] https://gitlab.com/BuildStream/buildstream-plugins-community/-/blob/22023ad60e91ff3f635c556ed4c32ce4dfd7c2b5/src/buildstream_plugins_community/elements/collect_manifest.py [2] https://gitlab.com/CodethinkLabs/lorry/bst-to-lorry/-/blob/d6d3782071502c56611ceffa574d2f81e2a1eedd/bst_to_lorry.py [3] https://gitlab.com/BuildStream/buildstream-plugins-community/-/issues/2

Reply via email to