Hi Martín,
On Thu, 2024-11-21 at 07:42 -0300, Martín Abente Lahaye wrote:
> Hello everyone,
>
> Currently, some of our community plugins like collect_manifest [1] and
> tools like bst-to-lorry [2], rely on a combination of assumptions based
> on the reported “kind” of the Source and private Python APIs.
>
> This can be problematic as both collect_manifest and bst-to-lorry query
> sources for their "kind" and assume that certain attributes and methods
> will be present (e.g., source.url). In fact, this has been discussed
> before at least once [3].
>
> Although this seems to work, it’s unreliable because even if the “kind”
> string matches, there’s no guarantee that it is the expected Source, as
> it can be a different Source with the same “kind” string. Plus, even if
> it really is the expected Source, a future refactor could break these
> assumptions as these aren’t public APIs.
Yes, well known issue, thanks for bringing this back to light :)
[...]
>
> Therefore, instead of letting plugins and tools do all that unreliable
> guessing, we could provide what these ultimately need by adding new
> abstract methods that each Source can implement. For example, something
> like:
>
> * Source.get_urls(self) -> List[str]: Which would provide a list of full
> upstream URL without any guessing or relying on accessing private
> attributes, for the caller.
> * Source.get_versions(self) -> List[str]: Similarly, for the versions,
> but the tricky piece with this would be the need for a regexp for each
> source, e.g., in case the version needs to be extracted from the Source
> URL.
> * Source.get_trackings(self) -> List[Optional[str]]: Similarly, for the
> tracking strings. Of course this would only make sense for sources that
> can actually be tracked.
I'm very much in favor of crafting Source APIs for Sources to report
common information about Sources in a standard API path.
This has the advantage of being easy enough to implement on Source
implementations once, and therefore be leveraged on a multitude of
projects with little or no effort (asides from perhaps having those
projects *use* the new plugin versions which support these new APIs).
Into some specifics:
* Given that a Source may have multiple URIs, refs, and tracking
informations, I think probably a more natural API would be to have
something like a SourceInfo object defined, and ask Source
implementations to return a list of them (e.g. list_source_info())
This is mostly just for a pretty API, it's easier this way to know
what information belongs together.
In 99% of cases, this will return a single entry list.
* Source.get_versions() is ambiguous to me.
From the BuildStream perspective, what is a "version" of a Source ?
Is it necessary to know the "version" of a Source without having the
Source data cached locally to compute it ?
It seems to me that we probably want to fetch the source first, so
that the Source implementation has the liberty to interrogate the
data in order to guess what a "version" is, perhaps by invoking
things like `git describe`
I'm not convinced that deriving this information from the URL alone,
or even the URL and the ref, is sufficient for the Source to make
a qualified "guess" at the version.
In any case, this will likely be a "guess" no matter how the
plugin computes this "version".
>
> Or perhaps, something that better groups these tuples, e.g.,
> Source.get_actual_sources(self) -> List[Tuple[str, str, Optional[str]]],
> providing URL, version and tracking strings tuples or equivalent object.
Ah yes, as I mentioned above, however I would prefer BuildStream
qualified objects, which are extensible in the future, with some
versioning strategy, rather than rigidly defining a tuple.
>
> A key question here is whether something like the above would still be
> too rigid or over-specified for that plugin and tool, and perhaps we
> should be thinking of a more free-form API to query for these.
>
> An idea that was mentioned when discussing this topic with Abderrahim
> was to introduce to sources something similar to the elements public
> data, e.g., this way we could add that version matching regexp.
I think we *also* want public data for this.
I.e. the Source cannot make a fully qualified "guess" at the "version",
and there may be other attributes we want to associate with a "source".
As such, the project author should have the authority to override the
Source implementation's "guess" by explicitly mentioning it in the bst
file.
Further, I would consider this proposal incomplete without adding CLI
support to interrogate this information with.
For the collect_manifest problem, I hope that we can address this
without needing to use a plugin at all, we should be able to get away
with only:
* bst source fetch --deps all <ELEMENT>
* bst source show --deps all --format <FORMAT> <ELEMENT>
Cheers,
-Tristan