Hi Martín,

On Thu, 2024-11-21 at 07:42 -0300, Martín Abente Lahaye wrote:
> Hello everyone,
> 
> Currently, some of our community plugins like collect_manifest [1] and 
> tools like bst-to-lorry [2], rely on a combination of assumptions based 
> on the reported “kind” of the Source and private Python APIs.
> 
> This can be problematic as both collect_manifest and bst-to-lorry query 
> sources for their "kind" and assume that certain attributes and methods 
> will be present (e.g., source.url). In fact, this has been discussed 
> before at least once [3].
> 
> Although this seems to work, it’s unreliable because even if the “kind” 
> string matches, there’s no guarantee that it is the expected Source, as 
> it can be a different Source with the same “kind” string. Plus, even if 
> it really is the expected Source, a future refactor could break these 
> assumptions as these aren’t public APIs.

Yes, well known issue, thanks for bringing this back to light :)

[...] 
> 
> Therefore, instead of letting plugins and tools do all that unreliable 
> guessing, we could provide what these ultimately need by adding new 
> abstract methods that each Source can implement. For example, something 
> like:
> 
> * Source.get_urls(self) -> List[str]: Which would provide a list of full 
> upstream URL without any guessing or relying on accessing private 
> attributes, for the caller.
> * Source.get_versions(self) -> List[str]: Similarly, for the versions, 
> but the tricky piece with this would be the need for a regexp for each 
> source, e.g., in case the version needs to be extracted from the Source 
> URL.
> * Source.get_trackings(self) -> List[Optional[str]]: Similarly, for the 
> tracking strings. Of course this would only make sense for sources that 
> can actually be tracked.

I'm very much in favor of crafting Source APIs for Sources to report
common information about Sources in a standard API path.

This has the advantage of being easy enough to implement on Source
implementations once, and therefore be leveraged on a multitude of
projects with little or no effort (asides from perhaps having those
projects *use* the new plugin versions which support these new APIs).

Into some specifics:

* Given that a Source may have multiple URIs, refs, and tracking
  informations, I think probably a more natural API would be to have
  something like a SourceInfo object defined, and ask Source
  implementations to return a list of them (e.g. list_source_info()) 

  This is mostly just for a pretty API, it's easier this way to know
  what information belongs together.

  In 99% of cases, this will return a single entry list. 

* Source.get_versions() is ambiguous to me.

  From the BuildStream perspective, what is a "version" of a Source ? 

  Is it necessary to know the "version" of a Source without having the
  Source data cached locally to compute it ? 

  It seems to me that we probably want to fetch the source first, so
  that the Source implementation has the liberty to interrogate the
  data in order to guess what a "version" is, perhaps by invoking
  things like `git describe` 

  I'm not convinced that deriving this information from the URL alone,
  or even the URL and the ref, is sufficient for the Source to make 
  a qualified "guess" at the version. 

  In any case, this will likely be a "guess" no matter how the
  plugin computes this "version".

 
> 
> Or perhaps, something that better groups these tuples, e.g., 
> Source.get_actual_sources(self) -> List[Tuple[str, str, Optional[str]]], 
> providing URL, version and tracking strings tuples or equivalent object.

Ah yes, as I mentioned above, however I would prefer BuildStream
qualified objects, which are extensible in the future, with some
versioning strategy, rather than rigidly defining a tuple.

> 
> A key question here is whether something like the above would still be 
> too rigid or over-specified for that plugin and tool, and perhaps we 
> should be thinking of a more free-form API to query for these.
> 
> An idea that was mentioned when discussing this topic with Abderrahim 
> was to introduce to sources something similar to the elements public 
> data, e.g., this way we could add that version matching regexp.

I think we *also* want public data for this.

I.e. the Source cannot make a fully qualified "guess" at the "version",
and there may be other attributes we want to associate with a "source".

As such, the project author should have the authority to override the
Source implementation's "guess" by explicitly mentioning it in the bst
file.

Further, I would consider this proposal incomplete without adding CLI 
support to interrogate this information with.

For the collect_manifest problem, I hope that we can address this
without needing to use a plugin at all, we should be able to get away
with only:

  * bst source fetch --deps all <ELEMENT>
  * bst source show --deps all --format <FORMAT> <ELEMENT>

Cheers,
    -Tristan




Reply via email to