Hello everyone,
Currently, some of our community plugins like collect_manifest [1] and
tools like bst-to-lorry [2], rely on a combination of assumptions based
on the reported “kind” of the Source and private Python APIs.
This can be problematic as both collect_manifest and bst-to-lorry query
sources for their "kind" and assume that certain attributes and methods
will be present (e.g., source.url). In fact, this has been discussed
before at least once [3].
Although this seems to work, it’s unreliable because even if the “kind”
string matches, there’s no guarantee that it is the expected Source, as
it can be a different Source with the same “kind” string. Plus, even if
it really is the expected Source, a future refactor could break these
assumptions as these aren’t public APIs.
Ultimately, what both of these are trying to determine is:
* The full upstream URL to the source, e.g.,
https://sourceware.org/git/glibc.git, to include it in the output
manifest or the lorry configuration file.
* The version of each source, e.g., 2.40, to include it in the output
manifest.
* Other information such as what the source is configured to track, to
include it in the lorry configuration file.
Therefore, instead of letting plugins and tools do all that unreliable
guessing, we could provide what these ultimately need by adding new
abstract methods that each Source can implement. For example, something
like:
* Source.get_urls(self) -> List[str]: Which would provide a list of full
upstream URL without any guessing or relying on accessing private
attributes, for the caller.
* Source.get_versions(self) -> List[str]: Similarly, for the versions,
but the tricky piece with this would be the need for a regexp for each
source, e.g., in case the version needs to be extracted from the Source
URL.
* Source.get_trackings(self) -> List[Optional[str]]: Similarly, for the
tracking strings. Of course this would only make sense for sources that
can actually be tracked.
Or perhaps, something that better groups these tuples, e.g.,
Source.get_actual_sources(self) -> List[Tuple[str, str, Optional[str]]],
providing URL, version and tracking strings tuples or equivalent object.
A key question here is whether something like the above would still be
too rigid or over-specified for that plugin and tool, and perhaps we
should be thinking of a more free-form API to query for these.
An idea that was mentioned when discussing this topic with Abderrahim
was to introduce to sources something similar to the elements public
data, e.g., this way we could add that version matching regexp.
What do you all think?
Regards,
Martín.
Refs:
[1]
https://gitlab.com/BuildStream/buildstream-plugins-community/-/blob/22023ad60e91ff3f635c556ed4c32ce4dfd7c2b5/src/buildstream_plugins_community/elements/collect_manifest.py
[2]
https://gitlab.com/CodethinkLabs/lorry/bst-to-lorry/-/blob/d6d3782071502c56611ceffa574d2f81e2a1eedd/bst_to_lorry.py
[3]
https://gitlab.com/BuildStream/buildstream-plugins-community/-/issues/2