Hello Martín, Thanks for the write up. See my comments inline below.
Le jeu. 21 nov. 2024 à 11:42, Martín Abente Lahaye <[email protected]> a écrit : > Ultimately, what both of these are trying to determine is: > > * The full upstream URL to the source, e.g., > https://sourceware.org/git/glibc.git, to include it in the output > manifest or the lorry configuration file. > * The version of each source, e.g., 2.40, to include it in the output > manifest. > * Other information such as what the source is configured to track, to > include it in the lorry configuration file. I think this is pretty accurate. I would just add that it tries to guess the version based on the tarball name or the git describe output for a git repository (which it assumes is the format used by the ref). > Therefore, instead of letting plugins and tools do all that unreliable > guessing, we could provide what these ultimately need by adding new > abstract methods that each Source can implement. For example, something > like: > > * Source.get_urls(self) -> List[str]: Which would provide a list of full > upstream URL without any guessing or relying on accessing private > attributes, for the caller. > * Source.get_versions(self) -> List[str]: Similarly, for the versions, > but the tricky piece with this would be the need for a regexp for each > source, e.g., in case the version needs to be extracted from the Source > URL. > * Source.get_trackings(self) -> List[Optional[str]]: Similarly, for the > tracking strings. Of course this would only make sense for sources that > can actually be tracked. I'd add as prior art what was done in buildstream-plugins-community (for collect_manifest plugin, but was extended for bst-to-lorry). See [1] Source.export_manifest(self) -> Dict[str, str] This returns a more or less free-form dictionary with keys defined in the collect_manifest plugin for a few things * "type" for the type of the source (archive/git/patch) * "url" or "path" for the provenance * "commit" or "sha256" for the ref As it's freeform, it can be extended to support other things, such as the tracked branch which is read by bst-to-lorry. > Or perhaps, something that better groups these tuples, e.g., > Source.get_actual_sources(self) -> List[Tuple[str, str, Optional[str]]], > providing URL, version and tracking strings tuples or equivalent object. This reminds me of the proposal to use the Remote Asset API [2] for fetching the sources. If we end up going with an idea similar to this, it should at least take into consideration a future use of Remote Asset. > An idea that was mentioned when discussing this topic with Abderrahim > was to introduce to sources something similar to the elements public > data, e.g., this way we could add that version matching regexp. This would essentially be another way to have sources export free-form data to other plugins. It's interesting in that we would reproduce the same concept that we already have for elements, rather than inventing something new. Cheers, Abderrahim [1] https://buildstream.gitlab.io/buildstream-plugins-community/elements/collect_manifest.html [2] https://github.com/apache/buildstream/issues/1274
