abderrahim commented on code in PR #1997:
URL: https://github.com/apache/buildstream/pull/1997#discussion_r2037693646


##########
src/buildstream/plugins/sources/local.py:
##########
@@ -110,6 +110,9 @@ def init_workspace_directory(self, directory):
         #
         self.__do_stage(directory)
 
+    def collect_source_info(self):
+        return [SourceInfo(self.path, SourceInfoMedium.LOCAL, 
SourceVersionType.DIGEST, self.get_unique_key())]

Review Comment:
   `get_unique_key()` only uses the hash of the digest, which isn't enough to 
get the content from CAS.
   
   Buildbox uses the format `{hash}/{size}`, which is what we're planning to 
use in #1994 and #1991. We should use it here too. This would allow downloading 
the source from CAS using e.g. `casdownload` from buildbox



##########
src/buildstream/downloadablefilesource.py:
##########
@@ -25,6 +25,46 @@
 implementation.
 
 
+Built-in functionality
+----------------------
+The DownloadableFileSource class provides built in keys which can be set when
+intantiating any Source which derives from DownloadableFileSource
+
+* Guess pattern
+
+  This ``version-guess-pattern`` sets the regular expression which will be 
used to attempt
+  to guess the version of a source when parsing the source's URI.
+
+  The DownloadableFileSource provides a default implementation of
+  :func:`Source.collect_source_info() 
<buildstream.source.Source.collect_source_info>`,
+  which will use the ``version-guess-pattern`` to attempt to extract a human 
readable
+  version string from the specified URI, in order to fill out the reported
+  :attr:`~buildstream.source.SourceInfo.version_guess`.
+
+  The URI will be *searched* using this regular expression, and is allowed to
+  yield a number of *groups*. For example the value ``(\\d+)_(\\d+)_(\\d+)`` 
would
+  report 3 *groups* if 3 numerical values separated by underscores were found 
in
+  the URI.
+
+  The default value for ``version-guess-pattern`` is 
``\\d+\\.\\d+(?:\\.\\d+)?``.
+
+  **Since: 2.5**.
+
+Built-in functionality

Review Comment:
   I think this section doesn't belong here (or at least it should be merged 
with the previous, similarly named one)



##########
src/buildstream/plugins/sources/workspace.py:
##########
@@ -108,6 +108,9 @@ def stage_directory(self, directory):
         with self._cache_directory(digest=self.__digest) as cached_directory:
             directory._import_files_internal(cached_directory, 
collect_result=False)
 
+    def collect_source_info(self):
+        return [SourceInfo(self.path, SourceInfoMedium.WORKSPACE, 
SourceVersionType.DIGEST, self.get_unique_key())]

Review Comment:
   same comment as local



##########
src/buildstream/downloadablefilesource.py:
##########
@@ -270,6 +318,21 @@ def fetch(self):  # pylint: disable=arguments-differ
                 "File downloaded from {} has sha256sum '{}', not 
'{}'!".format(self.url, sha256, self.ref)
             )
 
+    def collect_source_info(self):
+        version_match = self._guess_pattern.search(self.original_url)
+        if not version_match:
+            version_guess = None
+        elif self._guess_pattern.groups == 0:
+            version_guess = version_match.group(0)
+        else:
+            version_guess = ".".join(version_match.groups())

Review Comment:
   Maybe we could have this version guessing logic in a function in utils?



##########
src/buildstream/source.py:
##########
@@ -262,10 +261,147 @@ def __init__(
 
 @dataclass
 class AliasSubstitution:
+    """AliasSubstitution()
+    An opaque data structure which may be passed through
+    :func:`SourceFetcher.fetch() <buildstream.source.SourceFetcher.fetch>` and 
in such cases
+    must be provided to :func:`Source.translate_url() 
<buildstream.source.Source.translate_url>`.
+    """
+
     _effective_alias: str
     _mirror: Union[SourceMirror, str]
 
 
+class SourceInfoMedium(FastEnum):
+    """
+    Indicates the medium in which the source is obtained
+
+    *Since: 2.5*
+    """
+
+    WORKSPACE = "workspace"
+    """
+    Files in an open workspace
+    """
+
+    LOCAL = "local"
+    """
+    Files stored locally in the project
+    """
+
+    REMOTE_FILE = "remote-file"
+    """
+    A remote file
+    """
+
+    GIT = "git"
+    """
+    A git repository
+    """
+
+
+class SourceVersionType(FastEnum):
+    """
+    Indicates the type of the version string
+
+    *Since: 2.5*
+    """
+
+    COMMIT = "commit"
+    """
+    A commit string which accurately represents a version in a source
+    code repository or VCS
+    """
+
+    SHA256 = "sha256"
+    """
+    An sha256 checksum
+    """
+
+    DIGEST = "digest"
+    """
+    A CAS digest representing the unique version of this source input

Review Comment:
   We could be more explicit about the `{hash}/{size}` format?



##########
src/buildstream/source.py:
##########
@@ -684,6 +821,19 @@ def is_cached(self) -> bool:
         """
         raise ImplError("Source plugin '{}' does not implement 
is_cached()".format(self.get_kind()))
 
+    def collect_source_info(self) -> Iterable[SourceInfo]:
+        """Get the :class:`.SourceInfo` objects describing this source
+
+        This method is guaranteed to only be called whenever
+        :func:`Source.is_resolved() <buildstream.source.Source.is_resolved>`
+        returns `True`.
+
+        Returns: the :class:`.SourceInfo` objects describing this source
+
+        *Since: 2.5*
+        """
+        raise ImplError("Source plugin '{}' does not implement 
collect_source_info()".format(self.get_kind()))

Review Comment:
   Could we have this collect source info from source fetchers by default? Any 
plugin that uses source fetchers will definitely want to implement source info 
in source fetchers.
   
   Of course, this is more for standardisation than anything, one can always 
implement it like you did in 
https://github.com/apache/buildstream-plugins/pull/87



##########
src/buildstream/_frontend/widget.py:
##########
@@ -437,6 +437,23 @@ def show_pipeline(self, dependencies, format_):
                 runtime_deps = [e._get_full_name() for e in 
element._dependencies(_Scope.RUN, recurse=False)]
                 line = p.fmt_subst(line, "runtime-deps", 
_yaml.roundtrip_dump_string(runtime_deps).rstrip("\n"))
 
+            # Source Information
+            if "%{source-info" in format_:
+
+                # Get all the SourceInfo objects
+                #
+                all_source_infos = []
+                for source in element.sources():

Review Comment:
   A was playing with this a little, and came up with this:
   ```suggestion
                   source_element = element._get_source_element()
                   for source in source_element.sources():
   ```
   
   given `Element._get_source_element()` gives the element that has the sources 
for this element, it really made sense. It gives a nice solution to the issue 
of actual sources of filter elements not being accessible.
   
   I'm thinking we can push it further and move this whole loop into an 
`Element.collect_source_info()` method. This way, something like the 
`collect_manifest` plugin wouldn't need to call `Element.sources()` at all (and 
we can deprecate it). It would be able to call `_get_source_element()` as 
needed, and users won't have to worry about it.
   
   Moving forward with this idea, we could even have the `compose` plugin 
collect sources from all build dependencies that contribute files to the final 
image. Though this might need some more thinking. 



##########
src/buildstream/source.py:
##########
@@ -262,10 +261,147 @@ def __init__(
 
 @dataclass
 class AliasSubstitution:
+    """AliasSubstitution()
+    An opaque data structure which may be passed through
+    :func:`SourceFetcher.fetch() <buildstream.source.SourceFetcher.fetch>` and 
in such cases
+    must be provided to :func:`Source.translate_url() 
<buildstream.source.Source.translate_url>`.
+    """
+
     _effective_alias: str
     _mirror: Union[SourceMirror, str]
 
 
+class SourceInfoMedium(FastEnum):
+    """
+    Indicates the medium in which the source is obtained
+
+    *Since: 2.5*
+    """
+
+    WORKSPACE = "workspace"
+    """
+    Files in an open workspace
+    """
+
+    LOCAL = "local"
+    """
+    Files stored locally in the project
+    """
+
+    REMOTE_FILE = "remote-file"
+    """
+    A remote file
+    """
+
+    GIT = "git"
+    """
+    A git repository
+    """
+
+
+class SourceVersionType(FastEnum):
+    """
+    Indicates the type of the version string
+
+    *Since: 2.5*
+    """
+
+    COMMIT = "commit"
+    """
+    A commit string which accurately represents a version in a source
+    code repository or VCS

Review Comment:
   Should we comment something about git specifically? In particular, our 
plugins can use either the commit sha1, or a the long version of the git 
describe.
   
   I think the current description fits both, which I think is a good thing. 
But maybe it's better to give an example.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to