On Wed, Mar 4, 2026 at 10:05 AM Stefano Tondo <[email protected]> wrote:
>
> Extract version information for Git-based source components in SPDX 3.0
> SBOMs to improve SBOM completeness and enable better supply chain tracking.
>
> Problem:
> Git repositories fetched as SRC_URI entries currently appear in SBOMs
> without version information (software_packageVersion is null). This makes
> it difficult to track which specific revision of a dependency was used,
> reducing SBOM usefulness for security and compliance tracking.
>
> Solution:
> - Extract SRCREV for Git sources and use it as packageVersion
> - Use fd.revision attribute (the resolved Git commit)
> - Fallback to SRCREV variable if fd.revision not available
> - Use first 12 characters as version (standard Git short hash)
> - Generate pkg:github PURLs for GitHub repositories (official PURL type)
> - Add comprehensive debug logging for troubleshooting
>
> Impact:
> - Git source components now have version information
> - GitHub repositories get proper PURLs (pkg:github/owner/repo@commit)
> - Enables tracking specific commit dependencies in SBOMs
>
> Signed-off-by: Stefano Tondo <[email protected]>
> ---
>  meta/lib/oe/spdx30_tasks.py | 80 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 80 insertions(+)
>
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 11945a622d..78d1dfd250 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -569,6 +569,86 @@ def add_download_files(d, objset):
>                  )
>              )
>
> +            # Extract version and PURL for source packages
> +            dep_version = None
> +            dep_purl = None
> +
> +            # For Git repositories, extract version from SRCREV
> +            if fd.type == "git":
> +                srcrev = None
> +
> +                # Try to get SRCREV for this specific source URL
> +                # Note: fd.revision (not fd.revisions) contains the resolved 
> revision
> +                if hasattr(fd, 'revision') and fd.revision:
> +                    srcrev = fd.revision
> +                    bb.debug(1, f"SPDX: Found fd.revision for {file_name}: 
> {srcrev}")
> +
> +                # Note: We intentionally do NOT fall back to 
> d.getVar('SRCREV')
> +                # because referencing SRCREV in BBIMPORTS-registered module 
> code
> +                # causes bitbake's signature generator to trace the SRCREV ->
> +                # AUTOREV dependency chain during recipe finalization, 
> triggering
> +                # "AUTOREV/SRCPV set too late" errors for non-git temp 
> recipes
> +                # used by recipetool/devtool with HTTP sources.
> +                # fd.revision is always available for git sources after 
> fetch.

I'm fine with using fd.revision if it's correct.... but is this a bug
in devtool and recipetool?

> +                if srcrev and srcrev not in ['${AUTOREV}', 'AUTOINC', 
> 'INVALID']:

Minor: A set would be more efficient:

  srcrev not in {"${AUTOREV}", "AUTOINC", "INVALID}:

> +                    # Use first 12 characters of Git commit as version 
> (standard Git short hash)
> +                    dep_version = srcrev[:12] if len(srcrev) >= 12 else 
> srcrev

Is it always 12, or is it "12 or however many are required to be
disambiguous" (which would require asking git)? I'd prefer to use the
full SHA-1 to prevent that.

> +                    bb.debug(1, f"SPDX: Extracted Git version for 
> {file_name}: {dep_version}")
> +
> +                    # Generate PURL for Git hosting services
> +                    # Reference: 
> https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst
> +                    download_location = oe.spdx_common.fetch_data_to_uri(fd, 
> fd.name)
> +                    if download_location and 
> download_location.startswith('git+'):
> +                        git_url = download_location[4:]  # Remove 'git+' 
> prefix
> +
> +                        # Build Git PURL handlers from default + custom 
> mappings
> +                        # Format: 'domain': ('purl_type', lambda to extract 
> path)
> +                        # Can be extended in meta-siemens or other layers 
> via SPDX_GIT_PURL_MAPPINGS
> +                        git_purl_handlers = {
> +                            'github.com': ('pkg:github', lambda parts: 
> f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None),

You lambda is always the same for all of the entries in this hash
table; given that I don't see why we need it in the table.

> +                            # Note: pkg:gitlab is NOT in official PURL spec, 
> so we omit it by default
> +                            # Other Git hosts can be added via 
> SPDX_GIT_PURL_MAPPINGS
> +                        }
> +
> +                        # Allow layers to extend PURL mappings via 
> SPDX_GIT_PURL_MAPPINGS variable
> +                        # Format: "domain1:purl_type1 domain2:purl_type2"
> +                        # Example: SPDX_GIT_PURL_MAPPINGS = 
> "gitlab.com:pkg:gitlab git.example.com:pkg:generic"
> +                        custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
> +                        if custom_mappings:
> +                            for mapping in custom_mappings.split():
> +                                try:
> +                                    domain, purl_type = mapping.split(':')

This would fail with your example of "gitlab.com:pkg:gitlab" because
it would split into 3 parts and you are only capturing 2. You probably
want `mappings.split(":", 1), and some tests

> +                                    # Use simple path handler for custom 
> domains
> +                                    git_purl_handlers[domain] = (purl_type, 
> lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 
> else None)
> +                                    bb.debug(2, f"SPDX: Added custom Git 
> PURL mapping: {domain} -> {purl_type}")
> +                                except ValueError:
> +                                    bb.warn(f"SPDX: Invalid 
> SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
> +
> +                        for domain, (purl_type, path_handler) in 
> git_purl_handlers.items():
> +                            if f'://{domain}/' in git_url or f'//{domain}/' 
> in git_url:
> +                                # Extract path after domain
> +                                path_start = git_url.find(f'{domain}/') + 
> len(f'{domain}/')
> +                                path = git_url[path_start:].split('/')
> +                                purl_path = path_handler(path)

I think using urllib can simplify this code.

> +                                if purl_path:
> +                                    dep_purl = 
> f"{purl_type}/{purl_path}@{srcrev}"
> +                                    bb.debug(1, f"SPDX: Generated 
> {purl_type} PURL: {dep_purl}")
> +                                break
> +
> +            # Fallback: use parent package version if no other version found
> +            if not dep_version:
> +                pv = d.getVar('PV')
> +                if pv and pv not in ['git', 'AUTOINC', 'INVALID', '${PV}']:

Minor: Use a set

> +                    dep_version = pv
> +                    bb.debug(1, f"SPDX: Using parent PV for {file_name}: 
> {dep_version}")
> +
> +            # Set version and PURL if extracted
> +            if dep_version:
> +                dl.software_packageVersion = dep_version
> +
> +            if dep_purl:
> +                dl.software_packageUrl = dep_purl
> +
>              if fd.method.supports_checksum(fd):
>                  # TODO Need something better than hard coding this
>                  for checksum_id in ["sha256", "sha1"]:
> --
> 2.53.0
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#232621): 
https://lists.openembedded.org/g/openembedded-core/message/232621
Mute This Topic: https://lists.openembedded.org/mt/118136154/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to