On Wed, Mar 4, 2026 at 10:05 AM Stefano Tondo <[email protected]> wrote:
>
> Extract version information for Git-based source components in SPDX 3.0
> SBOMs to improve SBOM completeness and enable better supply chain tracking.
>
> Problem:
> Git repositories fetched as SRC_URI entries currently appear in SBOMs
> without version information (software_packageVersion is null). This makes
> it difficult to track which specific revision of a dependency was used,
> reducing SBOM usefulness for security and compliance tracking.
>
> Solution:
> - Extract SRCREV for Git sources and use it as packageVersion
> - Use fd.revision attribute (the resolved Git commit)
> - Fallback to SRCREV variable if fd.revision not available
> - Use first 12 characters as version (standard Git short hash)
> - Generate pkg:github PURLs for GitHub repositories (official PURL type)
> - Add comprehensive debug logging for troubleshooting
>
> Impact:
> - Git source components now have version information
> - GitHub repositories get proper PURLs (pkg:github/owner/repo@commit)
> - Enables tracking specific commit dependencies in SBOMs
>
> Signed-off-by: Stefano Tondo <[email protected]>
> ---
> meta/lib/oe/spdx30_tasks.py | 80 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 80 insertions(+)
>
> diff --git a/meta/lib/oe/spdx30_tasks.py b/meta/lib/oe/spdx30_tasks.py
> index 11945a622d..78d1dfd250 100644
> --- a/meta/lib/oe/spdx30_tasks.py
> +++ b/meta/lib/oe/spdx30_tasks.py
> @@ -569,6 +569,86 @@ def add_download_files(d, objset):
> )
> )
>
> + # Extract version and PURL for source packages
> + dep_version = None
> + dep_purl = None
> +
> + # For Git repositories, extract version from SRCREV
> + if fd.type == "git":
> + srcrev = None
> +
> + # Try to get SRCREV for this specific source URL
> + # Note: fd.revision (not fd.revisions) contains the resolved
> revision
> + if hasattr(fd, 'revision') and fd.revision:
> + srcrev = fd.revision
> + bb.debug(1, f"SPDX: Found fd.revision for {file_name}:
> {srcrev}")
> +
> + # Note: We intentionally do NOT fall back to
> d.getVar('SRCREV')
> + # because referencing SRCREV in BBIMPORTS-registered module
> code
> + # causes bitbake's signature generator to trace the SRCREV ->
> + # AUTOREV dependency chain during recipe finalization,
> triggering
> + # "AUTOREV/SRCPV set too late" errors for non-git temp
> recipes
> + # used by recipetool/devtool with HTTP sources.
> + # fd.revision is always available for git sources after
> fetch.
I'm fine with using fd.revision if it's correct.... but is this a bug
in devtool and recipetool?
> + if srcrev and srcrev not in ['${AUTOREV}', 'AUTOINC',
> 'INVALID']:
Minor: A set would be more efficient:
srcrev not in {"${AUTOREV}", "AUTOINC", "INVALID}:
> + # Use first 12 characters of Git commit as version
> (standard Git short hash)
> + dep_version = srcrev[:12] if len(srcrev) >= 12 else
> srcrev
Is it always 12, or is it "12 or however many are required to be
disambiguous" (which would require asking git)? I'd prefer to use the
full SHA-1 to prevent that.
> + bb.debug(1, f"SPDX: Extracted Git version for
> {file_name}: {dep_version}")
> +
> + # Generate PURL for Git hosting services
> + # Reference:
> https://github.com/package-url/purl-spec/blob/master/PURL-TYPES.rst
> + download_location = oe.spdx_common.fetch_data_to_uri(fd,
> fd.name)
> + if download_location and
> download_location.startswith('git+'):
> + git_url = download_location[4:] # Remove 'git+'
> prefix
> +
> + # Build Git PURL handlers from default + custom
> mappings
> + # Format: 'domain': ('purl_type', lambda to extract
> path)
> + # Can be extended in meta-siemens or other layers
> via SPDX_GIT_PURL_MAPPINGS
> + git_purl_handlers = {
> + 'github.com': ('pkg:github', lambda parts:
> f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2 else None),
You lambda is always the same for all of the entries in this hash
table; given that I don't see why we need it in the table.
> + # Note: pkg:gitlab is NOT in official PURL spec,
> so we omit it by default
> + # Other Git hosts can be added via
> SPDX_GIT_PURL_MAPPINGS
> + }
> +
> + # Allow layers to extend PURL mappings via
> SPDX_GIT_PURL_MAPPINGS variable
> + # Format: "domain1:purl_type1 domain2:purl_type2"
> + # Example: SPDX_GIT_PURL_MAPPINGS =
> "gitlab.com:pkg:gitlab git.example.com:pkg:generic"
> + custom_mappings = d.getVar('SPDX_GIT_PURL_MAPPINGS')
> + if custom_mappings:
> + for mapping in custom_mappings.split():
> + try:
> + domain, purl_type = mapping.split(':')
This would fail with your example of "gitlab.com:pkg:gitlab" because
it would split into 3 parts and you are only capturing 2. You probably
want `mappings.split(":", 1), and some tests
> + # Use simple path handler for custom
> domains
> + git_purl_handlers[domain] = (purl_type,
> lambda parts: f"{parts[0]}/{parts[1].replace('.git', '')}" if len(parts) >= 2
> else None)
> + bb.debug(2, f"SPDX: Added custom Git
> PURL mapping: {domain} -> {purl_type}")
> + except ValueError:
> + bb.warn(f"SPDX: Invalid
> SPDX_GIT_PURL_MAPPINGS entry: {mapping} (expected format: domain:purl_type)")
> +
> + for domain, (purl_type, path_handler) in
> git_purl_handlers.items():
> + if f'://{domain}/' in git_url or f'//{domain}/'
> in git_url:
> + # Extract path after domain
> + path_start = git_url.find(f'{domain}/') +
> len(f'{domain}/')
> + path = git_url[path_start:].split('/')
> + purl_path = path_handler(path)
I think using urllib can simplify this code.
> + if purl_path:
> + dep_purl =
> f"{purl_type}/{purl_path}@{srcrev}"
> + bb.debug(1, f"SPDX: Generated
> {purl_type} PURL: {dep_purl}")
> + break
> +
> + # Fallback: use parent package version if no other version found
> + if not dep_version:
> + pv = d.getVar('PV')
> + if pv and pv not in ['git', 'AUTOINC', 'INVALID', '${PV}']:
Minor: Use a set
> + dep_version = pv
> + bb.debug(1, f"SPDX: Using parent PV for {file_name}:
> {dep_version}")
> +
> + # Set version and PURL if extracted
> + if dep_version:
> + dl.software_packageVersion = dep_version
> +
> + if dep_purl:
> + dl.software_packageUrl = dep_purl
> +
> if fd.method.supports_checksum(fd):
> # TODO Need something better than hard coding this
> for checksum_id in ["sha256", "sha1"]:
> --
> 2.53.0
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#232621):
https://lists.openembedded.org/g/openembedded-core/message/232621
Mute This Topic: https://lists.openembedded.org/mt/118136154/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-