I think what you are trying to get at here is that the same file is
present in multiple different "packages", so all references are
equally interchangeable?

I'm fine if we can figure out a reasonable way to do that, but I don't
think this is the correct approach. A better option would be to simple
reference the SPDX ID of the previously described file instead of
making a new one each time. I don't really like "magic" in the
jsonld_hash_path() which really hides what we are actually after (only
creating a single file element and referencing it multiple times).

This would also conveniently solve the license problem since only one
file element would be created per hash.

However, I think the reason it's done in the manner it is, is because
each instance of the file is in a different path, so you'd lose that
information by combining them all into the same file element;
although, you might still be able to deduplicate the license
information

On Sat, Nov 9, 2024 at 8:07 PM Hongxu Jia <[email protected]> wrote:
>
> In order to support all in-scope SPDX data within a single
> JSON-LD file for SPDX 3.0.1, Yocto's SBOM:
> - In native/target/nativesdk recipe, created spdxid-hash symlink
>   for each element to point to the JSON-LD file that contains
>   element details;
> - In image recipe, use spdxid-hash symlink to collect element
>   details from varies of JSON-LD files
>
> While SPDX_INCLUDE_SOURCES = "1", it adds sources to JSON-LD file
> and create 2N+ spdxid-hash symlinks for N source files.
> (N for software_File, N for hasDeclaredLicense's Relationship)
>
> For large numbers of source files, adding an extra symlink -> real file
> will occupy one more inode (per file), which will need a slot in
> the OS's inode cache. In this situation, disk performance is slow
> and inode is used up quickly
>
> While using function add_package_files to add source files to JSON-LD file,
> the spdxid-hash symlinks for source files point to the same JSON-LD file,
> then according to the format of spdxId
>
> - spdxId of souce file:
> http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/sourcefile/1
>
> Remove the count number ('/1') from spdxId suffix, then all
> source files in one recipe will share one spdxid-hash symlink.
>
> The same reason to sysroot and package files
>
> - spdxId of sysroot file:
> http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/sysroot/1
>
> - spdxId of pacakge file:
> http://spdx.org/spdxdocs/shadow-10e66933-65cf-5a2d-9a1d-99b12a405441/0838759b8d71923d250a0813dda7356ffd309576115bbf8ed7e266cf4aed86a5/package/shadow-src/file/1
>
> Build core-image-minimal with/without this commit, comparing the spdxid-hash 
> number, 7 281 824 -> 70 508
>
> echo 'SPDX_INCLUDE_SOURCES = "1"' >> local.conf
>
> With this commit:
> $ time bitbake core-image-minimal
> real    95m6.960s
> user    0m22.832s
> sys     0m4.087s
>
> $ find tmp/deploy/spdx/3.0.1/*/by-spdxid-hash/ -name "*.spdx.json" |wc -l
> 70508
>
> Without this commit:
> $ time bitbake core-image-minimal
> real    100m17.769s
> user    0m24.516s
> sys     0m4.334s
>
> $ find tmp/deploy/spdx/3.0.1/*/by-spdxid-hash -name "*.json" |wc -l
> 7281824
>
> Signed-off-by: Hongxu Jia <[email protected]>
> ---
>  meta/lib/oe/sbom30.py | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/meta/lib/oe/sbom30.py b/meta/lib/oe/sbom30.py
> index e3a9428668..4efeaae3a0 100644
> --- a/meta/lib/oe/sbom30.py
> +++ b/meta/lib/oe/sbom30.py
> @@ -911,6 +911,10 @@ def jsonld_arch_path(d, arch, subdir, name, 
> deploydir=None):
>
>
>  def jsonld_hash_path(_id):
> +    # For the spdId added by add_package_files, remove suffix count number
> +    if re.match(r".*/(sourcefile|sysroot|file)/\w+$", _id):
> +        _id = os.path.dirname(_id)
> +
>      h = hashlib.sha256(_id.encode("utf-8")).hexdigest()
>
>      return Path("by-spdxid-hash") / h[:2], h
> @@ -992,6 +996,11 @@ def write_recipe_jsonld_doc(
>              *hash_path,
>              deploydir=deploydir,
>          )
> +
> +        # Return if expected symlink exists
> +        if link_name.is_symlink() and link_name.resolve() == dest:
> +            return hash_path[-1]
> +
>          try:
>              link_name.parent.mkdir(exist_ok=True, parents=True)
>              link_name.symlink_to(os.path.relpath(dest, link_name.parent))
> --
> 2.25.1
>
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#207374): 
https://lists.openembedded.org/g/openembedded-core/message/207374
Mute This Topic: https://lists.openembedded.org/mt/109492273/21656
Group Owner: [email protected]
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to