Re: [OE-core] [bitbake-devel] 'vendor' fetching discussion cont.

Stefan Herbrechtsmeier via lists.openembedded.org Fri, 14 Feb 2025 07:38:34 -0800

Am 13.02.2025 um 21:32 schrieb Richard Purdie:

On Thu, 2025-02-13 at 17:33 +0100, Stefan Herbrechtsmeier wrote:

  Am 13.02.2025 um 11:43 schrieb Richard Purdie via lists.openembedded.org:

Most of the concerns I've seen are about how easy it is to understand
what is going on behind the scenes. The move of code to OE and
splitting everything into multiple tasks/stages does do that to some
extent but it does it in a way which I think is going to create a new
and different set of problems.

  Okay, but please keep in mind that some of my oe patches are
reasonable independent of the native bitbake fetcher and it is
possible to integrate the steps from the early class into the fetch
task.

I appreciate that and I appreciate the desire to push things into OE as
it appears easier. It can lead to much looser APIs and less structured
code and I'm wary of it here as we create a two layered system which I
think will be harder to understand (and hence harder to debug and use).

I'm therefore wondering if there is a different way. The changes
I'm
wondering about would be to:

a) embrace the single SRC_URI entry

b) require a checksum of the internal "URL list" that is included
in SRC_URI, much in the same way that we have checksums of
tarballs.

  The list isn't fix because it depends on the configured package
manager proxy or registry. We have to remove this feature. But the
user could use a PREMIRROR to redirect the upstream proxy to its
private proxy.

If that is true we have a huge problem.

By list I mean a list of something like (component, version) pairs
where component uniquely identifies the component and version is a
specific verifiable version of that component. If we can create that
list, we can checksum it and use it as above. If we can't create that
list, we have no idea what is in our builds and we may as well give up
as it isn't reproducible.

What does reproducible mean? The lock file ensure that you always usethe same dependencies.


go.sum:
cloud.google.com/go v0.110.0 h1:Zc8gqp3+a9/Eyph2KDmcGaPtbKRIoqq4YTlL4NMD0Ys=

Cargo.lock:
[[package]]
name = "addr2line"
version = "0.24.2"
source = "registry+https://github.com/rust-lang/crates.io-index";

checksum ="dfbe277e56a376000877090da837660b4427aad530e3028d44e0bffe4f89a1c1"

dependencies = [
 "gimli",
]

package-lock.json:
    "node_modules/@adobe/css-tools": {
      "version": "4.3.3",

"resolved":"https://registry.npmjs.org/@adobe/css-tools/-/css-tools-4.3.3.tgz";, "integrity":"sha512-rE0Pygv0sEZ4vBWHlAgJLGDU7Pm8xoO6p3wsEceb7GYAjScrOHpEo8KK/eVkAcnSM+slAEtXjA2JpdjLp4fJQQ==",

      "dev": true
    },

The URL could be extracted from the lock file or could be generatedbased on the name and version or git revision. The integrity of thedependencies is ensured via a checksum or git revision. This means thesame lock file always generate the same downloads and is fullyreproducible. We only need to ensure the integrity of the lock file.This is already ensure by the existing fetchers.

At the moment the download URL could be manipulated by an variablebecause it is common to change the registry / proxy / server. The"https://registry.npmjs.org"; or"registry+https://github.com/rust-lang/crates.io-index"; are placeholdersand could be replaced with a local registry / proxy / server. Additionalthe downloads are unpacked into a subfolder of the project. Thereforethe SRC_URI parameters depends on S.

I already expand the SRC_URIs with a name and version (pn and pv) toenrich the SBOM with the name and version of the dependencies. Becausethe name parameter is already in use and the recipe could use adependency in different versions we need to append the version to thename. Either we remove the version from the name to receive the realname, use an other parameter for the name or use the name and version asvariable flag instead of the name alone.


It is already possible to create a list of the dependencies.

For better or worse, we have low trust in the underlying tools to
get this right (they are getting better).

We don't need to trust the tools. We parse the lock file and enrich

it with fix values. The resolve is deterministic. The output only
depends on the resolve function, variable values and lock file
content.

This assumes the "resolve" always does the same thing. I'm afraid
experience shows these can have issues. I'd much rather we have some
kind of backup in the system which tells whether we did get the same
resolution which is what this checksum represents.

This sounds like a problem with the test coverage. But this should besolved by a high test coverage for the resolve function.

c) if the checksum doesn't match, we know something went wrong and
error

Can you please elaborate this point. We already check the integrity
of the lock file and we have deterministically resolve the SRC_URIs.

See above. I'd like to know that the list of components and versions we
resolve everything to matches what we expect it to look like.

We can extract the dependency name and version from the SRC_URIs to hashit but in case of npm the destination folder could also influence the build.

d) require the new modules to write the URL list into a known
location as part of unpack

  Why is this needed? The generated SRC_URIs could be resolved via
fetcher.expanded_urldata().

If someone is trying to debug what the code did or resolved things too,
suggesting they run python functions to work it out will be a poor user
experience. If on the other hand they know the result is always stored
in WORKDIR/xyz/ABC, the know where and what to look at.

The user experience of using this code will make or break it's
adoption.

Can we instead extend the lock file in the download folder which theSRC_URIs? This allows use to use the file as cache and bypass the resolve.

e) add the ability to add custom hooks in the fetch process to
handle
the cases of needing to alter the flow for patching the components
list

  I'm afraid this will be complicated since PATCH is applied in S and
not in UNPACKDIR.

Then we should work out how to handle that. We could allow the recipes
to specify the top level dir to apply patches from for example?

Is it okay to use variables like S inside the fetcher or should we passeverything via SRC_URI parameter to the fetcher?

The fetcher only knows its SRC_URI. This means we have to add thepatches to the SRC_URI parameters or pass it via additional functionparameter to the download function.

f) create new tools that allow the fetcher to be stepped through
and
for example partially run, or run with clear debug output showing
what
was happening at each stage (show the list of components?). This
may be
standalone tools, maybe a devtool module, I don't know. We may want
to
make the fetch/unpack logs more useful in general as right now you
don't get much useful data about what it is doing.


If we do those things, where does that get us? How much buy in do
our
different stakeholders have?

FWIW I am leaning towards having this code in the bitbake fetcher
as a
first class citizen as to do otherwise is going to create layers of
abstraction and we probably have enough of those already.

The advantage is that the SRC_URI still contains the dependencies if

you expand the urldata. On the other side the integration of the
patches in the fetcher sounds complicated.

Patches would stay where they are in the system in do_patch and use the
code in OE-Core. I'm just thinking we could add some hooks in the fetch
process to allow adjustment of things like the resolved component list.
It doesn't have to be a patch, it could be a function passed data.

Please take a look at the following patch to fix a security issue inlibrsvg:

https://gitlab.gnome.org/GNOME/librsvg/-/commit/aaaa6b68b024b2adbfdf5f8493dfce1f60e5e331

How should the integration into the recipe should look like and how longdoes it take to integrate the changes into the recipe? In case of my OEseries you could simply apply the patch and mark it as early:

SRC_URI +="file://0001-update-url-crate-to-get-an-updated-idna-rustsec-2024.patch;early=1"

We need support for plain patches. Otherwise it is impossible to backport patches to fix a security issue or to reuse common tools.

The advantage of the OE based implementation is the possibility to patchthe lock file. The bitbake based implementation only support a completelocal lock file to manipulate the dependencies.

Any OE specific solution (.inc or hook) is useless because of thecomplexity of the dependency update. The lock file is a temporary flatview onto a dependency tree with inter dependencies, replacements,constraints, compatibilities and other meta data. A change without thisinformation could lead to anything. It's like the list of gitrepositories of recursive git submodules. It isn't useful to change therevision of an arbitrary git repository in the list because the changecould recursively influence other git submodules.

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#211393): 
https://lists.openembedded.org/g/openembedded-core/message/211393
Mute This Topic: https://lists.openembedded.org/mt/111165666/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [OE-core] [bitbake-devel] 'vendor' fetching discussion cont.

Reply via email to