Hi,

On 2/4/26 12:10 AM, Dmitry E. Oboukhov wrote:

One simple approach would be to package vendored dependencies as
separate .orig archives, ideally (if they come from git submodules)
with the git-archive commit ID annotation inside the archive. The
security tracker could import these annotations from all known
archives, map them to their origin projects, and then check if the
packaged commit ID is a descendant of a commit that introduces a
particular fix.

How would this approach work for, say, Python packages listed in
requirements.txt? Would we download them and package them as
separate .orig archives?

Yes, because whatever we do, we need all dependencies available after unpacking the source package and installing all listed dependencies, so either we merge all the archives together to one big blob, or we keep them separate and attach them all to the same dsc.

In the specific case of Python, where there is a strong culture of building somewhat stable APIs, I expect that vendoring dependencies will be the exception rather than the norm anyway.

The most problematic ecosystems, in my opinion, are cargo, npm and golang, but we mostly lack insight here -- we have no statistics on how much duplication exists inside the archive, how many of these could be avoided by folding them together, and how many binary packages we could save in the other direction by vendoring libraries that only have a single user and are statically linked anyway. That's why my proposal goes towards making these transparent and trackable first.

The rule against vendoring dependencies was informed by the effort it took to clean up the hundreds of embedded copies of zlib and make all packages use the common implementation. If we are to relax the rules again, we need to make sure that we can at least find any embedded copies and quickly determine their version and security status.

This will likely require an external archive scanner anyway, but we do have a few of those anyway, like scanning for undeclared file conflicts.

Thinking in lintian unpack levels, the less a package needs to be processed by this scanner, the better -- so an "XS-Ecosystem" tag would find its way into the Sources file, and a python specific scanner could disregard any packages without a requirements.txt file before downloading them.

Likewise, the tags I proposed for scanning git ancestry (i.e. ecosystem-agnostic) would also work either from the Sources file if we forward these tags there, or we'd generate a generic "Has-Git-Info" tag and make the scanner download the dsc files, that is still somewhat lightweight.

IMO, we should also include API/ABI stability as a factor into whether we actually want to release a package as part of a stable distribution.

We have several upstreams who are bluntly telling us that they are unwilling to support users running Debian stable, so at this point the decision to release such a package as part of a stable release is a commitment by the Debian maintainer to provide this user support.

For a package with a lot of version-pinned dependencies, that commitment can be massive, and should not be taken on lightly; on the other hand, for a lot of these packages vendoring is the only approach that makes this feasible in the first place.

   Simon

Reply via email to