Eddie Chapman wrote:
> Michał Górny wrote:
>
>> On Sat, 2024-03-30 at 14:57 +0000, Eddie Chapman wrote:
>>
>>
>>> Note, I'm not advocating ripping xz-utils out of tree, all I'm saying
>>>  is wouldn't it be nice if there were at least 2 alternatives to
>>> choose from? That doesn't have to be disruptive in any way, people who
>>> wish to continue using and trusting xz-utils should be able to
>>> continue to do so without any friction whatsoever.
>>
>> So, you're basically saying we should go out of our way, recompress all
>>  distfiles using two alternative compression formats, increase mirror
>> load four times and add a lot of complexity to ebuilds, right?
>>
>> --
>> Best regards,
>> Michał Górny
>>
> Yes that's a very good point, that was something I was wondering in
> weighing up both sides, what the costs would be practically, as I don't
> know the realities of running Gentoo infrastructure. And maybe the costs
> is just too high of a price to pay.
>
> I wonder if increased use of git repos rather than distributed tarballs
> could be part of a solution to those issues, although that could put quite
>  a storage burden on every user. Unless they were all shallow git pulls
> and the user could optionally choose to tar up the git directory after
> clone with compression.  But yes granted then there is even more ebuild
> complexity.
>

I've been thinking a little about how Gentoo without
compression/decompression of distfiles could work, as a feature, without
any impact on the existing world order, and no increased stress on Gentoo
infra. I was wondering how palatable the following idea might be to others
...

The basis of the idea is to add a feature to Portage which would let a
person optionally indicate in make.conf that whenever a path in SRC_URI
resolves to a file with a compression extension (.gz, .bz2, .xz, etc),
that Portage should attempt to fetch it without the compression extension.

So as an example, lets take sys-apps/pciutils, which currently has:
SRC_URI="https://mj.ucw.cz/download/linux/pci/${P}.tar.gz";

the feature would tell portage to simply translate this to:
SRC_URI="https://mj.ucw.cz/download/linux/pci/${P}.tar";

So perhaps it could be a flag that goes in FEATURES= called something like
"strip_dist_comp" or something similar, or maybe someone has a better idea
about that.

Now, of course, I'm not proposing that Gentoo infra keeps uncompressed
versions of distfiles. So by default Portage would encounter a 404 error
when it tries to fetch the uncompressed file from Gentoo mirrors.

However, this feature would then pave the way for a person to then
configure Portage to fetch distfiles from their own server as well as
Gentoo mirrors, and that person could then keep their own uncompressed
versions of distfiles on their  server, for however many and whichever
distfiles they might wish to keep there, as the compressed version would
get fetched from a Gentoo mirror if the uncompressed version is not there.
Such a person would then have to obtain or create their own uncompressed
distfile independently.

A caveat of this solution would be that one would have to disable checksum
verification (and gpg checks?) for this to work, as of course there would
be no checksum for the uncompressed version in the Manifest, and Gentoo
infra certainly should not be expected to especially uncompress each
distfile once in order to generate an extra checksum for the Manifest. In
fact I'd consider than undesirable, as anyone paranoid enough to want to
do this would not trust such a checksum anyway, since it would be a
checksum of a file that has been compressed at source and then
decompressed on Gentoo infra, potentially introducing vulnerabilities.
However, the lack of checksum is not a problem for someone who wants to
keep distfiles on their own server, as such a person can also be
responsible themselves for first verifying whatever they put on there, and
for keeping said server secured from tampering.

This seems to me to be something that would probably be relatively
straightforward to implement within Portage, maybe with just a few lines
around the python code that fetches the SRC_URI, and zero extra work or
resources required from Gentoo infra.

I'd consider it a feature for anyone who wants to eliminate a whole
potential class of vulnerabilities that may or may not be present either
now or in future in compression algorithm tools. Surely that would be a
nice feature to have for some folk?


Reply via email to