On Mon, Apr 11, 2022 at 7:14 PM Joshua Kinard <ku...@gentoo.org> wrote:
>
> On 4/5/2022 17:49, Jason A. Donenfeld wrote:
> > Hi Matt,
> >
> > On Tue, Apr 5, 2022 at 10:38 PM Matt Turner <matts...@gentoo.org> wrote:
> >>
> >> On Tue, Apr 5, 2022 at 12:30 PM Jason A. Donenfeld <zx...@gentoo.org> 
> >> wrote:
> >>> By the way, we're not currently _checking_ two hash functions during
> >>> src_prepare(), are we?
> >>
> >> I don't know, but the hash-checking is definitely checked before 
> >> src_prepare().
> >
> > Er, during the builtin fetch phase. Anyway, you know what I meant. :)
> >
> > Anyway, looking at the portage source code, to answer my own question,
> > it looks like the file is actually being read twice and both hashes
> > computed. I would have at least expected an optimization like:
> >
> > hash1_init(&hash1);
> > hash2_init(&hash2);
> > for chunks in file:
> >     hash1_update(&hash1, chunk);
> >     hash2_update(&hash2, chunk);
> > hash1_final(&hash1, out1);
> > hash2_final(&hash2, out2);
> >
> > But actually what's happening is the even less efficient:
> >
> > hash1_init(&hash1);
> > for chunks in file:
> >     hash1_update(&hash1, chunk);
> > hash1_final(&hash1, out1);
> > hash2_init(&hash2);
> > for chunks in file:
> >     hash2_update(&hash2, chunk);
> > hash1_final(&hash2, out2);
> >
> > So the file winds up being open and read twice. For huge tarballs like
> > chromium or libreoffice...
> >
> > But either way you do it - the missed optimization above or the
> > unoptimized reality below - there's still twice as much work being
> > done. This is all unless I've misread the source code, which is
> > possible, so if somebody knows this code well and I'm wrong here,
> > please do speak up.
>
> Not to go off-topic, but where in Portage's source is this logic at?  It
> seems like an easy fix for a slightly more efficient Portage.

I believe it's the portage.checksum.verify_all() function.

https://gitweb.gentoo.org/proj/portage.git/tree/lib/portage/checksum.py?h=portage-3.0.30#n471

Reply via email to