On Thu, Mar 21, 2024 at 09:25:47PM +0100, Julian Andres Klode wrote:
> On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote:
> > Package: apt
> > Version: 2.7.12
> > 
> > I noticed that searching for packages is very slow if the package lists are 
> > compressed. To reproduce, remove `/var/lib/apt/lists`, enable
> > 
> >     Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";
> > 
> > , run `apt update`. This enables LZ4 compression on my systems, but I don't 
> > think the exact method matters. You can then run `apt search librust`, 
> > which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 
> > has compression already set up), compared to 0.4 seconds without 
> > compression.
> > 
> > Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't 
> > matter too much.
> > 
> > I tried to look into it, and `strace -e trace=openat apt-cache search 
> > librust` shows it reopen and re-read one of the package lists:
> > 
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+default-dev - Cross-platform symbolication library - 
> > feature "default"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+object-dev - Cross-platform symbolication library - 
> > feature "object"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library 
> > - feature "rustc-demangle"
> > openat(AT_FDCWD, 
> > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4",
> >  O_RDONLY) = 16
> > librust-addr2line+std-dev - Cross-platform symbolication library - feature 
> > "std"
> > 
> > (you can use -e trace=openat,read to confirm that it's actually reading the 
> > file)
> > 
> > I believe it's quadratic in the number of search results, and this is 
> > related to the pseudo-indexing mechanism used by APT (see 
> > `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress 
> > the file in order to seek to the destination.
> > 
> > Unfortunately, I suspect this isn't exactly an easy fix, given the current 
> > design.
> > 
> 
> Going to respond to this but also including responses to your followup email
> which has a broken Subject:
> 
> 
> Searching works by ordering the packages based on file, offset
> and then iterating over them and looking them up. Seeking forward
> to a higher offset does not involve a reopen, we just skip content
> in betwene.
> 
> Full-text search is inside the description in the section parsed
> for each package.
> 
> It's not clear why this fails on bookworm - I can reproduce that -
> t certainly is fine in git main on my Ubuntu 24.04 system.

This should be fixed in

https://salsa.debian.org/apt-team/apt/-/merge_requests/336

What happened was that we got called for the same offset twice, e.g.
let's say 42.

Now our section size is let's say 7.

So what happened is that we did:

- Jump to 42  =>  offset = 42
- Parse       =>  offset = 49
- Jump to 42  => ugh gotta go back

Luckily we still have the currently used section in the Buffer; i.e.
we only looked at bytes 49...<something> but our buffer still starts
at 42, so we can just look back to find our section again.

Sadly we can't avoid the reparse of the section here, because the
pkgTagSection passed to pkgTagFile::Jump() could be a different one
than the last one.

I have a patch to avoid the reparse by storing offset a level higher
where the pkgTagSection is fixed but I'm in favour of not shipping that
- those would only be marginal gains and we want to exercise this code
path.
-- 
debian developer - deb.li/jak | jak-linux.org - free software dev
ubuntu core developer                              i speak de, en

Reply via email to