Bug#1067440: Compression makes searching packages very slow
Thanks for the quick fix, I can confirm it's much faster now: # apt 2.7.13, trixie $ time apt search librust- real0m30.185s user0m28.286s sys 0m1.729s # apt 2.7.14, trixie $ time apt search librust- real0m0.640s user0m0.490s sys 0m0.035s And sorry for the empty subject, it was my first time using bugs.debian.org. It told me to add comments by sending an email to the bug address and I didn't know whether to copy-paste the original subject. It's not like I can manually set In-Reply-To and References from my email client, so it would have been broken anyway.
Bug#1067440: Compression makes searching packages very slow
On Thu, Mar 21, 2024 at 09:25:47PM +0100, Julian Andres Klode wrote: > On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote: > > Package: apt > > Version: 2.7.12 > > > > I noticed that searching for packages is very slow if the package lists are > > compressed. To reproduce, remove `/var/lib/apt/lists`, enable > > > > Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz"; > > > > , run `apt update`. This enables LZ4 compression on my systems, but I don't > > think the exact method matters. You can then run `apt search librust`, > > which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 > > has compression already set up), compared to 0.4 seconds without > > compression. > > > > Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't > > matter too much. > > > > I tried to look into it, and `strace -e trace=openat apt-cache search > > librust` shows it reopen and re-read one of the package lists: > > > > openat(AT_FDCWD, > > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > > O_RDONLY) = 16 > > librust-addr2line+default-dev - Cross-platform symbolication library - > > feature "default" > > openat(AT_FDCWD, > > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > > O_RDONLY) = 16 > > librust-addr2line+object-dev - Cross-platform symbolication library - > > feature "object" > > openat(AT_FDCWD, > > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > > O_RDONLY) = 16 > > librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library > > - feature "rustc-demangle" > > openat(AT_FDCWD, > > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > > O_RDONLY) = 16 > > librust-addr2line+std-dev - Cross-platform symbolication library - feature > > "std" > > > > (you can use -e trace=openat,read to confirm that it's actually reading the > > file) > > > > I believe it's quadratic in the number of search results, and this is > > related to the pseudo-indexing mechanism used by APT (see > > `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress > > the file in order to seek to the destination. > > > > Unfortunately, I suspect this isn't exactly an easy fix, given the current > > design. > > > > Going to respond to this but also including responses to your followup email > which has a broken Subject: > > > Searching works by ordering the packages based on file, offset > and then iterating over them and looking them up. Seeking forward > to a higher offset does not involve a reopen, we just skip content > in betwene. > > Full-text search is inside the description in the section parsed > for each package. > > It's not clear why this fails on bookworm - I can reproduce that - > t certainly is fine in git main on my Ubuntu 24.04 system. This should be fixed in https://salsa.debian.org/apt-team/apt/-/merge_requests/336 What happened was that we got called for the same offset twice, e.g. let's say 42. Now our section size is let's say 7. So what happened is that we did: - Jump to 42 => offset = 42 - Parse => offset = 49 - Jump to 42 => ugh gotta go back Luckily we still have the currently used section in the Buffer; i.e. we only looked at bytes 49... but our buffer still starts at 42, so we can just look back to find our section again. Sadly we can't avoid the reparse of the section here, because the pkgTagSection passed to pkgTagFile::Jump() could be a different one than the last one. I have a patch to avoid the reparse by storing offset a level higher where the pkgTagSection is fixed but I'm in favour of not shipping that - those would only be marginal gains and we want to exercise this code path. -- debian developer - deb.li/jak | jak-linux.org - free software dev ubuntu core developer i speak de, en
Bug#1067440: Compression makes searching packages very slow
On Thu, Mar 21, 2024 at 06:01:12PM +0200, Laurențiu Nicola wrote: > Package: apt > Version: 2.7.12 > > I noticed that searching for packages is very slow if the package lists are > compressed. To reproduce, remove `/var/lib/apt/lists`, enable > > Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz"; > > , run `apt update`. This enables LZ4 compression on my systems, but I don't > think the exact method matters. You can then run `apt search librust`, which > takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has > compression already set up), compared to 0.4 seconds without compression. > > Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't > matter too much. > > I tried to look into it, and `strace -e trace=openat apt-cache search > librust` shows it reopen and re-read one of the package lists: > > openat(AT_FDCWD, > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > O_RDONLY) = 16 > librust-addr2line+default-dev - Cross-platform symbolication library - > feature "default" > openat(AT_FDCWD, > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > O_RDONLY) = 16 > librust-addr2line+object-dev - Cross-platform symbolication library - feature > "object" > openat(AT_FDCWD, > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > O_RDONLY) = 16 > librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - > feature "rustc-demangle" > openat(AT_FDCWD, > "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", > O_RDONLY) = 16 > librust-addr2line+std-dev - Cross-platform symbolication library - feature > "std" > > (you can use -e trace=openat,read to confirm that it's actually reading the > file) > > I believe it's quadratic in the number of search results, and this is related > to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in > apt-pkg). Each lookup call will have to decompress the file in order to seek > to the destination. > > Unfortunately, I suspect this isn't exactly an easy fix, given the current > design. > Going to respond to this but also including responses to your followup email which has a broken Subject: Searching works by ordering the packages based on file, offset and then iterating over them and looking them up. Seeking forward to a higher offset does not involve a reopen, we just skip content in betwene. Full-text search is inside the description in the section parsed for each package. It's not clear why this fails on bookworm - I can reproduce that - t certainly is fine in git main on my Ubuntu 24.04 system. -- debian developer - deb.li/jak | jak-linux.org - free software dev ubuntu core developer i speak de, en
Bug#1067440:
Correction: because of full-text search, it might actually be quadratic in the number of packages (I didn't check). And it might be possible to fix it, by going through the compressed stream just once, instead of restarting (assuming the results are returned in the file order, which seems reasonable).
Bug#1067440: Compression makes searching packages very slow
Package: apt Version: 2.7.12 I noticed that searching for packages is very slow if the package lists are compressed. To reproduce, remove `/var/lib/apt/lists`, enable Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz"; , run `apt update`. This enables LZ4 compression on my systems, but I don't think the exact method matters. You can then run `apt search librust`, which takes about 19 seconds in a Debian 12 container (docker.io/debian:12 has compression already set up), compared to 0.4 seconds without compression. Also tested on Ubuntu 22.04 and 24.04, so the exact APT version shouldn't matter too much. I tried to look into it, and `strace -e trace=openat apt-cache search librust` shows it reopen and re-read one of the package lists: openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16 librust-addr2line+default-dev - Cross-platform symbolication library - feature "default" openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16 librust-addr2line+object-dev - Cross-platform symbolication library - feature "object" openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16 librust-addr2line+rustc-demangle-dev - Cross-platform symbolication library - feature "rustc-demangle" openat(AT_FDCWD, "/var/lib/apt/lists/archive.ubuntu.com_ubuntu_dists_jammy_universe_binary-amd64_Packages.lz4", O_RDONLY) = 16 librust-addr2line+std-dev - Cross-platform symbolication library - feature "std" (you can use -e trace=openat,read to confirm that it's actually reading the file) I believe it's quadratic in the number of search results, and this is related to the pseudo-indexing mechanism used by APT (see `pkgRecords::Lookup` in apt-pkg). Each lookup call will have to decompress the file in order to seek to the destination. Unfortunately, I suspect this isn't exactly an easy fix, given the current design.