On Fri, 21 Jan 2022 10:03:43 +0100 Ludovic Courtès <l...@gnu.org> wrote:
> Hello Guix! > > Lately I found myself going several times to > <https://packages.debian.org> to look for packages providing a given > file and I thought it’s time to do something about it. > > The script below creates an SQLite database for the current set of > packages, but only for those already in the store: > > guix repl file-database.scm populate > > That creates /tmp/db; it took about 25mn on berlin, for 18K packages. > Then you can run, say: > > guix repl file-database.scm search boot-9.scm > > to find which packages provide a file named ‘boot-9.scm’. That part > is instantaneous. > > The database for 18K packages is quite big: > > --8<---------------cut here---------------start------------->8--- > $ du -h /tmp/db* > 389M /tmp/db > 82M /tmp/db.gz > 61M /tmp/db.zst > --8<---------------cut here---------------end--------------->8--- > > How do we expose that information? There are several criteria I can > think of: accuracy, freshness, privacy, responsiveness, off-line > operation. > > I think accuracy (making sure you get results that correspond > precisely to, say, your current channel revisions and your current > system) is not a high priority: some result is better than no result. > Likewise for freshness: results for an older version of a given > package may still be valid now. > > In terms of privacy, I think it’s better if we can avoid making one > request per file searched for. Off-line operation would be sweet, and > it comes with responsiveness; fast off-line search is necessary for > things like ‘command-not-found’ (where the shell tells you what > package to install when a command is not found). > > Based on that, it is tempting to just distribute a full database from > ci.guix, say, that the client command would regularly fetch. The > downside is that that’s quite a lot of data to download; if you use > the file search command infrequently, you might find yourself > spending more time downloading the database than actually searching > it. > > We could have a hybrid solution: distribute a database that contains > only files in /bin and /sbin (it should be much smaller), and for > everything else, resort to a web service (the Data Service could be > extended to include file lists). That way, we’d have fast > privacy-respecting search for command names, and on-line search for > everything else. > > Thoughts? > > Ludo’. > One use case that I hope can be addressed is TeXlive packages. Trying to figure out which package corresponded to which missing file was a nightmare the last I had to use LaTeX.