On 07/08/2018 09:14 PM, Luke Shumaker wrote: > From: Luke Shumaker <luke...@parabola.nu> > > In a patchset that I recently submitted, Eli was concerned that I was > parsing .db files with bsdtar+awk, when the format of .db files isn't > "public"; the only guarantees made about it are that libalpm can parse it. > > https://lists.archlinux.org/pipermail/arch-projects/2018-June/004932.html > > I wasn't too concerned, because `ftpdir-cleanup` and `sourceballs` already > parse the .db files in the same way. Nonetheless, I think Eli is right: we > shouldn't be parsing these files ourselves. > > So, add a `dbquery` function that uses pyalpm to parse the .db files:
What's wrong with expac? expac --config ${dbscripts_root}/pacman-community.conf -S '%f' expac is not only super elegant, there's pending patches to provide it in pacman 6 as part of the core project. This is what I'm waiting for, actually. I see no reason to add an external dependency on both python and pyalpm, in order to run a small python program which evals its arguments in order to inject database queries, when a tool with a simple API can do the same and will eventually be guaranteed to be everywhere pacman itself is. (Let's ignore for a moment, the defunct integrity checks service which is written in python, but not pyalpm. pyalpm is not currently installed on the dbscripts server ATM.) > - It takes as arguments Python 3 expressions; > 1. one that that returns a bool deciding whether we want to print > information on a package, and > 2. another that returns the string to print for a package. > > Currently, all callers use "True" for the decider expression, as > ftpdir-cleanup and sourceballs operate on *every* package. However, I'm > including a way to filter packages because, I'm coming at this from the > context that I want to parse .db files in other places too. > > - libalpm doesn't offer an easy way to say "parse this DB file for me"; > instead, we must construct a configuration that has a syncdb pointing to > that file, which we then have it sync in to a temporary directory. > > As a final note, when re-writing the bit of sourceballs to use dbquery > instead of AWK, I realized that it does not correctly handle licenses that > have a space in them (as of 2018-07-07 there are 67 packages in the Arch > repos that have license containing a space). I did not fix this bug; I > merely translated it from AWK to Python, as the program would also need to > be adjusted elsewhere. Keeping in mind the ones we're looking for are a whitelist of strictly-defined license types... I think those are all ad-hoc custom licenses, none of which we're interested in in the primary sourceballs deployment. -- Eli Schwartz Bug Wrangler and Trusted User
signature.asc
Description: OpenPGP digital signature