hi folks, sorry for breaking the thread, but i'm not subscribed and i wasn't cc'd, and my current MUA sucks.
> from ian: > Well, I don't know if I still count as one of the `dpkg team' but I > think this is a terrible idea for lots of reasons. dpkg needs to be > very reliable; its databases must not get corrupted even under > situations of stress. It is also very useful that the databases are > tractable with normal tools. dpkg is very close to the bottom of the > application stack; making it depend on a big and complex library like > a SQL engine is a bad idea. a couple points i'd like to clarify: - i'm not advocating sqlite necessarily, but some db library. and not even some db library as much as a smarter way of storing information to begin with, though some db's do provide advantages (query syntaxes, etc) - i'm also suggesting that any such db could be used as a cache, so that if there are any problems it could be regenerated from the flat files. that's not to say there still aren't problems with the suggestion (and that there aren't possibly workarounds to such problems), but i think it takes a bit of the bite out of the bark. there are two main points to the motivation for doing something like this in my eyes. first, you have the speed/efficiency factor, as already mentioned. i strongly suspect there's an order of magnitude or two in speed reductions, which i would argue is worth some consideration. secondly, you have the code complexity (wrt dpkg source code anyway). my time spent looking in dpkg is relatively short compared to most folks here i'm sure, but i can't help but notice how much of dpkg's internals are dedicated to representing, constructing, and manipulating datastructures to make up for the lack of a better storage/query format. something like sqlite would give the benefit of not only a smarter way of storing the data but also it would lower the complexity of the datastructures needed to work with the data, since much of the overhead could be shifted into cleverly crafted queries. sure, it means you're basically outsourcing a chunk of work to a 3rd party library who you know have to trust not to totally omg break things. that also means that the code complexity in the grand scheme of things isn't actually less (more, even), but it's where it ought to be. and really, an absolutist position on this line would mean not statically linking against zlib/bz2. > In practice I think the problem isn't that the *.list files are too > inefficient an on-disk representation. I have a number of machines > with small CPUs and little memory and they don't have a difficulty > there. i don't have the numbers to back it up, but i do think it is a rather inefficient format. fwiw, i've tried compiling dpkg with profiling info enabled (-pg), but this causes it to exit with SIGPROF while scanning the list files (profiling timer times out)... does anyone have a better way to profile various dpkg runs better than /usr/bin/time? also, wrt the status/available parsing... the same arguments outlined above would apply here... a step better than parsing the files every time they were needed would be to read the data pre-parsed from a db/cache. sean
signature.asc
Description: This is a digitally signed message part