On Apr 20, 2012, at 8:12 AM, R P Herrold wrote: > On Fri, 20 Apr 2012, Alexey Tourbin wrote: > >> I have just learnt that rpm5 project has borrowed set-string >> implementation recently from ALT Linux. At the very same time, I was >> working on on a new and improved encoding scheme which can make > > ... This is exciting news -- This seems like it would be a useful library. > What package in ALT are you doing this work in, or alternatiely, is there a > version control repository that I could check out to read your ongoing work? >
The methods in the existing encoding/decoding are in rpmio/set.c @rpm5.org: the algorithm is unchanged from Alt. A change to the existing scheme over the next few months doesn't bother me at all. But "legacy compatibility" has instantly appeared as an issue for an @rpm5.org implementation that has been "working" for less than 24 hours, and where the need is to attempt to install Alt packages into a chroot, sounds like @rpm5.org is going to be forced to both "old" and "new" encodings merely to continue trying to do "continuous integration" with Alt packages. But "legacy compatibility" is an insoluble problem which need not be discussed. If there's a better encoding scheme available soon, then switching is better done earlier than later. (aside) BTW, Alt is likely the "leanest-and-meanest" (i.e. smallest) distribution that can be installed. E.g. the package set that installs rpm and all its prerequisites is ~35MB, and I've also seen a 10MB set (bash and all its prerequisites iirc). Very nicely done. ATM, rpm-5.4.9 does only doing decoding (and comparison) of set:versions. The need was to be able to install Alt sisyphus packages (with set:versions dependencies) into a chroot. Generating set:versions (Alt uses a helper script, "multilib" packaging needs to use the gelf* API) will be harder, particularly if interoperability is desired. The actual existing encoding algorithm is likely pretty well described here: http://en.wikipedia.org/wiki/Golomb_coding In short: Golomb-Rice compression is an approximation to Huffman encoding that doesn't need dictionaries (at least that's the key issue I recognize). The other part of the implementation uses base62 because of maximal bit density in symbols. But set:versions looks quite useful, and far more effective at reducing the number of dependencies than attempting a "pin-hole" optimizations with boolean expressions, discarding inequalities which are implied by other dependencies, as Per Oyvind has been attempting in Mandriva. Nicely done! 73 de Jeff ______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org