On Apr 20, 2012, at 8:12 AM, R P Herrold wrote:

> On Fri, 20 Apr 2012, Alexey Tourbin wrote:
> 
>> I have just learnt that rpm5 project has borrowed set-string
>> implementation recently from ALT Linux. At the very same time, I was
>> working on on a new and improved encoding scheme which can make
> 
> ... This is exciting news -- This seems like it would be a useful library.  
> What package in ALT are you doing this work in, or alternatiely, is there a 
> version control repository that I could check out to read your ongoing work?
> 

The methods in the existing encoding/decoding are in rpmio/set.c @rpm5.org: the 
algorithm
is unchanged from Alt.

A change to the existing scheme over the next few months doesn't bother
me at all. But "legacy compatibility" has instantly appeared as an issue
for an @rpm5.org implementation that has been "working" for less than
24 hours, and where the need is to attempt to install Alt packages
into a chroot, sounds like @rpm5.org is going to be forced to both
"old" and "new" encodings merely to continue trying to do "continuous 
integration"
with Alt packages.

But "legacy compatibility" is an insoluble problem which need not be discussed.
If there's a better encoding scheme available soon, then switching is
better done earlier than later.

(aside)
BTW, Alt is likely the "leanest-and-meanest" (i.e. smallest) distribution that
can be installed. E.g. the package set that installs rpm and all its 
prerequisites
is ~35MB, and I've also seen a 10MB set (bash and all its prerequisites iirc).
Very nicely done.

ATM, rpm-5.4.9 does only doing decoding (and comparison) of set:versions.
The need was to be able to install Alt sisyphus packages (with set:versions 
dependencies)
into a chroot. Generating set:versions (Alt uses a helper script, "multilib" 
packaging needs
to use the gelf* API) will be harder, particularly if interoperability is 
desired.

The actual existing encoding algorithm is likely pretty well described here:
        http://en.wikipedia.org/wiki/Golomb_coding
In short: Golomb-Rice compression is an approximation to Huffman encoding
that doesn't need dictionaries (at least that's the key issue I recognize).

The other part of the implementation uses base62 because of maximal bit density
in symbols.

But set:versions looks quite useful, and far more effective at reducing the 
number
of dependencies than attempting a "pin-hole" optimizations with boolean
expressions, discarding inequalities which are implied by other dependencies,
as Per Oyvind has been attempting in Mandriva.

Nicely done!

73 de Jeff


______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to