On Jun 23, 2012, at 1:49 PM, Alexey Tourbin <alexey.tour...@gmail.com> wrote:

> On Sat, Jun 23, 2012 at 1:55 AM, Jeffrey Johnson <n3...@me.com> wrote:
>> There are lots of usage cases for efficient sub-set computations
>> in package management, not just as a de facto API/ABI check
>> using ELF symbols. Most of the other usage cases for efficient
>> sub-set computations are not subject to an ultimately optimal
>> string encoding.
> 
> The possibility of optimal string encoding should not be
> underestimated. If you can't "write it down with a pencil", which by
> the way refers to Alan Turing's style of reasoning, it becomes very
> problematic anyway. Your intuition is probably that raw bytes are
> cheap because you can bitwise-AND them in terms of direct CPU
> instructions, and any "encoding" is expensive because you have to
> crunch bits a lot. This is not very true in practice, though. But you
> should arm yourself with valgrind(1) and spend a few days with it
> before you understand how you waste you CPU powers (and bandwidth, for
> that matter). Set-string encoding has reasonable, and affordable,
> cost. It can be all put together and presented in a rather less
> pessimistic manner. (For example, there is a cache_decode_set routine
> that boosts things by a factor of about o 5. This is simply due to the
> fact that you do not always have to decode the same Provides version
> all over again. This is part of that "fancy-schmancy" stuff which you
> must ignore on the first reading.)
> 
> So I suggest that set-strings must cover all usages of set-subset
> computations, where static data structures are appropriate (so that
> you compute it, write into RPM header, and do not expect to modify
> it). it is still "efficient", despite the fact that we try hard to
> achieve a better string representation. The fact that it can be
> packaged nicely does not make it any worse in other respects, except
> that obviously we can stuff somewhat less bits into alnum characters.
> But I also wanted it to look "nice" (this is why base62 was devised
> over much simpler base64 encoding). But so, it looks nice, and it is
> efficient. Why are you unhappy? Do you necessarily require that you
> have to do bitmask tests yourself?

I'm not "unhappy". Rather I'm pointing out that -- if your primary
goal is bandwidth reduction in *.rpm package metadata -- that there are
equivalent savings that aren't from "pretty" encodings.

The other reason is this: missing values in the {E,V,R} triple
(and attaching some reasonable DWIM semantic) lead to huge
amounts of pointless discussion. Adding all of
        twiddle-in-version
        set:versions
        dpkg-like
is all underway atm. Running digest/encoding strings through
"legacy compatible" (i.e. what most rpm tool chains will
continue to be using for years) rpmvercmp(3) is an accident
waiting to be reported again again again.

Its just as easy to attach a tag with binary data as any other
approach to "pretty" encodings, all of which involve some
aesthetic choice of permitted/excluded ASCII characters, and so
the encodings proliferate to meet all possible opinions.

Perhaps making base61 encoding MANDATORY in rpm would displease
everyone equally: choosing a prime is as "pretty" as all other
encoding criteria. 

73 de Jeff

______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to