On Sat, Jun 23, 2012 at 10:10 PM, Jeffrey Johnson <n3...@me.com> wrote:
>> So I suggest that set-strings must cover all usages of set-subset
>> computations, where static data structures are appropriate (so that
>> you compute it, write into RPM header, and do not expect to modify
>> it). it is still "efficient", despite the fact that we try hard to
>> achieve a better string representation. The fact that it can be
>> packaged nicely does not make it any worse in other respects, except
>> that obviously we can stuff somewhat less bits into alnum characters.
>> But I also wanted it to look "nice" (this is why base62 was devised
>> over much simpler base64 encoding). But so, it looks nice, and it is
>> efficient. Why are you unhappy? Do you necessarily require that you
>> have to do bitmask tests yourself?
>
> I'm not "unhappy". Rather I'm pointing out that -- if your primary
> goal is bandwidth reduction in *.rpm package metadata -- that there are
> equivalent savings that aren't from "pretty" encodings.

Okay, the primary goal was not so much to minimize bandwidth, but just
to make things "nice" overall. But again, realize that set-versions
are O(N) in size, where N is the number of symbols. For example, here
is the Provides of perl-base package in ALT Linux:

$ rpm -q --provides perl-base
perl = 1:5.14.2
perl-version = 0.88
perl-PerlIO = 1:5.14.2
perl-Storable = 1:5.14.2
perl-Digest-MD5
perl-Time-HiRes
perl-MIME-Base64
libperl-5.14.so()(64bit) =
set:odiA4ZlKzevK5y39eQsWZckcCoxsZqg4k4MIETgeNn0RxqjzEscOAdkenBZ7a373IGkMzlRZcFtq8vCvMeJMgzVyfiFR61GhMIUZ81a3aDJtilWwCksMTw3vQxUgc4x1QMFnA2h8lmdZpJ4jHU4wlFn4W2wHpFabuLlqZiIyviaTYTPl1PjrcIm4iZdvp5fOXR9PKnYQyfo6w1AOS7V5qO5ND9AGGby7LRkpBYZuOEZhvKfx9ZEGiSTLG9vx8ka0dtcUm5WXZtFUvjG92kZwnUjHVWc9NZapw9ocaec37K7rO7N4rl3mGGxh21z3bE4FdZ6RI6k52BrCvogEqZqeVZLcBZ9tcAbHq6ObepE2vXR9TQODoNkoeyaOQGbrDchblKzS8wBPTGtnv5Ze8K9FAsQN2dsQxAJ2hD2NOIL1ROhhFydp27xpqCYssiowuiYfJtCdofpKW9qohptrbV1pmkjVb4kI6lhKHZb5ae7p4u65NodjOSHmOw81L7fjwZjoK2uwH5rACj7NxXW9kk10uOsM6EJdRY4nrZqYbkKN2G1gQvUtZzqxPMjWgkh3P9LmcluzYzgCfZoexmZxQP4nRLYuemEWzHXkX1Ys5wqbfMZDxBbi7PtkohqPCqcB4f6zurrib4y7aUHs6aqCiVRHGDdR4thBgZFjpXDxIDN4X2QjAOjRQZhNzilkpBXmCgy75JDndDdw1DCMEZkmfgAnLU4rUEDIZ2pUvC5q3S2lMtq2Zj17f2xDL80cyg11BFQ8thh9iR2SIXCB69Jrda3E7ArKzyXYci6KlHo1jfvWP6Q2lre8VQCGQZoH8qDo7vaq2MGlNFgtgJGQgTGVx0CKZAAhlijag1ZKUGmgqFOdprCVtuZc8IoZBsuiY1DKZyS6e1HPHGYrZcOcd4KNcminIr9spwghADXzujggTmVsCutsEh6LLbjD2boZan6ycEnBMoFsbyIoz5ggnxFKIjuJyZvM6voYIs2qurMGOCoeB2csGGtTV6b82ItcZ2kj7mbamrZsQscXQ8EUT7lflT6IamZAZLZx7GTKUaZjwPyUltZ6gHWKIvw5wbNQMZGvOU6F43xdP7M2lJ0R9g3cVd4oiQig2F8K0AUdnwXz7LVPbjJMk8iBytXGntBKoPI6G3ahWHgSvmhGVVdNJeNXZl0AkcxhvhFqRujtasuOxCEz8WJCLZaO642OhFjtOVZliv4b4Y34Uh2vftwCo1vXxEJxtVVek42wG9La5uiLoz9o77UfQz4pu0yvZmTR8rs1X05ykr61bK73IOzGuMFjfKxQX2uUUL7zkl661NxWK8VR04bP7p1GVLn7SWPfXyfm2MbTZG8ytPzT2vDJYjmKTi6hRJ9RFPpZegntTk7uuL7ahsSej3aiBbEWVKYXWhkntIZ3YfDPOzRxosdgSKMNRY1llmBvZk5JlN3ETltMlAI7v7bq24bCvG9iGyCPyoUoKVLYpL8cycfepxYcgm4ioyz8zt3ZkAMQi4iZfs2zbd34VnocYTB6cyEZxR67x0i7wQRP7VrwDilZfniZgmdH5pAsWxXX0auKdd1nSN93XGZECEQP4nKKWCLwjYVjjAB1iRBohqEbRBstuQLCBMEL8OLapxqEYozeGlFMoSbo6eV6lCFuOKbYwvN6mjofKRtjavsU4hYm2ArDYgZ8alF8tZCw1BFvmhI0zL4w2un5m3S3G2tcZpWdVODLSi3IB4pQyR755EB7zUaPCF8LafHzDKHBSKEZohdNEooVN3lY5DnGpz25rW38dU8TXRG2pckDnGb5LVBVYpxK7VsybdZGsy5JP8Alk1Mf5O4zFmGcpZLs172YECGj3f8JzUcpknwF2PWxqRZgnZapssvh3CIOl8fA1bfZiUGMFg1eu62GXYlWiqWaA9g627YjeQkGdz9AhZCVECGOqNEZFX2Jy6O23lHvgJ2ylDSwi2RJGZdl1ckGMVwZ5z3uKJ207KVuQWyeCv7Ne4wxyWj9JgP0Z2PEmyXsan5QBFEZq323vZmlRyhhmPoPDle2zyxxT7AZ3vL42k3MLOzNPtGpZcTgIJfhy2H38fUU0YIFNpi9ygZGg5NBZdu1XL0NWesCbk2PapUsltirZ7qo6Jwbg2UkwsgEaVsclZ2EPrD5brplmPl9DHlrgLcAZCaNIG9HIFSfZErD7U0tJob9JvoHKZogXZgtaH0UkZjrpma2lCxPpj9qdZdh4gli6C9mqCKElC2NjZKh9T0Uw1ddLDYNmvPB971Xyzx9qSNQ99acW1uP36D7qgjyPsf9qO1ZDqA9fZuJ5oEkNIpIoatW0f157Sp3bTZ7o58XhKir4N8PnJIfj3P4NQjtrjW85vAEHU9B2dZACzNfNACZn7TO2AyRBmnPn6XWYKoAEW9UiH8Z12Z5OBMa2ovftCZJ3Zpqoc0GG9kpGjXEZCN0ypGqk4lnYTCL4X8i1B2AabsfEG100r2HQWbjoIrM1LtGBAqSuPAPwIEVnaSqAvQFfCi9xCHS3EKRSZa9EsDZktcKHply2cMb5ldXl06eDg3i1xHM9uPTvm7laFyWorkUMcCZiV5d0tlXzFuiDjA3pOsOjUFmgAzPNJK17LfQSHA4avpaLbZ4lGSPRdQWOOKBxGDgOYMnK2Z11iom0XUBCvybJ2POnO81yDKHCEtVHJE4ZKTMiruLDjAe876HiwstanPxJ0m6ZeYmWgraCbEF7iaQZoZkgkwHS8ONi1OXSBF7crddfeYDwWLZk9w4lRr5lj5b3bZx2DFkDwk1IwczPHksFPHGgM8hSnBhfBXpcRLalykEkN9p0LGi6u5Zzs7q7dLLOPRrRLNtaz83XPT1ytZAbLKy7dKfRzTBQ314Zbe7tf8HZwplZuD7ZJZi2Juyxq8JdDPBRCJYgD8ZoDjoNbfrnxv7oJ60T8OyKLvkuLjFnWnHZd1M3OwsOM4uDRX3hk8OsAzsQSehIcdLYzRjYAloGknyZGmDCQLMlfpzzz18aIP9BMhLr364JPk6sSFK4qDU8EMcHGgYUWRUaJE9HnLTs9TzLxTveQnBch7PRnniDwgh2hUF0sDUbQmR9wRcDHtZD5p4QZBHaJYKWpdRtiL1nSsNEzvsMZ9PI05mLLNmZIfluzk3mopgeSQWtQGoiHzmRlNFnrhy5AxC9hHegoVllHAHc8VZrOZyNnBZD9b3Vaqrybs1uveBSfZG3mohKqO2WbTUjg3FXD1kkQPIWA0Z2Z3d6Q77wriiou1ZldJgnkru5WtLsY5YAvwTMdFzoa82XjjDRQwCXnzyRRuXqLF6TGDcrEZIAlAtgs8qgZr10JIaHcp7ZiG4oZpCP8nBCCWGxkBOrue2foa1QoJ30LXN7tMFc5U3wuIoXZmSIyBApsgPkgk5
perl(AutoLoader.pm) = 5.710
perl(B.pm) = 1.290
perl(Benchmark.pm) = 1.120
perl(Carp.pm) = 1.200
perl(Carp/Heavy.pm)
perl(Class/Struct.pm) = 0.630
perl(Config.pm)
perl(Config/Extensions.pm) = 0.010
[... more modules ...]

Did you see that long string? Was it nice? Well, at least it did not
contain funny characters which allude to swear words. :-) And you can
easily skim over it. But it's still kinda too long, which is not
exactly nice. The point is that I tried hard to make it as short as it
is, and there are very few remaining possibilities to make it even a
little bit more short. Please also realize that with C++ libraries
things get much worse, since symbol export in C++ is basically
uncontrollable. Here are some longest Provides set-versions on my
machine:

$ rpm -qa --provides |grep ' = set:' |awk '{print length($NF)"\t"$1}'
|sort -n |tail -20
12460   libbHYPRE.so.0()(64bit)
12838   libosp.so.5()(64bit)
14526   libostyle.so.0()(64bit)
14855   libkdeui.so.5()(64bit)
17444   libdesignercore.so.1()(64bit)
18094   libkdecore.so.4()(64bit)
18505   libgoto2.so.0()(64bit)
20061   libkdeui.so.4()(64bit)
20869   libwx_gtk2u_core-2.8.so.0()(64bit)
21227   libkio.so.4()(64bit)
23717   libxerces-c.so.28()(64bit)
25133   libgtkmm-2.4.so.1()(64bit)
25315   libcryptopp.so.6()(64bit)
25446   libLLVM-2.9.so()(64bit)
27339   libQtGui.so.4()(64bit)
27737   libgdal.so.1()(64bit)
29798   libkhtml.so.4()(64bit)
45492   libqt-mt.so.3()(64bit)
53000   libsidlstub_f90-1.4.0.so()(64bit)
139476  libgcj.so.11()(64bit)

That's strlen=20K per big C++ library, peaking at 139K as for libgcj,
which probably has particularly cluttered and uncontrollable export.
When you constantly have to deal with strings which do not fit on the
screen, the desire to make them shorter quickly becomes very
excusable. :-)

(On the other hand, Requires versions usually look much more
reasonable, since they include only a subset of the symbols which they
actually use, albeit at a slightly higher per-symbol cost. Let's check
it out. What are the longest Requires versions? The list is not unique
here.)

$ rpm -qa -R |grep '>= set:' |awk '{print length($NF)"\t"$1}' |sort -n
|tail -20
2117    libQtGui.so.4()(64bit)
2187    libgio-2.0.so.0()(64bit)
2280    libqt-mt.so.3()(64bit)
2337    libgtk-1.2.so.0()(64bit)
2421    libQtGui.so.4()(64bit)
2494    libqt-mt.so.3()(64bit)
2723    libwx_gtk2u_core-2.8.so.0()(64bit)
2794    libQtGui.so.4()(64bit)
2817    libQtGui.so.4()(64bit)
3139    libQtCore.so.4()(64bit)
3205    libQtGui.so.4()(64bit)
4047    libqt-mt.so.3()(64bit)
4210    libQtGui.so.4()(64bit)
4781    libwx_gtk2u_core-2.8.so.0()(64bit)
5152    libqt-mt.so.3()(64bit)
5696    libgtk-x11-2.0.so.0()(64bit)
6216    libgtk-x11-2.0.so.0()(64bit)
6461    libgtk-x11-2.0.so.0()(64bit)
8196    libQtGui.so.4()(64bit)
9639    libqt-mt.so.3()(64bit)

> The other reason is this: missing values in the {E,V,R} triple
> (and attaching some reasonable DWIM semantic) lead to huge
> amounts of pointless discussion. Adding all of
>         twiddle-in-version
>         set:versions
>         dpkg-like
> is all underway atm. Running digest/encoding strings through
> "legacy compatible" (i.e. what most rpm tool chains will
> continue to be using for years) rpmvercmp(3) is an accident
> waiting to be reported again again again.

1) Abusing Serial to indicate special kind of version is arguable the
best we can do. The alternative is to use dash: "set-version", where
"set" is a technically a Version and the rest is the Release, which
then has special meaning. But realize that "set" can be a valid
Version not only technically, but for real. Why, if Roman numerals can
be a version, or if versions can start with "v", why "set" can't be a
real Version? There's then a problem how to tell them apart.

2) Regular versions aka EVR triples are anyway 15-year old stuff of
limited use. When you use them, you rely on the concept of backward
compatibility, and, as for library versioning, they cannot be
generated/deduced automatically. To exaggerate a bit, they are only
good to decide whether a package has to be upgraded or not. On the
other hand, set-versions can exactly catch the problems when backward
compatibility has been broken, and they can be generated completely
automatically (in a nowhere-near easy but doable manner). So I can't
see why set-versions cannot be endowed with first-class citizenship
and must always be discussed in terms of EVR triples.

3) There are also "rpmlib(SetVersions)" dependencies which we attach
to the packages. They must take effect at least at install time.

> Its just as easy to attach a tag with binary data as any other
> approach to "pretty" encodings, all of which involve some
> aesthetic choice of permitted/excluded ASCII characters, and so
> the encodings proliferate to meet all possible opinions.

Set-versions still make sense as Versions: they can be meaningfully
compared, on behalf of dependencies. This is why I chose them to treat
as REQUIREVERSION and PROVIDEVERSION, not as some special binary data.
They just fit into there almost naturally.

> Perhaps making base61 encoding MANDATORY in rpm would displease
> everyone equally: choosing a prime is as "pretty" as all other
> encoding criteria.

Now that you know about set-versions much more, it is interesting to
consider again what are their weak points. I see no really weak
points, like you know blunders. They also work very well in practice
(you cannot delete/rename a symbol in a library without violating
dependencies, if the symbol happens to be used). On the second
thought, um, there might seem to be some arbitrary decisions. Some of
them I can perhaps defend as less arbitrary.
______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to