On Sat, Jun 23, 2012 at 10:10 PM, Jeffrey Johnson <n3...@me.com> wrote: >> So I suggest that set-strings must cover all usages of set-subset >> computations, where static data structures are appropriate (so that >> you compute it, write into RPM header, and do not expect to modify >> it). it is still "efficient", despite the fact that we try hard to >> achieve a better string representation. The fact that it can be >> packaged nicely does not make it any worse in other respects, except >> that obviously we can stuff somewhat less bits into alnum characters. >> But I also wanted it to look "nice" (this is why base62 was devised >> over much simpler base64 encoding). But so, it looks nice, and it is >> efficient. Why are you unhappy? Do you necessarily require that you >> have to do bitmask tests yourself? > > I'm not "unhappy". Rather I'm pointing out that -- if your primary > goal is bandwidth reduction in *.rpm package metadata -- that there are > equivalent savings that aren't from "pretty" encodings.
Okay, the primary goal was not so much to minimize bandwidth, but just to make things "nice" overall. But again, realize that set-versions are O(N) in size, where N is the number of symbols. For example, here is the Provides of perl-base package in ALT Linux: $ rpm -q --provides perl-base perl = 1:5.14.2 perl-version = 0.88 perl-PerlIO = 1:5.14.2 perl-Storable = 1:5.14.2 perl-Digest-MD5 perl-Time-HiRes perl-MIME-Base64 libperl-5.14.so()(64bit) = set:odiA4ZlKzevK5y39eQsWZckcCoxsZqg4k4MIETgeNn0RxqjzEscOAdkenBZ7a373IGkMzlRZcFtq8vCvMeJMgzVyfiFR61GhMIUZ81a3aDJtilWwCksMTw3vQxUgc4x1QMFnA2h8lmdZpJ4jHU4wlFn4W2wHpFabuLlqZiIyviaTYTPl1PjrcIm4iZdvp5fOXR9PKnYQyfo6w1AOS7V5qO5ND9AGGby7LRkpBYZuOEZhvKfx9ZEGiSTLG9vx8ka0dtcUm5WXZtFUvjG92kZwnUjHVWc9NZapw9ocaec37K7rO7N4rl3mGGxh21z3bE4FdZ6RI6k52BrCvogEqZqeVZLcBZ9tcAbHq6ObepE2vXR9TQODoNkoeyaOQGbrDchblKzS8wBPTGtnv5Ze8K9FAsQN2dsQxAJ2hD2NOIL1ROhhFydp27xpqCYssiowuiYfJtCdofpKW9qohptrbV1pmkjVb4kI6lhKHZb5ae7p4u65NodjOSHmOw81L7fjwZjoK2uwH5rACj7NxXW9kk10uOsM6EJdRY4nrZqYbkKN2G1gQvUtZzqxPMjWgkh3P9LmcluzYzgCfZoexmZxQP4nRLYuemEWzHXkX1Ys5wqbfMZDxBbi7PtkohqPCqcB4f6zurrib4y7aUHs6aqCiVRHGDdR4thBgZFjpXDxIDN4X2QjAOjRQZhNzilkpBXmCgy75JDndDdw1DCMEZkmfgAnLU4rUEDIZ2pUvC5q3S2lMtq2Zj17f2xDL80cyg11BFQ8thh9iR2SIXCB69Jrda3E7ArKzyXYci6KlHo1jfvWP6Q2lre8VQCGQZoH8qDo7vaq2MGlNFgtgJGQgTGVx0CKZAAhlijag1ZKUGmgqFOdprCVtuZc8IoZBsuiY1DKZyS6e1HPHGYrZcOcd4KNcminIr9spwghADXzujggTmVsCutsEh6LLbjD2boZan6ycEnBMoFsbyIoz5ggnxFKIjuJyZvM6voYIs2qurMGOCoeB2csGGtTV6b82ItcZ2kj7mbamrZsQscXQ8EUT7lflT6IamZAZLZx7GTKUaZjwPyUltZ6gHWKIvw5wbNQMZGvOU6F43xdP7M2lJ0R9g3cVd4oiQig2F8K0AUdnwXz7LVPbjJMk8iBytXGntBKoPI6G3ahWHgSvmhGVVdNJeNXZl0AkcxhvhFqRujtasuOxCEz8WJCLZaO642OhFjtOVZliv4b4Y34Uh2vftwCo1vXxEJxtVVek42wG9La5uiLoz9o77UfQz4pu0yvZmTR8rs1X05ykr61bK73IOzGuMFjfKxQX2uUUL7zkl661NxWK8VR04bP7p1GVLn7SWPfXyfm2MbTZG8ytPzT2vDJYjmKTi6hRJ9RFPpZegntTk7uuL7ahsSej3aiBbEWVKYXWhkntIZ3YfDPOzRxosdgSKMNRY1llmBvZk5JlN3ETltMlAI7v7bq24bCvG9iGyCPyoUoKVLYpL8cycfepxYcgm4ioyz8zt3ZkAMQi4iZfs2zbd34VnocYTB6cyEZxR67x0i7wQRP7VrwDilZfniZgmdH5pAsWxXX0auKdd1nSN93XGZECEQP4nKKWCLwjYVjjAB1iRBohqEbRBstuQLCBMEL8OLapxqEYozeGlFMoSbo6eV6lCFuOKbYwvN6mjofKRtjavsU4hYm2ArDYgZ8alF8tZCw1BFvmhI0zL4w2un5m3S3G2tcZpWdVODLSi3IB4pQyR755EB7zUaPCF8LafHzDKHBSKEZohdNEooVN3lY5DnGpz25rW38dU8TXRG2pckDnGb5LVBVYpxK7VsybdZGsy5JP8Alk1Mf5O4zFmGcpZLs172YECGj3f8JzUcpknwF2PWxqRZgnZapssvh3CIOl8fA1bfZiUGMFg1eu62GXYlWiqWaA9g627YjeQkGdz9AhZCVECGOqNEZFX2Jy6O23lHvgJ2ylDSwi2RJGZdl1ckGMVwZ5z3uKJ207KVuQWyeCv7Ne4wxyWj9JgP0Z2PEmyXsan5QBFEZq323vZmlRyhhmPoPDle2zyxxT7AZ3vL42k3MLOzNPtGpZcTgIJfhy2H38fUU0YIFNpi9ygZGg5NBZdu1XL0NWesCbk2PapUsltirZ7qo6Jwbg2UkwsgEaVsclZ2EPrD5brplmPl9DHlrgLcAZCaNIG9HIFSfZErD7U0tJob9JvoHKZogXZgtaH0UkZjrpma2lCxPpj9qdZdh4gli6C9mqCKElC2NjZKh9T0Uw1ddLDYNmvPB971Xyzx9qSNQ99acW1uP36D7qgjyPsf9qO1ZDqA9fZuJ5oEkNIpIoatW0f157Sp3bTZ7o58XhKir4N8PnJIfj3P4NQjtrjW85vAEHU9B2dZACzNfNACZn7TO2AyRBmnPn6XWYKoAEW9UiH8Z12Z5OBMa2ovftCZJ3Zpqoc0GG9kpGjXEZCN0ypGqk4lnYTCL4X8i1B2AabsfEG100r2HQWbjoIrM1LtGBAqSuPAPwIEVnaSqAvQFfCi9xCHS3EKRSZa9EsDZktcKHply2cMb5ldXl06eDg3i1xHM9uPTvm7laFyWorkUMcCZiV5d0tlXzFuiDjA3pOsOjUFmgAzPNJK17LfQSHA4avpaLbZ4lGSPRdQWOOKBxGDgOYMnK2Z11iom0XUBCvybJ2POnO81yDKHCEtVHJE4ZKTMiruLDjAe876HiwstanPxJ0m6ZeYmWgraCbEF7iaQZoZkgkwHS8ONi1OXSBF7crddfeYDwWLZk9w4lRr5lj5b3bZx2DFkDwk1IwczPHksFPHGgM8hSnBhfBXpcRLalykEkN9p0LGi6u5Zzs7q7dLLOPRrRLNtaz83XPT1ytZAbLKy7dKfRzTBQ314Zbe7tf8HZwplZuD7ZJZi2Juyxq8JdDPBRCJYgD8ZoDjoNbfrnxv7oJ60T8OyKLvkuLjFnWnHZd1M3OwsOM4uDRX3hk8OsAzsQSehIcdLYzRjYAloGknyZGmDCQLMlfpzzz18aIP9BMhLr364JPk6sSFK4qDU8EMcHGgYUWRUaJE9HnLTs9TzLxTveQnBch7PRnniDwgh2hUF0sDUbQmR9wRcDHtZD5p4QZBHaJYKWpdRtiL1nSsNEzvsMZ9PI05mLLNmZIfluzk3mopgeSQWtQGoiHzmRlNFnrhy5AxC9hHegoVllHAHc8VZrOZyNnBZD9b3Vaqrybs1uveBSfZG3mohKqO2WbTUjg3FXD1kkQPIWA0Z2Z3d6Q77wriiou1ZldJgnkru5WtLsY5YAvwTMdFzoa82XjjDRQwCXnzyRRuXqLF6TGDcrEZIAlAtgs8qgZr10JIaHcp7ZiG4oZpCP8nBCCWGxkBOrue2foa1QoJ30LXN7tMFc5U3wuIoXZmSIyBApsgPkgk5 perl(AutoLoader.pm) = 5.710 perl(B.pm) = 1.290 perl(Benchmark.pm) = 1.120 perl(Carp.pm) = 1.200 perl(Carp/Heavy.pm) perl(Class/Struct.pm) = 0.630 perl(Config.pm) perl(Config/Extensions.pm) = 0.010 [... more modules ...] Did you see that long string? Was it nice? Well, at least it did not contain funny characters which allude to swear words. :-) And you can easily skim over it. But it's still kinda too long, which is not exactly nice. The point is that I tried hard to make it as short as it is, and there are very few remaining possibilities to make it even a little bit more short. Please also realize that with C++ libraries things get much worse, since symbol export in C++ is basically uncontrollable. Here are some longest Provides set-versions on my machine: $ rpm -qa --provides |grep ' = set:' |awk '{print length($NF)"\t"$1}' |sort -n |tail -20 12460 libbHYPRE.so.0()(64bit) 12838 libosp.so.5()(64bit) 14526 libostyle.so.0()(64bit) 14855 libkdeui.so.5()(64bit) 17444 libdesignercore.so.1()(64bit) 18094 libkdecore.so.4()(64bit) 18505 libgoto2.so.0()(64bit) 20061 libkdeui.so.4()(64bit) 20869 libwx_gtk2u_core-2.8.so.0()(64bit) 21227 libkio.so.4()(64bit) 23717 libxerces-c.so.28()(64bit) 25133 libgtkmm-2.4.so.1()(64bit) 25315 libcryptopp.so.6()(64bit) 25446 libLLVM-2.9.so()(64bit) 27339 libQtGui.so.4()(64bit) 27737 libgdal.so.1()(64bit) 29798 libkhtml.so.4()(64bit) 45492 libqt-mt.so.3()(64bit) 53000 libsidlstub_f90-1.4.0.so()(64bit) 139476 libgcj.so.11()(64bit) That's strlen=20K per big C++ library, peaking at 139K as for libgcj, which probably has particularly cluttered and uncontrollable export. When you constantly have to deal with strings which do not fit on the screen, the desire to make them shorter quickly becomes very excusable. :-) (On the other hand, Requires versions usually look much more reasonable, since they include only a subset of the symbols which they actually use, albeit at a slightly higher per-symbol cost. Let's check it out. What are the longest Requires versions? The list is not unique here.) $ rpm -qa -R |grep '>= set:' |awk '{print length($NF)"\t"$1}' |sort -n |tail -20 2117 libQtGui.so.4()(64bit) 2187 libgio-2.0.so.0()(64bit) 2280 libqt-mt.so.3()(64bit) 2337 libgtk-1.2.so.0()(64bit) 2421 libQtGui.so.4()(64bit) 2494 libqt-mt.so.3()(64bit) 2723 libwx_gtk2u_core-2.8.so.0()(64bit) 2794 libQtGui.so.4()(64bit) 2817 libQtGui.so.4()(64bit) 3139 libQtCore.so.4()(64bit) 3205 libQtGui.so.4()(64bit) 4047 libqt-mt.so.3()(64bit) 4210 libQtGui.so.4()(64bit) 4781 libwx_gtk2u_core-2.8.so.0()(64bit) 5152 libqt-mt.so.3()(64bit) 5696 libgtk-x11-2.0.so.0()(64bit) 6216 libgtk-x11-2.0.so.0()(64bit) 6461 libgtk-x11-2.0.so.0()(64bit) 8196 libQtGui.so.4()(64bit) 9639 libqt-mt.so.3()(64bit) > The other reason is this: missing values in the {E,V,R} triple > (and attaching some reasonable DWIM semantic) lead to huge > amounts of pointless discussion. Adding all of > twiddle-in-version > set:versions > dpkg-like > is all underway atm. Running digest/encoding strings through > "legacy compatible" (i.e. what most rpm tool chains will > continue to be using for years) rpmvercmp(3) is an accident > waiting to be reported again again again. 1) Abusing Serial to indicate special kind of version is arguable the best we can do. The alternative is to use dash: "set-version", where "set" is a technically a Version and the rest is the Release, which then has special meaning. But realize that "set" can be a valid Version not only technically, but for real. Why, if Roman numerals can be a version, or if versions can start with "v", why "set" can't be a real Version? There's then a problem how to tell them apart. 2) Regular versions aka EVR triples are anyway 15-year old stuff of limited use. When you use them, you rely on the concept of backward compatibility, and, as for library versioning, they cannot be generated/deduced automatically. To exaggerate a bit, they are only good to decide whether a package has to be upgraded or not. On the other hand, set-versions can exactly catch the problems when backward compatibility has been broken, and they can be generated completely automatically (in a nowhere-near easy but doable manner). So I can't see why set-versions cannot be endowed with first-class citizenship and must always be discussed in terms of EVR triples. 3) There are also "rpmlib(SetVersions)" dependencies which we attach to the packages. They must take effect at least at install time. > Its just as easy to attach a tag with binary data as any other > approach to "pretty" encodings, all of which involve some > aesthetic choice of permitted/excluded ASCII characters, and so > the encodings proliferate to meet all possible opinions. Set-versions still make sense as Versions: they can be meaningfully compared, on behalf of dependencies. This is why I chose them to treat as REQUIREVERSION and PROVIDEVERSION, not as some special binary data. They just fit into there almost naturally. > Perhaps making base61 encoding MANDATORY in rpm would displease > everyone equally: choosing a prime is as "pretty" as all other > encoding criteria. Now that you know about set-versions much more, it is interesting to consider again what are their weak points. I see no really weak points, like you know blunders. They also work very well in practice (you cannot delete/rename a symbol in a library without violating dependencies, if the symbol happens to be used). On the second thought, um, there might seem to be some arbitrary decisions. Some of them I can perhaps defend as less arbitrary. ______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org