Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Mon, Jul 2, 2012 at 9:17 AM, Jeffrey Johnson wrote: >> All RPMv4.4+ packages, that is, but not RPMv4.0. I find this "file >> coloring" business very annoying, by the way, and it took me some time >> to realize that "fc" actually stands for "file coloring". :-) > > RPMv4.0 was a l-o-n-g time ago. It was a long time ago, but it was not that bad. In ALT Linux we still use rpm-4.0 code base (with many important backports etc.). Since I've recently broken with ALT Linux, I cannot make more claims, you see... > I find multilib quite annoying: I was asked for an implementation, > and did so. 'Twas a job mon: already 7y since leaving @redhat, and file > colors for multi lib were several years before that. I don't like how "multilib" works either. You can no longer identify packages by their names, and then there are special rules to resolve file conflicts, which basically say that the license is that you can swamp files as much as you want, provided that you got the first hand in that strange "x86-64" relationship. There is than that recent "x32" stuff where you can run in 64-bit mode using only 32-bit pointers. How can you address THAT? To me, the world is declining, like the Roman Empire. ;-) >> I see no reason why rpm(1) should ever consider any preferences. To >> me, rpm(1) is exactly black-and-white thing, a watchdog which checks >> logical assertions. Things are either consistent enough to proceed >> (and e.g. to upgrade the packages), or not - and then you get e.g. >> non-zero exit code and you are forced to bail out. Higher-level logic >> of finding an upgrade plan anyways belongs to something like apt(1) or >> yum(1), although it is executed in some basic librpm terms which we >> must make efficient enough. I see no reason to discuss closest metrics >> or largest overlaps - this is as interesting as irrelevant to our >> basic tasks. > > Everyone has an opinion: yours is particularly brittle but > entirely logical. Now go persuade everyone else that your > opinion is The One True Opinion and RPM will surely change too. > >>> There's a similar application with dual/triple/... licensed software and >>> computing >>> per-file, not per-package, license affinity precisely where set:versions (or >>> Bloom >>> filters) will represent keywords (like "LGPLv2" or "BSD") easily. Licenses >>> unlike >>> file(1) magic keywords will require name space administration. SUrely LSB >>> and LFF >>> are seeking something useful to do for RPM packaging these days, and might >>> be convinced to make some set of license tokens "standard" so that license >>> affinity can be precisely computed in distributed software. >> >> You then discuss more applications which are largely irrelevant to our >> basic tasks. (I realize that I'm revisiting and older discussion, >> which might not be completely fair because our understanding might >> have evolved since.) > > Um … what are "… our basic tasks."? Our basic task, is that when we feed packages to rpm(1), it must quickly decide whether the upgrade is feasible or not. Of course this involves hard and sometimes speculative considerations whether things are going to work. But if things are definitely not going to work, rpm should upgrade never! Not without a special flag which reads '--upgrade-as-i-wish'. What is not part of "our basic tasks" is to find an upgrade plan. How they do that - it's another business, and completely another story. We only check if the upgrade is consistent or not. > I cannot determine what is "fair" without knowing what is being compared … > >> Anyway, set-versions are not the "next big thing" with plenty of >> applications. It's rather a very boring stuff which nevertheless >> answers the question "how we can possibly enhance ABI compatibility >> control beyond sonames". The answer is that we must involve into >> set/subset testing - that's the model, that it is very expensive, and >> that the only reasonable and possibly the best way to go is to replace >> symbols with numbers, and to treat sets of numbers as special kind of >> versions. Now why is that? But that's a much better perspective for >> discussion. > > We differ in usage cases. I see a "container", you see a "version". I realize that the "version" is somehow a contrived concept behind really a containter. Like: - Whats' that? - It's a version. - Oh my gosh! But then again, if you try to organize your thinking beyond sets, bytes, and characters, the concept of "version" pops up pretty naturally. If we are going to satisfy some requirement, that's good for "version". > There are many usage cases for subset operations in "package management" > no matter what we think or how the operations are implemented and represented. Not all subset operations are equally important, or complex. As I said, sometimes you should make easy things easy, and simply use which is Rb-tree or something like this. Sometimes things go off, though. Like this: - How many symbols do you want to encode? - 10M. - That'
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Sat, Jun 23, 2012 at 10:29 PM, Jeffrey Johnson wrote: > In the interest of getting off negative nerdy obscure discussions, let's > try a positive alternative application for Golob-Rice subset operations. > > All RPMv4 packages attach (a lightly filtered) file(1) magic string to > every file. All RPMv4.4+ packages, that is, but not RPMv4.0. I find this "file coloring" business very annoying, by the way, and it took me some time to realize that "fc" actually stands for "file coloring". :-) > The file(1) data is mostly usable as a "keyword" namespace exactly as is. > Yes there are flaws: however magic strings are from file(1) is about > as good as any other de facto keyword tagging of file content. > > keywords are strings just like elf symbols are, and set:versions (or Bloom > filters) > are a compact representation from which its rather easy to do subset > computations. > > One extension that would be needed is a "closest" metric in order to > "prefer" > the largest subset overlap: with set:versions any contained subset will > satisfy the > logical assertions, and there's no easy way to prefer the larger sub-set. I see no reason why rpm(1) should ever consider any preferences. To me, rpm(1) is exactly black-and-white thing, a watchdog which checks logical assertions. Things are either consistent enough to proceed (and e.g. to upgrade the packages), or not - and then you get e.g. non-zero exit code and you are forced to bail out. Higher-level logic of finding an upgrade plan anyways belongs to something like apt(1) or yum(1), although it is executed in some basic librpm terms which we must make efficient enough. I see no reason to discuss closest metrics or largest overlaps - this is as interesting as irrelevant to our basic tasks. > There's a similar application with dual/triple/... licensed software and > computing > per-file, not per-package, license affinity precisely where set:versions (or > Bloom > filters) will represent keywords (like "LGPLv2" or "BSD") easily. Licenses > unlike > file(1) magic keywords will require name space administration. SUrely LSB > and LFF > are seeking something useful to do for RPM packaging these days, and might > be convinced to make some set of license tokens "standard" so that license > affinity can be precisely computed in distributed software. You then discuss more applications which are largely irrelevant to our basic tasks. (I realize that I'm revisiting and older discussion, which might not be completely fair because our understanding might have evolved since.) Anyway, set-versions are not the "next big thing" with plenty of applications. It's rather a very boring stuff which nevertheless answers the question "how we can possibly enhance ABI compatibility control beyond sonames". The answer is that we must involve into set/subset testing - that's the model, that it is very expensive, and that the only reasonable and possibly the best way to go is to replace symbols with numbers, and to treat sets of numbers as special kind of versions. Now why is that? But that's a much better perspective for discussion. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Sat, Jun 23, 2012 at 10:10 PM, Jeffrey Johnson wrote: >> So I suggest that set-strings must cover all usages of set-subset >> computations, where static data structures are appropriate (so that >> you compute it, write into RPM header, and do not expect to modify >> it). it is still "efficient", despite the fact that we try hard to >> achieve a better string representation. The fact that it can be >> packaged nicely does not make it any worse in other respects, except >> that obviously we can stuff somewhat less bits into alnum characters. >> But I also wanted it to look "nice" (this is why base62 was devised >> over much simpler base64 encoding). But so, it looks nice, and it is >> efficient. Why are you unhappy? Do you necessarily require that you >> have to do bitmask tests yourself? > > I'm not "unhappy". Rather I'm pointing out that -- if your primary > goal is bandwidth reduction in *.rpm package metadata -- that there are > equivalent savings that aren't from "pretty" encodings. Okay, the primary goal was not so much to minimize bandwidth, but just to make things "nice" overall. But again, realize that set-versions are O(N) in size, where N is the number of symbols. For example, here is the Provides of perl-base package in ALT Linux: $ rpm -q --provides perl-base perl = 1:5.14.2 perl-version = 0.88 perl-PerlIO = 1:5.14.2 perl-Storable = 1:5.14.2 perl-Digest-MD5 perl-Time-HiRes perl-MIME-Base64 libperl-5.14.so()(64bit) = set:odiA4ZlKzevK5y39eQsWZckcCoxsZqg4k4MIETgeNn0RxqjzEscOAdkenBZ7a373IGkMzlRZcFtq8vCvMeJMgzVyfiFR61GhMIUZ81a3aDJtilWwCksMTw3vQxUgc4x1QMFnA2h8lmdZpJ4jHU4wlFn4W2wHpFabuLlqZiIyviaTYTPl1PjrcIm4iZdvp5fOXR9PKnYQyfo6w1AOS7V5qO5ND9AGGby7LRkpBYZuOEZhvKfx9ZEGiSTLG9vx8ka0dtcUm5WXZtFUvjG92kZwnUjHVWc9NZapw9ocaec37K7rO7N4rl3mGGxh21z3bE4FdZ6RI6k52BrCvogEqZqeVZLcBZ9tcAbHq6ObepE2vXR9TQODoNkoeyaOQGbrDchblKzS8wBPTGtnv5Ze8K9FAsQN2dsQxAJ2hD2NOIL1ROhhFydp27xpqCYssiowuiYfJtCdofpKW9qohptrbV1pmkjVb4kI6lhKHZb5ae7p4u65NodjOSHmOw81L7fjwZjoK2uwH5rACj7NxXW9kk10uOsM6EJdRY4nrZqYbkKN2G1gQvUtZzqxPMjWgkh3P9LmcluzYzgCfZoexmZxQP4nRLYuemEWzHXkX1Ys5wqbfMZDxBbi7PtkohqPCqcB4f6zurrib4y7aUHs6aqCiVRHGDdR4thBgZFjpXDxIDN4X2QjAOjRQZhNzilkpBXmCgy75JDndDdw1DCMEZkmfgAnLU4rUEDIZ2pUvC5q3S2lMtq2Zj17f2xDL80cyg11BFQ8thh9iR2SIXCB69Jrda3E7ArKzyXYci6KlHo1jfvWP6Q2lre8VQCGQZoH8qDo7vaq2MGlNFgtgJGQgTGVx0CKZAAhlijag1ZKUGmgqFOdprCVtuZc8IoZBsuiY1DKZyS6e1HPHGYrZcOcd4KNcminIr9spwghADXzujggTmVsCutsEh6LLbjD2boZan6ycEnBMoFsbyIoz5ggnxFKIjuJyZvM6voYIs2qurMGOCoeB2csGGtTV6b82ItcZ2kj7mbamrZsQscXQ8EUT7lflT6IamZAZLZx7GTKUaZjwPyUltZ6gHWKIvw5wbNQMZGvOU6F43xdP7M2lJ0R9g3cVd4oiQig2F8K0AUdnwXz7LVPbjJMk8iBytXGntBKoPI6G3ahWHgSvmhGVVdNJeNXZl0AkcxhvhFqRujtasuOxCEz8WJCLZaO642OhFjtOVZliv4b4Y34Uh2vftwCo1vXxEJxtVVek42wG9La5uiLoz9o77UfQz4pu0yvZmTR8rs1X05ykr61bK73IOzGuMFjfKxQX2uUUL7zkl661NxWK8VR04bP7p1GVLn7SWPfXyfm2MbTZG8ytPzT2vDJYjmKTi6hRJ9RFPpZegntTk7uuL7ahsSej3aiBbEWVKYXWhkntIZ3YfDPOzRxosdgSKMNRY1llmBvZk5JlN3ETltMlAI7v7bq24bCvG9iGyCPyoUoKVLYpL8cycfepxYcgm4ioyz8zt3ZkAMQi4iZfs2zbd34VnocYTB6cyEZxR67x0i7wQRP7VrwDilZfniZgmdH5pAsWxXX0auKdd1nSN93XGZECEQP4nKKWCLwjYVjjAB1iRBohqEbRBstuQLCBMEL8OLapxqEYozeGlFMoSbo6eV6lCFuOKbYwvN6mjofKRtjavsU4hYm2ArDYgZ8alF8tZCw1BFvmhI0zL4w2un5m3S3G2tcZpWdVODLSi3IB4pQyR755EB7zUaPCF8LafHzDKHBSKEZohdNEooVN3lY5DnGpz25rW38dU8TXRG2pckDnGb5LVBVYpxK7VsybdZGsy5JP8Alk1Mf5O4zFmGcpZLs172YECGj3f8JzUcpknwF2PWxqRZgnZapssvh3CIOl8fA1bfZiUGMFg1eu62GXYlWiqWaA9g627YjeQkGdz9AhZCVECGOqNEZFX2Jy6O23lHvgJ2ylDSwi2RJGZdl1ckGMVwZ5z3uKJ207KVuQWyeCv7Ne4wxyWj9JgP0Z2PEmyXsan5QBFEZq323vZmlRyhhmPoPDle2zyxxT7AZ3vL42k3MLOzNPtGpZcTgIJfhy2H38fUU0YIFNpi9ygZGg5NBZdu1XL0NWesCbk2PapUsltirZ7qo6Jwbg2UkwsgEaVsclZ2EPrD5brplmPl9DHlrgLcAZCaNIG9HIFSfZErD7U0tJob9JvoHKZogXZgtaH0UkZjrpma2lCxPpj9qdZdh4gli6C9mqCKElC2NjZKh9T0Uw1ddLDYNmvPB971Xyzx9qSNQ99acW1uP36D7qgjyPsf9qO1ZDqA9fZuJ5oEkNIpIoatW0f157Sp3bTZ7o58XhKir4N8PnJIfj3P4NQjtrjW85vAEHU9B2dZACzNfNACZn7TO2AyRBmnPn6XWYKoAEW9UiH8Z12Z5OBMa2ovftCZJ3Zpqoc0GG9kpGjXEZCN0ypGqk4lnYTCL4X8i1B2AabsfEG100r2HQWbjoIrM1LtGBAqSuPAPwIEVnaSqAvQFfCi9xCHS3EKRSZa9EsDZktcKHply2cMb5ldXl06eDg3i1xHM9uPTvm7laFyWorkUMcCZiV5d0tlXzFuiDjA3pOsOjUFmgAzPNJK17LfQSHA4avpaLbZ4lGSPRdQWOOKBxGDgOYMnK2Z11iom0XUBCvybJ2POnO81yDKHCEtVHJE4ZKTMiruLDjAe876HiwstanPxJ0m6ZeYmWgraCbEF7iaQZoZkgkwHS8ONi1OXSBF7crddfeYDwWLZk9w4lRr5lj5b3bZx2DFkDwk1IwczPHksFPHGgM8hSnBhfBXpcRLalykEkN9p0LGi6u5Zzs7q7dLLOPRrRLNtaz83XPT1ytZAbLKy7dKfRzTBQ314Zbe7tf8HZwplZuD7ZJZi2Juyxq8JdDPBRCJYgD8ZoDjoNbfrnxv7oJ60T8OyKLvkuLjFnWnHZd1M3OwsOM4uDRX3hk8OsAzsQSehIcdLYzRjYAloGknyZGmDCQLMlfpzzz18aIP9BMhLr364JPk6sSFK4qDU8EMcHGgYUWRUaJE9HnLTs9TzLxTveQnBch7PRnniDwgh2hUF0sDUbQmR9wRcDHtZD5p4QZBHaJYKWpdRtiL1nSsNEzvsMZ9PI05mLLNmZIfluzk3mopgeSQWtQGoiHzmRlNFnrhy5AxC9hHegoVllHAHc8VZrOZyNnBZD9b3Vaqrybs1uveBSfZG3mohKqO2WbTUjg3FXD1kkQPIWA0Z2Z3d6Q77wriiou1ZldJgnkru5WtLsY5YAvwTMdFzoa82XjjDRQwCXnzyRRuXqLF6TGDcrEZIAlAtgs8qgZr10JIaHcp7ZiG4oZpCP8nBCCWGxkBOrue2foa1QoJ30LXN7tMFc5U3wuIoXZmSIyBApsgPkgk5 perl(AutoLoader.pm) = 5.710 perl(B.pm) = 1.290 perl(Benchmark.pm) = 1.120 perl(Carp.pm) = 1.200 perl(Carp/Heavy.pm) perl(Class/Struct.pm) = 0.630 perl(Config
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, Jun 28, 2012 at 8:18 PM, Alexey Tourbin wrote: > There is also a philosophical consideration which somehow accompanies > this practical consideration. There is a short story, I believe by > Borges, where a clever scientist devises a 1-1 map of reality. A 1-1 > map of reality turns out to be a very true picture of reality. What's > wrong with this approach? Well, a 1-1 map of the world turns out to be > an exact copy of the world, which is of no use in terms of being "a > map". Somehow, the reality must be "construed" and "reduced" to a > simpler (and somewhat coarser) description to become a useful model. > This is also why we don't plug ELF sections into RPM headers: we > believe that much simpler (or at least much shorter) dependencies must > be used to represent ELF binaries in terms of their > interconnectedness, and must also omit other less important details. This philosophical argument applies to set-versions in a (not-so-)obvious manner, which I will now clarify. It goes like this: although the ultimate goal is to check that R-set of symbols is a subset of P-set of symbols, you do not necessarily have to store the full names of the symbols in order to perform a somewhat stripped-down check itself. When it only matters if R is subset of P, the names themselves become largely irrelevant, provided that you can devise a very clever substitution/encoding scheme. You can make "much simpler (or at least shorter) dependencies" by getting rid of the names in a manner which does not destroy the check. The downside is, of course, that when a dependency R subset P is broken, it is not easy to find out which P symbols were deleted or renamed (or which R symbols are missing). But this is largely a developer's, or should I say a hacker's, problem. On the other hand, from the user point of view, and also from rpm(1) perspective, this approach simply promotes synchronous or rather "transactional" upgrades. It says like, guys, I will not apply half-baked updates before you fix it all - so that apps and libraries match. Which totally makes sense! __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Sat, Jun 23, 2012 at 7:10 AM, Jeffrey Johnson wrote: >> Why is it any wrong to minimize bandwidth, or, in other words, why it >> is bad to spend less money? Your answer is like, because the meaning >> of life is not to spend less money, which is a wrong perspective. >> Okay, but what's a better perspective? Spending more money for less >> good is not at all a better perspective. > > Nothing I said implies that bandwidth reduction is "wrong". > > If you want to minimize bandwidth downloading packages compress > the metadata and deal with the "legacy compatible" issues however > you want/need. > > My rule of thumb is that metadata is ~15% of the size of a > typical *.rpm package: assuming 50% compression one might save > 7.5% of the package download cost (0.50 * 0.15 = 7.5%). An interesting practical matter behind set-versions is that we can simply represent the exact set of symbols, which can be encoded and pictured like this: Provides: libfoo.so.0 = set:sym1,sym2,sym3,sym4 Requires: libfoo.so.0 = set:sym1,sym3 (where symN stands for direct ELF symbol name - e.g. strcmp). You simply name the symbols which you require or provide! In fact, this is exactly how early alpha set-versions were implemented - before I had some time to ponder over approximate set/subset encoding problem (it is then how sym1,sym2... sets where converted into numbers, per symbol, and compressed). The point here is that the price of the exact set representation may, or may not, be prohibitive. If the price is not prohibitive, it's a no-brainer: you don't have to involve into approximate subset business at all. Sometimes, you simply should not use bloom filters, despite the fact that they might seem appealing. However, if the price is prohibitive, which it was, the reason for going into approximate subset business is also a no-brainer: you should cut down heavily and optimize for size first. If you simply introduced probabilities without making things less prohibitive, did you do anything useful at all? You only spoiled things a bit! There is also a philosophical consideration which somehow accompanies this practical consideration. There is a short story, I believe by Borges, where a clever scientist devises a 1-1 map of reality. A 1-1 map of reality turns out to be a very true picture of reality. What's wrong with this approach? Well, a 1-1 map of the world turns out to be an exact copy of the world, which is of no use in terms of being "a map". Somehow, the reality must be "construed" and "reduced" to a simpler (and somewhat coarser) description to become a useful model. This is also why we don't plug ELF sections into RPM headers: we believe that much simpler (or at least much shorter) dependencies must be used to represent ELF binaries in terms of their interconnectedness, and must also omit other less important details. Back to the story of set-versions, with the "original" implementation (which introduced full Golomb coding), it was estimated that the size of architecture-dependent pkglist.classic.bz2 metadata is going to go up from about 3M to about 12M, four-fold! This still was almost prohibitive to soar up like this. It was considered non-prohibitive only by the virtue of information-theoretical considerations: since we are going to encode that many symbols, we must not fool ourselves into thinking that we could somehow pay a smaller price - that is, without violating fundamental laws. By the way, what's the information-theoretical minimum? Say, we want to encode 1024 20-bit hash values (which yields the false positive rate at about 0.1%). Well, the first mathematical intuition is that we need to cut 20-bit range into 1024 smaller stripes, which gives 10 bits per stripe, on average. It is a little bit more complicated than that, though, exactly because of this "on average" business: we must also take some bits to encode stripe boundaries. But this is only a mathematical intuition. The exact formula, in R, is: > lchoose(2**20,2**10)/log(2)/2**10 [1] 11.43581 (so, on the other hand, and somewhat unexpectedly, it is that old good "n choose k" business.) Current implementation lines up at about 11.6 bits per symbol. This is why sometimes I say that set-versions currently take about 2 alnum character per Provides symbol - this is because each alnum character can stuff about log2(62)=5.9+ bits. But compare this to "early alpha" set-versions which represented exact sets in form of "set:sym1,sym2,sym3,sym4", that is, in terms of enumerating symbols. With exact sets, can you think of going to anywhere near 2 characters per symbol, especially that you also need a separator? This leaves you 1 character per symbol! :-) There's a saying that Perl makes easy things easy and hard things possible. Set-versions were designed to make hard things possible. __ RPM Package Managerhttp://rpm5.org Developer Communication List
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Sat, Jun 23, 2012 at 1:55 AM, Jeffrey Johnson wrote: > There are lots of usage cases for efficient sub-set computations > in package management, not just as a de facto API/ABI check > using ELF symbols. Most of the other usage cases for efficient > sub-set computations are not subject to an ultimately optimal > string encoding. The possibility of optimal string encoding should not be underestimated. If you can't "write it down with a pencil", which by the way refers to Alan Turing's style of reasoning, it becomes very problematic anyway. Your intuition is probably that raw bytes are cheap because you can bitwise-AND them in terms of direct CPU instructions, and any "encoding" is expensive because you have to crunch bits a lot. This is not very true in practice, though. But you should arm yourself with valgrind(1) and spend a few days with it before you understand how you waste you CPU powers (and bandwidth, for that matter). Set-string encoding has reasonable, and affordable, cost. It can be all put together and presented in a rather less pessimistic manner. (For example, there is a cache_decode_set routine that boosts things by a factor of about o 5. This is simply due to the fact that you do not always have to decode the same Provides version all over again. This is part of that "fancy-schmancy" stuff which you must ignore on the first reading.) So I suggest that set-strings must cover all usages of set-subset computations, where static data structures are appropriate (so that you compute it, write into RPM header, and do not expect to modify it). it is still "efficient", despite the fact that we try hard to achieve a better string representation. The fact that it can be packaged nicely does not make it any worse in other respects, except that obviously we can stuff somewhat less bits into alnum characters. But I also wanted it to look "nice" (this is why base62 was devised over much simpler base64 encoding). But so, it looks nice, and it is efficient. Why are you unhappy? Do you necessarily require that you have to do bitmask tests yourself? __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Sat, Jun 23, 2012 at 1:55 AM, Jeffrey Johnson wrote: > I would state that compression (of any sort) to minimize > bandwidth is entirely the wrong problem to solve. So what kind of a problem are we trying to solve? Why are you making all these small puns? Are you ready to go out and pronounce that "I am the Jefferey Johnson is entirely confident that whichever problem we face we must address, but to minimize bandwidth is entirely wrong and shall be addressed never!" :-) Why is it any wrong to minimize bandwidth, or, in other words, why it is bad to spend less money? Your answer is like, because the meaning of life is not to spend less money, which is a wrong perspective. Okay, but what's a better perspective? Spending more money for less good is not at all a better perspective. Okay, but what do we actually try to do? Um, we try to bring some binary compatibility by using the limited amount of information with some contrived data structures which test set-subset relation. If space were not a problem, we could simply plug ELF sections into RPM headers, couldn't we? Why not? And why is there a distinction between the header and the payload? :-) > My contrarian POV should not be taken as opposed to compression > or elegance or anything else. Just that minimal size in the > representation of bit fields (or bit positions as numbers) > overly limits the applicability of set:versions. > > There are lots of usage cases for efficient sub-set computations > in package management, not just as a de facto API/ABI check > using ELF symbols. Most of the other usage cases for efficient > sub-set computations are not subject to an ultimately optimal > string encoding. My POV is that I ask "how the best I can do what I need to do, is it doable, what is the price, can it be reduced, etc". Of course, these "best to do" and "need to do" terms are not exactly mathematical, and I already hear some laughs in the audience. Nevertheless, the only thing which you can oppose to this is simply a better implementation. Anything else does not count. Why? That's because! Go tell people they should spend more money. ;-) > E.g. ripping off the base62 encoding and distributing binary > assertion fields saves at least as much bandwidth as your > guesstimate that a Golomb-Rice code is ~20% more efficient > than a Bloom filter. I see no point in criticizing base62 encoding being suboptimal as compared to raw bytes, because it does just that: squeezes bits into alnum characters. It is pretty clear and pointless that raw bytes stuff more bits. The goal was just that - to "dump" bits into "readable" and "usable" form. > Just in case: yes acidic sarcasm was fully intended. There is no point in your sarcasm, except that probably that you are very clever (which I totally agree). > You already made a valid point asking what probability means wrto > false positives. In fact your 2**10/2**20 is a very different > estimate of false positive probability than the approx. currently > in use in rpmbf.c (which is based on a statistical model for > Bloom filter parameters). The 2**10/2**20 is the best estimate and you probably cannot further improve it. The whole business of set-versions, which I have been pondering recently, boils down to the questions, can you improve the ratio just a little bit? Or can you pay just a little bit less for a little bit more? Is there a better data structure? The answer is basically turns out to be "NO". The "set of numbers" is good to go, and the Galois fields have some discrepancies which you do not want to know, and the benefit is very marginal. > Below is what is in use by RPM which chooses {m, k} given > an estimated population n and a desired probability of false > positives e (I forget where I found the actual computation, > can likely find it again if/when necessary). > > What I would like to be able to do with set:versions > in RPM is to be able to use either of these 2 "container" > representations interchangeably. > > I'd also like to be able to pre-compute either form > in package tags for whatever purpose without the > dreadful tyranny of being "legacy compatible" with > older versions of RPM. Adding a new tag that isn't > used by older RPM is exactly as compatible as following > existing string representations for NEVR dependencies > in existing tags, arguably more compatible because > older versions of RPM are incapable of interpreting > unknown tags. > > And again: the best solution for the download bandwidth > problem is > Update the metadata less often. > if you think a bit. The best solution for "don't be stupid" is just don't be stupid. Of course, downloading metadata is not the only problem, and not exactly the one which we exactly try to solve. It all combines into a complex system. The question is then, to me, can we do any better? How much does this whole business cost? if it's still expensive, which it is, how can we cut down on the costs? These kind of questions I am willing to
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Fri, Jun 22, 2012 at 5:30 AM, Jeffrey Johnson wrote: > Sure numbers "make sense". > > But God invented 0 and 1 and who needs steenkin carries to do > arithmetic in Galois fields?!? Jeffery, I understand that you are ironic and sarcastic, but I can't see the reason why, as per our discussion. If you ask honest questions, like "what this means", I try to do my best to answer the question, possibly involving considerations like "numbers comes from God", which are questionable but not irrelevant. They may help to understand, or may not. >> P: 01010010010101 >> R: 010001 > > Yes … but this is "premature optimization" … There is no premature optimization here, and it is fair to ask what we may possibly want to try to optimize. My answer is: first, it must take less space (given the probability) because we have to download the repo metadata every time we run the "update"; second, set-versions must compare quickly (we must be able to compare them all within a second). What's your suggestion? Do you want them to take more space and compare slowly, so as to avoid premature optimizations? :-) Or do you dislike them exactly because they are not Bloom filters? Go on then, make shorter strings, given the same probability, which compare faster! But this is largely impossible, exactly because of these laughable topics like "what is a number" or "what is a probability" which I'm trying to present. > How big the repo is determines how important the > distro is and nothing else. Just look at how impo'tent Debian is … Your considerations are probably true but are largely irrelevant to the discussion. Besides that, I can't understand them. Are you sarcastic? I am sarcastic too, that's not at all a problem. Shall we talk? :-) >> $ print -l sym1 sym2 |/usr/lib/rpm/mkset 10 >> set:dbUnz4 >> $ print -l sym1 sym2 |/usr/lib/rpm/mkset 20 >> set:nl2ZEALdS >> >> In the first run, the probability is modest ("print -l" is zsh syntax >> to print an arg per line). In the second run, the probability is much >> higher (and it takes more space). There is also a function in scripts >> to sustain probabilities at about o 0.1%. > > Hmmm … I've been defaulting Bloom filters ultra conservatively > at 10**-4 and mobing to 10**-6 at the first hint of a possible problem. The number does not matter here, the only consideration was that 2^{-10} was kind of cute number, and it works well in practice - that is, we catch all or most of the bugs where a library symbol has been deleted. But it could also be 2^{-16} or even 2^{-20}. It is a trade-off, and I see no ground for criticizing it for just that. Or again it boils down to the question how many megabytes you expect to download when you run the "update". __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, Jun 21, 2012 at 7:51 PM, Jeffrey Johnson wrote: > On Jun 18, 2012, at 2:32 PM, Jeffrey Johnson wrote: >> >> The "contained in" or subset semantic that applies to the operations "<" and >> ">=" >> is rather easy to do as well. E.g. if (assuming on;y existence, not >> versioned inequality ranges) >> P == Bloom filter of Provides: tokens >> R == Bloom filter of a (possible) subset of tokens >> then the subset computation is nothing more than >> (P & R) == R That's right. Set-versions do the same thing, only reframed in terms of set of numbers (as opposed to explicit bit masks). The rpmsetcmp routine basically does just that: (P & R) == R check, written in a slightly different manner, with a bunch of fancy stuff. :-) > I should point out the implicit (and tricky) assumption regarding > using Bloom filters to easily compute set and union intersection: > Both P and R MUST have exactly the same parameterization. > Choosing an a priori parametrization is hard because it MUST > deal with the worst possible case of the largest estimated population > which increases the sparseness of all other Bloom filters. There are a few kinds of "parametrization" which we must ponder. First, you should use the same hash function for R and P sets, so that it makes the same number per symbol (or sets the same bits). With different hash functions, there is no chance to compare sets meaningfully. Second, you should also ponder what a "number" actually is (or how high a bit you may want to be able to set). The thing is, the numbers are not unlimited (or otherwise the whole world can be represented with just a big number). It helps to think of a number as a tiny bit of information within a limited range. The range is another parameter. > This also applies to "tuning" to reduce the probability of false positives. > > There are no obvious ways to rescale a Bloom filter meaningfully either. There IS a (not-so-)obvious way to rescale Bloom filters, and numbers, for that matter. Big surprise, big surprise! It boils down to the question what a number is. If your number is just a few bits in the range modulo power of 2, you can "rescale" the number down to a smaller range by simply stripping its higher bits. The equivalent operation can be devised to "downsample" a bloom filter into a lesser precision, provided that some (not-so-)obvious conditions are met. Basically you need to split the filter into two halves and bitwise-OR them. > Do set: versions have the same difficulty (even if the parameters are hidden > as implementation/design constraints somehow)? There is some difficulty that we always need to use the same hash function, which is hardwired, and cannot be easily changed (it is Jenkins on-at-a-time hash with a fancy initial constant). There is no difficulty to compare set-versions per se, though. Because a set-versions is just a set of numbers within a limited range modulo power of two. If ranges are different, you first need to "downsample" the higher-ranged set by stripping its higher bits and sorting the numbers again, but then you proceed normally. Note that the range, or "bpp", has to be encoded explicitly, and is part of a set-version. Again, it helps to think in terms what a number is. A number is a thing within the range, modulo power of two. If you don't know the range, you don't know what the number is. > Or does the Golomb-Rice encoding "scale" more naturally than Bloom filter > parameters? > If the set:versions implementation "scales" better than Bloom filters (which > seems to be the case), > then there are lots of usage cases (like packages in a repository, or files > in a package) where > an efficient means to compute set membership is quite useful. > > (aside) > Apologies for "scales" imprecision. I'm merely trying to ask is > To what extent does set:versions depend on a priori assumptions? > > BTW, what is the current false positive failure probability for set:versions? What is a probability? (That's the second stunning questions after what is a number.) If numbers were unlimited, they could have represented symbols exactly. The only reason the numbers "clash" is because they "sit" within a limited range. What you have to do to control probabilities is to select the appropriate range, which is a trade-off between how many clashes are possible and how much bits per number you are allowed to take. The basic idea is that, to encode a Provides version, you first need to know how many symbols you are going to encode, say 1024. You then ask yourself, how many bits per symbol should I take. If you take e.g. only 10 bits, that's plain stupid, because 10-bit numbers can address only 1024 values (so you end up with most of the bits set, or most of the numbers taken, that is to say). If you take 20 bits per symbol, though, things get more interesting. 20-bit numbers can address 2^{20} different symbols. Since the probability is due to range limitations, you get 2^{10}/2^{20
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, Jun 21, 2012 at 7:28 PM, Jeffrey Johnson wrote: >> More precisely, a set-version can be (in principle) converted to a >> Bloom filter which uses only one hash function. The idea is that such >> a filter will set bits in a highly sparse set of bits, one by one. >> Instead, a set-version remembers the bits simply by their indices. >> Setting the bit becomes taking the number, and there is a >> straightforward correspondence. It also turns out that a set of >> numbers can be easier to deal with, as opposed to a sparse set of >> bits. > > You've made an unsupported claim here re "numbers can be easier". > Presumably you are talking about means of encoding, where clever > mathematics derived from numerical transforms can be used to > remove redundancy. With a pile of bits in a Bloom filter all one > has to work with is a pile of bits. Numbers are easier to deal with because you can use 'unsigned v[]' array to represent them, which can be seen throughout the code, accompanied by 'int c' (which is argc/argv[] style). Bits are somewhat more complicated: you need to use something like macros, and you cannot use pointer arithmetic to implement traversal, etc. Also, to me, a set of numbers just "makes sense", and is good to go. If you wonder how much complicated things can get, there's been some recent papers out there which use matrix solving for approximate set representation. They are basically unreadable (given the undergraduate skills), and you can be completely lost because there is apparently no connection between the Galois fields and what you actually try to do. After you try to read these papers, the "set of numbers" becomes a wonderful salvation which comes directly from God and the Holy Spirit and brings peace to your soul. This wonderful salvation, which comes from God, turns out to be a good thing. You should use it, when you have some. :-) > The counter argument is this: testing bits (in a Bloom filter) is > rather KISS. Other compression/redundancy removal schemes > end up trading storage for implementation complexity cost. > In a running installer like RPM, there will always be a need > for memory dealing with payload contents. Since large amounts > of memory are eventually going to be needed/used, savings > by using Golomb-Reid codes are mostly irrelevant for the > 100K -> 1M set sizes used by RPM no matter whether bits or > numbers are used to represent. I do not agree that testing bits is more KISS than a set of numbers. It you think that testing bits is the only and natural thing to do, I disagree. The reason is: Requires versions are much more sparse than Provides versions. If the Provides version is optimal, in terms of bit usage (50% set), the Requires version must be necessarily suboptimal (too few bits set). It can be pictured like this: P: 01010010010101 R: 010001 The conclusion is, if you want to stick to bitmasks and reduce subset comparison to bitmask tests, there will be a great deal of inefficiency because of sparse R-sets. On the other hand, sets of numbers can be pictured like this: P: 1,3,7,13,15 R: 1,13 Note that R-set does not take extra space, and there is a simple merge-like algorithm which advances either R or P (or both - which is how rpmsetcmp routine is implemented). > But don't take my argument as being opposed to set:versions whatsoever. > Just (for RPM and even for *.rpm) that compression size isn't as important > to me as it is to you as the "primary goal" of set: versions. The size of *.rpm is very important when you want to update the information about a repo, like in "apt-get update" - you have to download it all. The question is then how big the repo is, and how many set-versions you can afford to download. :-) The "primary goal" was not to make things much worse. In other words, the price must not be prohibitive even for the repo of 10k or 15k packages strong. (Please also realize that the Internet was not a given until very recent times - some shoddy ISPs in Kamchatka still want to charge something like $0.1 per 1Mb.) > AFAICT set:versions also has a finite probability of failure due to > (relatively rare) > hash collisions converting strings to numbers. > > Is there any hope that set:versions (or the Golomb-Rice code) can supply a > parameterization > to "tune" the probability of false positives? > > Or am I mis-understanding what is implemented? You can tune the probability by selecting the appropriate "bpp" paramter, and passing it to "mkset" (which is how scripts work). The "bpp" indicates how much bits must be used per hash value (the range is 10..32). For example, if you want to encode Provides versions which has 1024 symbols, and the error rate has to be fixed at about 0.1%, you need to use bpp=20 - that is, after each symbol is hashed, its 20 lower bits will be taken into account and encoded. The probability then is simply the number of symbols over the capacity of the universe, which is 2^{10}/2^{20}=0.1%. On the other hand,
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, Jun 21, 2012 at 12:15 AM, Alexey Tourbin wrote: > On Mon, Jun 18, 2012 at 10:32 PM, Jeffrey Johnson wrote: >> Good: the above confirmation of the characteristics allows a set:versions >> implementation to proceed. > > Hello, there's been some speculation about Bloom filters below, which > I cannot address right now, offhand. Nevertheless, I can say that, in > some highly mathematical sense, set-versions are exactly equivalent to > Bloom filters. They do just the same thing, if you will. The only > difference is that set-versions are more compact: they take somewhat > less space, which was, if you remember, the number one goal of the > original implementation. More precisely, a set-version can be (in principle) converted to a Bloom filter which uses only one hash function. The idea is that such a filter will set bits in a highly sparse set of bits, one by one. Instead, a set-version remembers the bits simply by their indices. Setting the bit becomes taking the number, and there is a straightforward correspondence. It also turns out that a set of numbers can be easier to deal with, as opposed to a sparse set of bits. If you want to know more why things have to work like this, and e.g. where constants pop up, there is a good starting point at "Cache-, Hash-, and Space-efficient Bloom filters" paper by Felix Putze, Peter Sanders, and Johannes Singler. Actually this paper helped me a lot to put things together and to produce the original implementation. Reading this paper requires some working mathematical knowledge, though. This requirement must not be underestimated, but also should not be overestimated. The paper is very readable: it tells you what you may want to do and what you have to do. http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Mon, Jun 18, 2012 at 10:32 PM, Jeffrey Johnson wrote: > Good: the above confirmation of the characteristics allows a set:versions > implementation to proceed. Hello, there's been some speculation about Bloom filters below, which I cannot address right now, offhand. Nevertheless, I can say that, in some highly mathematical sense, set-versions are exactly equivalent to Bloom filters. They do just the same thing, if you will. The only difference is that set-versions are more compact: they take somewhat less space, which was, if you remember, the number one goal of the original implementation. Very informal, if you want to encode 1K symbols out of 1M symbols using a bloom filter, you need at least 1024 * 1.44 * 10 bits of information (which is a factor of 1.44 per symbol) while with a set-version you need at least 1024 * (10 + 1.44) bits of information (which is an additive constant of 1.44 per symbol). It is basically fair to say that set-versions are 20% shorter than the equivalent bloom filters, which is not unimportant. By the way, the information-theoretical minimum is 1024 * 10 bits of information - you cannot go beyond that without violating fundamental laws. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Fri, Jun 1, 2012 at 6:07 AM, Jeffrey Johnson wrote: > I asked 2 very specific questions … the rest is quite important also, > but I need to understand precisely what properties set:versions have in order > to implement correctly (and I don't fully understand your reply). > > Specifically: > > 1) Is the set:versions VERSION independent of the order of the > calls to rpmsetAdd()? (you know the routine as set_add()) Completely independent - you can add symbols in any order. The symbols are then hashed and sorted by their numeric values. The underlying idea is that a set-version is just a (sorted) set of numbers. You can add whatever symbol to it, possibly twice, the symbol will be hashed to a number, in a unique manner, and finally you can get the string representation of the set of numbers. This involves much fuss under the hood, but basically, you should think of the set of symbols, which is just the set of numbers, after each symbol has been hashed individually. > 2) Can the set:versions encoding be compared for more than equality? > What set/arithmetic property is the basis for the comparison? What > circumstances/constraints are there related to > … You cannot always compare > set-versions in terms of "greater or equal" (but when you can, > it's > important). Set-versions compare as sets. There are Euler diagrams to visaulise set comparsion, which is an undergraduate matetrial. The idea is that, real numbers are linear order: you can always tell either V1=V2. Sets are quite another matter: you cannot always apply for "tertium non datur" (either lt or ge). Which is to say that sets can be quite different and do not compare easily. The order can be imposed on the sets, though, by requiring the "greater" sets to have at least the same elements they compare against (perhaps I'm starting to retell the undergraduate material, which is not going to last). To sum up, there IS a mathematical basis behind "Requires: foo >= set:asdf" dependencies. > I can of course answer my own questions with try-and-see test cases. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, May 31, 2012 at 9:03 AM, Jeffrey Johnson wrote: >>> The mixed code case is interesting: what happens >>> if a set:version encoding contains the literal string "0:V-R" >> >> I can't understand you question. A version is either a set-version, or >> not a set-version. If a version is a set-version, it has to be >> prefixed by "set:". Regular RPM versions cannot be prefixed by "set:", >> because "set" cannot be decoded as a valid serial number, which has to >> be an integer. There might be some implementation sloppiness out >> there, but in principle, I believe the encoding scheme is sane, and >> makes sense. >> >> Now the question is, what if a set-version cannot be decoded? But that >> can be perplexed by a question, what if a regular rpm version cannot >> be decoded? Or can you decode any junk as a valid RPM version? > > I'm trying to understand rpmsetcmp() as a "black box" independent > of all the gory implementation details of ELF symbols, base62 encoding, > and RPM dependencies. > > I believe that set:versions are much like Bloom filters: > 1) strings can be added to a "set" in any order > 2) the comparison operation implied by > Requires: foo >= set:…. > is identical to "contained in" or "is subset of" > Is that the case? To me, a set-version is just a VERSION. I can't stress that enough. When you need a library, it is a legitimate question to ask which version you need. If you answer that you need at least version 1.0, a conventional version, you must be kidding. Because there is no connection between what you actually need and a god-damn number. So a plausible answer must be "Well, we need at least a version which provides symbols which we need to use". Let's take this approach to the extreme, and say that the VERSION which we need is simply the one which provides at least symbols. To me, this is much better an approach to library versioning, and possibly the only viable approach. How else can you express your expectations about a library? Suppose someone is talking to you at a conference, and says "Mr. Johnson, we are very proud, blah-blah-blah, because blah-blah-blah". What you want to tell them is basically "Go out, folks, it works". Now, with set-versions, things really work. :-) Back to implementation, 1) set-strings should be considered opaque, static, and unmodifiable. Once they are formed, there is no useful way to alter them. They express an idea of a VERSION in a manner which cannot be easily comprehended, but that's not a problem (or otherwise there is perplexing questions of what regular rpm versions must mean, and whether any piece of junk can be decoded as a regular rpm version). 2) They must compare as sets, in terms of elements unique to the first set, common elements, and elements unique to the second set. The implementation does not quite much yet, because it already tries to mimic regular versions. Regular versions are linear order. Set-versions are partial order. You cannot always compare set-versions in terms of "greater or equal" (but when you can, it's important). __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, May 31, 2012 at 6:51 AM, Jeffrey Johnson wrote: > We are in violent agreement here over a minor issue > of implementation/representation. By the way, actual problems that will arise are rarely what you expect them to be. In 2010, I was naive and I thought that "char bitv[]" was a pretty good representation of bit sequence (which can be still seen in set.c). It then took many days to devise a sophisticated decoding routine which avoids bitv[] altogether and makes things smooth. So, in a violent agreement, don't take things for granted. :-) > The mixed code case is interesting: what happens > if a set:version encoding contains the literal string "0:V-R" I can't understand you question. A version is either a set-version, or not a set-version. If a version is a set-version, it has to be prefixed by "set:". Regular RPM versions cannot be prefixed by "set:", because "set" cannot be decoded as a valid serial number, which has to be an integer. There might be some implementation sloppiness out there, but in principle, I believe the encoding scheme is sane, and makes sense. Now the question is, what if a set-version cannot be decoded? But that can be perplexed by a question, what if a regular rpm version cannot be decoded? Or can you decode any junk as a valid RPM version? > and a match is attempted against a traditional dependency like > Provides: foo = 0:V-R > If the literal string in the Provides: is encoded on the > fly, will setcmp(…) match or not? __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Thu, May 31, 2012 at 6:21 AM, Jeffrey Johnson wrote: >> On Mon, Apr 23, 2012 at 6:32 PM, Jeff Johnson wrote: >>> I should point out that writing the attached >>> message (and sending from the wrong e-mail address) has instantly >>> led to a different -- and perhaps more natural -- syntax like >>> >>> Requires: set(libfoo.so.1) >= whatever >> >> Hello, >> Set-versions are just that - versions. One must arguably think of them >> in terms of VERSIONS. If you need a library, it is a legitimate >> question to ask which version you need. If you think you need the >> version at least 1.0, there's a good question: why the heck you think >> you need the version at least 1.0 (and whether 2.0 would still fit). >> With set-versions, things get straightforward: you need at least a >> version which provides API symbols - that's much >> better a description of a library than a god-damn arbitrary number. > > Yes VERSIONS. > The issue for RPM is how to represent/attach a different VERSION comparison. If those are only VERSIONS, they must apply to the same NAME. Look, there are two-fold way of dependency resolution in rpm. Set-versions were designed to fit into the scheme. First, you look up the name in rpmdb/Providename, and fetch the headers. Second, you decide whether the versions match. The real question is what to do if we are forced to match a set-version against a non-set/another kind of version. Well, the answer is that you should return as if the NAMEs were different. And unless you're in a deeply theoretical mood, I believe the approach is perfectly valid. If two kinds of versions cannot be matched, pretend they apply to different names. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: EVR issues: set:versions, epoch-as-string, now twiddle-in-version
On Mon, Apr 23, 2012 at 6:32 PM, Jeff Johnson wrote: > I should point out that writing the attached > message (and sending from the wrong e-mail address) has instantly > led to a different -- and perhaps more natural -- syntax like > > Requires: set(libfoo.so.1) >= whatever Hello, Set-versions are just that - versions. One must arguably think of them in terms of VERSIONS. If you need a library, it is a legitimate question to ask which version you need. If you think you need the version at least 1.0, there's a good question: why the heck you think you need the version at least 1.0 (and whether 2.0 would still fit). With set-versions, things get straightforward: you need at least a version which provides API symbols - that's much better a description of a library than a god-damn arbitrary number. There are some philosophical implications of introducing set-versions, in particular, whether it can be extended to describe prototypes and calling conventions (e.g. the number of arguments which must be passed to a function). This is why I might seem reluctant to participate in discussions. I'm thinking! (And perhaps I'm arrogant.) > for set:versions, and for the generalization (for writing strict regression > tests, > its mostly useless in packaging because there is no mapping that specifies > how the mixed DEB <-> RPM version comparison might be done "naturally") > > Requires: deb(foo) >= E:V-R > Requires: rpm(foo) >= E:V-R > > The precedent for foo(bar) name spacing in RPM dependencies with > the above syntax is already widely deployed although entirely > de facto. > > Sure would be nice _NOT_ to have to consider "Have it your own way!" > competing syntaxes like > Requires: libfoo.so.1 >= set:whatever > and > Requires: set(libfoo.so.1) >= whatever __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: refined implementation of set-versions
On Fri, Apr 20, 2012 at 5:16 PM, Jeffrey Johnson wrote: > The methods in the existing encoding/decoding are in rpmio/set.c @rpm5.org: > the algorithm > is unchanged from Alt. > > A change to the existing scheme over the next few months doesn't bother > me at all. But "legacy compatibility" has instantly appeared as an issue > for an @rpm5.org implementation that has been "working" for less than > 24 hours, and where the need is to attempt to install Alt packages > into a chroot, sounds like @rpm5.org is going to be forced to both > "old" and "new" encodings merely to continue trying to do "continuous > integration" > with Alt packages. > > But "legacy compatibility" is an insoluble problem which need not be > discussed. > If there's a better encoding scheme available soon, then switching is > better done earlier than later. The amount of compatibility with the existing Alt format is negotiable. If you think that rpm5 must be able to install Alt packages into chroot, then there is little choice but to 1) design the new format with a clear distinction, so that older set-strings don't get confused with the newer ones; and 2) to provide an additional decoding routine. However, no additional support is required in e.g. the comparison routine, since the decoding routine simply restores the array of hash values. So this doable. There another option, however. For the reason which shall remain nameless, I find it tempting to produce the new and incompatible format without any clear signs of distinction. :-) > ATM, rpm-5.4.9 does only doing decoding (and comparison) of set:versions. > The need was to be able to install Alt sisyphus packages (with set:versions > dependencies) > into a chroot. Generating set:versions (Alt uses a helper script, "multilib" > packaging needs > to use the gelf* API) will be harder, particularly if interoperability is > desired. Using gelf* API probably won't do, since it is best to use ldd(1)-based tool which will basically invoke ld.so(8) to resolve symbols and dump associations between the symbols and the libraries. It can't be easily done with *gelf ABI, unless you actually try to reimplement a substantial portion of the dynamic linker, which is a bad idea anyway. So using the script is the only realistic options. If the script cannot be used at all (i.e. due to issues with multilib attributes), this probably indicates a problem within rpmbuild itself. > But set:versions looks quite useful, and far more effective at reducing the > number > of dependencies than attempting a "pin-hole" optimizations with boolean > expressions, discarding inequalities which are implied by other dependencies, > as Per Oyvind has been attempting in Mandriva. Where can I find more information about this work? __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: refined implementation of set-versions
On Fri, Apr 20, 2012 at 4:12 PM, R P Herrold wrote: > On Fri, 20 Apr 2012, Alexey Tourbin wrote: > >> I have just learnt that rpm5 project has borrowed set-string >> implementation recently from ALT Linux. At the very same time, I was >> working on on a new and improved encoding scheme which can make > > ... This is exciting news -- This seems like it would be a useful library. > What package in ALT are you doing this work in, or alternatiely, is there a > version control repository that I could check out to read your ongoing work? The goal is not only to improve the implementation, but also to refine basic concepts and designs. Actually, set.c is already usable as a library on its own. Why, it provides an API for creating set-versions, and it also implements rpmsetcmp() comparison routine. That's just enough to get the job done, and everything else then is the implementation details which it hides in a particularly perfect manner. On the other hand, the details are concealed perhaps a bit too much - up to the point where it is not clear what set-versions are and how exactly they are supposed to work. It is desirable then to expose a lower-level API which clarifies the concept of set-versions while still hiding less important implementation details. So what's a set-version? Intuitively, a set-version represents a set of symbols. This actually suggests a new "de facto" approach to library versioning: the required version of a library, in addition to its soname, is simply the set of library symbols (i.e. functions and global variables) which we need to use. Likewise, the version provided by a library is simply the set of all symbols exported by the library. The key point is that Requires versions and Provides versions can be produced in a relatively independent manner, and meaningfully compared at later stages; that is, it is possible to check if R \subset P. The check is probabilistic, which indicates a possibility of error. The error rate is reasonably small, though, and can be further controlled by a parameter. What's more important is that only "false positive" kind of error is possible - that is, in the worst case, the check simply does not work, but at least we lose nothing. Another kind of error, a "false alarm", is not possible. In this respect, set-versions are similar to Bloom filters. Set-versions have other nice properties which you might suspect, and the one which is not so nice: the length of n-element set-version is O(n). This is a fundamental limit which cannot be overcome. However, a practical and feasible implementation is still possible. This outlines two implementation priorities: 1) set-versions should be as short in size as possible - actually their size should be close to the information-theoretical minimum; 2) however, this must not tamper with the possibility of fast decoding and comparison. With current implementation, when the error rate is fixed at about 0.1%, Provides version take about 2 alphanumeric characters per symbol, and Requires versions, since they are much more sparse, can take up to 3 alphanumeric characters per symbol. For a repo of about 10,000 packages, Requires and Provides set-versions can take only about 10M total (but this assumes that some dependency optimizations are performed and also that superfluous plugin-like Provides are excluded). The check of all set-versioned dependencies, such as performed by "apt-cache unmet", can be finished within a second. To sum up, this is a compromise; but it is a favorable compromise. So what a set-version really is? What if we say that a set-version is just a set of numbers, such as hash values obtained after hashing each symbol individually? Simple as it is, this approach can be used to express everything else. More precisely, we need a scheme to encode n m-bit numbers. For some reason which will become apparent later, we need to supply m, which is actual bits per hash value, aka bpp, explicitly. So I think we can define a lower-level "set-string" encoding API as follows: /** \ingroup setstring * Estimate the size of a string buffer for encoding. * @param v the values, sorted and unique * @param n number of values * @param bpp actual bits per value, 8..32 * @return buffer size for encoding, < 0 on error */ int setstringEncodeSize(const unsigned *v, int n, int bpp); /** \ingroup setstring * Encode a set of numeric values into alnum string. * @param v the values, sorted and unique * @param n number of values * @param bpp actual bits per value, 8..32 * @param s alnum output, null-terminated on success * @return alnum string length, < 0 on error */ int setstringEncode(const unsigned *v, int n, int bpp, char *s); The decoding API mirrors the encoding routines: /** \ingroup setstring *
refined implementation of set-versions
Hello, I have just learnt that rpm5 project has borrowed set-string implementation recently from ALT Linux. At the very same time, I was working on on a new and improved encoding scheme which can make set-versions about 1% shorter in size, and which also permits more efficient decoding. There are also other improvements, such as integer overflow checking and revised API. Assuming that set-string are not widely used yet (except for ALT Linux), and compatibility is not an issue, I'm willing to provide a refined implementation. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
ANN: RPM-Payload-0.10 on CPAN
It's better late than never that I released RPM::Paylod perl module on CPAN. The code is almost trivial, and I would doubt if it's worth releasing at all, except that it does a good job. (I am going to release more code that depends on RPM::Payload.) http://search.cpan.org/dist/RPM-Payload/ http://git.altlinux.org/people/at/packages/perl-RPM-Payload.git Also note that there is Archive::Cpio module on CPAN, written by Pixel, which may or may not suit one's needs better. http://search.cpan.org/dist/Archive-Cpio/ RPM::Payload(3) User Contributed Perl Documentation RPM::Payload(3) NAME RPM::Payload - simple in-memory access to RPM cpio archive SYNOPSIS use RPM::Payload; my $cpio = RPM::Payload->new("rpm-3.0.4-0.48.i386.rpm"); while (my $entry = $cpio->next) { print $entry->filename, "\n"; } DESCRIPTION "RPM::Payload" provides in-memory access to RPM cpio archive. Cpio headers and file data can be read in a simple loop. "RPM::Payload" uses "rpm2cpio" program which comes with RPM. EXAMPLE Piece of Bourne shell code: rpmfile() { tmpdir=`mktemp -dt rpmfile.` rpm2cpio "$1" |(cd "$tmpdir" cpio -idmu --quiet --no-absolute-filenames chmod -Rf u+rwX . find -type f -print0 |xargs -r0 file) rm -rf "$tmpdir" } Sample output: $ rpmfile rss2mail2-2.25-alt1.noarch.rpm ./usr/share/man/man1/rss2mail2.1.gz: gzip compressed data, from Unix, max compression ./usr/bin/rss2mail2: perl script text executable ./etc/rss2mail2rc: ASCII text $ Perl implementation: use RPM::Payload; use Fcntl qw(S_ISREG); use File::LibMagic qw(MagicBuffer); sub rpmfile { my $f = shift; my $cpio = RPM::Payload->new($f); while (my $entry = $cpio->next) { next unless S_ISREG($entry->mode); next unless $entry->size > 0; $entry->read(my $buf, 8192) > 0 or die "read error"; print $entry->filename, "\t", MagicBuffer($buf), "\n"; } } CAVEATS "rpm2cpio" program (which comes with RPM) must be installed. It will die on error, so you may need an enclosing eval block. How‐ ever, they say "when you must fail, fail noisily and as soon as possi‐ ble". Entries obtained with "$cpio->next" are coupled with current position in $cpio stream. Thus, "$entry->read" and "$entry->readlink" methods may only be invoked before the next "$cpio->next" call. Hradlinks must be handled manually. Alternatively, you may want to skip entries with "$entry->size == 0" altogether. AUTHOR Written by Alexey Tourbin . COPYING Copyright (c) 2006, 2009 Alexey Tourbin, ALT Linux Team. This is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SEE ALSO rpm2cpio(8). Edward C. Bailey. Maximum RPM. <http://www.rpm.org/max-rpm/index.html> (RPM File Format). Eric S. Raymond. The Art of Unix Programming. <http://www.faqs.org/docs/artu/index.html> (Rule of Repair). perl v5.8.9 2009-04-03 RPM::Payload(3) pgp3m8UQMqDGB.pgp Description: PGP signature
Re: Remapping lib/rpmal.c to use a backing store.
On Mon, Nov 10, 2008 at 12:38:32PM -0500, Jeff Johnson wrote: > Here's details of the hystory of rpmal.c and what I think needs to > be done instead. > > RPM started life as the engine for the Red Hat installer. The > malloc() in libc.so.5 was buggy, so the entire index was designed > as a huge array w/o ptrs, to simplify debugging and avoid memory > fragmentation. Frankly I don't quite understand what the "al" thing is, except that "al" is used to reorder pacakges in rpmtsOder(). To me, al is just a list of headers with TR_ADDED/TR_REMOVED flags attached to them. pgpx0i0U6I2Y3.pgp Description: PGP signature
Re: RPM: rpm/lib/ rpmal.c
On Mon, Nov 10, 2008 at 12:01:36PM -0500, Jeff Johnson wrote: > BTW, where & how are you seeing a flaw? What are the symptoms > or usage case? This code has survived on 64bit platforms so I stumbled upon various bugs in rpmtsOrder: certain ordering relations ultimately were *NOT* added (T3) to tsi structures. --- lib/depends.c- 2008-11-09 13:46:19 + +++ lib/depends.c 2008-11-10 17:16:00 + @@ -2142,12 +2143,14 @@ static inline int addRelation(rpmts ts, pkgKey = (alKey)(((long)pkgKey) + ts->numAddedPackages); for (qi = rpmtsiInit(ts), i = 0; (q = rpmtsiNext(qi, 0)) != NULL; i++) { + // pkgKey had garbage in its high bits if (pkgKey == rpmteAddedKey(q)) break; } qi = rpmtsiFree(qi); -if (q == NULL || i >= ts->orderCount) - return 0; +if (q == NULL || i >= ts->orderCount) { + fprintf(stderr, "RET2 q=%p %s <- %s\n", q, rpmdsN(requires), rpmteNEVRA(p)); + return 0; } /* Avoid certain dependency relations. */ if (ignoreDep(ts, p, q)) End of diff So, with this debugging output, I've seen a lot of early "RET"s. The code "survived" only because it silently returned for "can't happen" conditions. I'd rather use assert() to catch "can't happen" conditions (assuming that rpm should not be compiled with -DNDEBUG). > I'd like to understand why not reported. Presumably you > are using alNum2Key() in some new context where the garbage > bits matter. pgpYdRkSFnskg.pgp Description: PGP signature
Re: RPM: rpm/lib/ rpmal.c
On Sun, Nov 09, 2008 at 05:01:29PM -0500, Jeff Johnson wrote: > Hehe, been there, done that. I feel your pain ... This fixes various issues on 64 bit platforms. I think the fix should be backported to relevant 5_x branches. > Do we agree that lib/rpmal.c code needs to DIE! DIE! DIE!? > > Seriously, its insane to have an __IN MEMORY__ 2 level > dir/file store baseed on __BSEARCH__ for portability in the year 2008. I think that the code is complicated, and the underlying data model is hard to understand. The compilcation is partly due to opaque interfaces. Possible solution is to export less public interfaces, while using internally plain data structures. > > --- rpm/lib/rpmal.c 2 Aug 2008 00:38:04 - 2.71 > > +++ rpm/lib/rpmal.c 9 Nov 2008 21:38:03 - 2.72 > > @@ -154,6 +154,7 @@ > > { > > /[EMAIL PROTECTED] -temptrans -retalias @*/ > > union { alKey key; alNum num; } u; > > +u.num = 0; > > u.key = pkgKey; > > return u.num; > > /[EMAIL PROTECTED] =temptrans =retalias @*/ > > @@ -165,6 +166,7 @@ > > { > > /[EMAIL PROTECTED] -temptrans -retalias @*/ > > union { alKey key; alNum num; } u; > > +u.key = 0; > > u.num = pkgNum; > > return u.key; (Without this change, u.key had grabage in its high bits.) pgpHxNJ4bTY9y.pgp Description: PGP signature
Re: %post-script prerequisites
On Wed, Sep 24, 2008 at 07:08:54PM +, Alexey Tourbin wrote: > In package foo, program /usr/bin/foo is both packaged *and* called > in its %post script. The program /usr/bin/foo runs /usr/bin/bar, > for which we have the dependency "Requires: /usr/bin/bar". Here is similar example that does not require --noorder option to demonstrate the problem. The difference is that packages A and B have circular dependencies, so, unless we have "Requires(post)", rpm choose to install A first, and its %post script fails. Name: A Version: 1.0 Release: 1 Summary: A License: GPL Group: Development/Other Requires: /usr/bin/B #Requires(post): /usr/bin/B BuildArch: noarch AutoReqProv: no %package -n B Summary: B Group: Development/Other Requires: A AutoReqProv: no %description %description -n B %install mkdir -p %buildroot/usr/bin cat >%buildroot/usr/bin/A <%buildroot/usr/bin/B < pgplkUOJH591p.pgp Description: PGP signature
Re: %post-script prerequisites
On Wed, Sep 24, 2008 at 01:56:35PM -0400, Jeff Johnson wrote: > >Anyway, perhaps I should do some rewording in my initial description > >of the problem. In ALT Linux mailing list (in Russian), there seems > >to be some misunderstanding (or maybe a lack of thereof), too. > > Lots of misunderstandings wrto Requires(post): and PreReq:. > > But I still question whether "(post)" is fixing anything at all. Try > and see, > push a reproducer to me if you'ld like comments. Sample specfile: Name: foo Version: 1.0 Release: 1 Summary: foo License: GPL Group: Development/Other Requires: /usr/bin/bar BuildArch: noarch AutoReqProv: no %package -n bar Summary: bar Group: Development/Other AutoReqProv: no %description %description -n bar %install mkdir -p %buildroot/usr/bin cat >%buildroot/usr/bin/foo <%buildroot/usr/bin/bar < pgpCk2X1Z6H5m.pgp Description: PGP signature
Re: %post-script prerequisites
> >Think about this again: package foo has program /usr/bin/update-foo, > >which is invoked in %post-script of the package. The program is > >linked > >with e.g. libglib-2.0.so.0(GLIB_2.18), and we have the dependency > >"Requires: libglib-2.0.so.0(GLIB_2.18)". However, this is merely > >"Requires". To run /usr/bin/update-foo in the %post script reliably, > >that must be "Requires(post)". Or otherwise there's a possibility > >that, > >despite topological reordering, glib2 gets installed or upgraded after > >foo, and so the %post scriptlet fails miserably. > > > >So, there's a general problem: if you both package a program and run > >it in the %post-script (in the very same package), then bare Requires > >are not enough: some of them (namely, which are used by the program) > >should also become Requires(post). > > Mebbe. > > IMHO, there are several flaws in the above. > > The fundamental flaw (imho) is trying to add secondary > dependencies to a package node in the dependency graph. > > If the chain A -> B -> C is necessary for running /usr/bin/update-foo > in a script, then A should have > Requires: B > and B should have > Requires: C > and the dependency graph should be assembled dynamically, > not added statically to A. +# The solution is: 1) to detect all packaged programs (and files), recursively, +# which are used in %post-script; and 2) to find prerequisites for such files +# and programs (which are not provided by the package itself), and add them to +# Requires(post) dependencies. Also, we want to ensure that 3) the list of +# Requires(post) additional dependencies is only a subset of original Requires. Note that there are three steps, and the last step is explicitly about not adding secondary dependencies. So, for the package foo, it goes like this: 1) We see that the packaged program /usr/bin/update-foo is invoked in %post script (in the very same package!). 2) We run find-requires for /usr/bin/update-foo, and get some dependences, which are candidates for Requires(post); they are libglib-2.0.so(GLIB_2.18) and others (but no secondary dependencies anyway). 3) Requires(post) candidates are intersected with the "Requires" of the package. Sine /usr/bin/update-foo is packaged, there must be "Requires: libglib-2.0.so(GLIB_2.18)", too. This step is basically required only because of some glitches in dependency generators (i.e. some generators work best when they process the whole list of specific files, to make some folding). > Also your claim > > >To run /usr/bin/update-foo in the %post script reliably, > >that must be "Requires(post)". > > is false. The "(post) marker limits the context where a > dependency applies, and does not improve reliability at all. To run /usr/bin/foo in the %post-script reliably, there must be "Requires(post): libglib-2.0.so(GLIB_2.18)" dependency. "Reliably" means that, unless we have this "Requires(post)" dependency on recent glib2 version, there's a possibility that glib2 gets installed/upgraded *after* foo, which is too late for its %post-sciprt). What's false in this claim? Anyway, perhaps I should do some rewording in my initial description of the problem. In ALT Linux mailing list (in Russian), there seems to be some misunderstanding (or maybe a lack of thereof), too. pgplBvsqgRMfc.pgp Description: PGP signature
Re: %post-script prerequisites
On Wed, Sep 24, 2008 at 12:26:10PM -0400, Jeff Johnson wrote: > You do know that bash --rpm-requires will extract > dependencies for all scriptlets, not just %post and %preun, > automagically for several years now? Think about this again: package foo has program /usr/bin/update-foo, which is invoked in %post-script of the package. The program is linked with e.g. libglib-2.0.so.0(GLIB_2.18), and we have the dependency "Requires: libglib-2.0.so.0(GLIB_2.18)". However, this is merely "Requires". To run /usr/bin/update-foo in the %post script reliably, that must be "Requires(post)". Or otherwise there's a possibility that, despite topological reordering, glib2 gets installed or upgraded after foo, and so the %post scriptlet fails miserably. So, there's a general problem: if you both package a program and run it in the %post-script (in the very same package), then bare Requires are not enough: some of them (namely, which are used by the program) should also become Requires(post). pgp3zYMdfBxIm.pgp Description: PGP signature
Re: Two limitations of triggers in rpm
On Mon, Sep 22, 2008 at 07:38:03AM -0400, Jeff Johnson wrote: > >Now, some paths are are "virtual", which is e.g. executable paths > >under > >update-alternatives(1) control. Those paths are not packaged (and > >hence > >cannot be accessed via Basenames index), but rather created in % > >post script. > >We have find-provides hook which automatically provides virtual paths > >(e.g. "Provides: /usr/bin/xvt" for xterm, rxvt-unicode etc.) > > So use a probe dependency like > Requires: executable(/path/to/alternative) > The probe will be evaluated at run-time, and even permits rpm to > interoperate with dpkg alternatives. I don't need runtime probe at all, I need to resolve/install the dependency. # apt-get install /usr/bin/xvt Reading Package Lists... Done Building Dependency Tree... Done Package /usr/bin/xvt is a virtual package provided by: xterm 237-alt1 [Installed] termit 1.3.5-alt1 rxvt-unicode 9.02-alt1 [Installed] kdebase-wm 3.5.10-alt4 kde4base-konsole 4.1.1-alt1 gnome-terminal 2.22.3-alt1 aterm 1.0.1-alt3 [Installed] You should explicitly select one to install. E: Package /usr/bin/xvt is a virtual package with multiple good providers. # pgpwPm3gRgF5r.pgp Description: PGP signature
Re: Two limitations of triggers in rpm
On Sat, Sep 20, 2008 at 01:37:59PM -0400, Jeff Johnson wrote: > There's the additional wrinkle of handling > Provides: /path/to/file > which muddles the implementation further, because 2 indices need to > be searched. > > Personally, I think its way past time to prohibit file paths in the > Providename index. A second source > for paths adds a huge amount of complexity to all application > accesses, not just rpm, of an rpmdb for > very little benefit other than that packagers and vendors get to > pretend that file paths in the Providename > index is some sort of cool and useful feature. In ALT Linux, "file-level" dependencies are essential. I.e. when a file path is known in advance, we use dependency on that path (e.g. "Requires: /bin/sh"). And we have some complicated logic to translate file-level dependencies with intermediate symlinks in path components, e.g. /etc/init.d/functions -> /etc/rc.d/init.d/functions. Now, some paths are are "virtual", which is e.g. executable paths under update-alternatives(1) control. Those paths are not packaged (and hence cannot be accessed via Basenames index), but rather created in %post script. We have find-provides hook which automatically provides virtual paths (e.g. "Provides: /usr/bin/xvt" for xterm, rxvt-unicode etc.) Other packages might want to require /usr/bin/xvt, which they actually do require. So, due to "virtual paths", file-like Provides cannot be prohibited. > From an engineering POV, paths in the Providename index just doubles > the amount of work needed > to ensure whether a path is present (or not). Most of the time, Basenames index lookup will do (since most paths are non-virtual). The amount of work gets doubled only when Providename fallback is invoked. pgplCTewZ12J1.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 04:01:13PM -0400, Jeff Johnson wrote: > >1) > >%triggerin --posttrans -- /usr/share/icons/hicolor/*/*/ > >gtk-update-icon-cache /usr/share/icons/hicolor > > > >This trigger can be triggered/folded/called either by dirname or by > >glob pattern itself. Since there is no way to pass the matching > >dirname, which is limitation by itself, the only sane possibility > >is that DIRNAMES triggers are triggered/folded/called by glob > >patterns. > > You're worried about a package that has __LITERALLY__ a path > that includes glob characters?!? No-no-no. I mean something else. Driname triggers are called *per what*? They should be called *per matching dirname*. On the other hand, using glob patterns, there is no way to pass the dirname to the trigger, so we fold matching dirnames *by dirname patterns*. E.g. for %triggerin --posttrans -- /usr/share/icons/hicolor/*/*/ ... Possibility #1) when dirname matches, call the trigger; so the trigger gets called multiple times for e.g. /usr/share/icons/hicolor/a/b/ /usr/share/icons/hicolor/a/c/ /usr/share/icons/hicolor/a/d/ etc. Possibility #2) getting sober: no way to know which dirnames really matched; call once for all matching dirnames. My point was that, with glob-dirname triggers (which is what you propose), I still cannot do what I need (actually what *they* need). Triggers simply cannot have any specail "arguments" "that matched", at least not now. [... I need some time to study other points ...] pgpKt1roYx18U.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 12:40:55PM -0400, Jeff Johnson wrote: > >On Fri, Sep 19, 2008 at 04:21:50PM +, Alexey Tourbin wrote: > >>Technically there's no piping, only a file duplicated on stdin. And > >>"filetriggers" are run only once, at the end of transaction (they're > >>actually "posttrans filetriggers"), which saves consecutive ldconfig, > >>gtk-update-icon-cache, or whatever calls. > > > >Uh, but can that work? A Prereq to another package basically says > >that the package must be fully configured before installation, > >so all triggers must be run. Post-transaction is a bit late... > > > > There's need for a IMMEDIATE as well as a ONETIME (as in delayed) > trigger attribute. > > The ONETIME mechanism can be handled by appending to existing > %posttrans, the IMMEDIATE attribute is essentially the existing trigger > mechanism(s). Okay, with DIRNAMES patterns and "posttrans" trigger flag, you can implement something like "posttrans filetriggers" on behalf of specfile/rpmdb. There are still issues. 1) %triggerin --posttrans -- /usr/share/icons/hicolor/*/*/ gtk-update-icon-cache /usr/share/icons/hicolor This trigger can be triggered/folded/called either by dirname or by glob pattern itself. Since there is no way to pass the matching dirname, which is limitation by itself, the only sane possibility is that DIRNAMES triggers are triggered/folded/called by glob patterns. 2) %triggerin --posttrans -- /usr/share/icons/hicolor/*/*/ gtk-update-icon-cache /usr/share/icons/hicolor %triggerun --posttrans -- /usr/share/icons/hicolor/*/*/ gtk-update-icon-cache /usr/share/icons/hicolor How do you pass "$2" argument to these triggers? What is "$2"? If you pass different "$2" for in/un, you can no longer fold basically the same in/un triggers (and they run twice). Or you do not pass "$2" at all. Anway, doing just something about "$2" is weired. And this is still not enough. 3) There's a dozen of icon themes, and their gtk2 icon cache is specific to gtk2. The above triggers imply that I process "hicolor" theme specially. However, I do not. I want gtk2 to update caches for all its themes as needed. Here is gtk-icon-cache.filtrigger for gtk2 pacakge as (presumably) implemented for ALT Linux: #!/bin/sh egrep -o '^/usr/share/icons/[^/]+/' |sort -u | # doing /usr/share/icons/*/ directories while read -r dir; do if [ -f "$dir"/index.theme ]; then # something changed for this theme gtk-update-icon-cache "$dir" elif [ -f "$dir"/icon-theme.cache ]; then # theme was removed, nuke stale cache rm -f "$dir"/icon-theme.cache rmdir --ignore-fail-on-non-empty "$dir" fi done Now you cannot implement this with glob-dirname triggers, because you need to know the name of icon theme dir. gtk2.spec: %triggerin -- /usr/share/icons/*/*/*/ # cannot deduce /usr/share/icons/hicolor/ prefix pgpG1KIHvjhqR.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 12:08:14PM -0400, Jeff Johnson wrote: > >>So? Use a glob pattern against RPMTAG_DIRNAMES > >>elements to detect condition pkg-contains-directory. > > > >Do you mean something like -- ? > >%triggerin -- /usr/share/icons/hicolor/*/*/ > >gtk-update-icon-cache /usr/share/icons/hicolor > > Yes. > > >Possible implementation is: retrieve all Triggername index keys > >with leading "/", and treat them as patterns. Then do O(N^2) nested > >loop: for each DIRNAME in a package, for each Triggername pattern, > >check for fnmatch(pattern, dirname). > > No. I haven't said anything at all about loops or implementation. But it has to be implemented somehow... > And how is piping every file to an external script any better or faster? Technically there's no piping, only a file duplicated on stdin. And "filetriggers" are run only once, at the end of transaction (they're actually "posttrans filetriggers"), which saves consecutive ldconfig, gtk-update-icon-cache, or whatever calls. > For starters, there are many fewer directories, already uniqified, than > there are file paths in packages ... pgpUhaOFHb2X1.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 11:51:32AM -0400, Jeff Johnson wrote: > >$ find /usr/share/icons/hicolor -mindepth 2 -type d |sort -u |head > >/usr/share/icons/hicolor/128x128/actions > >/usr/share/icons/hicolor/128x128/animations > >/usr/share/icons/hicolor/128x128/apps > >/usr/share/icons/hicolor/128x128/categories > >/usr/share/icons/hicolor/128x128/devices > >/usr/share/icons/hicolor/128x128/emblems > >/usr/share/icons/hicolor/128x128/emotes > >/usr/share/icons/hicolor/128x128/filesystems > >/usr/share/icons/hicolor/128x128/intl > >/usr/share/icons/hicolor/128x128/mimetypes > >$ find /usr/share/icons/hicolor -mindepth 2 -type d |wc -l > >156 > >$ > > So? Use a glob pattern against RPMTAG_DIRNAMES > elements to detect condition pkg-contains-directory. Do you mean something like -- ? %triggerin -- /usr/share/icons/hicolor/*/*/ gtk-update-icon-cache /usr/share/icons/hicolor Possible implementation is: retrieve all Triggername index keys with leading "/", and treat them as patterns. Then do O(N^2) nested loop: for each DIRNAME in a package, for each Triggername pattern, check for fnmatch(pattern, dirname). pgp6jWa2oHb3F.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 11:36:52AM -0400, Jeff Johnson wrote: > >On Fri, Sep 19, 2008 at 11:26:23AM -0400, Jeff Johnson wrote: > >>Likely the 1st thing to get into place is the ability to trigger from > >>adding > >>a file to a directory, i.e. trigger if RPMTAG_DIRNAMES matches a > >>trigger > >>pattern, add trailing / to pattern to differentiate a dirname > >>trigger. > > > >Triggers based on DIRNAMES are not enough. Here is an example: > >when e.g. /usr/share/icons/hicolor/32x32/apps/kpdf.png is installed, > >upgraded, or removed, "gtk-update-icon-cache /usr/share/icons/hicolor" > >must be triggered. > > And that is exactly the same condition as RPMTAG_DIRNAMES contains > /usr/share/icons/hicolor/32x32/apps/ for a package that is installed/ > erased. $ find /usr/share/icons/hicolor -mindepth 2 -type d |sort -u |head /usr/share/icons/hicolor/128x128/actions /usr/share/icons/hicolor/128x128/animations /usr/share/icons/hicolor/128x128/apps /usr/share/icons/hicolor/128x128/categories /usr/share/icons/hicolor/128x128/devices /usr/share/icons/hicolor/128x128/emblems /usr/share/icons/hicolor/128x128/emotes /usr/share/icons/hicolor/128x128/filesystems /usr/share/icons/hicolor/128x128/intl /usr/share/icons/hicolor/128x128/mimetypes $ find /usr/share/icons/hicolor -mindepth 2 -type d |wc -l 156 $ pgpEORQtQioQn.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 11:26:23AM -0400, Jeff Johnson wrote: > Likely the 1st thing to get into place is the ability to trigger from > adding > a file to a directory, i.e. trigger if RPMTAG_DIRNAMES matches a trigger > pattern, add trailing / to pattern to differentiate a dirname trigger. Triggers based on DIRNAMES are not enough. Here is an example: when e.g. /usr/share/icons/hicolor/32x32/apps/kpdf.png is installed, upgraded, or removed, "gtk-update-icon-cache /usr/share/icons/hicolor" must be triggered. pgpZTNEqWo5hR.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 04:59:35PM +0200, Michael Schroeder wrote: > On Fri, Sep 19, 2008 at 02:52:21PM +0000, Alexey Tourbin wrote: > > On Fri, Sep 19, 2008 at 11:07:04AM +0200, Michael Schroeder wrote: > > > while implementing virtual triggers > > [...] > > May I perhaps take a look at where you are? > > I'm implementing some sort of triggers, too. > > Oh, I'm not implementing new triggers, I'm just changing the current > implementation so that they also trigger on package provides (and > maybe the file list) and not just package NEVR. > > What are you working on? File triggers for ALT Linux rpm, based on Mandriva patch, but with a few design decisions different. Main differences are: 1) The list of files is not prefixed with "+" or "-". When some package is upgraded, the same files are both "added" and "removed" (in terms of rpm), so the distinction is not relibable (especially if the triggers are postponed/resumed). We can still use simple file test like [ -f file ] to see if the files were actually added/upgraded or removed. This also means we can make the list of files unique with "sort -u". 2) No separate files with regular expressions (and no internal grep in librpm) -- file triggers are black boxes. All triggers are run with full file list attached to stdin. Here is for example how ldconfig trigger can be implemented. #!/bin/sh -e while read -r f; do case "$f" in /lib/lib*/* |\ /lib64/lib*/* |\ /usr/lib/lib*/* |\ /usr/lib64/lib*/* ) # false positives continue ;; /lib/lib*.so |\ /lib64/lib*.so |\ /usr/lib/lib*.so |\ /usr/lib64/lib*.so ) # maybe soname if set "$f".* && [ -f "$1" ]; then continue fi ;; /lib/lib*.so.* |\ /lib64/lib*.so.* |\ /usr/lib/lib*.so.* |\ /usr/lib64/lib*.so.* ) # soname ;; /etc/ld.so.conf.d/*.conf) ;; *) continue ;; esac exec /sbin/ldconfig done In "maybe soname" case, I check for something like *-devel packages installed or removed (they have symbolic links for which ldconfig should not be invoked). pgpCLtjnTv4Wn.pgp Description: PGP signature
Re: Conflicts on files not symmetric
On Fri, Sep 19, 2008 at 11:07:04AM +0200, Michael Schroeder wrote: > while implementing virtual triggers [...] May I perhaps take a look at where you are? I'm implementing some sort of triggers, too. pgpcQRuXXG2Et.pgp Description: PGP signature
Fwd: Re: v4.999.5alpha LZMA_STREAM_INIT_VAR
- Forwarded message from Lasse Collin <[EMAIL PROTECTED]> - Date: Sat, 13 Sep 2008 19:08:21 +0300 From: Lasse Collin <[EMAIL PROTECTED]> To: Alexey Tourbin <[EMAIL PROTECTED]> Subject: Re: v4.999.5alpha LZMA_STREAM_INIT_VAR Alexey Tourbin wrote: > On Sat, Sep 13, 2008 at 01:17:42PM +0300, Lasse Collin wrote: > > Alexey Tourbin wrote: > > > rpm5.org/rpmio/lzdio.c: > > > 81 lzfile = calloc(1, sizeof(*lzfile)); > > > 82 if (!lzfile) { > > > 83 (void) fclose(fp); > > > 84 return NULL; > > > 85 } > > > 86 lzfile->fp = fp; > > > 87 lzfile->encoding = encoding; > > > 88 lzfile->eof = 0; > > > 89 lzfile->strm = LZMA_STREAM_INIT_VAR; > > > > You can use for example memset(&lzfile->strm, 0, > > sizeof(lzfile->strm)). See the comment of LZMA_STREAM_INIT in > > src/liblzma/api/lzma/base.h for details. > > There's a calloc() call which is already there, which means there's > no need to explicitly invoke memset. But LZMA_STREAM_INIT_VAR has > been there for a while. Oh, I didn't read the code carefully, sorry. If the previous liblzma version you built against was 4.999.3alpha, you will have bunch of other problems too. For example, the protoypes of lzma_alone_encode() and lzma_alone_decoder() have changed. > May I perhaps forward your message to rpm5.org developemnt list? > (So that we can fix the code in some way.) Sure. I can go and idle on [EMAIL PROTECTED] for a while too. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode - End forwarded message - pgpTY7wZWFpGZ.pgp Description: PGP signature
Fwd: Re: v4.999.5alpha LZMA_STREAM_INIT_VAR
- Forwarded message from Lasse Collin <[EMAIL PROTECTED]> - Date: Sat, 13 Sep 2008 13:17:42 +0300 From: Lasse Collin <[EMAIL PROTECTED]> To: Alexey Tourbin <[EMAIL PROTECTED]> Subject: Re: v4.999.5alpha LZMA_STREAM_INIT_VAR Alexey Tourbin wrote: > Upgrading to v4.999.5alpha breaks existing software builds, since > LZMA_STREAM_INIT_VAR apparently has been removed from API (without > deprecation note or something). There are lots of other API changes in addition to LZMA_STREAM_INIT_VAR in 4.999.5alpha, and I've already made a few more in the git repository. So you need to be careful when upgrading until the first stable release, because it is possible that some changes don't get detected by the compiler, e.g. if a new member is added to a structure. I know how much people hate API and ABI breakages. Once the first stable release is out, I won't break the API or ABI easily. But before that, I won't promise anything, because it would complicate development far too much. Stable release should be out before end of this year. > rpm5.org/rpmio/lzdio.c: > 81 lzfile = calloc(1, sizeof(*lzfile)); > 82 if (!lzfile) { > 83 (void) fclose(fp); > 84 return NULL; > 85 } > 86 lzfile->fp = fp; > 87 lzfile->encoding = encoding; > 88 lzfile->eof = 0; > 89 lzfile->strm = LZMA_STREAM_INIT_VAR; You can use for example memset(&lzfile->strm, 0, sizeof(lzfile->strm)). See the comment of LZMA_STREAM_INIT in src/liblzma/api/lzma/base.h for details. If you think the initialization should be done in some other way, for example by having a separate function or macro to do the initialization, let me know. I'm going to remove all exported variables from the API, so LZMA_STREAM_INIT_VAR won't be added back as is. -- Lasse Collin | IRC: Larhzu @ IRCnet & Freenode - End forwarded message - pgpVQfYGIbdmm.pgp Description: PGP signature
Re: rpm infinite recursions using manifests
On Sun, Sep 07, 2008 at 02:32:57PM -0400, Jeff Johnson wrote: > On Sep 7, 2008, at 2:28 PM, Alexey Tourbin wrote: > >On Sun, Sep 07, 2008 at 02:22:26PM -0400, Jeff Johnson wrote: > >>>Forbid manifest files from within manifests. > >>Forbid manifests entirely is a similarly Draconian solution. > > > >I expect manifests to have semanitcs, not #include semantics. > > Sure manifests have list semantics. List semantics means that we have basic type rpm, and manifests are of type list. Include semantics has a bit more vague notion of basic type: [ | ]+, the latter is recurisve and should be reduced to terminal code_snippets. cpp(1) provides a device to prevent infinite recursion, which is #ifndef FOO_H #define FOO_H [ | ]+ #endif and otherwise has no special way to handle recursion, except for nesting limit. $ cat foo.h #include "foo.h" $ cpp -I. -E foo.h 2>&1 >/dev/null |tail from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1, from foo.h:1: foo.h:1:17: error: #include nested too deeply $ So, right, the best thing you can do is set up nesting limit. However, to me, the fact that manifests can include manifests is not the least surprising thing. pgp5cHW149MX2.pgp Description: PGP signature
Re: rpm infinite recursions using manifests
On Sun, Sep 07, 2008 at 02:22:26PM -0400, Jeff Johnson wrote: > >Forbid manifest files from within manifests. > Forbid manifests entirely is a similarly Draconian solution. I expect manifests to have semanitcs, not #include semantics. > I want an acceptable general solution, forbidding solves no > engineering problem. pgpAMkvOpN7Mb.pgp Description: PGP signature
Re: rpm infinite recursions using manifests
On Sun, Sep 07, 2008 at 12:10:18PM -0400, Jeff Johnson wrote: > There's a class of infinite recursion problems with manifests used > on the rpm CLI that I don't know to fix. > > A manifest is a file containing a list of paths to packages (or other > manifests) Forbid manifest files from within manifests. pgp8hE1AXhf2r.pgp Description: PGP signature
lua bindings for rpmdb and header
On Thu, Sep 04, 2008 at 05:08:21PM -0400, Jeff Johnson wrote: > >Also, it is necessary to provide lua bindings for 1) retrieving > >headers by instance, perhepas something like "h = getAddedHeader(num)" > >and "h = getRemovedHeader(num)"; and 2) retrieving header entries, > >e.g. "h.filenames". BTW, back in 2005 I wrote a file luarpm.c, which is lua bindings to rpm, akin to old Perl-RPM. I don't remember much details, but perhaps it was for lua-5.0, and perhaps it even worked. #include #include #include #include #include #include #include static const char RPM_Database[] = "RPM_Database"; static const char RPM_Header[] = "RPM_Header"; static int luaRPM_dbopen(lua_State *L) { char *prefix = NULL; int mode = O_RDONLY; int perms = 0; rpmdb db; if (rpmdbOpen(prefix, &db, mode, perms) != 0) { lua_pushnil(L); lua_pushstring(L, "rpmdbOpen failed"); return 2; } void *ptr = lua_newuserdata(L, sizeof db); memmove(ptr, &db, sizeof db); luaL_getmetatable(L, RPM_Database); lua_setmetatable(L, -2); return 1; } static int luaRPM_fopen(lua_State *L) { const char *fname = luaL_checkstring(L, 1); FD_t fd = Fopen(fname, "r"); if (fd == 0) { lua_pushnil(L); lua_pushstring(L, strerror(errno)); lua_pushstring(L, "open"); return 3; } Header hdr; rpmRC rc = rpmReadPackageHeader(fd, &hdr, 0, 0, 0); Fclose(fd); if (rc != RPMRC_OK) { lua_pushnil(L); lua_pushstring(L, "rpmReadPackageHeader failed"); lua_pushstring(L, "init"); return 3; } void *ptr = lua_newuserdata(L, sizeof hdr); memmove(ptr, &hdr, sizeof hdr); luaL_getmetatable(L, RPM_Header); lua_setmetatable(L, -2); return 1; } static int db_find(lua_State *L, int tag) { rpmdb db = *(rpmdb *)luaL_checkudata(L, 1, RPM_Database); luaL_argcheck(L, db != NULL, 1, "RPM_Database expected"); const char *name = luaL_checkstring(L, 2); rpmdbMatchIterator mi = rpmdbInitIterator(db, tag, name, 0); Header hdr; int n = 0; lua_newtable(L); while ((hdr = rpmdbNextIterator(mi)) != NULL) { headerLink(hdr); lua_pushnumber(L, ++n); void *ptr = lua_newuserdata(L, sizeof hdr); memmove(ptr, &hdr, sizeof hdr); luaL_getmetatable(L, RPM_Header); lua_setmetatable(L, -2); lua_settable(L, -3); } rpmdbFreeIterator(mi); return 1; } static int luaRPM_db_pkg(lua_State *L) { db_find(L, RPMTAG_NAME); lua_pushnumber(L, 1); lua_gettable(L, -2); lua_remove(L, -2); return 1; } static int luaRPM_db_whatrequires(lua_State *L) { return db_find(L, RPMTAG_REQUIRENAME); } static int luaRPM_db_whatprovides(lua_State *L) { return db_find(L, RPMTAG_PROVIDENAME); } static int tag2num(const char *tag) { int i; for (i = 0; i < rpmTagTableSize; i++) if (strcasecmp(tag, rpmTagTable[i].name + 7) == 0) return rpmTagTable[i].val; return -1; } static int luaRPM_hdr_tag(lua_State *L) { Header hdr = *(Header *)luaL_checkudata(L, 1, RPM_Header); const char *key = luaL_checkstring(L, 2); int tag = tag2num(key); if (tag < 0) { lua_pushnil(L); lua_pushfstring(L, "unknown tag: %s", key); return 2; } int type, size; char *data; if (headerGetEntry(hdr, tag, &type, (void**)&data, &size) == 0) { lua_pushnil(L); lua_pushfstring(L, "no tag in header: %s", key); return 2; } switch (type) { case RPM_NULL_TYPE: lua_pushnil(L); return 1; case RPM_BIN_TYPE: lua_pushlstring(L, data, size); return 1; } lua_newtable(L); int i; switch (type) { case RPM_CHAR_TYPE: { char *ptr; for (ptr = (char *)data, i = 0; i < size; i++, ptr++) { lua_pushnumber(L, i + 1); lua_pushlstring(L, ptr, 1); lua_settable(L, -3); } break; } case RPM_INT8_TYPE: { int_8 *ptr; for (ptr = (int_8 *)data, i = 0; i < size; i++, ptr++) { lua_pushnumber(L, i + 1); lua_pushnumber(L, *ptr & 0xff); lua_settable(L, -3); } break; } case RPM_INT16_TYPE: { int_16 *ptr;
Re: modular %posttrans-like scripts
On Wed, Aug 27, 2008 at 09:13:18AM -0400, Jeff Johnson wrote: > There's the Mandriva solution, called "file triggers", to the cache > update > problem in lib/filetriggers.c. I dislike several things with the the > specific > Mandriva implementation, but the idea is closest to being generally > useful IMHO. lib/transaction.c: 1880 if ((rpmtsFlags(ts) & _noTransTriggers) != _noTransTriggers) 1881 rpmRunFileTriggers(rpmtsRootDir(ts)); Perhaps I need more general *mechanism* which can implement file triggers as a site/vendor *policy*, and which is not limited itself to file triggers. - rpmRunFileTriggers(rpmtsRootDir(ts)); + rpmRunSitePosttrans(ts); In rpmRunSitePosttrans, what possibly can be done is provide the ability for lua script to access installed and removed headers. That is, rpmRunSitePosttrans can call lua script /usr/lib/rpm/posttrans.lua with basically two arguments: the list of removed package instance numbers, and the list of added instance numbers (in the transaction ts). Also, it is necessary to provide lua bindings for 1) retrieving headers by instance, perhepas something like "h = getAddedHeader(num)" and "h = getRemovedHeader(num)"; and 2) retrieving header entries, e.g. "h.filenames". Then the whole notion of "posttrans file triggers" or whatever posttrans triggers can be implemented that posttrans.lua script. Now, it is easy to retrive headers of added packages, but it is a tricky question if I can access headers of removed packages, i.e. the headers that's been removed from rpmdb while the transaction ts is not finished yet. There seems to be RPMDBI_REMOVED temporary database, but I am not sure how it is supposed to work. pgpcHwwmWnNR5.pgp Description: PGP signature
tagNum and fpNum
rpmdb/rpmdb.h: 63 struct _dbiIndexItem { 64 rpmuint32_t hdrNum; /*!< header instance in db */ 65 rpmuint32_t tagNum; /*!< tag index in header */ 66 rpmuint32_t fpNum; /*!< finger print index */ 67 }; Please explain what is tagNum and how it is used. (I just need to understand the coder better.) Here is some reverse engeneering (against older rpmdb, actually created with rpm-4.0.4+). $ ./rpm -qa --qf '%{NAME}\t%{DBINSTANCE}\n' |grep -w perl-base perl-base 1068 $ Package perl-base is instance #1068. $ rpm -q --qf '[%{BASENAMES}\t%{FILENAMES}\n]' perl-base |cat -n |awk '{$1--}($2=="perl5")' 0 perl5 /etc/perl5 2 perl5 /usr/bin/perl5 9 perl5 /usr/lib/perl5 $ Package perl-base have 3 entries for "perl5" basename. $ perl -MDB_File -le 'tie %db, "DB_File", "/var/lib/rpm/Basenames", 0 and print join ",", unpack "I*", $db{perl5}' 707,165,775,0,800,14,1068,0,1068,2,1068,9 $ We can see (1068,0), (1068,2), and (1068,9) pairs. Okay, perhaps I can understand what tagNum is. What is fpNum then? pgpjV5y7dgku6.pgp Description: PGP signature
Re: damaged headers are due to FILESTATES RPM_CHAR_TYPE
On Wed, Aug 27, 2008 at 12:55:06PM -0400, Jeff Johnson wrote: > >$ ./rpm -q -vvv --whatprovides /a > >D: opening db index /var/lib/rpm/Packages rdonly mode=0x0 > >D: locked db index /var/lib/rpm/Packages > >D: opening db index /var/lib/rpm/Basenames rdonly mode=0x0 > >error: rpmdb: damaged header #625 retrieved -- skipping. > >error: rpmdb: damaged header #625 retrieved -- skipping. > >D: opening db index /var/lib/rpm/Providename rdonly mode=0x0 > >file /a: No such file or directory > >D: closed db index /var/lib/rpm/Providename > >D: closed db index /var/lib/rpm/Basenames > >D: closed db index /var/lib/rpm/Packages > >$ ./rpm -q -vvv --whatprovides /bin/cat > >D: opening db index /var/lib/rpm/Packages rdonly mode=0x0 > >D: locked db index /var/lib/rpm/Packages > >D: opening db index /var/lib/rpm/Basenames rdonly mode=0x0 > >error: rpmdb: damaged header #90 retrieved -- skipping. > >error: rpmdb: damaged header #90 retrieved -- skipping. > >error: rpmdb: damaged header #531 retrieved -- skipping. > >error: rpmdb: damaged header #585 retrieved -- skipping. > >error: rpmdb: damaged header #1101 retrieved -- skipping. > >error: rpmdb: damaged header #1173 retrieved -- skipping. > >D: opening db index /var/lib/rpm/Providename rdonly mode=0x0 > >file /bin/cat is not owned by any package > >D: closed db index /var/lib/rpm/Providename > >D: closed db index /var/lib/rpm/Basenames > >D: closed db index /var/lib/rpm/Packages > >$ > > > >(I don't know why --whatprovides is special; simple -q queries > >whithout join-key lookup work fine.) > > > > Hmmm, there is nothing special abt --whatprovides that should wander > into code that depends on RPM_MIN_TYPE (unless I'm missing something). > > RPMTAG_FILESTATES is the only tag that was ever type'd as RPM_CHAR_TYPE > iirc. And here are the uses, none of the code is directly on a query > code path (this is HEAD): There's a code that handles this case. rpmdb/header_internal.c: 36 int headerVerifyInfo(rpmuint32_t il, rpmuint32_t dl, const void * pev, void * iv, int negate) 37 { 38 /[EMAIL PROTECTED]@*/ 39 entryInfo pe = (entryInfo) pev; 40 /[EMAIL PROTECTED]@*/ 41 entryInfo info = iv; 42 rpmuint32_t i; 43 (The following line was added by me.) 44 fprintf(stderr, "headerVerifyInfo\n"); 45 46 for (i = 0; i < il; i++) { 47 info->tag = (rpmuint32_t) ntohl(pe[i].tag); 48 info->type = (rpmuint32_t) ntohl(pe[i].type); 49 /* XXX Convert RPMTAG_FILESTATE to RPM_UINT8_TYPE. */ 50 if (info->tag == 1029 && info->type == 1) { 51 info->type = RPM_UINT8_TYPE; 52 pe[i].type = (rpmuint32_t) htonl(info->type); 53 } Now, for some reason, certain rpmquery calls do trigger headerVerifyInfo call, and others do not. $ ./rpm -q --qf '%{NAME}\n' coreutils headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo headerVerifyInfo coreutils $ ./rpm -q --qf '%{NAME}\n' --whatprovides /bin/cat error: rpmdb: damaged header #90 retrieved -- skipping. error: rpmdb: damaged header #90 retrieved -- skipping. error: rpmdb: damaged header #531 retrieved -- skipping. error: rpmdb: damaged header #585 retrieved -- skipping. error: rpmdb: damaged header #1101 retrieved -- skipping. error: rpmdb: damaged header #1173 retrieved -- skipping. file /bin/cat is not owned by any package $ 9 headerVerifyInfo() calls with '-q' and no single headerVerifyInfo() call with '-q --whatprovides'. pgp4DN49HvxVJ.pgp Description: PGP signature
Re: damaged headers are due to FILESTATES RPM_CHAR_TYPE
On Wed, Aug 27, 2008 at 03:36:26PM -0400, Jeff Johnson wrote: > >Is there any benefit for them to join rpm5.org? There must be a good > >reason. There's nothing more than that. (They might know that > >there's > >a good reason, or they might know that there is not. This is rather > >non-political argument.) > > Asking me what their reason(s) are is pointless. All I know is I asked, > and saw several SuSE developers at both FOSDEM in February and OLS > last month. > > There's lots that is dead-on with the OpenSUSE build system. OTOH, > some of the compatibility issues integrating "weak dependencies" and > the other RFE's at > http://wiki.rpm.org/Problems_of_Building > will be tricky, not all of the mentioned problems are properly > formulated yet imho. To me, weak dependencies are bullshit (that's straight, yeah). Either there is a dependency, or there is not. BTW, SuSE paper on their smart SAT solver says the SAT solver can't handle weak dependencies anyway. (This does not mean I object weak dependencies. Let them be. I just see them as a sort of some special comments.) (Also, to me, rpm is not a "deus ex machina", by which I mean that rpm is not supposed to resolve complicated dependency stuff. There must be external solvers associated with external rpm repositories, such as apt-get or yum. In other words, I see librpm as a simple yes-or-no thing.) > >Well, something what, basically I like rpm5.org because things are > >straight and there's no any kind of political endorsement or whatever. > >I can hack as long as I like and as long as I don't break too much. > >This also means that I can contribute back and forth (even if I don't > >like some of rpm5 new pieces.) Perhaps this is a good reason for > >developers to join. > > ;-) Basically why I joined @rpm5.org too. Let's gather more developers then. I would rather say hackers, unencumbered with political stuff or something. People who want to hack should join. Perhaps the worst thing that can happen is that their code is ifdeffed. Honestly, there's another approach: don't touch that "rpm" thing unless it breaks; that's probably okay, too. > Seriously, rpm "make check" is starting to congeal sufficiently > that I'm way less worried about undetected breakage like > RPM_CHAR_TYPE. Yes better "legacy" checks need to be added. pgpVcCt4z5kqz.pgp Description: PGP signature
Re: damaged headers are due to FILESTATES RPM_CHAR_TYPE
On Wed, Aug 27, 2008 at 01:44:15PM -0400, Jeff Johnson wrote: > >Anyway, perhaps you should try to persuade SuSE guys that rpm5 is > >beneficial for them. That could be a big win. > > All the SuSE guys were invited @rpm5.org last October. None bothered to > even acknowledge the invitation. They are certainly still welcome > @rpm5.org. Is there any benefit for them to join rpm5.org? There must be a good reason. There's nothing more than that. (They might know that there's a good reason, or they might know that there is not. This is rather non-political argument.) Well, something what, basically I like rpm5.org because things are straight and there's no any kind of political endorsement or whatever. I can hack as long as I like and as long as I don't break too much. This also means that I can contribute back and forth (even if I don't like some of rpm5 new pieces.) Perhaps this is a good reason for developers to join. pgpE8jIA2UtX9.pgp Description: PGP signature
Re: damaged headers are due to FILESTATES RPM_CHAR_TYPE
On Wed, Aug 27, 2008 at 12:55:06PM -0400, Jeff Johnson wrote: > >This apparently means that rpm5 is not that widely used. > >Perhaps you should call for yet more major distributions. > > I'd agree that rpm5 is not widely used on "legacy" distributions. Face it, people use what they give them, and they give them something about rpm-4.4.2+. > I'm more interested in best possible engineering than with > widest possible usage atm for rpm-5.x. Better engineering > will eventually be adopted is my guess, and I most definitely > do not want the burden of support for rpm everywhere, I'm > just one guy who wants his life back. (aside) You seem to use colloquial English which is not always easy to grasp. What "one guy who wants his life back" is supposed to mean with respect to rpm5.org? Anyway, perhaps you should try to persuade SuSE guys that rpm5 is beneficial for them. That could be a big win. pgpzDhLC9bma6.pgp Description: PGP signature
Re: damaged headers are due to FILESTATES RPM_CHAR_TYPE
On Wed, Aug 27, 2008 at 09:16:50AM -0400, Jeff Johnson wrote: > >Damaged headers are due to FILESTATES from older rpmdb. > > > >rpmdb/header.c (regionSwab): > > 522 for (; il > 0; il--, pe++) { > > 523 struct indexEntry_s ie; > > 524 rpmTagType type; > > 525 > > 526 ie.info.tag = (rpmuint32_t) ntohl(pe->tag); > > 527 ie.info.type = (rpmuint32_t) ntohl(pe->type); > > 528 ie.info.count = (rpmuint32_t) ntohl(pe->count); > > 529 ie.info.offset = (rpmint32_t) ntohl(pe->offset); > > 530 assert(ie.info.offset >= 0);/* XXX insurance */ > > 531 > >Bails out right here: > > 532 if (hdrchkType(ie.info.type)) > > 533 return 0; > > 534 if (hdrchkData(ie.info.count)) > > 535 return 0; > > 536 if (hdrchkData(ie.info.offset)) > > 537 return 0; > > 538 if (hdrchkAlign(ie.info.type, ie.info.offset)) > > 539 return 0; > > > >Older FILESTATES have type RPM_CHAR_TYPE (= 1), and new value > >for RPM_MIN_TYPE is 2, which is RPM_UINT8_TYPE. > > Nice catch! Changing RPM_MIN_TYPE back to 1 is the obvious fix. > > However, I do wonder why this has not been reported before. AFAICT > the issue should have been very very loud and obvious. This apparently means that rpm5 is not that widely used. Perhaps you should call for yet more major distributions. > What was the full calling context where the problem was seen? It goes like this: $ ./rpm -q -vvv --whatprovides /a D: opening db index /var/lib/rpm/Packages rdonly mode=0x0 D: locked db index /var/lib/rpm/Packages D: opening db index /var/lib/rpm/Basenames rdonly mode=0x0 error: rpmdb: damaged header #625 retrieved -- skipping. error: rpmdb: damaged header #625 retrieved -- skipping. D: opening db index /var/lib/rpm/Providename rdonly mode=0x0 file /a: No such file or directory D: closed db index /var/lib/rpm/Providename D: closed db index /var/lib/rpm/Basenames D: closed db index /var/lib/rpm/Packages $ ./rpm -q -vvv --whatprovides /bin/cat D: opening db index /var/lib/rpm/Packages rdonly mode=0x0 D: locked db index /var/lib/rpm/Packages D: opening db index /var/lib/rpm/Basenames rdonly mode=0x0 error: rpmdb: damaged header #90 retrieved -- skipping. error: rpmdb: damaged header #90 retrieved -- skipping. error: rpmdb: damaged header #531 retrieved -- skipping. error: rpmdb: damaged header #585 retrieved -- skipping. error: rpmdb: damaged header #1101 retrieved -- skipping. error: rpmdb: damaged header #1173 retrieved -- skipping. D: opening db index /var/lib/rpm/Providename rdonly mode=0x0 file /bin/cat is not owned by any package D: closed db index /var/lib/rpm/Providename D: closed db index /var/lib/rpm/Basenames D: closed db index /var/lib/rpm/Packages $ (I don't know why --whatprovides is special; simple -q queries whithout join-key lookup work fine.) Actually it was an infinite loop with ever increasing mi->mi_setx and all signals blocked, so I thought that first I had to fix that infinite loop. And then it was some printf-style debugging; the surprising thing is that printf debuggins sometimes goes faster than gdb breakpoints etc. pgpb7m3621sAp.pgp Description: PGP signature
damaged headers are due to FILESTATES RPM_CHAR_TYPE
Damaged headers are due to FILESTATES from older rpmdb. rpmdb/header.c (regionSwab): 522 for (; il > 0; il--, pe++) { 523 struct indexEntry_s ie; 524 rpmTagType type; 525 526 ie.info.tag = (rpmuint32_t) ntohl(pe->tag); 527 ie.info.type = (rpmuint32_t) ntohl(pe->type); 528 ie.info.count = (rpmuint32_t) ntohl(pe->count); 529 ie.info.offset = (rpmint32_t) ntohl(pe->offset); 530 assert(ie.info.offset >= 0);/* XXX insurance */ 531 Bails out right here: 532 if (hdrchkType(ie.info.type)) 533 return 0; 534 if (hdrchkData(ie.info.count)) 535 return 0; 536 if (hdrchkData(ie.info.offset)) 537 return 0; 538 if (hdrchkAlign(ie.info.type, ie.info.offset)) 539 return 0; Older FILESTATES have type RPM_CHAR_TYPE (= 1), and new value for RPM_MIN_TYPE is 2, which is RPM_UINT8_TYPE. pgpD2dXm501hK.pgp Description: PGP signature
modular %posttrans-like scripts
I need modular and configurable %posttrans-like scripts. Examples: 1) automatic run of update-menus, if package has /usr/lib/menu/* 2) update-desktop-database -- /usr/share/applications/*.desktop 3) gtk-update-icon-cache -- /usr/share/icons/hiclolr/* I mean that, for common tasks, I do not want packagers to write their %post-like scripts at all. Simple cache updates should be handled by rpm itself. Now, is there somehting like this in rpm5.org, to start looking at? pgpHTj3Z6zgTo.pgp Description: PGP signature
mi->mi_offset == mi->mi_prevoffset
rpmdb/rpmdb.c (rpmdbNextIterator): 2418 /* If next header is identical, return it now. */ 2419 /[EMAIL PROTECTED] -refcounttrans -retalias -retexpose -usereleased @*/ 2420 if (mi->mi_prevoffset && mi->mi_offset == mi->mi_prevoffset) 2421 return mi->mi_h; 2422 /[EMAIL PROTECTED] =refcounttrans =retalias =retexpose =usereleased @*/ It looks like the condition never holds, because: 1) when doing mi_set, each set element should be different; 2) when doing DB_NEXT, each header instance should be different as well. pgpKcSiCmcnEs.pgp Description: PGP signature
rpmdbNextIterator infinite loop
rpmdb/rpmdb.c (rpmdbNextIterator): 2381 top: 2382 uh = NULL; 2383 uhlen = 0; 2384 2385 do { 2386 union _dbswap mi_offset; 2387 2388 if (mi->mi_set) { 2389 if (!(mi->mi_setx < mi->mi_set->count)) 2390 return NULL; 2391 mi->mi_offset = dbiIndexRecordOffset(mi->mi_set, mi->mi_setx); 2392 mi->mi_filenum = dbiIndexRecordFileNumber(mi->mi_set, mi->mi_setx); 2393 mi_offset.ui = mi->mi_offset; 2394 if (dbiByteSwapped(dbi) == 1) 2395 _DBSWAP(mi_offset); 2396 keyp = &mi_offset; 2397 keylen = sizeof(mi_offset.ui); 2398 } else { * 2399 key->data = (void *)mi->mi_keyp; 2400 key->size = (UINT32_T) mi->mi_keylen; 2401 data->data = uh; 2402 data->size = (UINT32_T) uhlen; 2403 #if !defined(_USE_COPY_LOAD) 2404 data->flags |= DB_DBT_MALLOC; 2405 #endif 2406 rc = dbiGet(dbi, mi->mi_dbc, key, data, 2407 (key->data == NULL ? DB_NEXT : DB_SET)); 2408 data->flags = 0; 2409 keyp = key->data; 2410 keylen = key->size; 2411 uh = data->data; 2412 uhlen = data->size; 2413 2414 /* 2415 * If we got the next key, save the header instance number. 2416 * 2417 * For db3 Packages, instance 0 (i.e. mi->mi_setx == 0) is the 2418 * largest header instance in the database, and should be 2419 * skipped. 2420 */ 2421 if (keyp && mi->mi_setx && rc == 0) { 2422 memcpy(&mi_offset, keyp, sizeof(mi_offset.ui)); 2423 if (dbiByteSwapped(dbi) == 1) 2424 _DBSWAP(mi_offset); 2425 mi->mi_offset = (unsigned) mi_offset.ui; 2426 } 2427 2428 /* Terminate on error or end of keys */ 2429 /[EMAIL PROTECTED]@*/ 2430 if (rc || (mi->mi_setx && mi->mi_offset == 0)) 2431 return NULL; 2432 /[EMAIL PROTECTED]@*/ 2433 #ifdef REFERENCE 2434 if (mi->mi_offset & 0x) { 2435 fprintf(stderr, "*** damaged key 0x%x reset to 0\n", mi->mi_offset); 2436 mi->mi_offset = 0; 2437 } 2438 #endif 2439 } 2440 mi->mi_setx++; 2441 } while (mi->mi_offset == 0); ... 2532 if (mi->mi_h == NULL || !headerIsEntry(mi->mi_h, RPMTAG_NAME)) { 2533 rpmlog(RPMLOG_ERR, 2534 _("rpmdb: damaged header #%u retrieved -- skipping.\n"), 2535 mi->mi_offset); * 2536 goto top; 2537 } This code is subject to infinite loop. Consider how it is called: mi = rpmdbInitIterator(db, RPMDBI_PACKAGES, hdrNum, sizeof(hdrNum)); h = rpmdbNextIterator(mi); and consider that the header indentified with hdrNum is damaged. What happens then? In line 2536, you "goto top". And on top, mi->mi_set is epmty (because it is not index-to-join-key lookup), and key->data is hdrNum. This means that dbiGet gets called with DB_SET, you get the same header which is damaged etc. (I'll try to fix this but I'm not sure what's the best way to fix this yet.) pgp8u9pe5AkeU.pgp Description: PGP signature
Re: RPM: rpm/rpmdb/ db3.c dbconfig.c rpmdb.c rpmdb.h
On Mon, Aug 18, 2008 at 12:20:36PM -0400, Jeff Johnson wrote: > > @@ -2878,6 +2878,7 @@ > > > > if (db->db_tags != NULL) > > for (dbix = 0; dbix < db->db_ndbi; dbix++) { > > + dbiIndex dbi; (Actually one of my previous changes moved these variables from the outer scope to the inner loop scope.) Note that dbi corresponds to dbix here, and this is the right scope for dbi. There's a temptation to reuse this dbi in another loop, but that would be a different "dbi" for another purpose. So, sometimes I ask, like, "what this variable is for?" One approach is to use more distinctive names, e.g. s/dbi_for_dbix_index_update/ (and another would be dbi_to_fetch_max_header_instance_number). Possibly a better approach is still to use short names but within the inner scope. The very scope can tell then what the variable is supposed to mean. > > DBC * dbcursor = NULL; > > DBT k = DBT_INIT; > > DBT v = DBT_INIT; Now note that k and v are also connected to "dbi" index (k has tag value for dbi index and v is a set of header instances). As soon as "dbi" goes away, k and v have garbage. They'd better go away together. > > @@ -2887,7 +2888,6 @@ > > rpmTag rpmtag = dbiTag->tag; > > const char * dbiBN = (dbiTag->str != NULL > > ? dbiTag->str : tagName(rpmtag)); > > - dbiIndex dbi; > > rpmuint8_t * bin = NULL; > > int i; > > > > @@ -2939,7 +2939,6 @@ > > if (!xx) > > continue; > > /[EMAIL PROTECTED]@*/ break; > > - > > } > > > > dbi = dbiOpen(db, he->tag, 0); Note that dbiOpen happens to be at the very same scope/indentation as dbi was first declared. So is dbiClose, right before the bracket that closes the scope of the loop. The benefit at least that you don't have to set dbi to NULL so that it does not dangle throughout the rest of the code. > Be very careful re-scoping dbiIndex here as well. Or at least watch > for subtle flaws, > valgrind should catch any flaws iirc. I hope I am careful (well, I did not run valgrind, I just asked myself questions, like, "what this variable is for"). pgpqn72alT7R0.pgp Description: PGP signature
Re: rpmdb.c blockSignals()
On Mon, Aug 18, 2008 at 09:13:14AM -0400, Jeff Johnson wrote: > >Consider that you open two rpmdb databases simultaneously. > >Signal handling is screwed, and after you close them both, > >rpmsq handler is still installed, and "oact" is lost. > > Screwed? signal handling reverts to the original state is/was the > intent. When you call rpmdbOpen twice (without calling rpmdbClose before the second time), rpmsqEnable will be called twice, and the original state gets lost. Okay, I know how to fix this. > >Actually, if we consider rpmdb a *library*, rpmsq is bad idea. > >A library must not intervene signal handling, it may only only block > >signals for a short period of time to perform its critical sections. > > > >However, from the "library" point of view, there is no change > >to shut down rpmdb gracefully, and rpmdbRock is then useless. > > Yes having a signal handler in a library leads to many difficulties. > > I can go through the design choices again again again, but having > a means to handle state associated with using a Berkeley DB is most > definitely MUSTHAVE. Ideailly, only blocking signals for critical sections (which is dbiPut+dbiSync) should be enough. If Berkeley DB was not shut down gracefully, it might require only minor repair (e.g. removing stale locks), which can be performed automatically. pgps5ixNobdon.pgp Description: PGP signature
Re: rpmdb.c blockSignals()
On Sat, Aug 16, 2008 at 12:12:13PM -0400, Jeff Johnson wrote: > >I think this is wrong -- with this change, blockSignals() now does > >NOT block signals. Note that e.g. db->put is neither atomic nor > >reenterable (possibly cannot even close db if db->put is in progress). > > Correct, signals are not blocked, they are caught by the rpmsq > handler, with the patch applied. Assuming the code is still correct. > > No exit is undertaken until the caught signal mask is tested. db->put > is still atomic, and no re-entry is undertaken. > > >However, I do not quite understand what rpmsq does. > > SQ == Signal Queue. > rpmsq registers, catches and delivers signals. Oh, I see now. I don't like rpmsq then. Consider that you open two rpmdb databases simultaneously. Signal handling is screwed, and after you close them both, rpmsq handler is still installed, and "oact" is lost. Actually, if we consider rpmdb a *library*, rpmsq is bad idea. A library must not intervene signal handling, it may only only block signals for a short period of time to perform its critical sections. However, from the "library" point of view, there is no change to shut down rpmdb gracefully, and rpmdbRock is then useless. pgp6AC3jnEFW6.pgp Description: PGP signature
Re: rpmdb.c blockSignals()
On Sat, Aug 16, 2008 at 05:57:30AM -0400, Jeff Johnson wrote: > > On Aug 16, 2008, at 4:12 AM, Jeff Johnson wrote: > > > > >Please do not change anything with blockSignals. > > > > Hmmm, I've neglected to supply sufficient details, and so > it sounds like I'm forbidding better engineering. That was not my > intent. > In fact, I'd much rather have better engineering than risk aversion in > RPM. > > So let's say this patch is attempted: > > @@ -822,7 +822,7 @@ static int blockSignals(/[EMAIL PROTECTED]@*/ rpm > (void) sigdelset(&newMask, SIGHUP); > (void) sigdelset(&newMask, SIGTERM); > (void) sigdelset(&newMask, SIGPIPE); > -return sigprocmask(SIG_BLOCK, &newMask, NULL); > +return sigprocmask(SIG_SETMASK, &newMask, NULL); > } > > /** > > The net result is that rpm would run without blocking any > of those explicitly mentioned signals, and ^C et al would be handled > sooner. There are certain long running loops like rpm -qa whose I think this is wrong -- with this change, blockSignals() now does NOT block signals. Note that e.g. db->put is neither atomic nor reenterable (possibly cannot even close db if db->put is in progress). However, I do not quite understand what rpmsq does. > responsiveness might be improved. Note that how often the > received signal mask is polled, not how long ^C is blocked, > ultimately determines "responsiveness". pgpLhDtHFdqch.pgp Description: PGP signature
damaged rpmdb headers
On Sat, Aug 16, 2008 at 04:12:49AM -0400, Jeff Johnson wrote: > (aside) btw, I'm seeing issues with damaged (truncated is my guess, > not looked) > headers retrieved from rpmdb on HEAD, rpm-5_1_4 is fine. How do I reproduce? At least simple *read-only* rpmquery works fine. $ ./rpm -qa --qf '%{NAME}\n' >/dev/null $ echo $? 0 $ pgplm0bNNbIcF.pgp Description: PGP signature
mire
I wonder if mires are of any use. Suppose I want to query provides by glob pattern 'perl(*)'. The desired output is 4 columns perl(...) perl(...)-version %{NAME} %{VERSION} E.g. something like $ rpm -qa --qf '%{PROVIDENAME}\t%{PROVIDEVERSION}\t%{NAME}\t%{VERSION}\n' |grep '^perl(' |head perl(Unicode/UCD.pm)0.250 perl-unicore5.8.8 perl(Text/Reform.pm)1.012 perl-Text-Reform1.12.2 perl(Locale/Maketext/Simple.pm) 0.160 perl-Locale-Maketext-Simple 0.16 perl(XML/Parser.pm) 2.360 perl-XML-Parser 2.36 perl(FreezeThaw.pm) 0.430 perl-FreezeThaw 0.43 perl(List/MoreUtils.pm) 0.220 perl-List-MoreUtils 0.22 perl(CGI.pm)3.390 perl-CGI3.39 perl(Thread.pm) 2.010 perl-threads5.8.8 perl(DateTime/Format/Mail.pm) 0.300 perl-DateTime-Format-Mail 0.30 perl(Exception/Class.pm)1.240 perl-Exception-Class1.24 $ done with mire glob, not with external grep. pgp8rrRHUy8mz.pgp Description: PGP signature
rpmdb.c blockSignals()
rpmdb/rpmdb.c: 808 /** 809 * Block all signals, returning previous signal mask. 810 * @param dbrpm database 811 * @retval *oldMask previous sigset 812 * @return 0 on success 813 */ 814 static int blockSignals(/[EMAIL PROTECTED]@*/ rpmdb db, /[EMAIL PROTECTED]@*/ sigset_t * oldMask) 815 /[EMAIL PROTECTED] fileSystem @*/ 816 /[EMAIL PROTECTED] *oldMask, fileSystem @*/ 817 { 818 sigset_t newMask; 819 820 (void) sigfillset(&newMask);/* block all signals */ 821 (void) sigprocmask(SIG_BLOCK, &newMask, oldMask); 822 (void) sigdelset(&newMask, SIGINT); 823 (void) sigdelset(&newMask, SIGQUIT); 824 (void) sigdelset(&newMask, SIGHUP); 825 (void) sigdelset(&newMask, SIGTERM); 826 (void) sigdelset(&newMask, SIGPIPE); 827 return sigprocmask(SIG_BLOCK, &newMask, NULL); 828 } Why sigdelset() calls and second sigprocmask() call are necessary here? As far as I can see, they do nothing. pgpIpxXDCnLcw.pgp Description: PGP signature
Re: RPM: rpm/rpmdb/ rpmdb.c
On Fri, Aug 08, 2008 at 01:37:42AM -0400, Jeff Johnson wrote: > Careful with this change. It's quite easy to end up truncating > unloaded header blobs accidentally by 8b (which is what your > deletion does afaict, not checked). > > headerGetMagic() is just a complicated way of setting nb = 8. > > --- rpm/rpmdb/rpmdb.c 7 Aug 2008 19:17:20 - 1.260 > > +++ rpm/rpmdb/rpmdb.c 8 Aug 2008 02:07:11 - 1.261 > > @@ -3221,13 +3221,6 @@ > > dbi = dbiOpen(db, RPMDBI_PACKAGES, 0); > > if (dbi != NULL) { > > > > - nb = 0; > > - (void) headerGetMagic(h, NULL, &nb); > > - /* XXX db0: hack to pass sizeof header to fadAlloc */ > > - datap = h; > > - datalen = headerSizeof(h); > > - datalen -= nb; /* XXX HEADER_MAGIC_NO */ > > - > > xx = dbiCopen(dbi, dbi->dbi_txnid, &dbcursor, DB_WRITECURSOR); > > > > /* Retrieve join key for next header instance. */ The code was like this: 3214 { 3215 unsigned int firstkey = 0; 3216 void * keyp = &firstkey; 3217 size_t keylen = sizeof(firstkey); 3218 void * datap = NULL; 3219 size_t datalen = 0; 3220 3221dbi = dbiOpen(db, RPMDBI_PACKAGES, 0); 3222if (dbi != NULL) { 3223 3224 nb = 0; 3225 (void) headerGetMagic(h, NULL, &nb); 3226 /* XXX db0: hack to pass sizeof header to fadAlloc */ 3227 datap = h; 3228 datalen = headerSizeof(h); 3229 datalen -= nb; /* XXX HEADER_MAGIC_NO */ So here was the hack that affects datap and datalen. 3230 3231 xx = dbiCopen(dbi, dbi->dbi_txnid, &dbcursor, DB_WRITECURSOR); 3232 3233 /* Retrieve join key for next header instance. */ 3234 3235 /[EMAIL PROTECTED]@*/ 3236 key->data = keyp; 3237 key->size = (UINT32_T) keylen; 3238 /[EMAIL PROTECTED]@*/ data->data = datap; 3239 data->size = (UINT32_T) datalen; 3240 ret = dbiGet(dbi, dbcursor, key, data, DB_SET); Here goes dbiGet call with datap. I ASSUME that, with DB_SET flag, "key" is used only as input paramter, and "data" is used only as output parameter (so that the key is left unchanged, and initial "data" is not checked at all). There is simply no point in setting "data" fields before dbiGet(..., key, data, DB_SET) call, isn't it? (Except for some old hack.) 3241 keyp = key->data; 3242 keylen = key->size; 3243 datap = data->data; 3244 datalen = data->size; 3245 /[EMAIL PROTECTED]@*/ pgpS0K9r1LHtG.pgp Description: PGP signature
Re: Provideversion empty string index
On Thu, Aug 07, 2008 at 06:27:37PM -0400, Jeff Johnson wrote: > > On Thursday, August 07, 2008, at 06:21PM, "Alexey Tourbin" <[EMAIL > PROTECTED]> wrote: > >If you try something like > >perl -MDB_File -MData::Dumper -le 'tie > >%db,"DB_File","/var/lib/rpm/Provideversion",0,0,$DB_BTREE or die; print > >Dumper(\%db)' |less > >you can see that the biggest Provideversion entry has "\0" key. > >Actually It takes about one half of the whole index. > > > >Was it really intended? > > Yes intended. > > The underlying issue with a data store (like Headers) is >What to do with missing/optional values? > > I chose to make NULL == "" == missing. However, there is a special case for empty file digests. rpmdb.c (rpmdbAdd): 3402 case RPMTAG_FILEDIGESTS: 3403 /* Filter out empty MD5 strings. */ 3404 if (!(he->p.argv[i] && *he->p.argv[i] != '\0')) 3405 /[EMAIL PROTECTED]@*/ continue; pgpPOiEOTteHX.pgp Description: PGP signature
Re: rpmdbAdd
On Thu, Aug 07, 2008 at 09:08:01PM -0400, Jeff Johnson wrote: > Header indices are monotonically increasing integer instances starting with 1. > And header instance #0 is where the monotonically increasing integer is > stored. Thanks, that's the key point that I missed. > I can lay out what I think needs to happen with rpmdbAdd() early next week so > hold off on any radical rpmdb surgery please. Actually I'm trying to synchronize our (of ALT Linux) rpmdb code (based on 4.0.4 with a number of backports) with rpm5 rpmdb code. So I need something like "current stable" rpmdb code base. As I'm trying to understand the code when copying and pasting (which is a code review), I can also exercise minor changes to make the code a bit simpler and prettier. No radical surgery intend. pgpvBozcGfBws.pgp Description: PGP signature
rpmdbAdd
Can't you help me to understand the code please? What is the "firstkey"? What's going on? rpmdb.c (rpmdbAdd): 3214 { 3215 unsigned int firstkey = 0; 3216 void * keyp = &firstkey; 3217 size_t keylen = sizeof(firstkey); 3218 void * datap = NULL; 3219 size_t datalen = 0; 3220 3221dbi = dbiOpen(db, RPMDBI_PACKAGES, 0); 3222if (dbi != NULL) { 3223 3224 nb = 0; 3225 (void) headerGetMagic(h, NULL, &nb); 3226 /* XXX db0: hack to pass sizeof header to fadAlloc */ 3227 datap = h; 3228 datalen = headerSizeof(h); 3229 datalen -= nb; /* XXX HEADER_MAGIC_NO */ 3230 3231 xx = dbiCopen(dbi, dbi->dbi_txnid, &dbcursor, DB_WRITECURSOR); 3232 3233 /* Retrieve join key for next header instance. */ 3234 3235 /[EMAIL PROTECTED]@*/ 3236 key->data = keyp; 3237 key->size = (UINT32_T) keylen; 3238 /[EMAIL PROTECTED]@*/ data->data = datap; 3239 data->size = (UINT32_T) datalen; 3240 ret = dbiGet(dbi, dbcursor, key, data, DB_SET); 3241 keyp = key->data; 3242 keylen = key->size; 3243 datap = data->data; 3244 datalen = data->size; 3245 /[EMAIL PROTECTED]@*/ 3246 3247 hdrNum = 0; 3248 if (ret == 0 && datap) { 3249 memcpy(&mi_offset, datap, sizeof(mi_offset.ui)); 3250 if (dbiByteSwapped(dbi) == 1) 3251 _DBSWAP(mi_offset); 3252 hdrNum = (unsigned) mi_offset.ui; 3253 } 3254 ++hdrNum; 3255 mi_offset.ui = hdrNum; 3256 if (dbiByteSwapped(dbi) == 1) 3257 _DBSWAP(mi_offset); 3258 if (ret == 0 && datap) { 3259 memcpy(datap, &mi_offset, sizeof(mi_offset.ui)); 3260 } else { 3261 datap = &mi_offset; 3262 datalen = sizeof(mi_offset.ui); 3263 } 3264 3265 key->data = keyp; 3266 key->size = (UINT32_T) keylen; 3267 /[EMAIL PROTECTED]@*/ 3268 data->data = datap; 3269 /[EMAIL PROTECTED]@*/ 3270 data->size = (UINT32_T) datalen; 3271 3272 /[EMAIL PROTECTED]@*/ 3273 ret = dbiPut(dbi, dbcursor, key, data, DB_KEYLAST); 3274 /[EMAIL PROTECTED]@*/ 3275 xx = dbiSync(dbi, 0); 3276 3277 xx = dbiCclose(dbi, dbcursor, DB_WRITECURSOR); 3278 dbcursor = NULL; 3279} pgpMATFaFto4s.pgp Description: PGP signature
Provideversion empty string index
If you try something like perl -MDB_File -MData::Dumper -le 'tie %db,"DB_File","/var/lib/rpm/Provideversion",0,0,$DB_BTREE or die; print Dumper(\%db)' |less you can see that the biggest Provideversion entry has "\0" key. Actually It takes about one half of the whole index. Was it really intended? pgpTEUhT6H8pl.pgp Description: PGP signature
Re: - jbj: replace with private typedefs.
On Thu, Aug 07, 2008 at 08:40:26PM +0200, Ralf S. Engelschall wrote: > On Thu, Aug 07, 2008, Alexey Tourbin wrote: > > > [...] > > > - jbj: replace with private typedefs. > > > > Why private typedefs are any better? > > For instance because the private ones are available everywhere while > the ones require at least a C99 environment which in turn > unnecessarily increases the entry barrier for a bootstrapping tool like > RPM when it comes to non-Linux platforms. Arguably a better approach is to define uint32_t etc. on systems that miss them instead of introducing rpmuint32_t. My $EDITOR can highlight uint32_t but knows nothing about rpmuint32_t. As a developer, I'm disappointed about reading the code made harder. pgpYZdyrXi15e.pgp Description: PGP signature
- jbj: replace with private typedefs.
On Thu, Jul 31, 2008 at 04:40:11AM +0200, Jeff Johnson wrote: > Server: rpm5.org Name: Jeff Johnson > Root: /v/rpm/cvs Email: [EMAIL PROTECTED] > Module: rpm Date: 31-Jul-2008 04:40:11 > Branch: HEAD Handle: 2008073102400505 > > Modified files: > rpm configure.ac system.h > rpm/build buildio.h files.c misc.c names.c pack.c > parseChangelog.c parsePreamble.c parsePrep.c > parseReqs.c parseScript.c reqprov.c rpmbuild.h > rpmspec.h spec.c > rpm/lib depends.c formats.c fs.c fs.h fsm.c poptI.c psm.c > query.c rpmal.c rpmal.h rpmchecksig.c rpmcli.h > rpmds.c rpmds.h rpmfc.c rpmfc.h rpmfi.c rpmfi.h > rpminstall.c rpmps.c rpmps.h rpmrc.c rpmrollback.c > rpmte.c rpmte.h rpmts.c rpmts.h rpmversion.c > rpmversion.h.in tevr.c transaction.c verify.c > rpm/rpmdb db3.c db_emu.h dbconfig.c fprint.c fprint.h > hdrNVR.c hdrfmt.c header.c header_internal.c > header_internal.h pkgio.c rpmdb.c rpmdb.h rpmevr.c > rpmns.c rpmtag.h rpmtd.c rpmtd.h rpmwf.c > signature.c sqlite.c tjfn.c > rpm/rpmio digest.c gzdio.c iosm.c iosm.h lookup3.c lzdio.c > poptIO.h rpmbc.c rpmdav.c rpmdav.h rpmdigest.c > rpmgc.c rpmhash.c rpmhash.h rpmio.c > rpmio_internal.h rpmiotypes.h rpmkeyring.c > rpmkeyring.h rpmlua.c rpmnss.c rpmpgp.c rpmpgp.h > rpmssl.c rpmsw.c rpmwget.c rpmz.c thkp.c > > Log: > - jbj: replace with private typedefs. Why private typedefs are any better? pgpmRnQpO482n.pgp Description: PGP signature
Re: RPM: rpm/ CHANGES rpm/rpmio/ gzdio.c
On Wed, Aug 06, 2008 at 05:09:30AM -0400, Jeff Johnson wrote: > So that's what I screwed. Tired old blind eyes here, sigh. void pointers are evil. gzFile is typdeffed as "void *", which means there's no chances left for compiler to complain about pointer type mismatch. They'd better typdef gzFile as "struct gzFile_s *" or something. > Thanks for the fix! Next tasks are to unwire internal zlib, and hint > the need for compressor flushing slightly differently (from iosm.c) ... BTW, it is not possible to make lzma rsyncable, at least for now. In lzma_alone format, there's no block partitioning at all. It's just a huge single block, as far as I see. There are other reasons why lzma compressed data cannot be effectively rsyncable. Certain conditions must be met, including small dictionary size (gzip has 32K dictionary size, and lzma works best with ~2M dictionary size). Basically it's either ultimate compression with lzma or rsyncability with good old gzip. pgpryxr9COPBb.pgp Description: PGP signature
Re: Fwd: Re: rsyncable gzdio
On Wed, Jul 09, 2008 at 05:35:46AM -0700, Jeff Johnson wrote: >2) From a fundamental coding design POV, adding --rsyncable to rpmio >code is just >plain wrong. That's as true for the patched internal zlib as it is >for your gzdio >patches. The issue of when gzflush() is called is fundamentally a >compression >and rsync transport, not a *.rpm payload packaging, issue. > >So isn't a pure API/ABI drop-in replacement for gzio(3), with a >"gzwrite" rather >than a "rsyncable_gzwrite" symbol a better approach? There are many >applications, >not just rpm, that _MUST_ participate in an efficient *.rpm >distribution framework, >so patching gzdio in rpmio is just a tiny piece of a much bigger >puzzle, where "gzwrite" >rather than "rsyncable_gzwrite" is likelier to be successful. JMHO, >YMMV, as always. Hmm. Let me argue here just a bit. You're saying that indirect gzflush() calls are somewhat alien to plain gzdio abstrcation. However, from gzFile higher-level (logical) perspective, calling gzflush() is also simply a no-op. The worst thing that can ever happen is calling gzflush() too often, which can degrade compression. Now, rsyncable_gzwrite is simply about calling real gzwrite+gzflush in a loop. Provided that pointer arithmetic in rsyncable_gzwrite(), is valid, gzflush is still a no-op, and gzdio abstraction is preserved, and nothing bad can ever happen. The rest of the code is sync_hint(), which is about finding the right sync points, and also, basically, not calling gzflush() too often. So, the code may look weired, but the idea is simple. I introdice rsyncable_gzwrite, which is equivalent to gzwrite from the higher-level (logical) point of view. There is no possiblity for rsyncable_gzwrite to produce logically invalid output. The rest of the code is just a hint for rsyncable_gzwrite() loop. BTW, we can have something like %_rsyncable_gzwrite macro, to switch between plain gzwrite and rsyncable_gzwrite. pgpzqa7EJS54H.pgp Description: PGP signature
Re: rsyncable gzdio
On Wed, Jul 09, 2008 at 05:35:46AM -0700, Jeff Johnson wrote: >0) No one understands why --rsyncable is important, or why gzip != zlib, >or why the "fuzzy" name patch in rsync would be a tremendous bandwidth >saving for *.rpm packages. I've been tracking the issue for like 6+ >years, >and what is fundamentally needed is a very clear demonstration, >including >publicized benchmarks and likely a drop-in "production" ready transport >implementation, for any --rsyncable code to be worth the effort. JMHO >based on 6+ years of explaining ... I have done some comprehensive testing of rsyncable gzdio with respect to rpm packaging. I posted them (in Russian) to our ALT Linux Team development list. The upshot is that 1) it is known to work well, which is at least no segfaults or corrupted data; 2) it does not degrade compression rate, due to cpio hints (avg 0.09% compared to avg 1% for patched zlib); 3) rsyncability effect can be worthwhile, which is about 1/3 bandwidth saving on real data transfer. Some details. I've tested rsyncability of our two directories: /ALT/archive/Sisyphus/2008/03/01/files/x86_64/RPMS /ALT/archive/Sisyphus/2008/04/01/files/x86_64/RPMS (they are just what you may think.) 1) From these two directories, I select package tuples which have the same %{NAME} (but file names %name-%version-%release.x86_64.rpm differ). This means I test whether rsyncability is worthwhile for the packages that's been updated within one month. This includes %version upgrades as well is minor %release updates (something like representative data). 2) For each package in a tuple, I repackage its cpio archive with rsyncable gzdio. 3) Small packages are excluded: repackaged cpio must be at least 32K each. 4) rsync is run (with a small trick) to diagnose if there is any speedup. The resulting table is rpm-1 size-1 rpm-2 size-2 rsync-sent rsync-recv speedup - -- - -- -- -- --- The table (the attachment) and some more details are available here: http://lists.altlinux.org/pipermail/devel/2008-May/074937.html $ wc -l 2' rsyncability.txt |wc -l 211 $ -- 211 packages have high rsyncability rate (one has to download less than 1/2 of new package size). $ sum() { perl -MList::Util=sum -ln0 -e 'print sum split'; } $ cut -f4 rsyncability.txt |sum 2433627 $ -- New packages are 2.32G total. $ cut -f5 rsyncability.txt |sum 14017 $ -- rsync downloaded 1.57G. (End of details.) This means that rsyncable gzdio *can* be worthwhile -- one can expect to save about 1/3 of bandwidth. However, this also has some requirements: 1) you must have older rpms (or you are going to save nothing anyway); 2) you must synchronize two directories, and you must use 'rsync --fuzzy', to catch up file renames; 3) both old and new files must be compressed with rsyncable gzdio. I hope this gives some idea of what rsyncable gzdio can do. pgpFcrWcKwEaf.pgp Description: PGP signature
rsyncable gzdio
On Mon, Jul 07, 2008 at 11:44:07PM +0200, Jeff Johnson wrote: > - make gzdio.c standalone. BTW, I have rsyncable gzdio implemntation (this does not require patched zlib, one only has to call gzflush() at certain sync points). It is known to work well. Please review the patches and tell me whether you want it or not. http://git.altlinux.org/people/at/packages/rpm.git?a=commitdiff;h=c761902b http://git.altlinux.org/people/at/packages/rpm.git?a=commitdiff;h=f7b5ee1e http://git.altlinux.org/people/at/packages/rpm.git?a=commitdiff;h=8d5e355e http://git.altlinux.org/people/at/packages/rpm.git?a=commitdiff;h=52b2499a pgpnEHrWeb0ej.pgp Description: PGP signature
Re: RPM: rpm/ CHANGES rpmpopt.in
On Mon, Jul 07, 2008 at 09:08:05PM -0400, Jeff Johnson wrote: > >From another side, it is not obvious how recursive --needswhat should > >traverse virtual packages where more than one alternative is > >available. > > > > Except for multilib (which I personally don't use), what categories > for multiple provides exist? E.g. executable(foo), or /usr/bin/foo (which can be under update-alternatives(8) control), or MTA. > All that is needed is criteria for preferring a Provides:. Even for > multilib, there is now %_prefer_color which > can be added to display the "preferred" answer if necessary. Should I > implement? > > Note also that the examples I've given for --needswhat/--whatneeds > are slyly/implicitly dependent > on whatever packages are already installed, which is likely to be > whatever was "preferred". > > A general answer for "preferred" is more complex however ... pgpKKeb6f8JkJ.pgp Description: PGP signature
Re: RPM: rpm/build/ files.c
On Mon, Jul 07, 2008 at 01:31:55AM -0400, Jeff Johnson wrote: > Here's the flaw (from the edos-test*.src.rpm build): > > ... > re --nodeps edos-test-*.src.rpm > (cd edos-test && /X/src/wdj/rpm --macros /X/src/wdj/macros:/X/src/wdj/ > tests/macros -q --specfile edos-test.spec && /X/src/wdj/rpm --macros / > X/src/wdj/macros:/X/src/wdj/tests/macros -q --specsrpm edos-test.spec > && /X/src/wdj/rpm --macros /X/src/wdj/macros:/X/src/wdj/tests/macros - > q --specfile --specedit edos-test.spec && /X/src/wdj/rpm --macros /X/ > src/wdj/macros:/X/src/wdj/tests/macros -bb --nodeps edos-test.spec) > > /dev/null > /usr/lib/rpm/check-files: line 11: cd: /X/src/wdj/tests/tmp//edos- > test-root: No such file or directory > > ^ > /X/src/wdj/rpm --macros /X/src/wdj/macros:/X/src/wdj/tests/macros -i > --nosignature --nodeps probes-test-*.src.rpm Hmm, in tests/edos-test.spec, there is no %install section, and %buildroot is not created at all. Also, all %files sections are empty there. And so, there is simply no buildroot. Here is how check-files changed: -RPM_BUILD_ROOT=`echo $1 | sed 's://*:/:g'` - -if [ ! -d "$RPM_BUILD_ROOT" ] ; then - cat > /dev/null +RPM_BUILD_ROOT=$1 +if ! cd "${RPM_BUILD_ROOT:?}"; then + cat >/dev/null exit 1 Old behaviour was: test for buildroot or exit 1. New behaviour: cd to buildroot, or fail noisily and exit 1. Let me think! pgpxg5K3ZXiyI.pgp Description: PGP signature
rpmfiFMaxLen
On Sun, Jul 06, 2008 at 06:24:43AM +0200, Alexey Tourbin wrote: > Server: rpm5.org Name: Alexey Tourbin > Root: /v/rpm/cvs Email: [EMAIL PROTECTED] > Module: rpm Date: 06-Jul-2008 06:24:43 > Branch: HEAD Handle: 2008070604244200 > > Modified files: > rpm CHANGES > rpm/build files.c > rpm/lib librpm.vers rpmfi.c rpmfi.h > > Log: > rpmfi: changed fi->fnlen meaning, added rpmfiFMaxLen() function > > - fi->fnlen: now indicates max file name lengith, without '\0' (like > strlen) > - rpmfiNew: find the exact max file name lengith, not dnlmax + bnlmax > - rpmfiFN: allocate (fi->fnlen + 1) bytes > - rpmfiFMaxLen: new function rpmfiFNMaxLen is possibly a better name - s/F/FN/. However, there is also rpmfiFNlink. pgpnm8799ZH41.pgp Description: PGP signature
Re: RPM: rpm/ CHANGES rpm/build/ files.c
On Fri, Jul 04, 2008 at 09:48:19PM -0400, Jeff Johnson wrote: > > + if (!N1) headerNEVRA(h1, &N1, NULL, NULL, NULL, NULL); > > + if (!N2) headerNEVRA(h2, &N2, NULL, NULL, NULL, NULL); > > + rpmlog(RPMLOG_WARNING, > > + _("file %s is packaged into both %s and %s\n"), > > + fn1, N1, N2); > > > > This paradigm of displaying N-V-R.A (or whatever is deemed informative) > is an obviously (duh!) widely repeated paradigm throughout rpm. As for me, I think that displaying N-V-R.A is (sometimes) nothing more than displaying redundant information. Here is why. Suppose that we store rpmbuild build logs in some SCM system (well, we do), to study e.g. new compiler warnings or something. And then, simply changing the release is going to introduce quite a few changes in the build log, and hence this will yield new diff hunks. Now guess what. When studying build logs, the last thing we need is those funky new diff hunks. pgpX9eHkxZfbU.pgp Description: PGP signature
Re: RPM: rpm/ CHANGES rpm/build/ files.c
On Mon, Jun 16, 2008 at 10:21:56AM -0400, Jeff Johnson wrote: > Hmm, at some point, I start to question whether permitting duplicates > like >/foo/1 >/foo/1 > in %files, or worrying about %ghost (and %attr and %verify and % > exclude and ..., > there's *at least* one other place that the test for %ghost/%exclude > is needed) > scoping over pathologies as in your example above, is worth the effort. It is worth to have a correct algorithm for RPMTAG_SIZE counter. The algorithm is correct iff cpio file data bytes match RPMTAG_SIZE value (this is why we move the code to genCpioListAndHeader()). rpm2cpio foo.rpm |catenate_cpio_file_data |wc -c rpmquery --qf '%{SIZE}\n' -p foo.rpm My point is that as far as we can build valid cpio archive, we can also mimic some cpio logic a bit and get valid RPMTAG_SIZE value. It has nothing to do with specfile pathologies (or we just can't build cpio archive otherwise). > With %ghost, the issue runs to a fundamental spec file design flaw, > there > is plain and simply no way _BY DEFINITION_ to know the file type > associated > with the path that has a %ghost attribute. > > The RPMTAG_FILEMODES associated with %ghost files has ugo rwx, but > not the file type. I can't quite understand what you mean. It looks like you're trying to say that, in genCpioListAndHeader(), if (flp->flags & RPMFILE_HOST) holds, then we cannot reliably check for S_ISREG(flp->fl_mode). I can't see why yet. > Why shouldn't all of the above be treated as syntax errors instead > of quietly assuming that indeed, there is some real world need to > have sloppy goosey-loosey spec file syntax permitting duplicates (and > file marker attribute scoping across hard links, or primitive > filtering directives like %exclude) anyways. rpm seems to permit file dups by design, and, while issuing a warning, it also has some code to fold dups and merge their flags correctly. There is a good reason -- glob(3) is not quite flexible at times. Sometimes you do: %files /foo/prefix.* /foo/*.suffix Overlaps are okay here. > But as always, since there's no grammar for spec files, anything goes. > Even with a grammar, the issues of scoping through implicit hard link > aliasing, > are semantic, not syntactical (but duplicates are syntax). > > Guess what happens in your example when the install runs with > --excludepath=/foo/1 > > What link count should be checked with --verify, particularly if > another package also contains > %ghost /foo/2 > and /foo/1 explicitly had --excludepath when installed? > > And to return to the original RPMTAG_SIZE issue, what value should > --qf '%{size} > report for a given package with excluded hard linked paths that span > multiple > packages as above? I think that %{SIZE} should report cpio file data bytes (i.e. cpio archive size excluding 110 bytes per cpio entry, filenames, and alignment bytes). pgp1ZDyWXGcsu.pgp Description: PGP signature
Re: RPM: rpm/ CHANGES rpm/build/ files.c
On Mon, Jun 16, 2008 at 04:38:03AM -0400, Jeff Johnson wrote: > Alexey: >This is your patch reworked slightly. See what you think. > > + if (S_ISREG(flp->fl_mode)) { > > + int bingo = 1; > > + /* Hard links need be tallied only once. */ > > + if (flp->fl_nlink > 1) { > > + FileListRec jlp = flp + 1; > > + int j = i + 1; > > + for (; (unsigned)j < fi->fc; j++, jlp++) { Loop post-increment "j++, jlp++" is not enough here. Remember that "jlp" can go ahead of "j" when folding dups. You've got to rewind dups here the same way it is done in the outer loop. while (...) jlp++; Here is a counterexample: %install mkdir -p %buildroot/foo head -c 1024 /dev/zero >%buildroot/foo/1 head -c 1024 /dev/zero >%buildroot/foo/2 ln %buildroot/foo/1 %buildroot/foo/3 %files /foo/1 /foo/2 /foo/2 /foo/3 We expect RPMTAG_SIZE == 2048 (1024 per /foo/1+3 hardlink set plus 1024 per /foo/2). However, we get 3072. It is easy to see why: when doing /foo/1, /foo/2 gets checked twice, and /foo/3 is not checked at all (and /foo/1 gets bingo but it shouldn't). > > + if (!S_ISREG(jlp->fl_mode)) > > + continue; > > + if (flp->fl_nlink != jlp->fl_nlink) > > + continue; > > + if (flp->fl_ino != jlp->fl_ino) > > + continue; > > + if (flp->fl_dev != jlp->fl_dev) > > + continue; Now assume that rewind went well, and we are at the last dup entry (which has valid file flags that's been merged). You still have to check: if (jlp->fl_flags & (RPMFILE_EXCLUDE | RPMFILE_GHOST)) continue; Think of this example: %install mkdir -p %buildroot/foo head -c 1024 /dev/zero >%buildroot/foo/1 ln %buildroot/foo/1 %buildroot/foo/2 %files /foo/1 %ghost /foo/2 > > + bingo = 0; /* don't tally hardlink yet. */ > > + break; > > + } > > + } > > + if (bingo) > > + fl->totalFileSize += flp->fl_size; > > + } pgp6bhd4q61NY.pgp Description: PGP signature
Re: noarch sub-packages
On Mon, Jun 16, 2008 at 01:57:13AM -0400, Jeff Johnson wrote: > >It looks like you cannot specify "BuildArch: %_target_cpu" in toplevel > >package. > > Ah, I have a reproducer for the flaw now, fixing todo++. I believe there's an easier way to introduce noarch subpackages (i.e. without explicit spec->toplevel and pkg->noarch bookkeeping). You can simply check for (pkg == spec->packages). http://git.altlinux.org/people/at/packages/rpm.git?a=commitdiff;h=3ad2b101 pgpX7BSVxgedP.pgp Description: PGP signature
Re: noarch sub-packages
On Mon, Jun 16, 2008 at 01:20:31AM -0400, Jeff Johnson wrote: > Certainly "they really work" (unless I missed pushing some patch to > rpm-5.1.3): It looks like you cannot specify "BuildArch: %_target_cpu" in toplevel package. pgphiIGn556c8.pgp Description: PGP signature
[PATCH] fix RPMTAG_SIZE counter
RPMTAG_SIZE counter is broken a bit. 1) Duplicate file entries are not getting merged. %install mkdir -p %buildroot/foo head -c 1024 /dev/zero >%buildroot/foo/1 %files /foo/1 /foo/1 RPMTAG_SIZE == 2048. 2) File flags are not getting mreged. %install mkdir -p %buildroot/foo head -c 1024 /dev/zero >%buildroot/foo/1 %files /foo/1 %exclude /foo/1 RPMTAG_SIZE == 1024 (empty package). 3) Similar problems with %ghost files. To fix the problem, RPMTAG_SIZE counter should be invoked after file dups are merged. This means the code should be moved from addFile() to genCpioListAndHeader(). --- rpm-5.1.3/build/files.c-2008-04-06 12:47:47 +0400 +++ rpm-5.1.3/build/files.c 2008-06-16 07:16:45 +0400 @@ -1601,15 +1601,6 @@ if (!(_rpmbuildFlags & 4)) /[EMAIL PROTECTED] =noeffectuncon @*/ sxfn = _free(sxfn); -ui32 = fl->totalFileSize; -he->tag = RPMTAG_SIZE; -he->t = RPM_UINT32_TYPE; -he->p.ui32p = &ui32; -he->c = 1; -he->append = 1; -xx = headerPut(h, he, 0); -he->append = 0; - if (_rpmbuildFlags & 4) { (void) rpmlibNeedsFeature(h, "PayloadFilesHavePrefix", "4.0-1"); (void) rpmlibNeedsFeature(h, "CompressedFileNames", "3.0.4-1"); @@ -1713,7 +1704,44 @@ if (_rpmbuildFlags & 4) { if (isSrc) fi->fmapflags[i] |= IOSM_FOLLOW_SYMLINKS; + if (S_ISREG(flp->fl_mode) && flp->fl_nlink == 1) + fl->totalFileSize += flp->fl_size; + else if (S_ISREG(flp->fl_mode) && flp->fl_nlink > 1) { + /* Hard links need be counted only once. */ + FileListRec jlp; + int j; + int found = 0; + for (j = i+1, jlp = flp+1; (unsigned)j < fi->fc; j++, jlp++) { + while (((jlp - fl->fileList) < (fl->fileListRecsUsed - 1)) && + !strcmp(jlp->fileURL, jlp[1].fileURL)) + jlp++; + if (!S_ISREG(jlp->fl_mode)) + continue; + if (flp->fl_nlink != jlp->fl_nlink) + continue; + if (flp->fl_ino != jlp->fl_ino) + continue; + if (flp->fl_dev != jlp->fl_dev) + continue; + if (jlp->flags & (RPMFILE_EXCLUDE | RPMFILE_GHOST)) + continue; + found = 1; + break; + } + if (!found) /* last entry in hardlink set */ + fl->totalFileSize += flp->fl_size; + } } + +ui32 = fl->totalFileSize; +he->tag = RPMTAG_SIZE; +he->t = RPM_UINT32_TYPE; +he->p.ui32p = &ui32; +he->c = 1; +he->append = 1; +xx = headerPut(h, he, 0); +he->append = 0; + /[EMAIL PROTECTED]@*/ if (fip) *fip = fi; @@ -1947,26 +1975,6 @@ static int addFile(FileList fl, const ch flp->specdFlags = fl->currentSpecdFlags; flp->verifyFlags = fl->currentVerifyFlags; - /* Hard links need be counted only once. */ - if (S_ISREG(flp->fl_mode) && flp->fl_nlink > 1) { - FileListRec ilp; - for (i = 0; i < fl->fileListRecsUsed; i++) { - ilp = fl->fileList + i; - if (!S_ISREG(ilp->fl_mode)) - continue; - if (flp->fl_nlink != ilp->fl_nlink) - continue; - if (flp->fl_ino != ilp->fl_ino) - continue; - if (flp->fl_dev != ilp->fl_dev) - continue; - break; - } - } else - i = fl->fileListRecsUsed; - - if (!(flp->flags & RPMFILE_EXCLUDE) && S_ISREG(flp->fl_mode) && i >= fl->fileListRecsUsed) - fl->totalFileSize += flp->fl_size; } fl->fileListRecsUsed++; @@ -2758,8 +2766,6 @@ int processSourceFiles(Spec spec) #endif flp->langs = xstrdup(""); - fl.totalFileSize += flp->fl_size; - if (! (flp->uname && flp->gname)) { rpmlog(RPMLOG_ERR, _("Bad owner/group: %s\n"), diskURL); rc = fl.processingFailed = 1; pgp1XmZV5vOzZ.pgp Description: PGP signature
noarch sub-packages
Hello, Do they really work? rpm-5.1.3 $ ./rpmbuild -bb ~/test.spec error: line 8: Only "noarch" sub-packages are supported: BuildArch: x86_64 error: Package has no %description: test-1.0-alt1.x86_64 rpm-5.1.3 $ cat ~/test.spec Name: test Version: 1.0 Release: alt1 Summary: test License: GPL Source: foo Group: Development/Other BuildArch: %_target_cpu %description rpm-5.1.3 $ __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org