On Jun 15, 2012, at 11:58 PM, Alexey Tourbin wrote: > On Fri, Jun 1, 2012 at 6:07 AM, Jeffrey Johnson <n3...@me.com> wrote: >> I asked 2 very specific questions … the rest is quite important also, >> but I need to understand precisely what properties set:versions have in order >> to implement correctly (and I don't fully understand your reply). >> >> Specifically: >> >> 1) Is the set:versions VERSION independent of the order of the >> calls to rpmsetAdd()? (you know the routine as set_add()) > > Completely independent - you can add symbols in any order. The symbols > are then hashed and sorted by their numeric values. The underlying > idea is that a set-version is just a (sorted) set of numbers. You can > add whatever symbol to it, possibly twice, the symbol will be hashed > to a number, in a unique manner, and finally you can get the string > representation of the set of numbers. This involves much fuss under > the hood, but basically, you should think of the set of symbols, which > is just the set of numbers, after each symbol has been hashed > individually. > >> 2) Can the set:versions encoding be compared for more than equality? >> What set/arithmetic property is the basis for the comparison? What >> circumstances/constraints are there related to >> … You cannot always compare >> set-versions in terms of "greater or equal" (but when you >> can, it's >> important). > > Set-versions compare as sets. There are Euler diagrams to visaulise > set comparsion, which is an undergraduate matetrial. The idea is that, > real numbers are linear order: you can always tell either V1<V2 or > V1>=V2. Sets are quite another matter: you cannot always apply for > "tertium non datur" (either lt or ge). Which is to say that sets can > be quite different and do not compare easily. The order can be imposed > on the sets, though, by requiring the "greater" sets to have at least > the same elements they compare against (perhaps I'm starting to retell > the undergraduate material, which is not going to last). To sum up, > there IS a mathematical basis behind "Requires: foo >= set:asdf" > dependencies. > >> I can of course answer my own questions with try-and-see test cases.
Good: the above confirmation of the characteristics allows a set:versions implementation to proceed. What I am hearing is that set:versions is not fundamentally different from these transforms Create a Bloom filter (with acceptable false positive rate) Compress the Bloom filter parameters and bits. Encode the compression in base{16,32,62,64} for transport. The "contained in" or subset semantic that applies to the operations "<" and ">=" is rather easy to do as well. E.g. if (assuming on;y existence, not versioned inequality ranges) P == Bloom filter of Provides: tokens R == Bloom filter of a (possible) subset of tokens then the subset computation is nothing more than (P & R) == R The important difference is that Bloom filters have a false positive failure probability which is likely not present in the Golomb-Rice coding in the current implementation. I'm expecting that Bloom+Compression+Encoding is comparable in size to the Golomb-Rice encoding. The only way to tell (that I know of) is to try-and-see, which I will likely also do in order to tell how good the compression is with Golomb-Rice coding. Meanwhile there's lots of usage cases for the set:versions implementation in RPM, not just as a container of Elf symbols that is a de facto ABI sub-set. 73 de Jeff > ______________________________________________________________________ > RPM Package Manager http://rpm5.org > Developer Communication List rpm-devel@rpm5.org ______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org