On Jun 23, 2012, at 2:10 PM, Jeffrey Johnson wrote: > > On Jun 23, 2012, at 1:49 PM, Alexey Tourbin <alexey.tour...@gmail.com> wrote: >
<…> > Perhaps making base61 encoding MANDATORY in rpm would displease > everyone equally: choosing a prime is as "pretty" as all other > encoding criteria. > In the interest of getting off negative nerdy obscure discussions, let's try a positive alternative application for Golob-Rice subset operations. All RPMv4 packages attach (a lightly filtered) file(1) magic string to every file. The file(1) data is mostly usable as a "keyword" namespace exactly as is. Yes there are flaws: however magic strings are from file(1) is about as good as any other de facto keyword tagging of file content. keywords are strings just like elf symbols are, and set:versions (or Bloom filters) are a compact representation from which its rather easy to do subset computations. One extension that would be needed is a "closest" metric in order to "prefer" the largest subset overlap: with set:versions any contained subset will satisfy the logical assertions, and there's no easy way to prefer the larger sub-set. There's a similar application with dual/triple/... licensed software and computing per-file, not per-package, license affinity precisely where set:versions (or Bloom filters) will represent keywords (like "LGPLv2" or "BSD") easily. Licenses unlike file(1) magic keywords will require name space administration. SUrely LSB and LFF are seeking something useful to do for RPM packaging these days, and might be convinced to make some set of license tokens "standard" so that license affinity can be precisely computed in distributed software. 73 de Jeff