On May 27, 2011, at 9:36 AM, Anders F Björklund wrote: > Jeff Johnson: > >>>> Header tags _ARE_ ordered, just the collation isn't >>>> what no brainer conversions into native data types >>>> are implementing. > >> And the collation is integer numeric based on tag numbers. > > Okay. Does this tag ordering need to be preserved ?
All depends on context. *IF* there is a canonical representation in the spewage (basically a defined, not defacto, order) that standard "plaintext" digests and signatires can be retrofitted to secure the spewage and achieve interoperality. The alternative de facto approach is defacto createrepo toolchain lock-in. > If so, we should change the --yaml and --json now. > Its too early to finalize CLI options until the goals interoperability through stricter spewage definitions are well understood. Basically all I'm saying is the same as DER and BER (and PER and ...) encodings for spewage where one doesn't have the luxury of ASN.1 "standards" when dealing with spewage. See the lengths that XML-SEC has to specify what is essentially a retrofitted canonical definition of "plaintext" on which signatures can be defined. But yes, --yaml and --json would need to change as the usage case and goals are more clearly understood. >> There's a s superficial and a deep answer here. >> >> SOmehow it needs to be indicated that header speawage >> is _NOT_ random, but rather carefully sorted in many >> ways. > > All the sequences/arrays maintain their sort order, > but the mappings/objects do not (they're unordered) > Yes. but its "implementation defined" for YAML iirc, and bindings are free to interpret !!omap however they wish. So a "mapping" might be a sorted array instead of a hash table, depending on implementation. Meanwhile these are largely (imho) moot technically obscure discussions that can only meaningfully be answered by looking at the "real world" of usage cases and implementations. >> The "LSB packaging standard" totally blew it with respect to >> tag ordering. >> >> And the spewage -- if not carefully controlled -- will be useless >> for RPM itself, whose task is to import/export through speawage >> into a "header" blob. > > If spewage is to preserve tag ordering, then the > currently used markup/schema needs to be changed. > > From: > { > Tag1: Value1, > Tag2: Value2, > Tag3: Value3 > } > > To: > [ > { Tag1: Value1 }, > { Tag2: Value2 }, > { Tag3: Value3 } > ] > > This will make it slightly trickier to handle, > but it will preserve the order of the keys/tags. > Yes, a specification that ALSO defines the ordering starts to become pretty complex. See XML-SEC. >>>> But it hardly matters with spewage, fewer tokens to ask >>>> about KISS simplicity trumps everything else in FL/OSS. >>> >>> For the YAML and JSON formats, it's easier unordered. >> >> Easier for whom? Lusers who don't undertsand what >> "canoniocally represented plaintext" actually means? >> >> Or why sorted data can be accessed in logN not linear time? > > Easier formats, i.e. not needing nested structures ? > You are correct that RPM metadata doesn't need all the generality provided by various spewage formats. > But if it's needed, it's needed. I thought it wasn't. > The specific usage case that I see short-term is Poky/Ycto. Instead of using *.spec templating, YAML (or XML or JSON) would be used as a better (than *.spec) templating for driving packaging (i.e. just producing *.rpm from a build not performed by rpmbuild). The risk there is that almost instantly not just Poky/Yacto will be attempting to produce *.rpm packages from markup and so I worry up front about issues like How SHOULD the ordering criteria be hinted? Its not OPTIONAL: *.rpm data has all sorts of implicit constraints, and you will NOT be happy just typing up some markup and feeding that spewage to a backend that attempts to produce *.rpm package from %{buildroot} and markup. >>> And I think the XML would need a DTD, to do ordering ? >>> >> >> No idea what XML "needs". I do know from rpmrepo that >> its _IMPOSSIBLE_ to be bit for bit compatible because >> tag data is being run through a python dict which >> _DESTROYS_ the ordering of the original data. > > Right. The same goes for using a mongo document iirc ? > No. A python dict is a hash, and the loss of sort order comes from walking hash buckets serially. A "document structured" MongoDB has the ability to add an ordering key that a python dict (as used in createrepo) does not. But yes explicit means to preserve order WILL need to be undertaken to simplify generating header blob's (which is also not the general, but rather the de facto first and most common "container" representation in use by RPM where order _IS_ important). >>> i.e. XML does ordering now, but I think a parser is >>> "free" to reorder the elements without invalidating ? >>> >> >> Please note that I'm disagreeing with your patch whatsoever. >> >> But somehow and somewhere it needs to be hinted to all >> the "spewage suckers" that there are most definitely >> performance and interoperability wins by establishing >> a sorted and canonical ordering on the spewage items. >> >> Yes I know how to use qsort(3) wherever needed. I'm enetrested >> in proper spewage specification on which it becomes feasible >> to define digests/signatures and simplify interoperability >> and implementations. And most definitely I'm not holding my breath >> waiting for FedEx to ship me a pony ... > > If metadata must be sorted, it should be specified > and required by any export/import (in any format). > Specified how? Required by ... ? The tools atm are vapor ware, and even if vapor ware, vendors/applications WILL rip out what they don't think is important. > Was under the impression that it was only "needed" > for arrays like Requires/Files, but probably wrong. > The general principle -- and this really shpuld be obvious -- followed in RPM is: Optimize the data stores as much as possible in rpmbuild so that installers are as high performing as possible. The basic "win" there is packages are built once, but installed zillions of times. so rpmbuild is the naural place for optimizations (like sorted tag data) SHOULD be done. WHat is happening instead is that package monkeys are minimizing their build maintenance efforts, and thereby preventing (by choosing not to sort tag data) higher performing installations with data in packaging that is tuned to minimize additional processing while installing. 73 de Jeff > --anders > > ______________________________________________________________________ > RPM Package Manager http://rpm5.org > Developer Communication List rpm-devel@rpm5.org
smime.p7s
Description: S/MIME cryptographic signature