hdrfmt.c

Jeff Johnson Fri, 27 May 2011 07:34:46 -0700

On May 27, 2011, at 9:36 AM, Anders F Björklund wrote:

> Jeff Johnson:
> 
>>>> Header tags _ARE_ ordered, just the collation isn't
>>>> what no brainer conversions into native data types
>>>> are implementing.
> 
>> And the collation is integer numeric based on tag numbers.
> 
> Okay. Does this tag ordering need to be preserved ?


All depends on context.

*IF* there is a canonical representation in the spewage (basically
a defined, not defacto, order) that standard "plaintext" digests
and signatires can be retrofitted to secure the spewage and
achieve interoperality. The alternative de facto approach is
defacto createrepo toolchain lock-in.

> If so, we should change the --yaml and --json now.
> 

Its too early to finalize CLI options until the goals
        interoperability through stricter spewage definitions
are well understood. Basically all I'm saying is the
same as DER and BER (and PER and ...) encodings for
spewage where one doesn't have the luxury of ASN.1 "standards"
when dealing with spewage. See the lengths that XML-SEC
has to specify what is essentially a retrofitted canonical
definition of "plaintext" on which signatures can be defined.

But yes, --yaml and --json would need to change as the usage
case and goals are more clearly understood.

>> There's a s superficial and a deep answer here.
>> 
>> SOmehow it needs to be indicated that header speawage
>> is _NOT_ random, but rather carefully sorted in many
>> ways.
> 
> All the sequences/arrays maintain their sort order,
> but the mappings/objects do not (they're unordered)
> 

Yes. but its "implementation defined" for YAML iirc, and
bindings are free to interpret !!omap however they wish.
So a "mapping" might be a sorted array instead of a hash
table, depending on implementation.

Meanwhile these are largely (imho) moot technically obscure discussions
that can only meaningfully be answered by looking  at the "real world"
of usage cases and implementations.

>> The "LSB packaging standard" totally blew it with respect to
>> tag ordering.
>> 
>> And the spewage -- if not carefully controlled -- will be useless
>> for RPM itself, whose task is to import/export through speawage
>> into a "header" blob.
> 
> If spewage is to preserve tag ordering, then the
> currently used markup/schema needs to be changed.
> 
> From:
> {
>  Tag1: Value1,
>  Tag2: Value2,
>  Tag3: Value3
> }
> 
> To:
> [
>  { Tag1: Value1 },
>  { Tag2: Value2 },
>  { Tag3: Value3 }
> ]
> 
> This will make it slightly trickier to handle,
> but it will preserve the order of the keys/tags.
> 

Yes, a specification that ALSO defines the ordering starts
to become pretty complex. See XML-SEC.

>>>> But it hardly matters with spewage, fewer tokens to ask
>>>> about KISS simplicity trumps everything else in FL/OSS.
>>> 
>>> For the YAML and JSON formats, it's easier unordered.
>> 
>> Easier for whom? Lusers who don't undertsand what
>> "canoniocally represented plaintext" actually means?
>> 
>> Or why sorted data can be accessed in logN not linear time?
> 
> Easier formats, i.e. not needing nested structures ?
> 

You are correct that RPM metadata doesn't need all the generality
provided by various spewage formats.

> But if it's needed, it's needed. I thought it wasn't.
> 

The specific usage case that I see short-term is Poky/Ycto.

Instead of using *.spec templating, YAML (or XML or JSON) would
be used as a better (than *.spec) templating for driving
packaging (i.e. just producing *.rpm from a build not performed
by rpmbuild).

The risk there is that almost instantly not just Poky/Yacto will be attempting
to produce *.rpm packages from markup and so I worry up front about
issues like
        How SHOULD the ordering criteria be hinted?
Its not OPTIONAL: *.rpm data has all sorts of implicit
constraints, and you will NOT be happy just typing up
some markup and feeding that spewage to a backend that
attempts to produce *.rpm package from %{buildroot} and markup.

>>> And I think the XML would need a DTD, to do ordering ?
>>> 
>> 
>> No idea what XML "needs". I do know from rpmrepo that
>> its _IMPOSSIBLE_ to be bit for bit compatible because
>> tag data is being run through a python dict which
>> _DESTROYS_ the ordering of the original data.
> 
> Right. The same goes for using a mongo document iirc ?
> 

No. A python dict is a hash, and the loss of sort order comes
from walking hash buckets serially.

A "document structured" MongoDB has the ability to add an
ordering key that a python dict (as used in createrepo) does not.

But yes explicit means to preserve order WILL need to be undertaken
to simplify generating header blob's (which is also not
the general, but rather the de facto first and most common
"container" representation in use by RPM where order _IS_ important).

>>> i.e. XML does ordering now, but I think a parser is
>>> "free" to reorder the elements without invalidating ?
>>> 
>> 
>> Please note that I'm disagreeing with your patch whatsoever.
>> 
>> But somehow and somewhere it needs to be hinted to all
>> the "spewage suckers" that there are most definitely
>> performance and interoperability wins by establishing
>> a sorted and canonical ordering on the spewage items.
>> 
>> Yes I know how to use qsort(3) wherever needed. I'm enetrested
>> in proper spewage specification on which it becomes feasible
>> to define digests/signatures and simplify interoperability
>> and implementations. And most definitely I'm not holding my breath
>> waiting for FedEx to ship me a pony ...
> 
> If metadata must be sorted, it should be specified
> and required by any export/import (in any format).
> 

Specified how?
Required by ... ? The tools atm are vapor ware, and
even if vapor ware, vendors/applications WILL rip
out what they don't think is important.

> Was under the impression that it was only "needed"
> for arrays like Requires/Files, but probably wrong.
> 

The general principle -- and this really shpuld be obvious --
followed in RPM is:
        Optimize the data stores as much as possible in rpmbuild
        so that installers are as high performing as possible.
The basic "win" there is
        packages are built once, but installed zillions of times.
so rpmbuild is the naural place for optimizations (like sorted tag data)
SHOULD be done.

WHat is happening instead is that package monkeys are minimizing their
build maintenance efforts, and thereby preventing (by choosing not
to sort tag data) higher performing installations with data in
packaging that is tuned to minimize additional processing while installing.

73 de Jeff
> --anders
> 
> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> Developer Communication List                        rpm-devel@rpm5.org

smime.p7s
Description: S/MIME cryptographic signature

Re: [CVS] RPM: rpm/rpmdb/ hdrfmt.c

Reply via email to