Re: rpmrepo

Jeffrey Johnson Wed, 04 Jan 2012 10:03:11 -0800

On Jan 4, 2012, at 12:03 PM, Jeremy Huntwork wrote:

> On Jan 4, 2012, at 11:43 AM, Jeffrey Johnson wrote:
>> Well "broken" is perhaps a bit harsh as a euphemism for
>>      Not useful to you.
>> 
>> ;-) included just in case; I don't believe we disagree here.
> 
> Chalk it down to perhaps a misunderstanding of the intended usage, but I'm 
> not entirely certain on that point yet. :-)
>


No need to apologize for misunderstanding:

        We both agree that rpmrepo is only useful atm as "bug" fodder for @rpm 
5.org code.

>> The concept of mirrored markup used for depsolvers like 
>> yum/zypper/urpmi/smart
>> is what is "broken" imho. rpmrepo is/was intended as a client side,
>> not a server-side, tool. As a client side tool, the generated markup
>> is entirely intact, was known to work with (at least) smart at the
>> time rpmrepo was implemented, and the markup is still the same,
>> and would work with that version of smart, on the client no matter
>> what the createrepo de jure chooses to generate to feed to 
>> yum/zypper/urpmi/smart.
> 
> Alright, but two things:
> 
> 1. Please explain the use case as a client side tool. I'm not sure I 
> understand the intent there.
> 

A complete explanation is what is needed in a "blueprint", not in e-mail. But
I can supply a recent "real world" example of the consequences of choosing
what I am calling a "server-side" (i.e. vendor distro metadata is created and
mirrored and downloaded by vendor clients) depsolver implementation.

In the last week createrepo was modified to add a "compat" compression 
functionality.

The reason given for adding "compat" was to support RHEL6 where the distributed 
version
of yum expects gzip'ed XML and bzip'ed sqlite3 for downloading, while Fedorable 
yum
is using XZ! XZ! XZ! and perhaps deltification (I forget whether used in RHEL6: 
I don't
use of believe in delta rpm's …)

The "server-side" design flaw in the above is this:
        all the metadata is static content for efficient mirror transport using 
rsync.
One can easily see *already* that there are multiple duplicated 
formats/compressions
that MUST be distributed to mirrors because of
        XML vs sqlite3
        gzip/bzip2 vs XZ!
which leads to mirror bloat (yes there are fewer mirrors than clients).

The "server-side" problem is pervasive however: any change to the 
one-size-fits-all
distributed (in multiple "compatible" forms) leads to almost instant "legacy 
compatibility"
issues. Consider what happens if one adds a tag which is useful for client side 
tools:
one has to be very careful about phasing in the new change, or one will be 
forced
on the "server-side" to distribute Yet Another Type of almost identical markup.

So it takes months and years to change "server-side" markup, and a huge
amount of planning to deploy any change.

There's also the problem that most metadata is being distributed monolithically:
        Any change to the metadata forces a full download.
(aside)
Yes I know about various implementations intended to address the "monolithic"  
problem
that are underway; I'm speaking generally here, and generally a full download is
needed for any change at all to "server-side" metadata.

I believe its obvious how a "client-side" tool addresses the problems of
incompatibilities. But I will list a couple of the more important points:

        1) a "client-side" tool scales because the #clients >> #servers creating
        a markup representation. IOW, there are more cpu cycles available
        on clients (in general) than on servers. And the compute intensive
        operations are digests/decompresses that are solely needed for
        bandwidth savings and integrity checks that are already provided
        by RPM in *.rpm packages that are going to be checked anyways.

        2) version compatibilities cease to matter when a "client-side" tool
        chooses to continue generating gzip'ed XML and bzip'd sqlite3; in
        fact there's no need for both XML/sqlite3 on the client, only one
        format will be used by whatever depsolver is in use.

This is the distinction I make between "client-side" and "server-side"
spewage generators like rpmrepo (and createrepo).

I'm pretty sure that you are trying to use rpmrepo as a "server-side" tool
that generates sewage on a server. The intent is/was a "client-side" tool
that generated markup (in the case of rpm's depsolver, its called a "solvedb",
basically just like an rpmdb of "everything" but not installed) locally for
speedy access in a specific format.

> 2. Regardless of _where_ the tool is used, what is the <open-checksum> 
> supposed to be a hash of? It doesn't correspond to anything else I see in the 
> generated repo, whereas the schema leads me to believe that it was intended 
> to show the hash of the uncompressed file (as does the working functionality 
> of createrepo) - if so, then it actually is broken.
> 

I don't believe there is any single convention for <open-checksum> content. IIRC
(I would have to look, this is from memory) SuSE wished a digest on uncompressed
content, while RedHat wanted a digest of compressed content.

So 2 tags were added to the markup, thereby further increasing the complexity
of the XML markup.

(aside)
And in fact, using a digest of the *.rpm file is rather broken design as well 
where
various "server-side" tools are forced to duplicate functionality that is 
already mostly
present in *.rpm packages. One can attempt a KISS argument, that the entire file
needs to be digest checked, not just using the existing header SHA1 etc etc etc,
but these discussions quickly become very complex and security related and are
out of scope in what I am saying here).

>> I personally think (and I believe we agree) that a NoSQL! distributed store 
>> instead
>> of mirrored content is more soundly engineered, mostly because its rather
>> easy to design an incremental update, which monolithic mounds of
>> spewage (as generated by createrepo and mirrored by rsync and downloaded
>> by url-grabber for use by yum) is still commonly doing.
> 
> Yes, I'm still interested about the NoSQL solutions, but I'm a little hazy 
> about the implementation details, how to get up and running with a mongo 
> store (for this purpose).
> 

rpm-5.3.x had a JSON spewage implementation that was routinely uploading
the usual primary/filelists/other JSON analogue by piping to /usr/bin/mongo.

The script with details is in rpm-5_4 somewhere.

Note that Anders Bjorkland changed --json to --mongodb or something last
May because JSON and the markup used by the mongo shell are really
different; for proof-of-concept I just needed _SOME_ option to use four 
development.

And the implementation has mostly sat there since last May while I have dealt
with other matters like generating an income.

>> Meanwhile, ROADMAP discussions for rpm-5_4 "features" are currently active at
>>      http://launchpad.net/rpm
>> 
>> Feel free to add to this "blueprint"
>>      http://rpm5.org/community/rpm-devel/4444.html
>> and spell out the usage case (for others, not me: I already
>> know what I intended when I wrote rpmrepo).
>> 
>> Adding bug reports detailing the "broken" and attaching to the "blueprint"
>> (I will do the attach if you don't wish to) so that the amount of effort
>> (and cost <-> benefit in general) can be estimated and scheduled/targeted
>> at some milestone is what is going on this month.
>> 
>> Any/all additional information will only increase the likelihood of finishing
>> rpmrepo (or starting to use the embedded mongo-c-driver already present
>> in rpm-5.3).
> 
> I'll try to see if I can spend a few cycles assisting with the above…
> 

This month (through January 20th) is the time slot allocated to devising
a ROADMP feature list.

Thanks to the Gregorian calendar, you have a 10 day headstart on the
Russians at ROSA, who will be celebrating Christmas/NewYear's on
a different calendar this week ;-)

hth

73 de Jeff
> JH
> 
> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> Developer Communication List                        rpm-devel@rpm5.org

______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Re: rpmrepo

Reply via email to