On Jan 4, 2012, at 12:03 PM, Jeremy Huntwork wrote: > On Jan 4, 2012, at 11:43 AM, Jeffrey Johnson wrote: >> Well "broken" is perhaps a bit harsh as a euphemism for >> Not useful to you. >> >> ;-) included just in case; I don't believe we disagree here. > > Chalk it down to perhaps a misunderstanding of the intended usage, but I'm > not entirely certain on that point yet. :-) >
No need to apologize for misunderstanding: We both agree that rpmrepo is only useful atm as "bug" fodder for @rpm 5.org code. >> The concept of mirrored markup used for depsolvers like >> yum/zypper/urpmi/smart >> is what is "broken" imho. rpmrepo is/was intended as a client side, >> not a server-side, tool. As a client side tool, the generated markup >> is entirely intact, was known to work with (at least) smart at the >> time rpmrepo was implemented, and the markup is still the same, >> and would work with that version of smart, on the client no matter >> what the createrepo de jure chooses to generate to feed to >> yum/zypper/urpmi/smart. > > Alright, but two things: > > 1. Please explain the use case as a client side tool. I'm not sure I > understand the intent there. > A complete explanation is what is needed in a "blueprint", not in e-mail. But I can supply a recent "real world" example of the consequences of choosing what I am calling a "server-side" (i.e. vendor distro metadata is created and mirrored and downloaded by vendor clients) depsolver implementation. In the last week createrepo was modified to add a "compat" compression functionality. The reason given for adding "compat" was to support RHEL6 where the distributed version of yum expects gzip'ed XML and bzip'ed sqlite3 for downloading, while Fedorable yum is using XZ! XZ! XZ! and perhaps deltification (I forget whether used in RHEL6: I don't use of believe in delta rpm's …) The "server-side" design flaw in the above is this: all the metadata is static content for efficient mirror transport using rsync. One can easily see *already* that there are multiple duplicated formats/compressions that MUST be distributed to mirrors because of XML vs sqlite3 gzip/bzip2 vs XZ! which leads to mirror bloat (yes there are fewer mirrors than clients). The "server-side" problem is pervasive however: any change to the one-size-fits-all distributed (in multiple "compatible" forms) leads to almost instant "legacy compatibility" issues. Consider what happens if one adds a tag which is useful for client side tools: one has to be very careful about phasing in the new change, or one will be forced on the "server-side" to distribute Yet Another Type of almost identical markup. So it takes months and years to change "server-side" markup, and a huge amount of planning to deploy any change. There's also the problem that most metadata is being distributed monolithically: Any change to the metadata forces a full download. (aside) Yes I know about various implementations intended to address the "monolithic" problem that are underway; I'm speaking generally here, and generally a full download is needed for any change at all to "server-side" metadata. I believe its obvious how a "client-side" tool addresses the problems of incompatibilities. But I will list a couple of the more important points: 1) a "client-side" tool scales because the #clients >> #servers creating a markup representation. IOW, there are more cpu cycles available on clients (in general) than on servers. And the compute intensive operations are digests/decompresses that are solely needed for bandwidth savings and integrity checks that are already provided by RPM in *.rpm packages that are going to be checked anyways. 2) version compatibilities cease to matter when a "client-side" tool chooses to continue generating gzip'ed XML and bzip'd sqlite3; in fact there's no need for both XML/sqlite3 on the client, only one format will be used by whatever depsolver is in use. This is the distinction I make between "client-side" and "server-side" spewage generators like rpmrepo (and createrepo). I'm pretty sure that you are trying to use rpmrepo as a "server-side" tool that generates sewage on a server. The intent is/was a "client-side" tool that generated markup (in the case of rpm's depsolver, its called a "solvedb", basically just like an rpmdb of "everything" but not installed) locally for speedy access in a specific format. > 2. Regardless of _where_ the tool is used, what is the <open-checksum> > supposed to be a hash of? It doesn't correspond to anything else I see in the > generated repo, whereas the schema leads me to believe that it was intended > to show the hash of the uncompressed file (as does the working functionality > of createrepo) - if so, then it actually is broken. > I don't believe there is any single convention for <open-checksum> content. IIRC (I would have to look, this is from memory) SuSE wished a digest on uncompressed content, while RedHat wanted a digest of compressed content. So 2 tags were added to the markup, thereby further increasing the complexity of the XML markup. (aside) And in fact, using a digest of the *.rpm file is rather broken design as well where various "server-side" tools are forced to duplicate functionality that is already mostly present in *.rpm packages. One can attempt a KISS argument, that the entire file needs to be digest checked, not just using the existing header SHA1 etc etc etc, but these discussions quickly become very complex and security related and are out of scope in what I am saying here). >> I personally think (and I believe we agree) that a NoSQL! distributed store >> instead >> of mirrored content is more soundly engineered, mostly because its rather >> easy to design an incremental update, which monolithic mounds of >> spewage (as generated by createrepo and mirrored by rsync and downloaded >> by url-grabber for use by yum) is still commonly doing. > > Yes, I'm still interested about the NoSQL solutions, but I'm a little hazy > about the implementation details, how to get up and running with a mongo > store (for this purpose). > rpm-5.3.x had a JSON spewage implementation that was routinely uploading the usual primary/filelists/other JSON analogue by piping to /usr/bin/mongo. The script with details is in rpm-5_4 somewhere. Note that Anders Bjorkland changed --json to --mongodb or something last May because JSON and the markup used by the mongo shell are really different; for proof-of-concept I just needed _SOME_ option to use four development. And the implementation has mostly sat there since last May while I have dealt with other matters like generating an income. >> Meanwhile, ROADMAP discussions for rpm-5_4 "features" are currently active at >> http://launchpad.net/rpm >> >> Feel free to add to this "blueprint" >> http://rpm5.org/community/rpm-devel/4444.html >> and spell out the usage case (for others, not me: I already >> know what I intended when I wrote rpmrepo). >> >> Adding bug reports detailing the "broken" and attaching to the "blueprint" >> (I will do the attach if you don't wish to) so that the amount of effort >> (and cost <-> benefit in general) can be estimated and scheduled/targeted >> at some milestone is what is going on this month. >> >> Any/all additional information will only increase the likelihood of finishing >> rpmrepo (or starting to use the embedded mongo-c-driver already present >> in rpm-5.3). > > I'll try to see if I can spend a few cycles assisting with the above… > This month (through January 20th) is the time slot allocated to devising a ROADMP feature list. Thanks to the Gregorian calendar, you have a 10 day headstart on the Russians at ROSA, who will be celebrating Christmas/NewYear's on a different calendar this week ;-) hth 73 de Jeff > JH > > ______________________________________________________________________ > RPM Package Manager http://rpm5.org > Developer Communication List rpm-devel@rpm5.org ______________________________________________________________________ RPM Package Manager http://rpm5.org Developer Communication List rpm-devel@rpm5.org