On Dec 14, 2010, at 3:00 PM, Per Øyvind Karlsen wrote:

> 
> On a related note though I've started giving parentdir & symlink deps
> some more thoughts again though, skimming the surface on practical
> issues and drawbacks of such as ie. the size of files.xml.lzma in
> main/release currently being 3736704 bytes compared to the gzip
> compressed synthesis.hdlist.cz being as small as 775230 bytes (if
> switching to xz compression, size could be trimmed down  ~30%
> further), so something more optimal than requiring this metadata which
> currently for a single repository alone is bigger in size than the sum
> of synthesis for all repos together. The necessary cleaning implied by
> stricter policies on ownership and getting rid of all the incorrectly
> multiple owned directories might improve it marginally, putting some
> thinking into the format of the metadata (with something as repetetive
> data as lists of paths are, I bet there must be some room for improval
> utilizing this knowledge about the data to compress) might improve
> things a bit as well, but deltas are the only known solution to
> worthwile size reduction gains to at least reducing the pain of having
> to download large metadata files with minimal changes between
> frequently when being on a rolling distro such as ie. cooker, although
> putting maintenance headaches of maintaining several deltas for it to
> work (bundling several deltas into some indexed revision history to
> match against older revisions of the metadata could perhaps be an idea
> to make things slightly less messy perhaps?).
> 

The issues of the size of files.xml* and synthesis.hdlist* have nothing
whatsoever to do with parentdir/linkto dependencies.

The Mandriva flaw (imho) was adding
        Provides: /path/to/file
so that file dependencies end-up in (the equiv of) primary.xml*
and files.xml* doesn't need to be downloaded. Well you can do that,
but "cherry picking" dependencies that you think you need and putting
a representation in a different file, addresses only the bandwiidth
savings (by avoiding need for files.xml* download) without really
addressing the fundamental problem of dependency data reduction.

Deltas (and compression) aren't the only means to data reduction.
There is --rsyncable (and a change in transport) and there is
also "binary xml" used to some extent in Apple *.plist and there's
a similar binary representation (with associated data reduction) of
XML here:
        http://www.ccnx.org/releases/latest/doc/technical/BinaryEncoding.html
The point is that there's more solutions than deltafication/compression 
available.

There's also nothing wrong with a "less == more" approach to bandwidth 
reduction:
release repository metadata monthly or weekly rather than daily or hourly.

This will cause some pain for developers, but those are exactly the
people likelier to have bandwidth and be interested in timeliness.

Most users (who drive the RFE for bandwidth/data reduction ultimately) would
benefit from _NOT_ having to download all the "stuff" repeatedly.

A properly engineered solution needs to be incremental (perhaps
that's what you meant instead of delta?), distributing metadata
only for the changed packages, not for everything all over again again.

But overall, I would not expect parentdir/linkto dependencies to affect
repository metadata size directly very much at all.

There is a secondary/implicit effect of parentdir dependencies which
cause one to look more carefully at dependency graphs, where there's all
sorts of unnecessary data. The current automagic dependency generation
trades redundancy for reliability: having multiple types of dependencies
(file, soname, interpreter, --rpm-requires, ... parentdir, linkto) 
over-generated increases
reliability through redundancy, and attempts to avoid the cost of direct manual
developer packaging changes.

If you happen to have a zoo full of "package monkeys" willing to work for 
bananas
adding additional package metadata markup, well, the automation just isn't 
needed or useful.

> Just some thoughts of mine, I bet others has put a lot more thinking
> with more actual insight, knowledge and even experience
> on the matters, I'd love to hear others' thoughts on the matter, with
> perhaps some suggestions and ideas to look into when
> getting there..
> 
> Olivier & Giuseppe, I CC'ed you two as I know these are fields which
> you both has/had some interest in at least on some level. :)
> 

hth

73 de Jeff

> --
> Regards,
> Per Øyvind
> ______________________________________________________________________
> RPM Package Manager                                    http://rpm5.org
> Developer Communication List                        rpm-devel@rpm5.org

______________________________________________________________________
RPM Package Manager                                    http://rpm5.org
Developer Communication List                        rpm-devel@rpm5.org

Reply via email to