On 28.05.2013 21:18, seth vidal wrote:
On Tue, 28 May 2013 20:42:13 +0300
Alek Paunov <a...@declera.com> wrote:
So, it seems that yum already have the "filelists on demand"
optimization implemented. Why you are asking for removing a feature,
which do not make the things worse ... ?

I'm not.

But when you download the filelists - it is A LOT of data.

It is of course :-). It is big and slow now, but it implements one more distinguishing and convenient Fedora feature ... and under careful schema and encoding, can be scaled down several times in both space and query time.

Actually, every "positive" (install, update) yum operation implies access to the repos. Repos contain everything. If our software was perfectly optimized, not only filelists but all other parts of the database (including primary.files, which you have cited initially) should be lazily synced, right?


I'd rather not have filedeps so it doesn't get pulled in for other
things in depsolving.


Sorry, I do not know how this amount of data will impact libsolv in the future. IMO, for yum (I mean in the sqlite based solution) it is a matter of optimizations.

I have a few questions:

   * What is the reasoning behind the splitting of the database across
many .sqlite files?

many? it's 3 afaik. primary, filelists, other.

how do you mean 'many?

Multiplied by the number of the repos. That is what I am trying to understand - Why not just single .sqlite file for the whole yum database?

   * Why the sql schema is so denormalized (IMO, leads to both
bandwidth and disk overspending without speed benefits)?. For
example: Why provides and requires tables do not use the common
domain table?

B/c it was designed 8yrs ago and we were going for compressable space
and making it as quick as possible to search?

In the provides and requires example, we do not have any space/speed benefits achieved by the missing common domain (dependency + dependency_evr tables). In the current situation we have fat and slow text duplication and indexes instead of integer references to the domain subnodes (dependencies is the biggest domain in the primary). Yes, in bunch of cases a little denormalization is inevitable when we fight for speed, but IMO, this and few other space flaws are with negative impact on the speed too.


   * Why the incremental update mechanism (eg. applying xml diffs to
the sqlite database) was not been considered from the very beginning?

It wasn't necessary? There was a massively smaller number of pkgs to
consider.


Indeed. Also, 8 years ago the possibilities and the number of ideas to reuse were definitely different :-)

Thank you,
Alek

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel

Reply via email to