Ciaran McCreesh wrote:
On Tue, 10 Jun 2008 19:40:22 +0530
"Arun Raghavan" <[EMAIL PROTECTED]> wrote:
On Tue, Jun 10, 2008 at 7:32 PM, Ciaran McCreesh
<[EMAIL PROTECTED]> wrote:
[...]
I don't think that filename-vs-first-line is going to make a big
difference in practical performance.
It's about a factor of five difference in cold-cache resolution
performance for Paludis.
Could you please share more details on the experiment that showed this
kind of performance degradation and the numbers, if possible?
Shove an open and a getline in when doing what would otherwise be a
successful cache read. It adds a couple of thousand new open()s on
files that would otherwise be left alone, and changes the nice linear
"slurp up things in this directory in order until we don't need
anything else" pattern into "bounce backwards and forwards between lots
of files in two different directories".
On a cold cache, which is the most common use case, this is very very
nasty.
I'm curious as to what operation in particular we're looking at. Let's
say I type in "paludis --sync":
Obviously the first step is to rsync the portage tree. Then we find all
modified files (already output by rsync) and update the caches. We
already need to fully parse every file to create a dependency tree, so
why is it slow to cache the EAPI for each file while we're at it?
Next, suppose I type in "paludis -pi world":
Why wouldn't everything you need not already be in the cache? And if
something wasn't, then why is reading in the EAPI any slower than
reading in (R)DEPEND/KEYWORDS/IUSE/etc?
What specific operation (from an end-user perspective) is improved by
putting EAPI in the filename? I'm not interested in theoretical
operations like "given a portage tree, find the EAPI of every file" -
users don't do those operations routinely in isolation, but as part of
larger operations where doing an open and a getline now just speeds up
the open and getline that would be executed 500 lines later anyway.
Again, I'm not completely opposed to the idea of putting EAPI in the
filename, but I'm not convinced that the arguments in favor of it are as
clear-cut as some are suggesting. If everybody was open about the
pros/cons of each solution and not suggesting that one solution is
near-perfect and the others are total non-starters then this whole
debate would probably be a whole lot less contentious. When people
perceive that somebody isn't honest about the shortcomings of their own
proposal then they're less likely to trust them - it is just a human
thing...
--
gentoo-dev@lists.gentoo.org mailing list