Bug#630799: Splitting tools out to help build-deps install time

2011-07-09 Thread Colin Watson
On Fri, Jun 17, 2011 at 05:06:29PM +0200, Loïc Minier wrote:
 On Fri, Jun 17, 2011, Colin Watson wrote:
  I really don't want to do this.  I'd rather optimise mandb.
 
  Ok; just so that I understand, is this about avoid confusion of the
  users, or complexity or...?

Splitting packages is for life, not just for Christmas.  Once I do it,
I'm pretty much stuck with it, or at least some vestige of it, forever.
Thus, I'm reluctant to do it solely for performance reasons which I feel
can be addressed in other ways.

If I exhaust the possibilities for optimising mandb without reaching
acceptable performance, then I'm willing to revisit splitting some tools
out into a separate package.

  I guess we can repurpose this bug to man-db is too slow on armel/ppc
  or something, which are arches where I've witnessed this.

Actually, I think I could do a lot better generally.  For example,
compare these two operations which have identical output, with hot cache
on a reasonably decent i386 laptop with fast SSD:

  cjwatson@sarantium /usr/share/man$ time find -type f | xargs cat | zcat 
/dev/null
  
  real0m2.494s
  user0m2.440s
  sys 0m0.324s

  cjwatson@sarantium /usr/share/man$ time find -type f | xargs -n1 zcat 
/dev/null
  
  real1m27.988s
  user0m7.940s
  sys 0m16.373s

mandb is currently acting more like the latter than the former (and, for
that matter, has similar runtime).  OK, so it isn't actually execing
zcat every time, instead forking and having one of the child processes
run an in-process function which uses zlib, thus saving an execve per
process and all the associated process startup costs, and I seem to
remember that that made a noticeable performance difference; but even
so, simply forking 2-odd processes (as in my example, which is in a
fairly complete environment with lots of manual pages installed;
probably very much less in a build chroot) isn't cheap.

In fact, strace indicates that mandb is forking on the order of four
processes per page.  Just the cost of forking, exiting, and waiting for
that number of processes comes to 23 seconds on my system out of mandb's
total runtime of around 100 seconds, and I strongly suspect that doing
any non-trivial multi-process work like this gives the scheduler trouble
and slows everything down further due to the sheer number of context
switches involved (trashing CPU caches, doing TLB flushes, etc.).

My plan here is to beef up libpipeline so that I can do all of mandb's
work in a single process.  In fact, I've had a to-do entry in the code
for some time: ideally, could there be a facility to execute
non-blocking functions without needing to fork?  These would be
something like coroutines or generators.  If I do this in libpipeline,
then the changes in man-db can be very small and wouldn't make the code
much harder to maintain: it would still look like running a pipeline of
processes, except that some of them happen to be non-forking function
calls, much as some of them can currently be function calls executed in
a child process.  The called functions would just need to be written
such that they can yield control and be re-entered later rather than
blocking.

If that doesn't speed things up enough, then I can look at having more
things done by passing buffers around rather than reading and writing
over pipes.  That breaks some useful abstraction layers, though (less
common compression methods are implemented by calling programs like
bzcat, and I'd rather not have to link directly against lots of
decompression libraries), and I'm not sure that it will be necessary.
My instinct is that I can make a very serious dent in mandb's runtime
without resorting to that.

Cheers,

-- 
Colin Watson   [cjwat...@debian.org]



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630799: Splitting tools out to help build-deps install time

2011-06-17 Thread Loïc Minier
Package: man-db
Version: 2.6.0.2-1
Severity: wishlist

Hey there

 I've seen this while installing build-deps in chroots for years:
Building database of manual pages ...

 and that's likely because I am too lazy to set man-db/auto-update
 properly.  Pbuilder offers an optin hook to do this, and I'm sure this
 could be done in other software, but it turns out most build software
 doesn't bother with this.  I checked random buildd logs of qemu and
 qemu-linaro in Debian and Ubuntu and found:
Setting up man-db (2.6.0.2-1) ...
Building database of manual pages ...
 this is particularly common because debhelper depends on man-db (as
 dh_installman calls man it seems); lintian also depends on man-db, but
 this is likely less of an issue on buildds.

 In the interest of saving buildd time without anyone having to set
 man-db/auto-update, I propose that we split the tools and the trigger /
 database handling in separate packages so that debhelper/lintian just
 depend on the tools, not on the presence of a database.

 NB: this is particularly bad on ports architectures where it often
 takes minutes to generate the DB for some reason

   Cheers,
-- 
Loïc Minier



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630799: Splitting tools out to help build-deps install time

2011-06-17 Thread Colin Watson
On Fri, Jun 17, 2011 at 03:19:09PM +0200, Loïc Minier wrote:
  In the interest of saving buildd time without anyone having to set
  man-db/auto-update, I propose that we split the tools and the trigger /
  database handling in separate packages so that debhelper/lintian just
  depend on the tools, not on the presence of a database.

I really don't want to do this.  I'd rather optimise mandb.

-- 
Colin Watson   [cjwat...@debian.org]



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#630799: Splitting tools out to help build-deps install time

2011-06-17 Thread Loïc Minier
On Fri, Jun 17, 2011, Colin Watson wrote:
 I really don't want to do this.  I'd rather optimise mandb.

 Ok; just so that I understand, is this about avoid confusion of the
 users, or complexity or...?

 I guess we can repurpose this bug to man-db is too slow on armel/ppc
 or something, which are arches where I've witnessed this.

-- 
Loïc Minier



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org