[Last e-mail from me on this sub-thread.  I don't really want to
contribute to the length of this mega-thread.]

On Fri, Aug 28, 2009 at 04:00:49PM -0700, Brock Pytlik wrote:
> Also remember that update time is very important in our model. What I 
> found when testing other search engines was that a) if you optimized 
> them after every install/uninstall (doing the equivalent of rebuilding 
> the index) they were slow to finish and b) if you didn't optimize them, 
> their performance dropped off rather quickly. Of course, having not 
> tested SQLite3, maybe it's solved these problems. In fact, for most 
> users, I'd argue that insertion/install time matters much more to them 
> than local search performance. Also remember that the indexes can't go 
> on the live CD (unless the indexes produced by SQLite are 
> microscopically small), that search needs to work identically (if more 
> slowly) when the index isn't present or that building an index from 
> scratch on a netbook is fast and unnoticeable to a user.

I don't recall the time and DB size from my roboporter days, so I just
tried an experiment.  I sucked in the SVR4 contents file from a Nevada
box (301713 lines in total) into a SQLite3 DB.  Below are the results.
Are the load+index ptime numbers acceptable for install time?  What
about the DB sizes?

 - non-indexed load from CSV:

real        4.446439769
user        4.287435051
sys         0.155222414

   with DB size ~31MB (31164KB).

 - indexed load from CSV:

real        9.712495976
user        8.789524901
sys         0.893463404

   with DB size ~53MB (53752KB).

   Loading first, then indexing takes about the same total time.

   This is with a single index on basename and dirname.  Other indexes
   would be needed, no doubt.

 - The DB can be loaded entirely into memory too, if you want, including
   indexes, so that you avoid disk I/O (assuming you're not paging out
   anon pages).  The indexed load from CSV into a :memory: DB took:

real        8.679847874
user        8.478430602
sys         0.195397422

 - time to search for "ls", including fork() + exec() + SQL compilation
   time:

real        0.008348750
user        0.001272678
sys         0.003709161

(This is on a Sun Fire X4200.)

Finally, SQLite3 does fsync() operations per-transaction (in order to
implement ACID properties); write(2)s in each transaction are otherwise
asynchronous.  Which means that it pays to use as few and large
transactions as possible, preferably just one.  So if pkg(1) is
installing 10 pkgs, say, it should use a single transaction to update
the DB.

Nico
-- 
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to