Great analysis.

GList is the most absolutely rudimentary list structure you could
imagine. g_list_append walks the list to the end, then adds the new
element. Totally O(n^2). A bit of googling turned up some very defensive
arguments against changing GList to be a more opaque type with efficient
head and tail pointers and length information.

We have several calls to g_list_length in the codebase, too. These walk
all elements to find the length. Yuck.

I think we should use g_list* as the internals to our own opaque list
type. Given that most messages have only 15-20 headers, a hash table is
probably not going to be more efficient due to its overhead -- it would
have better worst-case performance, though.

Aaron

On Mon, 2006-03-06 at 22:00 -0600, Matthew Sayler wrote:
> > I compiled dbmail and glib with profiling and gprof shows almost all
> > time being spent in IA__g_list_last.  (Note the CPU time reported by
> > time was 277.17s -- the rest is probably in glibc and mysql libs..)
> > 
> >  time   seconds   seconds    calls   s/call   s/call  name    
> >  99.35    272.99   272.99   344704     0.00     0.00  IA__g_list_last
> >   0.20    273.54     0.55 18702183     0.00     0.00  ucmp
> >   0.10    273.82     0.28   344681     0.00     0.00  g_tree_node_insert
> >   0.06    273.99     0.17   860545     0.00     0.00  g_tree_node_lookup
> >   0.05    274.12     0.13   344771     0.00     0.00  _g_list_alloc
> >   0.03    274.20     0.08       14     0.01     0.01  db_getmailbox
> > (full profile log attached)
> 
> OK -- O(n^2) behavior is bad.  The traverse_tree_{keys, values, merge}
> functions all ended up appending message pointers to the end of a linked
> list.  When you do this for 80,000 messages, you end up following about
> 3.2 billion pointers.. Yuck.
> 
> Attached patch created copies of these functions (traverse_tree_keys_rev
> etc) which prepends list items and then uses a g_list_reverse to sort
> out the mess.  It seems to work for me but I've given it minimal
> testing.
> 
> There are better ways of doing this, but for me this reduces the above
> mailbox query from 272cpusec to 5.1cpusec, for a tidy 50x speedup.
> 
> Matt
> 
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev
> 

Reply via email to