So running commands on alh's massively split folder in the new
conversation-splitting-world that we have on master, I get this annotated
callgrind.
Commands:
. login
. select INBOX.alerts
. xconvsort (reverse arrival) (conversations position (691 30)) utf-8 all
. logout
1 OK Completed (in 92.756 secs) (it's about 15 times slower than running
without callgrind - this is a 6s command usually)
16,154,183,725 /usr/src/cyrus-next-build/cyrus/master/service.c:main
[/usr/cyrus-next/libexec/imapd]
16,148,984,824 /usr/src/cyrus-next-build/cyrus/imap/imapd.c:service_main
[/usr/cyrus-next/libexec/imapd]
16,148,289,216 /usr/src/cyrus-next-build/cyrus/imap/imapd.c:cmdloop
[/usr/cyrus-next/libexec/imapd]
15,826,028,681
/usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_reload_index_record
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,820,943,656
/usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_read_index_record
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,447,150,662
/usr/src/cyrus-next-build/cyrus/imap/annotate.c:annotatemore_msg_lookup
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
And there we have it. We're spending 95% of that time doing annotation lookups
for split conversations:
if ((record->system_flags & FLAG_SPLITCONVERSATION)) {
struct buf annotval = BUF_INITIALIZER;
annotatemore_msg_lookup(mailbox->name, record->uid, IMAP_ANNOT_NS
"basethrid", "", &annotval);
if (annotval.len == 16) {
const char *p = buf_cstring(&annotval);
/* we have a new canonical CID */
r = parsehex(p, &p, 16, &record->basecid);
}
buf_free(&annotval);
}
Which is all well and good, and we could possibly optimise this a bit more, but
over 50% is just seeking to find the record in the twoskip file. That's not
going to optimise much. Random access lookups on every read.
Alternative pathway ahead here is to change EVERYONE's cyrus.index files to
have space to store a separate basecid field. Downside, it adds 8 bytes to
every single cyrus.index record, regardless of whether they have any split
conversations. Upside, fast access and simplicity on the split conversation
handling.
(we also want to add some other interesting fields at the same time, for
example a CREATEDDATE which will allow us to implement purge based on when a
message was added to a folder by a move rather than on its INTERNALDATE).
... another alternative here is to delay reading basecid until we actually need
it. There aren't many places that request it. If we had rsto's soon-to-land
wrapper around index records, we could probably delay reading until someone
asks for the basecid.
Bron.
--
Bron Gondwana
[email protected]