2013/2/12 Bron Gondwana <[email protected]>: > One of the perennial topics on #cyrus is "what about a more configurable set > of cached headers". >
Indeed. > As you can see, there are some normalised things from some headers. The same > information normalised in a DIFFERENT way in the ENVELOPE and then a > BODYSTRUCTURE and a BODY response. Yes it's redundant > > 1) keep the BODYSTRUCTURE, it's the result of parsing the entire message, and > can't be calculated cheaply again > 2) keep the SECTION data (possibly along with the bodystructure) - it's the > offsets for the various parts of the message, same issue > 3) add a list of "SUPPRESSED HEADERS". This would list any header which is > present in the file, but NOT in the cache. > 4) cache every other header, including all the To:, From:, Subject:, etc - in > as close to raw form as possible. > > The entire list of headers to suppress would initially be: > > received > dkim-signature > domainkey-signature > domainkey-x509 > > But it would be configurable as an imapd.conf option. > > NOTE: you can still infer the presence or absence just by querying the > suppressed list - so many messages the entire suppressed list would just be > 'received'. > > This should take fairly similar space to what we have now, be more flexible, > and be more future-proof. However, I think the cache file is already big today. It causes extra disk I/O. > No matter how you want to parse the fields, the original values is what > you've got! Even if you change the list of headers you suppress, each cache > record is complete in itself, so there's no loss of fidelity. > > It means a little more CPU to calculate the ENVELOPE, but seriously... I > don't think it's a worry in the current world, and it's not so commonly > requested anyway. Completely agree > ===== > > Thoughts? Your proposal sounds good. It is quite close to current dovecot behavior, according to the documentation : >Cache file may contain the following information for messages: > > Message headers (some, not all) > Sent date (parsed Date: header) > Received date (IMAP's INTERNALDATE field) > Physical and virtual message sizes > Message's parsed MIME structure, allowing to quickly read only a specific > MIME part (IMAP's FETCH BODY[1.2.3] command) > IMAP's BODY and BODYSTRUCTURE fields > If both are used, only BODYSTRUCTURE is saved, since BODY can be > generated from it > IMAP's ENVELOPE isn't cached currently. Instead the headers used to build > it are cached directly. I also like the opportunity to get out old cached data that are no longer needed. And the adaptative behavior depending how the IMAP clients work : http://wiki2.dovecot.org/IndexFiles http://wiki2.dovecot.org/Design/Indexes/Cache However, I wonder what happens when a webmail users requests to sort the mails by sender, if From headers are not all cached ! Regards, Sébastien
