>Thu Sep 25 2008 13:00:43 EDT from IGnatius T [EMAIL PROTECTED] >Subject: Re: Citadel commit log: revision 6628 > > samjam: welcome :)
thanks! > Let's get up to speed. First of all, the place you want to be looking >isn't the MSG2 command, which is only used in the client protocol. You have >to go deeper, into functions such as CtdlOutputMsg() and >CtdlOutputPreloadedMsg() which then tell the back end to either fetch the >body or don't fetch the body. > > Today I've finished some work that I thought was already completed -- >previously, a caller could specify HEADERS_ALL, HEADERS_NONE, or >HEADERS_ONLY. Now there is an additional mode, HEADERS_FAST, which will only >fetch the "top level" headers -- in other words, the headers that are encoded >into the database rather than the ones left in RFC822 format in the message >body. > > The sieve module is using HEADERS_ONLY. This means that things like >X-Spam-Status: should work perfectly now. Since the sieve module runs in the >background (from the user's perspective), we don't have to worry too much >about the performance hit. > > IMAP is another story. IMAP is using HEADERS_FAST because it is >*absolutely* *imperative* that a FETCH operation on a large folder (10,000 to >100,000 messages, many which contain multimegabyte attachments) complete in a >reasonable amount of time. aye >The common replies are: > > * "No one should be keeping mailboxes that big." > > * "If it's that big then they should just deal with the performance >hit." > > > Well, that sounds great if you're just a developer, but it doesn't work in >the real world. For example, we use Citadel here at my workplace, and during >the first year or so after we switched our email to Citadel, I had to deal >with upper management types screaming directly at me because IMAP performance >was not acceptable. If we switch away from HEADERS_FAST mode, I guarantee >that Citadel will be removed from places which depend on the expected level >of performance. > > That having been said, I do appreciate that we need to make things like >Pocket Outlook work properly, so let's see if we can come up with a way to >get the headers we need. > Pocket outlook is just an (important) special case. It also worries me that an IMAP headers request should ever not return all of the headers. It's quite reasonable that imap spam filters will use received headers for spam decisions. (It's also weird that webcit headers view doesn't show all headers). > We've had to work with this problem before. For example, check out r5398, >where we had to make a change in order to accomodate the iPhone, which does > > UID FETCH (BODY.PEEK[HEADER] BODY.PEEK[TEXT]) > > ...and then gets all bent out of shape when it doesn't see the >Content-type: header where it expects it. > > One may ask, why isn't Content-type: in the top level headers, if it's so >important? Normally, we *do* move headers into the top level when we >determine that they're important. For example, List-ID: was moved just a >couple of weeks ago. The answer has to to with MIME parsing. Since >multipart nesting is a very recursive operation, and every level of nesting >has its own Content-type: header, it made far more sense to keep >Content-type: where it is, in order to allow the very same parser code to >work at every level of nesting. We would have had to make ugly changes to >the parser to say "at the top level, go with this content type, at other >levels, extract it from the nested part..." indeed. There's mime-version, as well. > > One solution might be to store Content-type: as *both* a top-level header >*and* in the message body. There are two caveats to this approach: > > 1. It will only be effective for messages which arrive after we make the >change. Existing messages will still appear wrong. Is this an acceptable >caveat? for me, because I'm a new adopter, but existing users who start using pocket outlook might sqeak a bit > > 2. It breaks our data model slightly, because up until now, one could count >on a particular header only appearing in one place or the other. Additional >code will be required to handle Content-type: as a special case, where it is >copied to the top level headers, *not* stripped from the body, and then >*ignored* once whenever it appears twice. Is the resulting benefit worth >doing this? > I've got some other suggestions. The "Other headers" can be stored seperate, like the body is (but not in the same place), so that they can be fetched without fetching the body. (I don't like that idea) or, the body can be fetched in small chunks until the end-of-headers is spotted. Mabe the header size can be put in a top level header? (Sounds fine to me) [I can't find the un-indent in citedel composer, I've had to use html view to close and re-open the blockquote tags] Sam