Re: [Dbmail-dev] header storage schema changes. (W)Here we (may) go.

Aaron Stone Mon, 3 Jan 2005 11:01:23 +0100 (CET)

Paul J Stevens <[EMAIL PROTECTED]> said:

>>>> This structure will make it very easy to query all the headers from a 
>>>> given message or find all the messages with a given header, or a 
>>>> given header value.  It also leaves our current structure intact 
>>>> which will make it easier to phase in.
> 
> It will work, but you still require a lookup on an extra table to do
> header searches. There has to be hidden cost there. Also the added
> db-interaction at insertion time is very significant.


We could probably do it in one query:

  SELECT header, header_idnr FROM dbmail_headers
    WHERE header IN ('To', 'From', 'Subject', etc...)

[snip]
> preparse certain headers for common attributes: the in-reply-to and
> references headers will be used for threading. These headers contain 
> one or more message-id header values, which are stored separately.
> With this approach building message-threads can be done with fully
> indexed, single-table queries! Of course a union with the msgids from
> the opened mailbox is still required, but you can't beat such a setup
> wrt threading.

Neato!

[snip]
>>>> What do you think?  I don't think we need to special case any headers 
>>>> not even sendername or subject.
> 
> The more I think about this, the more convinced I am that we do need to
> treat common headers differently in the end. And your suggestion to
> preseed the header_names table and hardcode their ids tells me we're
> agreed here.

I'm skeptical. If we can remain generic at similar cost, let's go generic.
Performance numbers showing that special casing is *significantly* faster
should be a hard requirement here.

[snip]
> If I want to store mime-part headers the same as message-headers, first
> thing I have to do is change the physmessage_id references in the header
> tables to messageblk_idnr references. Next I would have to store mime-part
> headers same as message-headers.

Is that going to be smart enough to know *which* mime part is in which
messageblk? I'm still only getting a mental picture of knowing which
messageblks correspond to which physmessage and not the association of
mime parts to messageblks. Wouldn't we need one more layer of relation
table?

> Finally I would have to change the messageblks model
> 
> from:
> 
> block[0]: message-header
> block[1]: message-body[slice1]
> block[N]: message-body[sliceN]
> 
> to
> 
> block[0]: raw message-headers
> block[1]: mime-part[0]
> block[N]: mime-part[N]

block[0]: raw message-headers
block[1]: mime-part[0][slice1]
block[N]: etc.

If we decide to keep the slicing model for large mime parts.

> Where the only really tricky part is converting an existing dbmail
> storage to this modified approach. Still, way cool stuff.

Reparse everything! It'll be fun to write the migration tool :-P


Aaron

Re: [Dbmail-dev] header storage schema changes. (W)Here we (may) go.

Reply via email to