I've modified my first setup. I'm now storing each mime-chunk as follows:

CREATE TABLE dbmail_partlists (
        physmessage_id  INTEGER NOT NULL,
        is_header       BOOLEAN DEFAULT '0' NOT NULL, ## each
mime-header chunk is a header blob
        part_key        INTEGER DEFAULT '0' NOT NULL, ## simply sequence
 per message
        part_depth      INTEGER DEFAULT '0' NOT NULL, ## used for
message/rfc822 attachments
        part_order      INTEGER DEFAULT '0' NOT NULL,
        part_id         TEXT NOT NULL ## foreign key to mimeparts
);

CREATE TABLE dbmail_mimeparts (
        id      TEXT NOT NULL, ## primary key sha1
        data    TEXT NOT NULL,
        size    INTEGER NOT NULL
);

I've got insertion working beautifully. Attachments are stored encoded
but seperate from the mime-headers that come with them. Those are
inserted separately. The jury is still out as to decoding the
attachments first. I'll probably add that as well. In fact that would
mean you could insert the same attachment under different filenames
using different encodings (base64/uuencode) and it would still be stored
only once. Me like much.

So: inserting the same messages over and over doesn't add *anything* in
the mimeparts table.

Retrieval is almost done but not quite there yet. I'm still playing
around with the reconstruction and addition of the proper mime boundary
strings. Almost there though. Once that is done, I can update the rest
of the code (mostly some minor parts of imap) that talk to the
messageblks table directly.

git-branch:

http://nfg3.nfgs.net/var/git/dbmail.git#mimechunk




Jake Anderson wrote:
> Paul J Stevens wrote:
>> Aaron Stone wrote:
>>   
>>> I think we should keep things encoded, because that's what clients
>>> expect to receive. OTOH, encoded data cannot be searched.
>>>     
>>
>> I don't see how we can do both decoding and sha1 digests reliably at the
>> same time. Seems like asking for a *lot* of trouble that is simply not
>> worth it.
>>
>>   
> With all the talk of reducing file size I taught a basically free 30%
> reduction would be worth it lol.
> I don't see why it is so difficult if you are already seperating the
> chunks? Does it not tell you if that chunk is encoded?
> If it is then decode it, then hash it then store it.
> 
> It wouldn't be on the fly but since the whole message is assembled
> anyway before it gets sent to the database  I don't see the big issue?
> (IE trying to halt a memory copy at the expense of a 30% reduction in
> write to disk doesn't sound worth it to me)
>> Doing search on binary attachments is not required by any RFC, and if
>> required should be done through a separate decode/stringify/index setup
>> (check the wiki).
>>   
> I agree with you there, there isn't a valid reason i can come up with
> for an end user wanting to search a binary attachment. Although
> especially with the adoption of plain text file formats for documents I
> can see that becoming a potentially winning feature. I have often been
> in the situation of needing to find a particular document somebody
> emailed amidst four bazillion (hah bazillion passes spell check) other
> emails from them about the same time. If the mime parts are decoded then
> they could conceivably use the same fulltext indexing the message bodies
> use?
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Dbmail-dev mailing list
> Dbmail-dev@dbmail.org
> http://twister.fastxs.net/mailman/listinfo/dbmail-dev


-- 
  ________________________________________________________________
  Paul Stevens                                      paul at nfg.nl
  NET FACILITIES GROUP                     GPG/PGP: 1024D/11F8CD31
  The Netherlands________________________________http://www.nfg.nl
_______________________________________________
Dbmail-dev mailing list
Dbmail-dev@dbmail.org
http://twister.fastxs.net/mailman/listinfo/dbmail-dev

Reply via email to