I've modified my first setup. I'm now storing each mime-chunk as follows: CREATE TABLE dbmail_partlists ( physmessage_id INTEGER NOT NULL, is_header BOOLEAN DEFAULT '0' NOT NULL, ## each mime-header chunk is a header blob part_key INTEGER DEFAULT '0' NOT NULL, ## simply sequence per message part_depth INTEGER DEFAULT '0' NOT NULL, ## used for message/rfc822 attachments part_order INTEGER DEFAULT '0' NOT NULL, part_id TEXT NOT NULL ## foreign key to mimeparts );
CREATE TABLE dbmail_mimeparts ( id TEXT NOT NULL, ## primary key sha1 data TEXT NOT NULL, size INTEGER NOT NULL ); I've got insertion working beautifully. Attachments are stored encoded but seperate from the mime-headers that come with them. Those are inserted separately. The jury is still out as to decoding the attachments first. I'll probably add that as well. In fact that would mean you could insert the same attachment under different filenames using different encodings (base64/uuencode) and it would still be stored only once. Me like much. So: inserting the same messages over and over doesn't add *anything* in the mimeparts table. Retrieval is almost done but not quite there yet. I'm still playing around with the reconstruction and addition of the proper mime boundary strings. Almost there though. Once that is done, I can update the rest of the code (mostly some minor parts of imap) that talk to the messageblks table directly. git-branch: http://nfg3.nfgs.net/var/git/dbmail.git#mimechunk Jake Anderson wrote: > Paul J Stevens wrote: >> Aaron Stone wrote: >> >>> I think we should keep things encoded, because that's what clients >>> expect to receive. OTOH, encoded data cannot be searched. >>> >> >> I don't see how we can do both decoding and sha1 digests reliably at the >> same time. Seems like asking for a *lot* of trouble that is simply not >> worth it. >> >> > With all the talk of reducing file size I taught a basically free 30% > reduction would be worth it lol. > I don't see why it is so difficult if you are already seperating the > chunks? Does it not tell you if that chunk is encoded? > If it is then decode it, then hash it then store it. > > It wouldn't be on the fly but since the whole message is assembled > anyway before it gets sent to the database I don't see the big issue? > (IE trying to halt a memory copy at the expense of a 30% reduction in > write to disk doesn't sound worth it to me) >> Doing search on binary attachments is not required by any RFC, and if >> required should be done through a separate decode/stringify/index setup >> (check the wiki). >> > I agree with you there, there isn't a valid reason i can come up with > for an end user wanting to search a binary attachment. Although > especially with the adoption of plain text file formats for documents I > can see that becoming a potentially winning feature. I have often been > in the situation of needing to find a particular document somebody > emailed amidst four bazillion (hah bazillion passes spell check) other > emails from them about the same time. If the mime parts are decoded then > they could conceivably use the same fulltext indexing the message bodies > use? > > > ------------------------------------------------------------------------ > > _______________________________________________ > Dbmail-dev mailing list > Dbmail-dev@dbmail.org > http://twister.fastxs.net/mailman/listinfo/dbmail-dev -- ________________________________________________________________ Paul Stevens paul at nfg.nl NET FACILITIES GROUP GPG/PGP: 1024D/11F8CD31 The Netherlands________________________________http://www.nfg.nl _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev