Re: [Dovecot] (Single instance) attachment storage

William Blunn Mon, 19 Jul 2010 09:29:43 -0700

Timo Sirainen wrote:

Now that v2.0.0 is only waiting for people to report bugs (and me to figure out 
how to fix them), I've finally had time to start doing what I actually came 
here (Portugal Telecom/SAPO) to do. :)


The idea is to have dbox and mdbox support saving attachments (or MIME parts in 
general) to separate files, which with some magic gives a possibility to do 
single instance attachment storage. Comments welcome.


Cool.

Extra features
--------------

The attachment files begin with an extensible header. This allows a couple of 
extra features to reduce disk space:

1) The attachment could be compressed (header contains compressed-flag)


Cool.

2) If base64 attachment is in a standardized form that can be 100% reliably 
converted back to its original form, it could be stored decoded and then 
encoded back to original on the fly.


Cool.

I have thought about this issue in the past. What follows may be obviousto you already, but might as well mention rather than missing something.

Presumably you want to be able to recreate the original base64 streamexactly verbatim?

Under base64, the number of 4-byte (encoded) / 3-byte (decoded) cellsper line is not fixed by the specs.

I believe the optimal value is 19 cells per line, but I have seen somesystems use 18 cells per line, and I think I have seen 15 as well. Onceyou have three possibilities, you might as well just store the number ofcells per line.

I would suggest considering the base64 format as (conceptually) havingan (integer) parameter for the number of cells in each line (except forthe last line).

So base64(19) would have on each line 19 cells encoding 57 (19 × 3)bytes into 76 (19 × 4) bytes.

Probably you would need to have a base64 matcher/decoder which issmarter than normal base64 decoders and checks to make sure that alllines (apart from the last) are encoded (a) canonically (e.g.. with notrailing whitespace), and (b) using the same number of cells per line.

The base64 matcher/decoder needs to return information about the cellcount as well as the decoded data.

If any line is not canonical base64 or uses a different number of cells,then the base64 may still be valid but "weird" so would just be storedas the original base64 stream.

When recovering message data, obviously your base64 encoder needs to usea parameter which is the number of cells per line to encode. Then youget back your original base64 stream verbatim.

==

Some systems finish the base64 stream with a newline (which in amultipart manifests as a blank line between the base64 stream and the'--' of the MIME boundary), whereas some systems finish the base64stream at the end of final 4-byte cell (which in a multipart manifestsas the '--' of the MIME boundary appearing on the line immediatelyfollowing the base64 encoded data). Your encoding allows for arbitrarydata between the objects, so you would have no problem store these twocases verbatim. But something to watch out for when storing.

==

Maybe it would be a good idea to have the ability to say that an objectwas base64 decoded AND compressed (i.e. to recover the original streamfragment you need to decompress and base64 encode (with the relevantnumber of base64 cells per line)) --- as well as options for just base64decoded or just compressed.

You could go nuts and say that it is an arbitrarily-sized filter stack,but my first guess would be that this would be too much flexibility.


It might be better to say that there can be
zero or one decode/encode layers (like base64 or something else), and
zero or one compression layers (like gzip or bzip2 or xz/LZMA).

Obviously whatever translations are required to recover the originalstream should be encoded into the attachment file so that sysadmins cantune the storage algorithm without affecting previously stored attachments.


Bill

Re: [Dovecot] (Single instance) attachment storage

Reply via email to