On Mon, 2010-07-19 at 18:30 +0100, William Blunn wrote: > Consider storing the recovery filter stack in the dbox metadata rather > than the attachment file. > > This has a couple of upshots: > > 1. If one person receives a message with an attachment which is encoded > with base64 at say 19 cells (76 bytes) per line, and then re-sends the > same file as an attachment to someone else but their MUA encodes base64 > at say 18 cells (72 bytes) per line, the attachment file can contain > exactly the same data, allowing for deduplication even in this case.
I thought about that also, but it would require calculating and using a hash of the decoded message (but not the compressed message). Could get complex. > 2. Assuming we have configured Dovecot to decode base64 but not to > compress, then the file in which we store the attachment data contains > literally the exact same byte stream as if the attachment were saved out > from the MUA. I don't know what practical use this might be, but it > /sounds/ cool :-) Perhaps a suitable filesystem or backup-system could > deduplicate both a file *and* its instance as a message attachment. I was thinking about adding some small header to the dbox file, so they wouldn't be completely identical. BTW. I was thinking about using "number of characters per base64 line" rather than "number of cells". I don't think it's required that line ends with a complete cell.