On Nov 27, 2009, at 1:59 PM, Joshua Rodman wrote:

On Fri, Nov 27, 2009 at 01:06:04PM -0800, Joshua Juran wrote:
On Nov 27, 2009, at 10:04 AM, Simon Wistow wrote:

On Fri, Nov 27, 2009 at 01:29:26PM +0000, Roger Burton West said:
I should like to find the person who decided that since "bookmarks"
and
"history" were both lists of URLs they ought to be integrated in a
single database. I should like to shake him warmly by the throat
until
his head comes off.

ObJWZ: http://www.jwz.org/doc/mailsum.html

The loosely related formats specified only vaguely (or not at all) and
known collectively as 'mbox' are too broken to live.

From what I've seen, at any rate. I don't care how long it's been in use -- "we've always done it this way" is no excuse for data corruption. And yes, inserting garbage characters into my mail because mbox uses a common English term as a record separator counts as data corruption, especially
when it's the same character used for quoting.

Horribly, horribly broken. Although the format's broken isn't half as
hateful as the community's complacency allowing it to persist.

Josh

True, but I prefer that kind of corruption to the corruption I
experienced every 2 weeks or so when I tried out the joys known as
Entourage and Thunderbird.

mbox sure sucks, but other people seem to manage to suck more.

Currently I'm trying out maildir with offlineimap, but the cost of
processing thousands of independent files can suck too.

Fixing mbox isn't hard -- it's easy enough to mandate that a leading '>' must be escaped -- but the trick is making sure people don't continue to use broken legacy tools, which is basically impossible. So instead I propose a new format where a mailbox is stored as a concatenation of messages, where each message is terminated by a dot and leading dots are prefixed with another dot. If you also used CRLF as a line terminator, it would match the SMTP wire protocol.

This format would have the same complexity characteristics as mbox (i.e. using it would be no more or less efficient), just as human- readable, and actually more machine-readable, since there's no ambiguity as to what constitutes a record separator.

And as a plus, the messages wouldn't get corrupted.

Josh


Reply via email to