Petri Lehtinen <pe...@digip.org> added the comment: endolith wrote: > > - If the mailbox is written using the mboxrd format and read using > > - the mboxo format, lines that were meant to start with ">From " > > - are changed to ">>>From ". This is a new type of corruption. > > Well, yes. So the choices are: > > mboxrd as default: Sometimes results in corruption > mboxo as default: Always results in corruption
I don't think so. Assuming that mboxo (the current default) was used to write the mailbox, both formats sometimes result in corruption. mboxo as default: "From " lines get written (and subsequently read) as ">From ". mboxrd as default: ">From " lines were written as ">From " but are read as "From ". Furthermore, if Python's mailbox module is used to write the mbox file and another software, that only supports mboxo, is used to read it (e.g. mutt), having mboxrd as the default would case ">From " lines to be written as ">>From ". These linew would then be read as ">>From " by the reading software. So, I'd like to keep the default as is, and add a parameter to change to mboxrd when it's OK for the use case at hand. We should also clearly document that mboxrd is recommended as it never corrupts data if used for both reading and writing. > Is there a way to reliably detect the format of the file and produce > an error if it seems to be reading it wrong? > > If not, maybe just include a function that guesses the format so the > correct option can be found easily? If there are consecutive ">" > quoted lines, like this, for instance: > > >This is the body. > >>From my point of view > >there are 3 lines. > > then it was probably encoded with mboxrd? If instead you find: > > >This is the body. > >From my point of view > >there are 3 lines. > > then it was probably encoded with mboxo? It's not possible to automatically detect the format. Guessing like you suggested is too fragile. It might work on some situations, but wouldn't work on others. If it was possible to detect the format by guessing, I'm sure RFC 4155 would mention that, as it aims for the best possible outcome for reading any of the formats, without knowing which format is actually in use. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13698> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com