On Wed, Sep 14, 2016 at 12:38:20PM -0700, Jeff King wrote: > On Wed, Sep 14, 2016 at 12:30:06PM -0700, Junio C Hamano wrote: > > > Another small thing I am not sure about is if the \ quoting can hide > > an embedded newline in the author name. Would we end up turning > > > > From: "Jeff \ > > King" <p...@peff.net> > > > > or somesuch into > > > > Author: Jeff > > King > > Email: p...@peff.net > > > > ;-) > > Heh, yeah. That is another reason to clean up and sanitize as much as > possible before stuffing it into another text format that will be > parsed.
A quoted string cannot contain newlines according to the RFC, so I think we don't need to care about that. > > > So let's roll the \" -> " into mailinfo. > > > > I am not sure if we also should remove the surrounding "", i.e. we > > currently do not turn this > > > > From: "Jeff King" <p...@peff.net> > > > > into this: > > > > Author: Jeff King > > Email: p...@peff.net > > > > I think we probably should, and remove the one that does so from the > > reader. > > I think you have to, or else you cannot tell the difference between > surrounding quotes that need to be stripped, and ones that were > backslash-escaped. Like: > > From: "Jeff King" <p...@peff.net> > From: \"Jeff King\" <p...@peff.net> > > which would both become: > > Author: "Jeff King" > Email: p...@peff.net > > though I am not sure the latter one is actually valid; you might need to > be inside syntactic quotes in order to include backslashed quotes. I > haven't read rfc2822 carefully recently enough to know. > > Anyway, I think that: > > From: One "Two \"Three\" Four" Five > > may also be valid. So the quote-stripping in the reader is not just "at > the outside", but may need to handle interior syntactic quotes, too. So > it really makes sense for me to clean and sanitize as much as possible > in one step, and then make the parser of mailinfo as dumb as possible. > Makes sense, the current itteration of my patch already strips exterior quotes, no matter where they happen. I will send a patch soon.