On Mon, Sep 19, 2016 at 12:51:33PM +0200, Kevin Daudt wrote:

> > I didn't look in the RFC. Is:
> > 
> >   From: my \"name\" <f...@example.com>
> > 
> > really the same as:
> > 
> >   From: "my \\\"name\\\"" <f...@example.com>
> > 
> > ? That seems weird, but I think it may be that the former is simply
> > bogus (you are not supposed to use backslashes outside of the quoted
> > section at all).
> 
> Correct, the quoted-pair (escape sequence) can only occur in a quoted
> string or a comment. Even more so, the display name *needs* to be quoted
> when consisting of more then one word according to the RFC.

Hmm. So, I guess a follow-up question is: what would it be OK to do if
we see a quoted-pair outside of quotes? If the top one above violates
the RFC, it seems like stripping the backslashes would be a reasonable
outcome.

So if that's the case, do we actually need to care if we see any
parenthesized comments? I think we should just leave comments in place
either way, so syntactically they are only interesting insofar as we
replace quoted pairs or not.

IOW, I wonder if:

  while ((c = *in++)) {
        switch (c) {
        case '\\':
                if (!*in)
                        return 0; /* ignore trailing backslash */
                /* quoted pair */
                strbuf_addch(out, *in++);
                break;
        case '"':
                /*
                 * This may be starting or ending a quoted section,
                 * but we do not care whether we are in such a section.
                 * We _do_ need to remove the quotes, though, as they
                 * are syntactic.
                 */
                break;
        default:
                /*
                 * Anything else is a normal character we keep. These
                 * _might_ be violating the RFC if they are magic
                 * characters outside of a quoted section, but we'd
                 * rather be liberal and pass them through.
                 */
                strbuf_addch(out, c);
                break;
        }
  }

would work. I certainly do not mind following the RFC more closely, but
AFAICT the very simple code above gives a pretty forgiving outcome.

> > This is obviously getting pretty silly, but if we are going to follow
> > the RFC, I think you actually have to do a recursive parse, and keep
> > track of an arbitrary depth of context.
> > 
> > I dunno. This method probably covers most cases in practice, and it's
> > easy to reason about.
> 
> The problem is, how do you differentiate between nested comments, and
> escaped braces within a comment after one run?

I'm not sure what you mean. Escaped characters are always handled first
in your loop. Can you give an example (although if you agree with what I
wrote above, it may not be worth discussing further)?

-Peff

Reply via email to