From: Jonathan Tan [mailto:jonathanta...@google.com]
> On 01/12/2017 01:20 AM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawil...@microsoft.com>
> >
> > Extend the --scissors mechanism to strip off the preamble created by
> > forwarding a patch.  There are a couple of extra headers ("Sent" and
> > "To") added by forwarding, but other than that, the --scissors option
> > will now remove patches forwarded from Microsoft Outlook to a Linux
> > email account.
> >
> > Signed-off-by: Matthew Wilcox <mawil...@microsoft.com>
> 
> Also add a test showing the kind of message that the current code
> doesn't handle, and that this commit addresses.

OK.  For the sake of discussion, here's what the base64-decoded message looks 
like:

--- 8< ---

-----Original Message-----
From: Rehas Sachdeva [mailto:aquan...@gmail.com]
Sent: Wednesday, January 4, 2017 11:55 AM
To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com
Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

Make the output of radix tree test suite less verbose by default and add
-v and -vv command line options for increasing level of verbosity.

--- >8 ---

> > ---
> >  mailinfo.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/mailinfo.c b/mailinfo.c
> > index 2059704a8..fc1275532 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi,
> struct strbuf *subject)
> >
> >  #define MAX_HDR_PARSED 10
> >  static const char *header[MAX_HDR_PARSED] = {
> > -   "From","Subject","Date",
> > +   "From","Subject","Date","Sent","To",
> 
> Are these extra headers used in both the "real" e-mail headers and the
> in-body headers, or only one of them? (If the latter, they should
> probably be handled only in the relevant function - my previous patches
> to this file were in that direction too, if I remember correctly.) Also,
> I suspect that these will have to be handled differently to the other 3,
> but that will be clearer when you add the test with an example message.

Without this part of the patch, using --scissors means it stops parsing headers 
when it hits the 'Sent' line.  So all I'm looking for is to have 
'is_inbody_header()' return true.  If there's a better way to achieve that, I'm 
all for it.  Without this hunk of the patch, the commit looks like this:

Without any of this patch, the commit looks like this:

Author: Matthew Wilcox <mawil...@microsoft.com>
Date:   Sat Jan 7 18:33:43 2017 +0000

    FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    -----Original Message-----
    From: Rehas Sachdeva [mailto:aquan...@gmail.com]
    Sent: Wednesday, January 4, 2017 11:55 AM
    To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com
    Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.

Without this hunk of the patch, the commit looks like this:

Author: Rehas Sachdeva <[mailto:aquan...@gmail.com]>
Date:   Sat Jan 7 18:33:43 2017 +0000

    FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Sent: Wednesday, January 4, 2017 11:55 AM
    To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com
    Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.

And with this hunk, it turns into this:

Author: Rehas Sachdeva <[mailto:aquan...@gmail.com]>
Date:   Sat Jan 7 18:33:43 2017 +0000

    radix tree test suite: Dial down verbosity with -v

    Make the output of radix tree test suite less verbose by default and add
    -v and -vv command line options for increasing level of verbosity.


There's more work to do here, turning that silly <[mailto:]> into a proper 
email address, for example, and parsing Sent: like we parse Date:, but I 
thought it worth sending out patches 1 & 2 before writing patch 3.

> > @@ -685,6 +685,13 @@ static int is_scissors_line(const char *line)
> >                     c++;
> >                     continue;
> >             }
> > +           if (!memcmp(c, "Original Message", 16)) {
> 
> 1) You can use starts_with or skip_prefix.

I was following the existing logic in this function that looks for 8< or >8 or 
...

> 2) This seems vulnerable to false positives. If "Original Message"
> always follows a certain kind of line, it might be better to check for
> that. (Again, it will be clearer when we have an example message.)

I don't think it's terribly vulnerable to false positives.  The logic at the 
end of the function tries to make sure that the scissor marks make up a 
substantial component of the line.

We could do this differently; I'm not sure if there's a regex library built 
into git, but we could even custom-write a separate matcher that looks only for 
^-{3,}Original Message-{3,}$

Or we could use a different option; eg --forwarded that will trim off the extra 
gunk associated with forwarding messages, instead of overloading --scissors.

> > +                   in_perforation = 1;
> > +                   perforation += 16;
> > +                   scissors += 16;
> > +                   c += 15;
> 
> Why 15? Also, can skip_prefix avoid these magic numbers?

Again, following the rest of the function.  c has already been advanced by 1, 
now it needs to be advanced to the end of the 16 character string "Original 
Message".

Reply via email to