From: Jonathan Tan [mailto:jonathanta...@google.com] > On 01/12/2017 01:20 AM, Matthew Wilcox wrote: > > From: Matthew Wilcox <mawil...@microsoft.com> > > > > Extend the --scissors mechanism to strip off the preamble created by > > forwarding a patch. There are a couple of extra headers ("Sent" and > > "To") added by forwarding, but other than that, the --scissors option > > will now remove patches forwarded from Microsoft Outlook to a Linux > > email account. > > > > Signed-off-by: Matthew Wilcox <mawil...@microsoft.com> > > Also add a test showing the kind of message that the current code > doesn't handle, and that this commit addresses.
OK. For the sake of discussion, here's what the base64-decoded message looks like: --- 8< --- -----Original Message----- From: Rehas Sachdeva [mailto:aquan...@gmail.com] Sent: Wednesday, January 4, 2017 11:55 AM To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v Make the output of radix tree test suite less verbose by default and add -v and -vv command line options for increasing level of verbosity. --- >8 --- > > --- > > mailinfo.c | 9 ++++++++- > > 1 file changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/mailinfo.c b/mailinfo.c > > index 2059704a8..fc1275532 100644 > > --- a/mailinfo.c > > +++ b/mailinfo.c > > @@ -332,7 +332,7 @@ static void cleanup_subject(struct mailinfo *mi, > struct strbuf *subject) > > > > #define MAX_HDR_PARSED 10 > > static const char *header[MAX_HDR_PARSED] = { > > - "From","Subject","Date", > > + "From","Subject","Date","Sent","To", > > Are these extra headers used in both the "real" e-mail headers and the > in-body headers, or only one of them? (If the latter, they should > probably be handled only in the relevant function - my previous patches > to this file were in that direction too, if I remember correctly.) Also, > I suspect that these will have to be handled differently to the other 3, > but that will be clearer when you add the test with an example message. Without this part of the patch, using --scissors means it stops parsing headers when it hits the 'Sent' line. So all I'm looking for is to have 'is_inbody_header()' return true. If there's a better way to achieve that, I'm all for it. Without this hunk of the patch, the commit looks like this: Without any of this patch, the commit looks like this: Author: Matthew Wilcox <mawil...@microsoft.com> Date: Sat Jan 7 18:33:43 2017 +0000 FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v -----Original Message----- From: Rehas Sachdeva [mailto:aquan...@gmail.com] Sent: Wednesday, January 4, 2017 11:55 AM To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v Make the output of radix tree test suite less verbose by default and add -v and -vv command line options for increasing level of verbosity. Without this hunk of the patch, the commit looks like this: Author: Rehas Sachdeva <[mailto:aquan...@gmail.com]> Date: Sat Jan 7 18:33:43 2017 +0000 FW: [PATCH v3] radix tree test suite: Dial down verbosity with -v Sent: Wednesday, January 4, 2017 11:55 AM To: Matthew Wilcox <mawil...@microsoft.com>; r...@surriel.com Subject: [PATCH v3] radix tree test suite: Dial down verbosity with -v Make the output of radix tree test suite less verbose by default and add -v and -vv command line options for increasing level of verbosity. And with this hunk, it turns into this: Author: Rehas Sachdeva <[mailto:aquan...@gmail.com]> Date: Sat Jan 7 18:33:43 2017 +0000 radix tree test suite: Dial down verbosity with -v Make the output of radix tree test suite less verbose by default and add -v and -vv command line options for increasing level of verbosity. There's more work to do here, turning that silly <[mailto:]> into a proper email address, for example, and parsing Sent: like we parse Date:, but I thought it worth sending out patches 1 & 2 before writing patch 3. > > @@ -685,6 +685,13 @@ static int is_scissors_line(const char *line) > > c++; > > continue; > > } > > + if (!memcmp(c, "Original Message", 16)) { > > 1) You can use starts_with or skip_prefix. I was following the existing logic in this function that looks for 8< or >8 or ... > 2) This seems vulnerable to false positives. If "Original Message" > always follows a certain kind of line, it might be better to check for > that. (Again, it will be clearer when we have an example message.) I don't think it's terribly vulnerable to false positives. The logic at the end of the function tries to make sure that the scissor marks make up a substantial component of the line. We could do this differently; I'm not sure if there's a regex library built into git, but we could even custom-write a separate matcher that looks only for ^-{3,}Original Message-{3,}$ Or we could use a different option; eg --forwarded that will trim off the extra gunk associated with forwarding messages, instead of overloading --scissors. > > + in_perforation = 1; > > + perforation += 16; > > + scissors += 16; > > + c += 15; > > Why 15? Also, can skip_prefix avoid these magic numbers? Again, following the rest of the function. c has already been advanced by 1, now it needs to be advanced to the end of the 16 character string "Original Message".