Re: [PATCH v2 0/4] handle multiline in-body headers

2016-09-19 Thread Junio C Hamano
Jonathan Tan  writes:

> This is an iteration of the patch set after the discussion in
>  .
>
> Changes:
> o largely rewritten to follow Junio's suggested design (refactor of
>   check_header to separate out ">From" and "[PATCH]", and an
>   is_inbody_header function performing an airtight check on whether a
>   line is an in-body header)
> o simpler try_convert_to_utf8 API
> o one file of the expected output of t/t5100/*0015 is modified (instead
>   of the input)
> o t/t5100/*0018--no-inbody-headers test files added
> o example in commit message improved following Peff's suggestion

Overall it looks much nicer, but I still do not understand why we
want to do try-convert-to-utf8.  The _only_ caller of that helper
function is in 4/4 where it tries to convert the line to see if a
line is a scissors line.

It seems to me that the only thing doing so gives us is that you
could later enhance the definition of what a scissors-line looks
like by adding unicode characters [*1*], but I would strongly
suggest not doing that kind of enhancement [*2*].  With the current
definition of what a scissors-line looks like, it can be checked
before you do the UTF-8 conversion, I think, which would mean that
the helper is not strictly necessary.

So...


[Footnotes]

*1* E.g. ✂ encoded in non-UTF8 may have to be converted to UTF-8
first before being recognized as such.

*2* Encouraging people to use non-ASCII that older version of Git
did not take for structural things like "what is a scissors line"
merely for the looks is a bad trade-off; patch e-mails that use the
enhanced definition will break for those with older version of Git,
and the benefit of "prettier scissors" is not really worth it, I
would think.







[PATCH v2 0/4] handle multiline in-body headers

2016-09-19 Thread Jonathan Tan
This is an iteration of the patch set after the discussion in
 .

Changes:
o largely rewritten to follow Junio's suggested design (refactor of
  check_header to separate out ">From" and "[PATCH]", and an
  is_inbody_header function performing an airtight check on whether a
  line is an in-body header)
o simpler try_convert_to_utf8 API
o one file of the expected output of t/t5100/*0015 is modified (instead
  of the input)
o t/t5100/*0018--no-inbody-headers test files added
o example in commit message improved following Peff's suggestion

Jonathan Tan (4):
  mailinfo: separate in-body header processing
  mailinfo: refactor to support utf8 decode attempts
  mailinfo: make is_scissors_line take plain char *
  mailinfo: handle in-body header continuations

 mailinfo.c   | 164 ++-
 mailinfo.h   |   1 +
 t/t4150-am.sh|  23 +
 t/t5100-mailinfo.sh  |   2 +-
 t/t5100/info0018 |   5 ++
 t/t5100/info0018--no-inbody-headers  |   5 ++
 t/t5100/msg0015  |   2 -
 t/t5100/msg0018  |   2 +
 t/t5100/msg0018--no-inbody-headers   |   8 ++
 t/t5100/patch0018|   6 ++
 t/t5100/patch0018--no-inbody-headers |   6 ++
 t/t5100/sample.mbox  |  19 
 12 files changed, 198 insertions(+), 45 deletions(-)
 create mode 100644 t/t5100/info0018
 create mode 100644 t/t5100/info0018--no-inbody-headers
 create mode 100644 t/t5100/msg0018
 create mode 100644 t/t5100/msg0018--no-inbody-headers
 create mode 100644 t/t5100/patch0018
 create mode 100644 t/t5100/patch0018--no-inbody-headers

-- 
2.10.0.rc2.20.g5b18e70