Re: [PATCH v2] format-patch: respect --stat in cover letter's diffstat

2018-11-12 Thread Laszlo Ersek
On 11/10/18 06:46, Nguyễn Thái Ngọc Duy wrote:
> Commit 43662b23ab (format-patch: keep cover-letter diffstat wrapped in
> 72 columns - 2018-01-24) uncondtionally sets stat width to 72 when
> generating diffstat for the cover letter, ignoring --stat from command
> line. But it should only do so when stat width is still default
> (i.e. stat_width == 0).
> 
> In order to fix this, we should only set stat_width if stat_width is
> zero. But it will never be. Commit 071dd0ba43 (format-patch: reduce
> patch diffstat width to 72 - 2018-02-01) makes sure that default stat
> width will be 72 (ignoring $COLUMNS, but could still be overriden by
> --stat). So all we need to do here is drop the assignment.
> 
> Reported-by: Laszlo Ersek 
> Helped-by: Leif Lindholm 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/log.c  |  2 --
>  t/t4052-stat-output.sh | 48 +-
>  2 files changed, 33 insertions(+), 17 deletions(-)

* This submission should have been posted as v3, not v2. V2 was posted at

https://public-inbox.org/git/20181107164953.24965-1-pclo...@gmail.com/

* Comparing the patch emails, the only difference is that this version
renames "expect.40" to "expect.60". This should have been mentioned in a
cover letter, or in the Notes section of the current submission.

* In my response to the (original) v2 posting, at

https://public-inbox.org/git/f0f95dd0-1a9e-01d0-70f4-3c6d5450d...@redhat.com/

I stated that I didn't try to run the test suite, and gave my T-b (under
the circumstances described there). Given that the change in v3 (= this
submission) is limited to the test case, I think my T-b should have been
carried forward.

Thanks
Laszlo



> diff --git a/builtin/log.c b/builtin/log.c
> index 061d4fd864..1a39c6e52a 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1009,8 +1009,6 @@ static void show_diffstat(struct rev_info *rev,
>  
>   memcpy(, >diffopt, sizeof(opts));
>   opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
> - opts.stat_width = MAIL_DEFAULT_WRAP;
> -
>   diff_setup_done();
>  
>   diff_tree_oid(get_commit_tree_oid(origin),
> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
> index 6e2cf933f7..28c053849a 100755
> --- a/t/t4052-stat-output.sh
> +++ b/t/t4052-stat-output.sh
> @@ -44,42 +44,50 @@ show --stat
>  log -1 --stat
>  EOF
>  
> -while read cmd args
> +cat >expect.60 <<-'EOF'
> + ...a | 1 +
> +EOF
> +cat >expect.6030 <<-'EOF'
> + ...aaa | 1 +
> +EOF
> +cat >expect2.60 <<-'EOF'
> + ...a | 1 +
> + ...a | 1 +
> +EOF
> +cat >expect2.6030 <<-'EOF'
> + ...aaa | 1 +
> + ...aaa | 1 +
> +EOF
> +while read expect cmd args
>  do
> - cat >expect <<-'EOF'
> -  ...a | 1 +
> - EOF
>   test_expect_success "$cmd --stat=width: a long name is given more room 
> when the bar is short" '
>   git $cmd $args --stat=40 >output &&
>   grep " | " output >actual &&
> - test_cmp expect actual
> + test_cmp $expect.60 actual
>   '
>  
>   test_expect_success "$cmd --stat-width=width with long name" '
>   git $cmd $args --stat-width=40 >output &&
>   grep " | " output >actual &&
> - test_cmp expect actual
> + test_cmp $expect.60 actual
>   '
>  
> - cat >expect <<-'EOF'
> -  ...aaa | 1 +
> - EOF
>   test_expect_success "$cmd --stat=...,name-width with long name" '
>   git $cmd $args --stat=60,30 >output &&
>   grep " | " output >actual &&
> - test_cmp expect actual
> + test_cmp $expect.6030 actual
>   '
>  
>   test_expect_success "$cmd --stat-name-width with long name" '
>   git $cmd $args --stat-name-width=30 >output &&
>   grep " | " output >actual &&
> - test_cmp expect actual
> + test_cmp $expect.6030 actual
>   '
>  done <<\EOF
> -format-patch -1 --stdout
> -diff HEAD^ HEAD --stat
> -show --stat
> -log -1 --stat
> +expect2 format-patch --cover-letter -1 --stdout
> +expect diff HEAD^ HEAD --stat
> +expect show --stat
> +expect log -1 --stat
>  EOF
>  
>  
> @@ -95,6 +103,16 @@ t

Re: [PATCH v2] format-patch: respect --stat in cover letter's diffstat

2018-11-07 Thread Laszlo Ersek
On 11/07/18 17:49, Nguyễn Thái Ngọc Duy wrote:
> Commit 43662b23ab (format-patch: keep cover-letter diffstat wrapped in
> 72 columns - 2018-01-24) uncondtionally sets stat width to 72 when
> generating diffstat for the cover letter, ignoring --stat from command
> line. But it should only do so when stat width is still default
> (i.e. stat_width == 0).
> 
> In order to fix this, we should only set stat_width if stat_width is
> zero. But it will never be. Commit 071dd0ba43 (format-patch: reduce
> patch diffstat width to 72 - 2018-02-01) makes sure that default stat
> width will be 72 (ignoring $COLUMNS, but could still be overriden by
> --stat). So all we need to do here is drop the assignment.
> 
> Reported-by: Laszlo Ersek 
> Helped-by: Leif Lindholm 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/log.c  |  2 --
>  t/t4052-stat-output.sh | 48 +-
>  2 files changed, 33 insertions(+), 17 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index 061d4fd864..1a39c6e52a 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1009,8 +1009,6 @@ static void show_diffstat(struct rev_info *rev,
>  
>   memcpy(, >diffopt, sizeof(opts));
>   opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
> - opts.stat_width = MAIL_DEFAULT_WRAP;
> -
>   diff_setup_done();
>  
>   diff_tree_oid(get_commit_tree_oid(origin),

Because I plan to use the patch on top of v2.19.1 (until the next major
release, v2.20, is made), that's also where I applied and tested the patch.

With master @ a4b8ab5363a3, this patch targets show_diffstat(). At
v2.19.1, commit fa5b7ea670f4 ("format-patch: allow additional generated
content in make_cover_letter()", 2018-07-23) had not occurred yet, so
there the subject code still lived in make_cover_letter(). On my end,
git-am has applied the hunk to make_cover_letter() seamlessly.

I tested the patch with "--stat=1000 --stat-graph-width=20", formatting
an edk2 series that contained commit 1ed6498c4a02
("UefiCpuPkg/CommonFeature: Skip locking when the feature is disabled",
2018-11-07). The long pathname
"UefiCpuPkg/Library/CpuCommonFeaturesLib/FeatureControl.c" is no longer
truncated in the cumulative diffstat, in the cover letter.

Tested-by: Laszlo Ersek 

> diff --git a/t/t4052-stat-output.sh b/t/t4052-stat-output.sh
> [...]

I didn't try to run the test suite (I wasn't conscious of it anyway); I
built & installed git with

  nice make -j4 prefix=... all doc info
  nice make prefix=... install install-doc install-html install-info

I also wasn't watching the make log. So if those make targets don't
include the test suite, then I didn't exercise the new test case.

Thank you!
Laszlo


Re: [PATCH] format-patch: respect --stat when explicitly specified

2018-11-06 Thread Laszlo Ersek
On 11/06/18 16:56, Duy Nguyen wrote:
> On Tue, Nov 6, 2018 at 11:48 AM Leif Lindholm  
> wrote:
>>
>> Commit 43662b23abbd
>> ("format-patch: keep cover-letter diffstat wrapped in 72 columns") made
>> format-patch keep the diffstat to within 72 characters. However, it does
>> this even when --stat is explicitly set on the command line.
>>
>> Make it possible to explicitly override the new mechanism, using --stat,
>> matching the functionality before this change. This also matches the
>> output in the case of non-cover-letter files.
>>
>> Cc: Nguyễn Thái Ngọc Duy 
>> Cc: Junio C Hamano 
>> Reported-by: Laszlo Ersek 
>> Signed-off-by: Leif Lindholm 
>> ---
>>
>> Note:
>> In TianoCore we have LotsOfGloriousFilesNamedInReallyLongCamelCase, so
>> our official submission guidelines specify the use of --stat.
>>
>>  builtin/log.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/builtin/log.c b/builtin/log.c
>> index 061d4fd86..07e6ae2c1 100644
>> --- a/builtin/log.c
>> +++ b/builtin/log.c
>> @@ -1009,7 +1009,8 @@ static void show_diffstat(struct rev_info *rev,
>>
>> memcpy(, >diffopt, sizeof(opts));
>> opts.output_format = DIFF_FORMAT_SUMMARY | DIFF_FORMAT_DIFFSTAT;
>> -   opts.stat_width = MAIL_DEFAULT_WRAP;
>> +   if (rev->diffopt.stat_width == -1)
> 
> I don't think we get -1 here when stat_width is not defined. The
> "undefined" value is zero but I'm pretty sure we get MAIL_DEFAULT_WRAP
> in here unless you specify --stat.
> 
> So I think you can just drop the below assignment. But if you want to
> be on the safe side, check for zero stat_width instead of -1 and set
> MAIL_DEFAULT_WRAP.

Looks like I'll have to test v2 then...

> 
>> +   opts.stat_width = MAIL_DEFAULT_WRAP;
> 
> How about a test to make sure this will not be broken in future?

Oh, looks like I won't have to test this patch at all! ;)

(Just kidding, I'll test the next iteration.)

Thanks,
Laszlo

> 
>>
>> diff_setup_done();
>>
>> --
>> 2.11.0



Re: recent glob expansion breakage on Windows?

2018-03-09 Thread Laszlo Ersek
On 03/08/18 23:03, Jonathan Nieder wrote:
> +git-for-windows
> Hi,
> 
> Laszlo Ersek wrote:
> 
>> Jaben reports that git-send-email is suddenly failing to expand the
>> "*.patch" glob for him, at the Windows CMD prompt:
>>
>> -
>> E:\...>git send-email --suppress-cc=author --suppress-cc=self 
>> --suppress-cc=cc --suppress-cc=sob --dry-run *.patch
>>
>> No patch files specified!
>> -
>>
>> Whereas, moving the same patch files to another subdir, and then passing
>> the subdir to git-send-email, works fine.
>>
>> I seem to have found some $^O based perl code in the git tree that
>> expands the glob manually (in place of the shell) on Windows -- however,
>> that code looks ancient ^W very stable, and doesn't seem to explain the
>> sudden breakage.
>>
>> Is it possible that a separate perl update on Jaben's Windows box is
>> causing this? Or does the breakage look consistent with a recent git change?
>>
>> Has anyone else reported something similar recently?
> 
> This reminds me of https://github.com/git-for-windows/git/issues/339.
> There, Johannes Schindelin writes (about a different command):
> 
> | This is expected because neither PowerShell nor cmd.exe nor git.exe
> | expand wildcards. Those examples you found were written with a shell
> | in mind, and the shell expands wildcards (hence Git does not think
> | it needs to).
> 
> That may or may not also apply to send-email.

Thank you for the reference -- I can't say whether closing issue #339 as
WONTFIX was justified or not, but it certainly seems inconsistent with
Jaben's earlier experience (to my understanding), i.e. that git did
expand the glob.

> In what version did it work?

Jaben, can you please answer that? (One version in which it is broken is
2.14.1.windows.1.) Can you perhaps ask your teammates about their
git/windows versions (assuming the *.patch glob is expanded correctly
for them)?

Thank you, Jonathan,
Laszlo


recent glob expansion breakage on Windows?

2018-03-08 Thread Laszlo Ersek
Hi,

Jaben reports that git-send-email is suddenly failing to expand the
"*.patch" glob for him, at the Windows CMD prompt:

-
E:\...>git send-email --suppress-cc=author --suppress-cc=self
--suppress-cc=cc --suppress-cc=sob --dry-run *.patch

No patch files specified!
-

Whereas, moving the same patch files to another subdir, and then passing
the subdir to git-send-email, works fine.

I seem to have found some $^O based perl code in the git tree that
expands the glob manually (in place of the shell) on Windows -- however,
that code looks ancient ^W very stable, and doesn't seem to explain the
sudden breakage.

Is it possible that a separate perl update on Jaben's Windows box is
causing this? Or does the breakage look consistent with a recent git change?

Has anyone else reported something similar recently?

Thanks (and sorry about the noise; this list might not be the best place
to ask)!
Laszlo


Re: FORMAT_PATCH_NAME_MAX increase

2017-05-30 Thread Laszlo Ersek
On 05/30/17 16:35, Ævar Arnfjörð Bjarmason wrote:
> On Tue, May 30, 2017 at 3:37 PM, Junio C Hamano  wrote:
>> Ævar Arnfjörð Bjarmason  writes:
>>
>>> Just curious do you know about https://github.com/trast/tbdiff ? If
>>> not it might have a high overlap with what you're doing.

Thank you for the suggestion, it does exactly what I need. It solves the
Assignment Problem a whole lot better than my current, crude script.

https://en.wikipedia.org/wiki/Assignment_problem
https://en.wikipedia.org/wiki/Hungarian_algorithm

(Mildly amusing, to me anyway: I happen to be Hungarian.)

The output of git-tbdiff is not exactly the way I like it (I like to
review interdiffs colorized, and fed to $PAGER individually), but I can
easily do that on top of the "--no-patches" output.

>> Yes, that is a very good suggestion.  You'd need to be able to
>> actually apply the patches but the way I often do a review is very
>> similar to (actually, I'd say it is identical workflow) Laszlo's,
>> and it goes like this:
>>
>> $ git checkout topic ;# previous round
>> $ git checkout master... ;# check out the fork point of previous one
>> $ git am mbox ;# apply the updated one
>> $ git tbdiff ..@{-1} @{-1}..
> 
> I wonder if this tool seemingly everyone on-list uses should just be
> integrated into git.git.
> 
> I only learned about it <2 weeks ago and it's been great. The diff
> output was a bit nonsensical in some cases because it uses python's
> diff engine instead of git's.

Yes, I'd probably plug in git's diff engine (the patience algorithm at
that).

> 
> Would you mind patches to just integrate it to git in python as-is,
> then we could slowly convert it to C as we're doing with everything
> else.

That would be wonderful. My preference would be to use core git features
for solving the problem; as a second preference, it would be great if
git-tbdiff were packaged & shipped by GNU/Linux distros.

Thomas (I see you are on To:/Cc: now -- I meant to CC you, snarfing your
email from the git.git history :) ), are you aware of any distros
already packaging git-tbdiff?

Still the best would be if git provided this feature out of the box;
incremental review of rerolled series appears a common activity for any
serious reviewer / subsystem co-maintainer.

Thanks!
Laszlo


Re: FORMAT_PATCH_NAME_MAX increase

2017-05-30 Thread Laszlo Ersek
(apologies for the self-followup:)

On 05/30/17 14:28, Laszlo Ersek wrote:

> Note that in such an incremental review, I specifically wish to compare
> patches against each other (i.e., I'd like to see diffs of diffs, AKA
> interdiffs), and not the source tree at two, v1<->v2, commits into the
> two series. The latter would only give me the difference between the v1
> and v2 patch application results, and while that is valuable as well,
> I'd really like to diff the actual patches.

... One (but not the only) reason for that is that I'd like to see the
v1->v2 commit message improvements as well. I quite frequently suggest
commit message improvements in review.

Thanks
Laszlo


Re: FORMAT_PATCH_NAME_MAX increase

2017-05-30 Thread Laszlo Ersek
On 05/30/17 13:36, Ævar Arnfjörð Bjarmason wrote:
> On Mon, May 29, 2017 at 10:49 AM, Laszlo Ersek <ler...@redhat.com> wrote:
>> Hi,
>>
>> would it be possible to
>>
>> - increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128?
>>
>> - Or else to introduce a new git-config knob for it?
>>
>> I have a small review-helper / interdiff script that matches patches
>> from adjacent versions of a series against each other, based on subject
>> line. (Using the formatted file name with the patch number stripped.)
>> The project in question uses long common prefixes in subjects, and the
>> current limit of 64 does not always ensure unicity (again, with the
>> number stripped).
> 
> I don't see why this shouldn't be made configurable, but more
> generally if you have a script like this it seems like a futile effort
> in general to just make the subject longer to solve the general case,
> consider:
> 
> (cd /tmp/; rm -rf test; git init test; cd test && for i in {1..3};
> do touch $i && git add $i && git commit -m"test"; done && git
> format-patch -2 && git reset --hard HEAD~2 && git am *patch)
> 
> Which now yields:
> 
> 0001-test.patch
> 0002-test.patch
> 
> Git projects in general will have plenty of patches like this, e.g.
> "fix build" or "update tests" or whatever. Isn't a better solution for
> your script to e.g. key on git-patch-id?
> 
> $ grep "^From " *patch
> 0001-test.patch:From 870e37afa1a5aeb7eef76e607345adcfd4a9022d Mon
> Sep 17 00:00:00 2001
> 0002-test.patch:From de8c37a1532a4f6ae71ffa65400479ba77438f3b Mon
> Sep 17 00:00:00 2001
> $ cat *patch | git patch-id
> c71eb8f2c8c461ba6040668e9d79f996f5a04a61
> 870e37afa1a5aeb7eef76e607345adcfd4a9022d
> 735aff6fb601d7ce99506dc7701be3e8a9b5d38c
> de8c37a1532a4f6ae71ffa65400479ba77438f3b
> 
> Other potential heuristics could be keying not just on the subject but
> on the subject + subject of the last N commits for each commit, which
> should give more of a unique key, or key on the whole commit message
> etc.
> 

My use case is the following:

- Contributor posts version 1 of his/her patch series to the mailing
list, and also pushes the series to his/her publicly accessible git repo
on some branch. I review it and suggest a number of improvements.

- Contributor posts version 2 of his/her patch series to the mailing
list, and pushes the series to another branch in his/her repo too. I
want to review the v2 series incrementally; that is, I'd like to diff
each individual v2 patch against its v1 counterpart.

While patches may be dropped, added, replaced between patch set
versions, the general case is that most patches remain in the series,
and they preserve their original subjects. Thus, finding the v1
counterpart for a v2 patch, based on the subject, as formatted by
git-format-patch into filenames, works reliably most of the time.

Note that in such an incremental review, I specifically wish to compare
patches against each other (i.e., I'd like to see diffs of diffs, AKA
interdiffs), and not the source tree at two, v1<->v2, commits into the
two series. The latter would only give me the difference between the v1
and v2 patch application results, and while that is valuable as well,
I'd really like to diff the actual patches.

Of course I could technically parse the Subject: header of the formatted
patch files, and rename the files for interdiffing using full-length
filenames. But then I'd have to care about filesystem safety and such,
and that's already handled perfectly by git-format-patch itself. The
only bit I'm missing is the arbitrary length in the formatted file
names. (I had patched git earlier to use 128 for FORMAT_PATCH_NAME_MAX,
but I'd like to stop carrying that change.)

"git-patch-id" doesn't look applicable here, as I'd like to assign
patches (across v1 and v2) to each other based on "purpose". I.e., the
assignment should occur regardless of even functional changes between
corresponding v1 and v2 patches. I need the coupling exactly so I can
review those differences.

I use the "join" utility on the number-stripped patch filenames to find
the pairs of patches between v1 and v2 that should be compared.

Thanks!
Laszlo


Re: FORMAT_PATCH_NAME_MAX increase

2017-05-30 Thread Laszlo Ersek
On 05/30/17 03:34, Junio C Hamano wrote:

> I cannot offhand guess what other places would suffer from such a
> project convention, because I do not work with such a project, but
> you may be able to come up with a list of various places in Git
> where the commit titles are used, and that if there were a mechanism
> to take these commit titles, pass them to your cutomization script,
> which abbreviates these "long common prefixes" to a shorter string,
> and to use the output from that script instead of the actual commit
> title, it would help all of these places.

The problem is that I can't really automate the subject munging. The
concrete subjects in this case were:

> OvmfPkg/QemuFwCfgLib: Implement SEV internal function for SEC phase
> OvmfPkg/QemuFwCfgLib: Implement SEV internal functions for PEI phase
> OvmfPkg/QemuFwCfgLib: Implement SEV internal function for Dxe phase

which got formatted as

> 0010-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch
> 0011-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch
> 0012-OvmfPkg-QemuFwCfgLib-Implement-SEV-internal-function.patch

and these filenames differ only in the running counter on the left.

The patch subjects themselves aren't overlong (the longest is 68
characters).

At best I could "normalize away" the "OvmfPkg/QemuFwCfgLib" prefix, but
that exact prefix is pretty accidental. Any standalone module (driver or
library instance) in the edk2 project is supposed to be named like this
in patch subjects, so all those prefixes would have to be normalized
somehow.

We generally try to limit subjects (and commit messages in general) to
74 columns. I think for one source of inspiration, we used the kernel
documentation, when setting that limit.

says, under section 14, "The canonical patch format",

> The canonical patch subject line is::
>
> Subject: [PATCH 001/123] subsystem: summary phrase
>
> The canonical patch message body contains the following:
>
>   [...]
>
>   - The body of the explanation, line wrapped at 75 columns, which will
> be copied to the permanent changelog to describe this patch.

It does not specify the subject length, but perhaps we can apply the
body line length to the subject as well.

So, even in kernel land, if subjects up to 75 columns are permitted, but
FORMAT_PATCH_NAME_MAX is 64, conflicts are possible, at least in theory,
aren't they? With the numbers stripped, of course.

Thanks,
Laszlo


FORMAT_PATCH_NAME_MAX increase

2017-05-29 Thread Laszlo Ersek
Hi,

would it be possible to

- increase the FORMAT_PATCH_NAME_MAX macro from 64 to, say, 128?

- Or else to introduce a new git-config knob for it?

I have a small review-helper / interdiff script that matches patches
from adjacent versions of a series against each other, based on subject
line. (Using the formatted file name with the patch number stripped.)
The project in question uses long common prefixes in subjects, and the
current limit of 64 does not always ensure unicity (again, with the
number stripped).

Thank you,
Laszlo


Re: appending a pattern to the default "diff.cpp.xfuncname"

2016-08-03 Thread Laszlo Ersek
On 08/03/16 20:02, Jeff King wrote:
> On Wed, Aug 03, 2016 at 12:16:14PM +0200, Laszlo Ersek wrote:
> 
>> I've used diff..xfuncname with great success for file s that
>> I defined myself. However, now I would like to append an extra pattern
>> to the TYPE=cpp case (for which git has builtin patterns). Is there an
>> easy way to do this?
>>
>> I figured I could open-code the builtin patterns from "userdiff.c", and
>> then append my new pattern to those, but it looks kinda gross :)
> 
> Unfortunately, no, the config system has no notion of "append to this
> value". So you are stuck with extracting the builtin value (which
> annoyingly, you cannot even get without looking at the source code!),
> and repeating it in your config file.

Thank you for confirming!
Laszlo

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


appending a pattern to the default "diff.cpp.xfuncname"

2016-08-03 Thread Laszlo Ersek
Hi,

I've used diff..xfuncname with great success for file s that
I defined myself. However, now I would like to append an extra pattern
to the TYPE=cpp case (for which git has builtin patterns). Is there an
easy way to do this?

I figured I could open-code the builtin patterns from "userdiff.c", and
then append my new pattern to those, but it looks kinda gross :)

Thanks!
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-24 Thread Laszlo Ersek
On 09/23/14 23:35, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 [...]

 The important thing to note here is that use of text/plain for
 patches, if you want to have distinction between CRLF and LF in your
 payload, is not designed to work over e-mails.

That's good to know, thanks!

 Now if we accept that this issue is coming from lossy nature of
 transporting patches via e-mails, we would realize that the right
 place to solve this is not in apply's parsing of structural part
 of the diff output (e.g. diff --git ..., rename from ... or
 --- filename), but the payload part (i.e.   followed by context,
 - followed by removed and + followed by added).

I can agree with this, yes.

 Removal of CR
 by am - mailsplit - mailinfo - apply callchain is not losing
 any information, as the input does not have useful information to
 let us answer are the lines in this path supposed to end with
 CRLF? in the first place; /dev/null\r patch is barking up a wrong
 tree.

OK.

 Our line-endings infrastructure (e.g. core.autocrlf configuration
 variable, `eol` attribute) is all geared toward supporting cross
 platform projects in that the BCP is to use LF line endings as the
 canonical line endings for in-repository data and convert it to CRLF
 for working tree files when necessary.  We do not have direct
 support (other than declaring contents for paths as binary aka no
 conversion) to use CRLF in in-repository data (and that is partly
 deliberate).

I see. The edk2 mirror-of-svn git repo is then somewhat incompatible
with the original design.

 But it is conceivable to enhance the attribute system to allow us to
 mark certain paths (or all paths, which is a trivial special case)
 as using CRLF line endings in in-repository and in-working-tree.  It
 could be setting `eol` attribute to `both-crlf` or something.
 
 Then am - mailsplit - mailinfo - apply chain could:
 
  * mailsplit and mailinfo does not have to do anything special,
other than stripping CR and make sure apply only sees LF
endings;
 
  * apply is taught a new option --fix-mta-corruption which
causes it to pay attention to the `eol` attribute set to
`both-crlf`, and when it reads a patch
 
   diff --git a/one b/one
 --- one
 +++ one
 @@ -4,6 +4,6 @@
  common1
common2
 -old1
 -old2
 +new1
 +new2
  common3
  common4
 
and notices that path one is marked as such, it pretends as if
the input were
 
   diff --git a/one b/one
 --- one
 +++ one
 @@ -4,6 +4,6 @@
  common1^M
common2^M
 -old1^M
 -old2^M
 +new1^M
 +new2^M
  common3^M
  common4^M
 
to compensate for possible breakage during the mail transit.
 
  * am is taught to pass --fix-mta-corruption to apply perhaps
by default.
 
 Because code paths that internally run git am, e.g. git rebase,
 work on data that is *not* corrupt by mta (we generate diff and
 apply ourselves), these places need to tell am not to use the
 --fix-mta-corruption when running apply.

Thank you for taking the time to describe this. It does look like the
by-the-book solution.

Obviously, I can't implement it myself -- first, I have no experience
with the git codebase, second, I have no time nor mandate to get
familiar with it and to make a serious effort to improve it in this
direction. (IOW git is a tool for my job, not my job.) The one-liner
patch and this discussion is all I can manage -- I thought I'd take a
stab at it myself rather than just complain.

If someone finds the time to implement and document this feature, a
small part of the community will be very grateful. (Not much of a
compensation for a corner case like this, admittedly.)

Thanks,
Laszlo

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-23 Thread Laszlo Ersek
On 09/23/14 20:54, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
   git format-patch master..branch1
 
 The output from this has these (excerpt from od -xc output):
 
 360   f   2  \n  \n   d   i   f   f   -   -   g   i   t
66200a32640a666920662d2d69672074
 400   a   /   f   2   b   /   f   2  \n   n   e   w   f   i
2f6132666220662f0a32656e20776966
 420   l   e   m   o   d   e   1   0   0   6   4   4  \n   i
656c6d20646f2065303136303434690a
 440   n   d   e   x   0   0   0   0   0   0   0   .   .   f   3
646e786530203030303030302e2e3366
 460   5   d   3   e   6  \n   -   -   -   /   d   e   v   /   n
643565330a362d2d202d642f76656e2f
 500   u   l   l  \n   +   +   +   b   /   f   2  \n   @   @
6c750a6c2b2b202b2f623266400a2040
 520   -   0   ,   0   +   1   @   @  \n   +   h   e   l   l
302d302c2b20203140402b0a65686c6c
 540   o   w   o   r   l   d  \r  \n   -   -  \n   2   .   1
206f6f776c720d642d0a202d320a312e
 
 The structural parts of the diff, including --- /dev/null line,
 are all terminated by \n (as they should be), and the only CR
 appears in the message is at the end of +hello world line.

That's right -- until the patch email goes through an MTA that turns all
line endings into CRLF. (Did you email the patch to yourself as
requested in the reproducer?)

Such CRLFs are normally transparent because git-am strips them. The
keepcr=true setting preserves them, but not only for the source code
lines (where it's the right thing to do): it also preserves them in the
git diff header lines. Which is not a problem in general, *except* when
said header line includes /dev/null.

 So I do not think apply should need to loosen its sanity check and
 take a random whitespace after the /dev/null as a valid this is a
 creation event for the path marker (e.g. --- /dev/null whoa?).
 
 is_dev_null() is used to in the fallback code path that parses
 traditional patch output (e.g. GNU diff) which throws random cruft
 (e.g. timestamp) after the /dev/null marker, e.g.
 
 $ diff -u /dev/null f2
 --- /dev/null   2014-09-17 18:22:57.995111003 -0700
 +++ f2  2014-09-23 11:37:09.0 -0700
 @@ -0,0 +1 @@
 +hello world
 
 and we'd be hesitant to allow that kind of looseness for Git patches
 where we know we end the line after the /dev/null marker.

Okay, let's say I only relax the check in question to accept \r\n in
addition to the current \n. Will you take that?

 3. In the reviewer / tester / maintainer role, save the patch from your
 email client to a local file. Assume that your email client does not
 corrupt the patch when saving it.
 
 Perhaps compare this saved file with the output from the above
 format-patch to see where things got broken?
 
 SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
 turn out that what you are trying to do might be an equilvalent of
 
   git format-patch ... |
 # first lose all \r\n
 dos2unix | 
   # then make everything \r\n
 unix2dos |
 # and apply
 git am
 
 which is not workable in the first place.  I dunno.

I agree with your analysis. It is indeed the MTA (or the MUA, not sure)
that turns all line endings into uniform CRLFs -- it is a requirement in
RFC 2822 / 822bis.

http://cr.yp.to/docs/smtplf.html
http://www.rfc-editor.org/rfc/rfc2822.txt

 2.3. Body

The body of a message is simply lines of US-ASCII characters.  The
only two limitations on the body are as follows:

- CR and LF MUST only occur together as CRLF; they MUST NOT appear
  independently in the body.

But why is this situation not workable? The same happens with *all*
patches that people mail around, it's just not visible to them, because
git-am strips all CRs indiscriminately.

People who are forced to work with CRLF repositories don't have this
luxury with git-am, and bump into the /dev/null problem all the time.

What do you think about accepting only /dev/null\n and /dev/null\r\n?

Another question I had about gitdiff_verify_name() -- what ensures there
that the memcmp(), with the fixed size of 9 bytes, won't fall off the
end of the line? Some of the outer caller functions verify the line
length before their comparisons, but I don't see any length checks in
gitdiff_verify_name() for /dev/null specifically.

Thank you,
Laszlo
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-23 Thread Laszlo Ersek
On 09/23/14 22:02, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 09/23/14 20:54, Junio C Hamano wrote:
 ...
 SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
 turn out that what you are trying to do might be an equilvalent of

 git format-patch ... |
 # first lose all \r\n
 dos2unix | 
 # then make everything \r\n
 unix2dos |
 # and apply
 git am

 which is not workable in the first place.  I dunno.

 I agree with your analysis. It is indeed the MTA...
- CR and LF MUST only occur together as CRLF; they MUST NOT appear
  independently in the body.

 But why is this situation not workable? The same happens with *all*
 patches that people mail around, it's just not visible to them, because
 git-am strips all CRs indiscriminately.
 
 It is not git am or git apply that strips all CRs
 indiscriminately.  I just tried to apply 0001-add-f2 without
 letting your MTA/MUA corrupt it on master branch in the repository
 you prepared that patch from, i.e.
 
   git checkout master^0 ;# go back
 git am 0001-add-f2* ;# apply that +hello world\r\n patch
 git diff branch ;# nothing

When you did this, was am.keepcr=true in effect?

Thanks
Laszlo
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-23 Thread Laszlo Ersek
On 09/23/14 21:56, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 What do you think about accepting only /dev/null\n and /dev/null\r\n?
 
 I thought we agreed that what you are doing is not workable in the
 first place, no?
 
 I suspect one way to handle In this project, the files that are
 checked out must be with CRLF line endings no matter what the
 platform is might be to use the line ending attributes to force
 that while keeping the in-repository data with LF line endings.  The
 diff output (format-patch output is just one of them) comes from
 comparing the in-repository representation, so you won't have \r\n
 that will be stripped via MTA in it, apply and am will apply the
 patch without having to worry about \r\n, _and_ the line ending
 attributes would end the lines in your in-working-tree files with
 CRLF that way.

This would be a perfect solution if the git repository was not a mirror
of a Subversion repository that contains all files with embedded CRLFs.

Anyway I accept defeat, thanks for your time.

Laszlo

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-23 Thread Laszlo Ersek
On 09/23/14 22:35, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 09/23/14 22:02, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:

 On 09/23/14 20:54, Junio C Hamano wrote:
 ...
 SMTP transport may be CRLF-unsafe, so I have a suspicion that it may
 turn out that what you are trying to do might be an equilvalent of

   git format-patch ... |
 # first lose all \r\n
 dos2unix | 
   # then make everything \r\n
 unix2dos |
 # and apply
 git am

 which is not workable in the first place.  I dunno.

 I agree with your analysis. It is indeed the MTA...
- CR and LF MUST only occur together as CRLF; they MUST NOT appear
  independently in the body.

 But why is this situation not workable? The same happens with *all*
 patches that people mail around, it's just not visible to them, because
 git-am strips all CRs indiscriminately.

 It is not git am or git apply that strips all CRs
 indiscriminately.  I just tried to apply 0001-add-f2 without
 letting your MTA/MUA corrupt it on master branch in the repository
 you prepared that patch from, i.e.

 git checkout master^0 ;# go back
 git am 0001-add-f2* ;# apply that +hello world\r\n patch
 git diff branch ;# nothing

 When you did this, was am.keepcr=true in effect?
 
 I actually briefly scratched my head but realized when I saw it work
 as expected without me passing --keep-cr to am myself.
 
 But I did that experiment in the repository created by following
 your reproduction recipe, in which it had these:
 
   git config core.whitespace cr-at-eol
   git config am.keepcr true
 
 so yes I had keepcr set.

Thank you for confirming, I expected so.

Because in this case the test doesn't refute my claim that git-am
strips all CRs indiscriminately.

Git-am *does* strip all CRs indiscriminately, undoing the CRs that the
email servers / clients introduce. Your above test worked out because
you prevented git-am from stripping the CRs, with the keepcr=true
setting. If you turn that off, then either your git am command won't
succeed (because it will run into context conflicts due to different
line endings -- although not in this example), or the final git-diff
will report differences.

In summary:
- the email infrastructure turns all line terminators into CRLFs
- git-am strips these by default, from source code lines and from git
  diff header lines alike,
- this is fine for repos that store files with \n terminators,
- not fine for repos with embedded \r\n terminators -- the default
  stripping behavior of git-am breaks the source code in that case
  (runs into conflicts with existing files, and creates new files with
  wrong line endings)
- if you set am.keepcr=true, then the source code remains intact (no
  conflicts for existing files), but new files cannot be created,
  because the /dev/null\r header lines are rejected.

Thanks
Laszlo

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-23 Thread Laszlo Ersek
On 09/23/14 22:40, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:
 
 On 09/23/14 21:56, Junio C Hamano wrote:
 Laszlo Ersek ler...@redhat.com writes:

 What do you think about accepting only /dev/null\n and /dev/null\r\n?

 I thought we agreed that what you are doing is not workable in the
 first place, no?

 I suspect one way to handle In this project, the files that are
 checked out must be with CRLF line endings no matter what the
 platform is might be to use the line ending attributes to force
 that while keeping the in-repository data with LF line endings.  The
 diff output (format-patch output is just one of them) comes from
 comparing the in-repository representation, so you won't have \r\n
 that will be stripped via MTA in it, apply and am will apply the
 patch without having to worry about \r\n, _and_ the line ending
 attributes would end the lines in your in-working-tree files with
 CRLF that way.

 This would be a perfect solution if the git repository was not a mirror
 of a Subversion repository that contains all files with embedded CRLFs.
 
 Yikes.
 
 Anyway I accept defeat, thanks for your time.
 
 I do not consider that a defeat.  It is just I do not think it is
 the right solution for the problem you are having to butcher apply.
 You'd need to find some other way to solve it, and other people on
 the list may be able to offer a solution neither of us thought of in
 this thread.
 
 Perhaps those who are on Windows have more experience in situations
 like yours?

I'm not on Windows, obviously. :)

The overwhelming majority of the EDK II developers use windows, and
connect directly to subversion. They work with CRLF line endings natively.

Jordan (CC'd) operates a robot that mirrors SVN commits to the git repo
on github, with git svn. git svn rebase --use-log-author fetches the
new SVN commits to the robots local git clone, and then git push (as
usual) pushes them to github. This is being done so that people knowing
git don't lose their sanity, trying to work with SVN.

The process works very well, up to a point (git-loving people clone the
github mirror, and submit patches with git-format-patch / git send-email
to edk2-devel). The problem is when you want to apply patches with
git-am from the list -- the CRLFs are a mess. Hence my thread starter here.

We can get around this by maintaining personal forks on github, pushing
our patches there too in parallel with the email postings. People can
then fetch directly, and avoid git-am altogether. But this is very
cumbersome -- you need a github account, you need an edk2 fork on
github, others need to add your repo as a remote, etc etc, while the
review occurs anyway in our MUAs.

Thanks
Laszlo
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for-maint] apply: gitdiff_verify_name(): accept /dev/null\r

2014-09-22 Thread Laszlo Ersek
The edk2 (EFI Development Kit II) project at
https://github.com/tianocore/edk2/ uses CRLF line endings.

The following small reproducer demonstrates how gitdiff_verify_name()
breaks when it meets the usual git patches workflow in combination with
CRLF line endings:

1. Prepare the test repo:

  mkdir testdir
  cd testdir
  git init
  git config core.whitespace cr-at-eol
  git config am.keepcr true
  touch f1
  git add f1
  git commit -m 'initial import'

2. In the contributor role, write a patch that creates a new file
(adhering to the CRLF convention), submit it, then clean up:

  git checkout -b branch1 master
  echo 'hello world' | unix2dos f2
  git add f2
  git commit -m 'add f2'
  git format-patch master..branch1
  git send-email 0001-add-f2.patch
  # send it to yourself -- make sure it goes through your MTA
  git clean -fdx

3. In the reviewer / tester / maintainer role, save the patch from your
email client to a local file. Assume that your email client does not
corrupt the patch when saving it. (Thunderbird doesn't corrupt it, for
example.) Once saved, try to apply the patch email to a new branch.

  git checkout -b branch2 master
  git am /path/to/the/saved/file

  - Applying: add f2
  - fatal: git apply: bad git-diff - expected /dev/null on line 9

This happens because am.keepcr==true keeps the CRLFs intact (as it should
in fact), but then /dev/null\r in the diff header trips up
gitdiff_verify_name().

Fix it by reusing the is_dev_null() helper function, which in effect
changes the condition from

  memcmp(/dev/null, line, 9) || line[9] != '\n'

to

  memcmp(/dev/null, line, 9) || !isspace(line[9])

Signed-off-by: Laszlo Ersek ler...@redhat.com
---

Notes:
I'm not subscribed to the list; please keep me CC'd. Thanks.

 builtin/apply.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 6b7c764..a9c6a08 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -955,7 +955,7 @@ static char *gitdiff_verify_name(const char *line, int 
isnull, char *orig_name,
}
else {
/* expect /dev/null */
-   if (memcmp(/dev/null, line, 9) || line[9] != '\n')
+   if (!is_dev_null(line))
die(_(git apply: bad git-diff - expected /dev/null on 
line %d), linenr);
return NULL;
}
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html