Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Junio C Hamano
Jonathan Nieder  writes:

>> So it seems most sensible to me if this is going to be supported that we
>> go a bit beyond the call of duty and fake up the start of it, namely:
>>
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>>
>> To be:
>>
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>
> Right.  We may want to handle diff.mnemonicPrefix as well.

I definitely think under the --stable option, we should pretend as
if the canonical a/ vs b/ prefixes were given with the "diff --git"
header, just like we try to reverse the effect of diff-orderfile,
etc.

I am unsure what the right behaviour under --unstable is, though.




Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 07 2018, Jonathan Nieder wrote:

>> The patch-id appears to only care about the diff text, so it should be
>> able to handle this.  So if we have a better heuristic for where the
>> diff starts, it would be good to use it.
>
> No, the patch-id doesn't just care about the diff, it cares about the
> context before the diff too.

Sorry, I did a bad job of communicating.  When I said "diff text", I was
including context.

[...]
> Observe that the diff --git line matters, we hash it:
>
> $ git diff-tree -p HEAD~.. | git patch-id
> 5870d115b7e2a9a936ab8fdc254932234413c710 
> 
> $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id 
> --stable
> 5870d115b7e2a9a936ab8fdc254932234413c710 
> 
> $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id 
> --stable
> 4cd136f2b98760150f700ac6a5b126389d6d05a7 
> 

Oh, hm.  That's unfortunate.

[...]
> So it seems most sensible to me if this is going to be supported that we
> go a bit beyond the call of duty and fake up the start of it, namely:
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
>
> To be:
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c

Right.  We may want to handle diff.mnemonicPrefix as well.

Jonathan


Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Ævar Arnfjörð Bjarmason


On Fri, Dec 07 2018, Jonathan Nieder wrote:

> Hi,
>
> Konstantin Ryabitsev wrote:
>
>> Every now and again I come across a patch sent to LKML without a leading
>> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>>
>> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>>
>> I am guessing quilt does not bother including the leading "diff a/foo
>> b/foo" because it's redundant with the next two lines, however this
>> remains a valid patch recognized by git-am.
>>
>> If you pipe that patch via git-patch-id, it produces nothing, but if I
>> put in the leading "diff", like so:
>>
>> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>>
>> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Interesting.  As Ævar mentioned, the relevant code is
>
>   /* Ignore commit comments */
>   if (!patchlen && !starts_with(line, "diff "))
>   continue;
>
> which is trying to handle a case where a line that is special to the
> parser appears before the diff begins.
>
> The patch-id appears to only care about the diff text, so it should be
> able to handle this.  So if we have a better heuristic for where the
> diff starts, it would be good to use it.

No, the patch-id doesn't just care about the diff, it cares about the
context before the diff too.

See this patch:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~..
diff --git x/refs/files-backend.c y/refs/files-backend.c
index 9183875dad..dd8abe9185 100644
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store 
*refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
-   return files_reflog_path_other_worktrees(refs, sb, refname);
+   files_reflog_path_other_worktrees(refs, sb, refname);
+   break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;

Observe that the diff --git line matters, we hash it:

$ git diff-tree -p HEAD~.. | git patch-id
5870d115b7e2a9a936ab8fdc254932234413c710 

$ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id 
--stable
5870d115b7e2a9a936ab8fdc254932234413c710 

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id 
--stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 


The thing it doesn't care about is the "index" between the "diff" and
patch:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id --stable
4cd136f2b98760150f700ac6a5b126389d6d05a7 


We also care about the +++ and --- lines:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id
56985c2c38cce6079de2690082e1770a8e81214c 


Then we normalize the @@ line, e.g.:

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 

$ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index 
| perl -pe 's/\d+/123/g' | git patch-id
4cd136f2b98760150f700ac6a5b126389d6d05a7 



There's other caveats (see the code, e.g. "strip space") but to a first
approximation a patch id is a hash of something that looks like this:

diff --git x/refs/files-backend.c y/refs/files-backend.c
--- x/refs/files-backend.c
+++ y/refs/files-backend.c
@@ -123,123 +123,123 @@ static void files_reflog_path(struct 
files_ref_store *refs,
break;
case REF_TYPE_OTHER_PSEUDOREF:
case REF_TYPE_MAIN_PSEUDOREF:
-   return files_reflog_path_other_worktrees(refs, sb, refname);
+   files_reflog_path_other_worktrees(refs, sb, refname);
+   break;
case REF_TYPE_NORMAL:
strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname);
break;

Which means that accepting a patch like this as input would actually
give you a different patch-id than if it had the proper header.

So it seems most sensible to me if this is going to be supported that we
go a bit beyond the call of duty and fake up the start of it, namely:

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c

To be:

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c

It'll make the state machine a bit more complex, but IMO it would suck
more if 

Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Jonathan Nieder
Hi,

Konstantin Ryabitsev wrote:

> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Interesting.  As Ævar mentioned, the relevant code is

/* Ignore commit comments */
if (!patchlen && !starts_with(line, "diff "))
continue;

which is trying to handle a case where a line that is special to the
parser appears before the diff begins.

The patch-id appears to only care about the diff text, so it should be
able to handle this.  So if we have a better heuristic for where the
diff starts, it would be good to use it.

"git apply" uses apply.c::find_header, which is more permissive.
Maybe it would be possible to unify these somehow.  (I haven't looked
closely enough to tell how painful that would be.)

Thanks and hope that helps,
Jonathan


Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Ævar Arnfjörð Bjarmason


On Fri, Dec 07 2018, Konstantin Ryabitsev wrote:

> Hi, all:
>
> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".
>
> Can we please teach git-patch-id to work without the leading diff a/foo
> b/foo, same as git-am?
>
> Best,
> -K

The state machine is sensitive there being a "diff" line, then "index"
etc.

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 970d0d30b4..b99e4455fd 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -97,7 +97,9 @@ static int get_one_patchid(struct object_id *next_oid, struct 
object_id *result,
}

/* Ignore commit comments */
-   if (!patchlen && !starts_with(line, "diff "))
+   if (!patchlen && starts_with(line, "--- a/"))
+   ;
+   else if (!patchlen && !starts_with(line, "diff "))
continue;

/* Parsing diff header?  */

This would make it produce a patch-id for that input, however note that
I've done "--- a/" there, with just "--- " (which is legit) we'd get
confused and start earlier before the diffstat.

So if you're interested in having this I leave it to you to run with
this & write tests for it, but more convincingly run it on the git &
LKML archives and see that the output is the same (or just extra in case
where we now find patches) with --stable etc.


RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Konstantin Ryabitsev
Hi, all:

Every now and again I come across a patch sent to LKML without a leading
"diff a/foo b/foo" -- usually produced by quilt. E.g.:

https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/

I am guessing quilt does not bother including the leading "diff a/foo
b/foo" because it's redundant with the next two lines, however this
remains a valid patch recognized by git-am.

If you pipe that patch via git-patch-id, it produces nothing, but if I
put in the leading "diff", like so:

diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c

then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Can we please teach git-patch-id to work without the leading diff a/foo
b/foo, same as git-am?

Best,
-K


signature.asc
Description: PGP signature