Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-04-03 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 So I think I prefer using an option and not a heuristic if you
 are fine with that.

Sure. Changing behaviour only by explicit user request (command line
or configuration) is much safer than heuristics that does not work
reliably.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-04-02 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  The hash used is mostly an internal implementation detail, isn't it?
 
 Yes, but that does not mean we can break people who keep an external
 database indexed with the patch-id by changing the default under
 them, and they can give --unstable option to work it around is a
 workaround, not a fix.  Without this change, they did not have to do
 anything.
 
 I would imagine that most of these people will be using the plain
 vanilla git show output without any ordering or hunk splitting
 when coming up with such a key.  A possible way forward to allow the
 configuration that corresponds to -Oorderfile while not breaking
 the existing users could be to make the patch-id --stable kick in
 automatically (of course, do this only when the user did not give
 the --unstable command line option to override) when we see the
 orderfile configuration in the repository, or when we see that the
 incoming patch looks like reordered (e.g. has multiple diff --git
 header lines that refer to the same path,

 This would require us to track affected files in memory.
 Issue?

Don't we already do that in order to handle a patch that touches the
same path more than once anyway?  I think a possibly larger issue
might be that you would still want to do the hashing in a single
pass so you may need to always keep two accumulated hashes, before
you can decide if the patch is or is not a straight-forward one and
use one of the two, but that hopefully should not require a rocket
scientist.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-04-02 Thread Michael S. Tsirkin
On Wed, Apr 02, 2014 at 11:18:26AM -0700, Junio C Hamano wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote:
  Michael S. Tsirkin m...@redhat.com writes:
  
   The hash used is mostly an internal implementation detail, isn't it?
  
  Yes, but that does not mean we can break people who keep an external
  database indexed with the patch-id by changing the default under
  them, and they can give --unstable option to work it around is a
  workaround, not a fix.  Without this change, they did not have to do
  anything.
  
  I would imagine that most of these people will be using the plain
  vanilla git show output without any ordering or hunk splitting
  when coming up with such a key.  A possible way forward to allow the
  configuration that corresponds to -Oorderfile while not breaking
  the existing users could be to make the patch-id --stable kick in
  automatically (of course, do this only when the user did not give
  the --unstable command line option to override) when we see the
  orderfile configuration in the repository, or when we see that the
  incoming patch looks like reordered (e.g. has multiple diff --git
  header lines that refer to the same path,
 
  This would require us to track affected files in memory.
  Issue?
 
 Don't we already do that in order to handle a patch that touches the
 same path more than once anyway?

At least I don't see it in builtin/patch-id.c

 I think a possibly larger issue
 might be that you would still want to do the hashing in a single
 pass so you may need to always keep two accumulated hashes, before
 you can decide if the patch is or is not a straight-forward one and
 use one of the two, but that hopefully should not require a rocket
 scientist.

But the issue is that equivalent patches would get a different hash.
This is what I tried to fix, after all.

So I think I prefer using an option and not a heuristic if you
are fine with that.
At some point in the future we might flip the default.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-03-31 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 Clarify that patch ID is now a sum of hashes, not a hash.
 Document --stable and --unstable flags.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---

 changes from v2:
   explicitly list the kinds of changes against which patch ID is stable

  Documentation/git-patch-id.txt | 23 ++-
  1 file changed, 18 insertions(+), 5 deletions(-)

 diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
 index 312c3b1..30923e0 100644
 --- a/Documentation/git-patch-id.txt
 +++ b/Documentation/git-patch-id.txt
 @@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch
  SYNOPSIS
  
  [verse]
 -'git patch-id'  patch
 +'git patch-id' [--stable | --unstable]  patch

Thanks.  It seems taht we are fairly inconsistent when writing
alternatives on the SYNOPSIS line.  A small minority seems to spell
the above as [--stable|--unstable], which may want to be fixed
(outside the context of this series, of course).

  
  DESCRIPTION
  ---
 -A patch ID is nothing but a SHA-1 of the diff associated with a patch, with
 -whitespace and line numbers ignored.  As such, it's reasonably stable, but 
 at
 -the same time also reasonably unique, i.e., two patches that have the same 
 patch
 -ID are almost guaranteed to be the same thing.
 +A patch ID is nothing but a sum of SHA-1 of the diff hunks associated with 
 a
 +patch, with whitespace and line numbers ignored.  As such, it's reasonably
 +stable, but at the same time also reasonably unique, i.e., two patches that
 +have the same patch ID are almost guaranteed to be the same thing.

Perhaps nothing but can go by now?

  
  IOW, you can use this thing to look for likely duplicate commits.
  
 @@ -27,6 +27,19 @@ This can be used to make a mapping from patch ID to commit 
 ID.
  
  OPTIONS
  ---
 +
 +--stable::
 + Use a symmetrical sum of hashes as the patch ID.
 + With this option, reordering file diffs that make up a patch or
 + splitting a diff up to multiple diffs that touch the same path
 + does not affect the ID.
 + This is the default.
 +
 +--unstable::
 + Use a non-symmetrical sum of hashes, such that reordering
 + or splitting the patch does affect the ID.
 + This was the default value for git 1.9 and older.

I am not sure if swapping the default in this series is a wise
decision.  We typically introduce a new shiny toy to play with in a
release and then later when the shiny toy proves to be useful, start
to think about changing the default, but not before.

  patch::
   The diff to create the ID of.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-03-31 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 The hash used is mostly an internal implementation detail, isn't it?

Yes, but that does not mean we can break people who keep an external
database indexed with the patch-id by changing the default under
them, and they can give --unstable option to work it around is a
workaround, not a fix.  Without this change, they did not have to do
anything.

I would imagine that most of these people will be using the plain
vanilla git show output without any ordering or hunk splitting
when coming up with such a key.  A possible way forward to allow the
configuration that corresponds to -Oorderfile while not breaking
the existing users could be to make the patch-id --stable kick in
automatically (of course, do this only when the user did not give
the --unstable command line option to override) when we see the
orderfile configuration in the repository, or when we see that the
incoming patch looks like reordered (e.g. has multiple diff --git
header lines that refer to the same path, or the set of files
mentioned by the diff --git lines are not in ascending order),
perhaps?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 2/3] patch-id: document new behaviour

2014-03-31 Thread Michael S. Tsirkin
On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  The hash used is mostly an internal implementation detail, isn't it?
 
 Yes, but that does not mean we can break people who keep an external
 database indexed with the patch-id by changing the default under
 them, and they can give --unstable option to work it around is a
 workaround, not a fix.  Without this change, they did not have to do
 anything.
 
 I would imagine that most of these people will be using the plain
 vanilla git show output without any ordering or hunk splitting
 when coming up with such a key.  A possible way forward to allow the
 configuration that corresponds to -Oorderfile while not breaking
 the existing users could be to make the patch-id --stable kick in
 automatically (of course, do this only when the user did not give
 the --unstable command line option to override) when we see the
 orderfile configuration in the repository, or when we see that the
 incoming patch looks like reordered (e.g. has multiple diff --git
 header lines that refer to the same path,

This would require us to track affected files in memory.
Issue?

 or the set of files
 mentioned by the diff --git lines are not in ascending order),
 perhaps?

I hope a patch-id configuration flag plus maybe checking the orderfile if not
specified together should be good enough.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] patch-id: document new behaviour

2014-03-30 Thread Michael S. Tsirkin
Clarify that patch ID is now a sum of hashes, not a hash.
Document --stable and --unstable flags.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---

changes from v2:
explicitly list the kinds of changes against which patch ID is stable

 Documentation/git-patch-id.txt | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt
index 312c3b1..30923e0 100644
--- a/Documentation/git-patch-id.txt
+++ b/Documentation/git-patch-id.txt
@@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch
 SYNOPSIS
 
 [verse]
-'git patch-id'  patch
+'git patch-id' [--stable | --unstable]  patch
 
 DESCRIPTION
 ---
-A patch ID is nothing but a SHA-1 of the diff associated with a patch, with
-whitespace and line numbers ignored.  As such, it's reasonably stable, but at
-the same time also reasonably unique, i.e., two patches that have the same 
patch
-ID are almost guaranteed to be the same thing.
+A patch ID is nothing but a sum of SHA-1 of the diff hunks associated with a
+patch, with whitespace and line numbers ignored.  As such, it's reasonably
+stable, but at the same time also reasonably unique, i.e., two patches that
+have the same patch ID are almost guaranteed to be the same thing.
 
 IOW, you can use this thing to look for likely duplicate commits.
 
@@ -27,6 +27,19 @@ This can be used to make a mapping from patch ID to commit 
ID.
 
 OPTIONS
 ---
+
+--stable::
+   Use a symmetrical sum of hashes as the patch ID.
+   With this option, reordering file diffs that make up a patch or
+   splitting a diff up to multiple diffs that touch the same path
+   does not affect the ID.
+   This is the default.
+
+--unstable::
+   Use a non-symmetrical sum of hashes, such that reordering
+   or splitting the patch does affect the ID.
+   This was the default value for git 1.9 and older.
+
 patch::
The diff to create the ID of.
 
-- 
MST

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html