Re: [PATCH v3 2/3] patch-id: document new behaviour
Michael S. Tsirkin m...@redhat.com writes: So I think I prefer using an option and not a heuristic if you are fine with that. Sure. Changing behaviour only by explicit user request (command line or configuration) is much safer than heuristics that does not work reliably. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] patch-id: document new behaviour
Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote: Michael S. Tsirkin m...@redhat.com writes: The hash used is mostly an internal implementation detail, isn't it? Yes, but that does not mean we can break people who keep an external database indexed with the patch-id by changing the default under them, and they can give --unstable option to work it around is a workaround, not a fix. Without this change, they did not have to do anything. I would imagine that most of these people will be using the plain vanilla git show output without any ordering or hunk splitting when coming up with such a key. A possible way forward to allow the configuration that corresponds to -Oorderfile while not breaking the existing users could be to make the patch-id --stable kick in automatically (of course, do this only when the user did not give the --unstable command line option to override) when we see the orderfile configuration in the repository, or when we see that the incoming patch looks like reordered (e.g. has multiple diff --git header lines that refer to the same path, This would require us to track affected files in memory. Issue? Don't we already do that in order to handle a patch that touches the same path more than once anyway? I think a possibly larger issue might be that you would still want to do the hashing in a single pass so you may need to always keep two accumulated hashes, before you can decide if the patch is or is not a straight-forward one and use one of the two, but that hopefully should not require a rocket scientist. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] patch-id: document new behaviour
On Wed, Apr 02, 2014 at 11:18:26AM -0700, Junio C Hamano wrote: Michael S. Tsirkin m...@redhat.com writes: On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote: Michael S. Tsirkin m...@redhat.com writes: The hash used is mostly an internal implementation detail, isn't it? Yes, but that does not mean we can break people who keep an external database indexed with the patch-id by changing the default under them, and they can give --unstable option to work it around is a workaround, not a fix. Without this change, they did not have to do anything. I would imagine that most of these people will be using the plain vanilla git show output without any ordering or hunk splitting when coming up with such a key. A possible way forward to allow the configuration that corresponds to -Oorderfile while not breaking the existing users could be to make the patch-id --stable kick in automatically (of course, do this only when the user did not give the --unstable command line option to override) when we see the orderfile configuration in the repository, or when we see that the incoming patch looks like reordered (e.g. has multiple diff --git header lines that refer to the same path, This would require us to track affected files in memory. Issue? Don't we already do that in order to handle a patch that touches the same path more than once anyway? At least I don't see it in builtin/patch-id.c I think a possibly larger issue might be that you would still want to do the hashing in a single pass so you may need to always keep two accumulated hashes, before you can decide if the patch is or is not a straight-forward one and use one of the two, but that hopefully should not require a rocket scientist. But the issue is that equivalent patches would get a different hash. This is what I tried to fix, after all. So I think I prefer using an option and not a heuristic if you are fine with that. At some point in the future we might flip the default. -- MST -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] patch-id: document new behaviour
Michael S. Tsirkin m...@redhat.com writes: Clarify that patch ID is now a sum of hashes, not a hash. Document --stable and --unstable flags. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- changes from v2: explicitly list the kinds of changes against which patch ID is stable Documentation/git-patch-id.txt | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt index 312c3b1..30923e0 100644 --- a/Documentation/git-patch-id.txt +++ b/Documentation/git-patch-id.txt @@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch SYNOPSIS [verse] -'git patch-id' patch +'git patch-id' [--stable | --unstable] patch Thanks. It seems taht we are fairly inconsistent when writing alternatives on the SYNOPSIS line. A small minority seems to spell the above as [--stable|--unstable], which may want to be fixed (outside the context of this series, of course). DESCRIPTION --- -A patch ID is nothing but a SHA-1 of the diff associated with a patch, with -whitespace and line numbers ignored. As such, it's reasonably stable, but at -the same time also reasonably unique, i.e., two patches that have the same patch -ID are almost guaranteed to be the same thing. +A patch ID is nothing but a sum of SHA-1 of the diff hunks associated with a +patch, with whitespace and line numbers ignored. As such, it's reasonably +stable, but at the same time also reasonably unique, i.e., two patches that +have the same patch ID are almost guaranteed to be the same thing. Perhaps nothing but can go by now? IOW, you can use this thing to look for likely duplicate commits. @@ -27,6 +27,19 @@ This can be used to make a mapping from patch ID to commit ID. OPTIONS --- + +--stable:: + Use a symmetrical sum of hashes as the patch ID. + With this option, reordering file diffs that make up a patch or + splitting a diff up to multiple diffs that touch the same path + does not affect the ID. + This is the default. + +--unstable:: + Use a non-symmetrical sum of hashes, such that reordering + or splitting the patch does affect the ID. + This was the default value for git 1.9 and older. I am not sure if swapping the default in this series is a wise decision. We typically introduce a new shiny toy to play with in a release and then later when the shiny toy proves to be useful, start to think about changing the default, but not before. patch:: The diff to create the ID of. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] patch-id: document new behaviour
Michael S. Tsirkin m...@redhat.com writes: The hash used is mostly an internal implementation detail, isn't it? Yes, but that does not mean we can break people who keep an external database indexed with the patch-id by changing the default under them, and they can give --unstable option to work it around is a workaround, not a fix. Without this change, they did not have to do anything. I would imagine that most of these people will be using the plain vanilla git show output without any ordering or hunk splitting when coming up with such a key. A possible way forward to allow the configuration that corresponds to -Oorderfile while not breaking the existing users could be to make the patch-id --stable kick in automatically (of course, do this only when the user did not give the --unstable command line option to override) when we see the orderfile configuration in the repository, or when we see that the incoming patch looks like reordered (e.g. has multiple diff --git header lines that refer to the same path, or the set of files mentioned by the diff --git lines are not in ascending order), perhaps? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 2/3] patch-id: document new behaviour
On Mon, Mar 31, 2014 at 12:54:46PM -0700, Junio C Hamano wrote: Michael S. Tsirkin m...@redhat.com writes: The hash used is mostly an internal implementation detail, isn't it? Yes, but that does not mean we can break people who keep an external database indexed with the patch-id by changing the default under them, and they can give --unstable option to work it around is a workaround, not a fix. Without this change, they did not have to do anything. I would imagine that most of these people will be using the plain vanilla git show output without any ordering or hunk splitting when coming up with such a key. A possible way forward to allow the configuration that corresponds to -Oorderfile while not breaking the existing users could be to make the patch-id --stable kick in automatically (of course, do this only when the user did not give the --unstable command line option to override) when we see the orderfile configuration in the repository, or when we see that the incoming patch looks like reordered (e.g. has multiple diff --git header lines that refer to the same path, This would require us to track affected files in memory. Issue? or the set of files mentioned by the diff --git lines are not in ascending order), perhaps? I hope a patch-id configuration flag plus maybe checking the orderfile if not specified together should be good enough. -- MST -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] patch-id: document new behaviour
Clarify that patch ID is now a sum of hashes, not a hash. Document --stable and --unstable flags. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- changes from v2: explicitly list the kinds of changes against which patch ID is stable Documentation/git-patch-id.txt | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/Documentation/git-patch-id.txt b/Documentation/git-patch-id.txt index 312c3b1..30923e0 100644 --- a/Documentation/git-patch-id.txt +++ b/Documentation/git-patch-id.txt @@ -8,14 +8,14 @@ git-patch-id - Compute unique ID for a patch SYNOPSIS [verse] -'git patch-id' patch +'git patch-id' [--stable | --unstable] patch DESCRIPTION --- -A patch ID is nothing but a SHA-1 of the diff associated with a patch, with -whitespace and line numbers ignored. As such, it's reasonably stable, but at -the same time also reasonably unique, i.e., two patches that have the same patch -ID are almost guaranteed to be the same thing. +A patch ID is nothing but a sum of SHA-1 of the diff hunks associated with a +patch, with whitespace and line numbers ignored. As such, it's reasonably +stable, but at the same time also reasonably unique, i.e., two patches that +have the same patch ID are almost guaranteed to be the same thing. IOW, you can use this thing to look for likely duplicate commits. @@ -27,6 +27,19 @@ This can be used to make a mapping from patch ID to commit ID. OPTIONS --- + +--stable:: + Use a symmetrical sum of hashes as the patch ID. + With this option, reordering file diffs that make up a patch or + splitting a diff up to multiple diffs that touch the same path + does not affect the ID. + This is the default. + +--unstable:: + Use a non-symmetrical sum of hashes, such that reordering + or splitting the patch does affect the ID. + This was the default value for git 1.9 and older. + patch:: The diff to create the ID of. -- MST -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html