Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-24 Thread Michael S. Tsirkin
On Wed, Apr 23, 2014 at 03:05:42PM -0700, Junio C Hamano wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote:
  Are these three patches the same as what has been queued on
  mt/patch-id-stable topic and cooking in 'next' for a few weeks?
 
  Not exactly - at your request I implemented git config
  options to control patch id behaviour.
  Documentation and tests updated to match.
 
 After comparing the patches 4-6 and the one that has been in 'next'
 for a few weeks, I tried to like the new one, but I couldn't.

I'm fine with the one in next too.
I was under the impression that you wanted me to change
the behaviour so I worked on this, but previous version was sufficient
for my purposes (which is really all about putting diff.orderfile
upstream).

 The new one, if I am reading the patch correctly, makes the stable
 variant the default if
 
  - the configuration explicitly asks to use it; or
 
  - diff.orderfile, which is a new configuration that did not exist,
is used.
 
 At the first glance, the latter makes it look as if that will not
 hurt any existing users/repositories, but the thing is, the producer
 of the patches is different from the consumer of the patches.  There
 needs to be a way for a consumer to say I need stable variant on
 the patches when computing patch-id on a patch that arrived.  As
 long as two different producers use two different orders, the
 consumer of the patches from these two sources is forced to use the
 stable variant if some sort of cache is kept keyed with the
 patch-ids.
 
 But diff.orderfile configuration is all and only about the
 producer, and does not help the case at all.

OK, we can just drop that, that's easy.

 Compared to it, the previous version forced people who do not have a
 particular reason to choose between the variants to use the new
 stable variant, which was a lot simpler solution.
 
 The reason why I merged the previous one to 'next' was because I
 wanted to see if the behaviour change is as grave as I thought, or I
 was just being unnecessary cautious.  If there is no third-party
 script that precomputes something and stores them under these hashes
 that breaks with this change, I do not see any reason not to make
 the new stable one the default.

Ah okay, so we just wait a bit and see and if all is quiet the
existing patch will graduate to master with time?
Totally cool.
I thought you wanted me to add the config option, but if everyone
is happy as is, I don't need it personally at all.

 I however suspect that the ideal stable implementation may be
 different from what you implemented.  What if we compute a patch-id
 on a reordered patch as if the files came in the usual order?

ATM patch id does not have any concept of the usual order,
so that's one problem - how does one figure out what the order would be?
I have no idea - is this documented anywhere?
Also I'm guessing this would depend on the state of the git tree which
would be another behaviour change: previously patch-id worked
fine outside any git tree.


 That would be another way to achieve the stable-ness for sane cases
 (i.e. no funny you could split one patch with two hunks into two
 patches with one hunk each twice mentioning the same path which is
 totally an uninteresting use case---diff.orderfile would not even
 produce such a patch)

Yes I'm thinking we should drop this hunk in the patch:
let's support reordering files but not splitting them.
This makes the change even smaller, so I now think we should
just go for it.

 and a huge difference would be that it would
 produce the same patch-id as existing versions of Git, if the patch
 is not reordered.  Is that asking for a moon (I admit that I haven't
 looked at the code involved too deeply)?

Yes this would be a bunch of code to write - certainly much more complex than
the existing small patch which just tweaks the checksum slightly to make
it stable.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-24 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Apr 23, 2014 at 03:05:42PM -0700, Junio C Hamano wrote:

 After comparing the patches 4-6 and the one that has been in 'next'
 for a few weeks, I tried to like the new one, but I couldn't.

 I'm fine with the one in next too.
 I was under the impression that you wanted me to change
 the behaviour so I worked on this,...

What I wanted to see was to make sure this would not kick in unless
the user explicitly asks.  patchid.stable configuration variable
is very much welcome, and if it defaulted to false with or without
diff.orderfile, that would have been very much welcome.  Then
nobody will be surprised.

 But diff.orderfile configuration is all and only about the
 producer, and does not help the case at all.

 OK, we can just drop that, that's easy.

 Compared to it, the previous version forced people who do not have a
 particular reason to choose between the variants to use the new
 stable variant, which was a lot simpler solution.
 
 The reason why I merged the previous one to 'next' was because I
 wanted to see if the behaviour change is as grave as I thought, or I
 was just being unnecessary cautious.  If there is no third-party
 script that precomputes something and stores them under these hashes
 that breaks with this change, I do not see any reason not to make
 the new stable one the default.

 Ah okay, so we just wait a bit and see and if all is quiet the
 existing patch will graduate to master with time?
 Totally cool.
 I thought you wanted me to add the config option, but if everyone
 is happy as is, I don't need it personally at all.

... or if we see problems, then build a fix on top to introduce
patchid.stable that defaults to false and not linking the feature
with diff.ordefile.

Adding a new feature turned-off by default is the safer thing to do.
When nothing changes, by defintion there won't be a new breakage.
That is the default stance this project has taken for a long time,
and what made us trusted by projects that manage their source files
using our product.

It however is to favour stability and no surprise over progress,
and it may not be an optimal thing to do if the new feature is
compellingly good.  In some cases like this one, we may need to
experiment to find out the need of the users, and introducing the
feature with a configuration that is explicitly opt-in is one way to
do so.  We may or may not see many people using that feature without
disrupting existing users that way.

Cooking in 'next' with the feature turned-on by default is another
way that is riskier, but in this particular case, the population
that is likely to be affected are power users, which would match the
audience of 'next'---those who still want stability but want to be
closer to the cutting edge.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-23 Thread Michael S. Tsirkin
Patch id changes if users
1. reorder file diffs that make up a patch
or
2. split a patch up to multiple diffs that touch the same path
(keeping hunks within a single diff ordered to make patch valid).

As the result is functionally equivalent, a different patch id is
surprising to many users.
In particular, reordering files using diff -O is helpful to make patches
more readable (e.g. API header diff before implementation diff).

Add an option to change patch-id behaviour making it stable against
these two kinds of patch change:
1. calculate SHA1 hash for each hunk separately and sum all hashes
(using a symmetrical sum) to get patch id
2. hash the file-level headers together with each hunk (not just the
first hunk)

We use a 20byte sum and not xor - since xor would give 0 output
for patches that have two identical diffs, which isn't all that
unlikely (e.g. append the same line in two places).

The new behaviour is enabled
- when diff.orderfile config option is present
  (as that is likely to reorder patches)
- when patchid.stable is true
- when --stable flag is present

Using a new flag --unstable or setting patchid.stable to false force
the historical behaviour.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 builtin/patch-id.c | 94 --
 1 file changed, 78 insertions(+), 16 deletions(-)

diff --git a/builtin/patch-id.c b/builtin/patch-id.c
index 3cfe02d..e154405 100644
--- a/builtin/patch-id.c
+++ b/builtin/patch-id.c
@@ -1,17 +1,14 @@
 #include builtin.h
 
-static void flush_current_id(int patchlen, unsigned char *id, git_SHA_CTX *c)
+static void flush_current_id(int patchlen, unsigned char *id, unsigned char 
*result)
 {
-   unsigned char result[20];
char name[50];
 
if (!patchlen)
return;
 
-   git_SHA1_Final(result, c);
memcpy(name, sha1_to_hex(id), 41);
printf(%s %s\n, sha1_to_hex(result), name);
-   git_SHA1_Init(c);
 }
 
 static int remove_space(char *line)
@@ -56,10 +53,31 @@ static int scan_hunk_header(const char *p, int *p_before, 
int *p_after)
return 1;
 }
 
-static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct 
strbuf *line_buf)
+static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx)
 {
-   int patchlen = 0, found_next = 0;
+   unsigned char hash[20];
+   unsigned short carry = 0;
+   int i;
+
+   git_SHA1_Final(hash, ctx);
+   git_SHA1_Init(ctx);
+   /* 20-byte sum, with carry */
+   for (i = 0; i  20; ++i) {
+   carry += result[i] + hash[i];
+   result[i] = carry;
+   carry = 8;
+   }
+}
+
+static int get_one_patchid(unsigned char *next_sha1, unsigned char *result,
+  struct strbuf *line_buf, int stable)
+{
+   int patchlen = 0, found_next = 0, hunks = 0;
int before = -1, after = -1;
+   git_SHA_CTX ctx, header_ctx;
+
+   git_SHA1_Init(ctx);
+   hashclr(result);
 
while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) {
char *line = line_buf-buf;
@@ -98,7 +116,19 @@ static int get_one_patchid(unsigned char *next_sha1, 
git_SHA_CTX *ctx, struct st
if (before == 0  after == 0) {
if (!memcmp(line, @@ -, 4)) {
/* Parse next hunk, but ignore line numbers.  */
+   if (stable) {
+   /* Hash the file-level headers together 
with each hunk. */
+   if (hunks) {
+   flush_one_hunk(result, ctx);
+   /* Prepend saved header ctx for 
next hunk.  */
+   memcpy(ctx, header_ctx, 
sizeof(ctx));
+   } else {
+   /* Save header ctx for next 
hunk.  */
+   memcpy(header_ctx, ctx, 
sizeof(ctx));
+   }
+   }
scan_hunk_header(line, before, after);
+   hunks++;
continue;
}
 
@@ -107,7 +137,10 @@ static int get_one_patchid(unsigned char *next_sha1, 
git_SHA_CTX *ctx, struct st
break;
 
/* Else we're parsing another header.  */
+   if (stable  hunks)
+   flush_one_hunk(result, ctx);
before = after = -1;
+   hunks = 0;
}
 
/* If we get here, we're inside a hunk.  */
@@ -119,39 +152,68 @@ static int get_one_patchid(unsigned char *next_sha1, 
git_SHA_CTX *ctx, struct st
/* Compute the sha without whitespace */

Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-23 Thread Junio C Hamano
Are these three patches the same as what has been queued on
mt/patch-id-stable topic and cooking in 'next' for a few weeks?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-23 Thread Michael S. Tsirkin
On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote:
 Are these three patches the same as what has been queued on
 mt/patch-id-stable topic and cooking in 'next' for a few weeks?

Not exactly - at your request I implemented git config
options to control patch id behaviour.
Documentation and tests updated to match.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering

2014-04-23 Thread Junio C Hamano
Michael S. Tsirkin m...@redhat.com writes:

 On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote:
 Are these three patches the same as what has been queued on
 mt/patch-id-stable topic and cooking in 'next' for a few weeks?

 Not exactly - at your request I implemented git config
 options to control patch id behaviour.
 Documentation and tests updated to match.

After comparing the patches 4-6 and the one that has been in 'next'
for a few weeks, I tried to like the new one, but I couldn't.

The new one, if I am reading the patch correctly, makes the stable
variant the default if

 - the configuration explicitly asks to use it; or

 - diff.orderfile, which is a new configuration that did not exist,
   is used.

At the first glance, the latter makes it look as if that will not
hurt any existing users/repositories, but the thing is, the producer
of the patches is different from the consumer of the patches.  There
needs to be a way for a consumer to say I need stable variant on
the patches when computing patch-id on a patch that arrived.  As
long as two different producers use two different orders, the
consumer of the patches from these two sources is forced to use the
stable variant if some sort of cache is kept keyed with the
patch-ids.

But diff.orderfile configuration is all and only about the
producer, and does not help the case at all.

Compared to it, the previous version forced people who do not have a
particular reason to choose between the variants to use the new
stable variant, which was a lot simpler solution.

The reason why I merged the previous one to 'next' was because I
wanted to see if the behaviour change is as grave as I thought, or I
was just being unnecessary cautious.  If there is no third-party
script that precomputes something and stores them under these hashes
that breaks with this change, I do not see any reason not to make
the new stable one the default.

I however suspect that the ideal stable implementation may be
different from what you implemented.  What if we compute a patch-id
on a reordered patch as if the files came in the usual order?
That would be another way to achieve the stable-ness for sane cases
(i.e. no funny you could split one patch with two hunks into two
patches with one hunk each twice mentioning the same path which is
totally an uninteresting use case---diff.orderfile would not even
produce such a patch) and a huge difference would be that it would
produce the same patch-id as existing versions of Git, if the patch
is not reordered.  Is that asking for a moon (I admit that I haven't
looked at the code involved too deeply)?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html