Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering
On Wed, Apr 23, 2014 at 03:05:42PM -0700, Junio C Hamano wrote: Michael S. Tsirkin m...@redhat.com writes: On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote: Are these three patches the same as what has been queued on mt/patch-id-stable topic and cooking in 'next' for a few weeks? Not exactly - at your request I implemented git config options to control patch id behaviour. Documentation and tests updated to match. After comparing the patches 4-6 and the one that has been in 'next' for a few weeks, I tried to like the new one, but I couldn't. I'm fine with the one in next too. I was under the impression that you wanted me to change the behaviour so I worked on this, but previous version was sufficient for my purposes (which is really all about putting diff.orderfile upstream). The new one, if I am reading the patch correctly, makes the stable variant the default if - the configuration explicitly asks to use it; or - diff.orderfile, which is a new configuration that did not exist, is used. At the first glance, the latter makes it look as if that will not hurt any existing users/repositories, but the thing is, the producer of the patches is different from the consumer of the patches. There needs to be a way for a consumer to say I need stable variant on the patches when computing patch-id on a patch that arrived. As long as two different producers use two different orders, the consumer of the patches from these two sources is forced to use the stable variant if some sort of cache is kept keyed with the patch-ids. But diff.orderfile configuration is all and only about the producer, and does not help the case at all. OK, we can just drop that, that's easy. Compared to it, the previous version forced people who do not have a particular reason to choose between the variants to use the new stable variant, which was a lot simpler solution. The reason why I merged the previous one to 'next' was because I wanted to see if the behaviour change is as grave as I thought, or I was just being unnecessary cautious. If there is no third-party script that precomputes something and stores them under these hashes that breaks with this change, I do not see any reason not to make the new stable one the default. Ah okay, so we just wait a bit and see and if all is quiet the existing patch will graduate to master with time? Totally cool. I thought you wanted me to add the config option, but if everyone is happy as is, I don't need it personally at all. I however suspect that the ideal stable implementation may be different from what you implemented. What if we compute a patch-id on a reordered patch as if the files came in the usual order? ATM patch id does not have any concept of the usual order, so that's one problem - how does one figure out what the order would be? I have no idea - is this documented anywhere? Also I'm guessing this would depend on the state of the git tree which would be another behaviour change: previously patch-id worked fine outside any git tree. That would be another way to achieve the stable-ness for sane cases (i.e. no funny you could split one patch with two hunks into two patches with one hunk each twice mentioning the same path which is totally an uninteresting use case---diff.orderfile would not even produce such a patch) Yes I'm thinking we should drop this hunk in the patch: let's support reordering files but not splitting them. This makes the change even smaller, so I now think we should just go for it. and a huge difference would be that it would produce the same patch-id as existing versions of Git, if the patch is not reordered. Is that asking for a moon (I admit that I haven't looked at the code involved too deeply)? Yes this would be a bunch of code to write - certainly much more complex than the existing small patch which just tweaks the checksum slightly to make it stable. -- MST -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering
Michael S. Tsirkin m...@redhat.com writes: On Wed, Apr 23, 2014 at 03:05:42PM -0700, Junio C Hamano wrote: After comparing the patches 4-6 and the one that has been in 'next' for a few weeks, I tried to like the new one, but I couldn't. I'm fine with the one in next too. I was under the impression that you wanted me to change the behaviour so I worked on this,... What I wanted to see was to make sure this would not kick in unless the user explicitly asks. patchid.stable configuration variable is very much welcome, and if it defaulted to false with or without diff.orderfile, that would have been very much welcome. Then nobody will be surprised. But diff.orderfile configuration is all and only about the producer, and does not help the case at all. OK, we can just drop that, that's easy. Compared to it, the previous version forced people who do not have a particular reason to choose between the variants to use the new stable variant, which was a lot simpler solution. The reason why I merged the previous one to 'next' was because I wanted to see if the behaviour change is as grave as I thought, or I was just being unnecessary cautious. If there is no third-party script that precomputes something and stores them under these hashes that breaks with this change, I do not see any reason not to make the new stable one the default. Ah okay, so we just wait a bit and see and if all is quiet the existing patch will graduate to master with time? Totally cool. I thought you wanted me to add the config option, but if everyone is happy as is, I don't need it personally at all. ... or if we see problems, then build a fix on top to introduce patchid.stable that defaults to false and not linking the feature with diff.ordefile. Adding a new feature turned-off by default is the safer thing to do. When nothing changes, by defintion there won't be a new breakage. That is the default stance this project has taken for a long time, and what made us trusted by projects that manage their source files using our product. It however is to favour stability and no surprise over progress, and it may not be an optimal thing to do if the new feature is compellingly good. In some cases like this one, we may need to experiment to find out the need of the users, and introducing the feature with a configuration that is explicitly opt-in is one way to do so. We may or may not see many people using that feature without disrupting existing users that way. Cooking in 'next' with the feature turned-on by default is another way that is riskier, but in this particular case, the population that is likely to be affected are power users, which would match the audience of 'next'---those who still want stability but want to be closer to the cutting edge. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/6] patch-id: make it stable against hunk reordering
Patch id changes if users 1. reorder file diffs that make up a patch or 2. split a patch up to multiple diffs that touch the same path (keeping hunks within a single diff ordered to make patch valid). As the result is functionally equivalent, a different patch id is surprising to many users. In particular, reordering files using diff -O is helpful to make patches more readable (e.g. API header diff before implementation diff). Add an option to change patch-id behaviour making it stable against these two kinds of patch change: 1. calculate SHA1 hash for each hunk separately and sum all hashes (using a symmetrical sum) to get patch id 2. hash the file-level headers together with each hunk (not just the first hunk) We use a 20byte sum and not xor - since xor would give 0 output for patches that have two identical diffs, which isn't all that unlikely (e.g. append the same line in two places). The new behaviour is enabled - when diff.orderfile config option is present (as that is likely to reorder patches) - when patchid.stable is true - when --stable flag is present Using a new flag --unstable or setting patchid.stable to false force the historical behaviour. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- builtin/patch-id.c | 94 -- 1 file changed, 78 insertions(+), 16 deletions(-) diff --git a/builtin/patch-id.c b/builtin/patch-id.c index 3cfe02d..e154405 100644 --- a/builtin/patch-id.c +++ b/builtin/patch-id.c @@ -1,17 +1,14 @@ #include builtin.h -static void flush_current_id(int patchlen, unsigned char *id, git_SHA_CTX *c) +static void flush_current_id(int patchlen, unsigned char *id, unsigned char *result) { - unsigned char result[20]; char name[50]; if (!patchlen) return; - git_SHA1_Final(result, c); memcpy(name, sha1_to_hex(id), 41); printf(%s %s\n, sha1_to_hex(result), name); - git_SHA1_Init(c); } static int remove_space(char *line) @@ -56,10 +53,31 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after) return 1; } -static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct strbuf *line_buf) +static void flush_one_hunk(unsigned char *result, git_SHA_CTX *ctx) { - int patchlen = 0, found_next = 0; + unsigned char hash[20]; + unsigned short carry = 0; + int i; + + git_SHA1_Final(hash, ctx); + git_SHA1_Init(ctx); + /* 20-byte sum, with carry */ + for (i = 0; i 20; ++i) { + carry += result[i] + hash[i]; + result[i] = carry; + carry = 8; + } +} + +static int get_one_patchid(unsigned char *next_sha1, unsigned char *result, + struct strbuf *line_buf, int stable) +{ + int patchlen = 0, found_next = 0, hunks = 0; int before = -1, after = -1; + git_SHA_CTX ctx, header_ctx; + + git_SHA1_Init(ctx); + hashclr(result); while (strbuf_getwholeline(line_buf, stdin, '\n') != EOF) { char *line = line_buf-buf; @@ -98,7 +116,19 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st if (before == 0 after == 0) { if (!memcmp(line, @@ -, 4)) { /* Parse next hunk, but ignore line numbers. */ + if (stable) { + /* Hash the file-level headers together with each hunk. */ + if (hunks) { + flush_one_hunk(result, ctx); + /* Prepend saved header ctx for next hunk. */ + memcpy(ctx, header_ctx, sizeof(ctx)); + } else { + /* Save header ctx for next hunk. */ + memcpy(header_ctx, ctx, sizeof(ctx)); + } + } scan_hunk_header(line, before, after); + hunks++; continue; } @@ -107,7 +137,10 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st break; /* Else we're parsing another header. */ + if (stable hunks) + flush_one_hunk(result, ctx); before = after = -1; + hunks = 0; } /* If we get here, we're inside a hunk. */ @@ -119,39 +152,68 @@ static int get_one_patchid(unsigned char *next_sha1, git_SHA_CTX *ctx, struct st /* Compute the sha without whitespace */
Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering
Are these three patches the same as what has been queued on mt/patch-id-stable topic and cooking in 'next' for a few weeks? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering
On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote: Are these three patches the same as what has been queued on mt/patch-id-stable topic and cooking in 'next' for a few weeks? Not exactly - at your request I implemented git config options to control patch id behaviour. Documentation and tests updated to match. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 4/6] patch-id: make it stable against hunk reordering
Michael S. Tsirkin m...@redhat.com writes: On Wed, Apr 23, 2014 at 10:39:23AM -0700, Junio C Hamano wrote: Are these three patches the same as what has been queued on mt/patch-id-stable topic and cooking in 'next' for a few weeks? Not exactly - at your request I implemented git config options to control patch id behaviour. Documentation and tests updated to match. After comparing the patches 4-6 and the one that has been in 'next' for a few weeks, I tried to like the new one, but I couldn't. The new one, if I am reading the patch correctly, makes the stable variant the default if - the configuration explicitly asks to use it; or - diff.orderfile, which is a new configuration that did not exist, is used. At the first glance, the latter makes it look as if that will not hurt any existing users/repositories, but the thing is, the producer of the patches is different from the consumer of the patches. There needs to be a way for a consumer to say I need stable variant on the patches when computing patch-id on a patch that arrived. As long as two different producers use two different orders, the consumer of the patches from these two sources is forced to use the stable variant if some sort of cache is kept keyed with the patch-ids. But diff.orderfile configuration is all and only about the producer, and does not help the case at all. Compared to it, the previous version forced people who do not have a particular reason to choose between the variants to use the new stable variant, which was a lot simpler solution. The reason why I merged the previous one to 'next' was because I wanted to see if the behaviour change is as grave as I thought, or I was just being unnecessary cautious. If there is no third-party script that precomputes something and stores them under these hashes that breaks with this change, I do not see any reason not to make the new stable one the default. I however suspect that the ideal stable implementation may be different from what you implemented. What if we compute a patch-id on a reordered patch as if the files came in the usual order? That would be another way to achieve the stable-ness for sane cases (i.e. no funny you could split one patch with two hunks into two patches with one hunk each twice mentioning the same path which is totally an uninteresting use case---diff.orderfile would not even produce such a patch) and a huge difference would be that it would produce the same patch-id as existing versions of Git, if the patch is not reordered. Is that asking for a moon (I admit that I haven't looked at the code involved too deeply)? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html