Re: [PATCH 2/2] pickaxe: use textconv for -S counting
Jeff King p...@peff.net writes: On Tue, Nov 13, 2012 at 03:13:19PM -0800, Junio C Hamano wrote: static int has_changes(struct diff_filepair *p, struct diff_options *o, regex_t *regexp, kwset_t kws) { + struct userdiff_driver *textconv_one = get_textconv(p-one); + struct userdiff_driver *textconv_two = get_textconv(p-two); + mmfile_t mf1, mf2; + int ret; + if (!o-pickaxe[0]) return 0; - if (!DIFF_FILE_VALID(p-one)) { - if (!DIFF_FILE_VALID(p-two)) - return 0; /* ignore unmerged */ What happened to this part that avoids showing nonsense for unmerged paths? It's moved down. fill_one will return an empty mmfile if !DIFF_FILE_VALID, so we end up here: fill_one(p-one, mf1, textconv_one); fill_one(p-two, mf2, textconv_two); if (!mf1.ptr) { if (!mf2.ptr) ret = 0; /* ignore unmerged */ Prior to this change, we didn't use fill_one, so we had to check manually. + /* + * If we have an unmodified pair, we know that the count will be the + * same and don't even have to load the blobs. Unless textconv is in + * play, _and_ we are using two different textconv filters (e.g., + * because a pair is an exact rename with different textconv attributes + * for each side, which might generate different content). + */ + if (textconv_one == textconv_two diff_unmodified_pair(p)) + return 0; I am not sure about this part that cares about the textconv. Wouldn't the normal git diff A B skip the filepair that are unmodified in the first place at the object name level without even looking at the contents (see e.g. diff_flush_patch())? Hmph. The point was to find the case when the paths are different (e.g., in a rename), and therefore the textconvs might be different. But I think I missed the fact that diff_unmodified_pair will note the difference in paths. So just calling diff_unmodified_pair would be sufficient, as the code prior to my patch does. I thought the point was an optimization to avoid comparing contains() on the same data (which we can know will match without looking at it). Yes. Exact renames are the obvious one, but they are not handled here. That is half true. Before this change, we will find the same number of needles and this function would have said no differences in a very inefficient way. After this change, we may apply different textconv filters and this function will say there is a difference, even though we wouldn't see such a difference at the content level if there wasn't any rename. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] pickaxe: use textconv for -S counting
On Tue, Nov 13, 2012 at 03:13:19PM -0800, Junio C Hamano wrote: static int has_changes(struct diff_filepair *p, struct diff_options *o, regex_t *regexp, kwset_t kws) { + struct userdiff_driver *textconv_one = get_textconv(p-one); + struct userdiff_driver *textconv_two = get_textconv(p-two); + mmfile_t mf1, mf2; + int ret; + if (!o-pickaxe[0]) return 0; - if (!DIFF_FILE_VALID(p-one)) { - if (!DIFF_FILE_VALID(p-two)) - return 0; /* ignore unmerged */ What happened to this part that avoids showing nonsense for unmerged paths? It's moved down. fill_one will return an empty mmfile if !DIFF_FILE_VALID, so we end up here: fill_one(p-one, mf1, textconv_one); fill_one(p-two, mf2, textconv_two); if (!mf1.ptr) { if (!mf2.ptr) ret = 0; /* ignore unmerged */ Prior to this change, we didn't use fill_one, so we had to check manually. + /* +* If we have an unmodified pair, we know that the count will be the +* same and don't even have to load the blobs. Unless textconv is in +* play, _and_ we are using two different textconv filters (e.g., +* because a pair is an exact rename with different textconv attributes +* for each side, which might generate different content). +*/ + if (textconv_one == textconv_two diff_unmodified_pair(p)) + return 0; I am not sure about this part that cares about the textconv. Wouldn't the normal git diff A B skip the filepair that are unmodified in the first place at the object name level without even looking at the contents (see e.g. diff_flush_patch())? Hmph. The point was to find the case when the paths are different (e.g., in a rename), and therefore the textconvs might be different. But I think I missed the fact that diff_unmodified_pair will note the difference in paths. So just calling diff_unmodified_pair would be sufficient, as the code prior to my patch does. I thought the point was an optimization to avoid comparing contains() on the same data (which we can know will match without looking at it). Exact renames are the obvious one, but they are not handled here. So I am not sure of the point (to catch git diff $blob1 $blob2 when the two are identical? I am not sure at what layer we cull that from the diff queue). So there is room for optimization here on exact renames, but diff_unmodified_pair is too forgiving of what is interesting (a rename is interesting to diff_flush_patch, because it wants to mention the rename, but it is not interesting to pickaxe, because we did not change the content, and it could be culled here). I don't know that it is that big a deal in general. Pure renames are going to be the minority of blobs we look at, so it is probably not even measurable. You could construct a pathological case (e.g., an otherwise small repo with a 2G file, rename the 2G file without modification, then running git log -Sfoo will unnecessarily load the giant blob while examining the rename commit). Shouldn't this part of the code emulating that behaviour no matter what textconv filter(s) are configured for these paths? Yeah, I just missed that it is checking the path already. It may still make sense to tighten the optimization, but that is a separate issue. It should just check diff_unmodified_pair as before; textconv only matters if you are trying to optimize out exact renames. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] pickaxe: use textconv for -S counting
Jeff King p...@peff.net writes: We currently just look at raw blob data when using -S to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). Oookay, I suppose... static int has_changes(struct diff_filepair *p, struct diff_options *o, regex_t *regexp, kwset_t kws) { + struct userdiff_driver *textconv_one = get_textconv(p-one); + struct userdiff_driver *textconv_two = get_textconv(p-two); + mmfile_t mf1, mf2; + int ret; + if (!o-pickaxe[0]) return 0; - if (!DIFF_FILE_VALID(p-one)) { - if (!DIFF_FILE_VALID(p-two)) - return 0; /* ignore unmerged */ What happened to this part that avoids showing nonsense for unmerged paths? + /* + * If we have an unmodified pair, we know that the count will be the + * same and don't even have to load the blobs. Unless textconv is in + * play, _and_ we are using two different textconv filters (e.g., + * because a pair is an exact rename with different textconv attributes + * for each side, which might generate different content). + */ + if (textconv_one == textconv_two diff_unmodified_pair(p)) + return 0; I am not sure about this part that cares about the textconv. Wouldn't the normal git diff A B skip the filepair that are unmodified in the first place at the object name level without even looking at the contents (see e.g. diff_flush_patch())? Shouldn't this part of the code emulating that behaviour no matter what textconv filter(s) are configured for these paths? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] pickaxe: use textconv for -S counting
We currently just look at raw blob data when using -S to pickaxe. This is mostly historical, as pickaxe predates the textconv feature. If the user has bothered to define a textconv filter, it is more likely that their search string will be on the textconv output, as that is what they will see in the diff (and we do not even provide a mechanism for them to search for binary needles that contain NUL characters). This patch teaches -S to use textconv, just as we already do for -G. Signed-off-by: Jeff King p...@peff.net --- diffcore-pickaxe.c | 56 +--- t/t4030-diff-textconv.sh | 12 +++ 2 files changed, 51 insertions(+), 17 deletions(-) diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index 61f628c..b097fa7 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -157,17 +157,15 @@ static unsigned int contains(struct diff_filespec *one, struct diff_options *o, return; } -static unsigned int contains(struct diff_filespec *one, struct diff_options *o, +static unsigned int contains(mmfile_t *mf, struct diff_options *o, regex_t *regexp, kwset_t kws) { unsigned int cnt; unsigned long sz; const char *data; - if (diff_populate_filespec(one, 0)) - return 0; - sz = one-size; - data = one-data; + sz = mf-size; + data = mf-ptr; cnt = 0; if (regexp) { @@ -197,29 +195,53 @@ static int has_changes(struct diff_filepair *p, struct diff_options *o, cnt++; } } - diff_free_filespec_data(one); return cnt; } static int has_changes(struct diff_filepair *p, struct diff_options *o, regex_t *regexp, kwset_t kws) { + struct userdiff_driver *textconv_one = get_textconv(p-one); + struct userdiff_driver *textconv_two = get_textconv(p-two); + mmfile_t mf1, mf2; + int ret; + if (!o-pickaxe[0]) return 0; - if (!DIFF_FILE_VALID(p-one)) { - if (!DIFF_FILE_VALID(p-two)) - return 0; /* ignore unmerged */ + /* +* If we have an unmodified pair, we know that the count will be the +* same and don't even have to load the blobs. Unless textconv is in +* play, _and_ we are using two different textconv filters (e.g., +* because a pair is an exact rename with different textconv attributes +* for each side, which might generate different content). +*/ + if (textconv_one == textconv_two diff_unmodified_pair(p)) + return 0; + + fill_one(p-one, mf1, textconv_one); + fill_one(p-two, mf2, textconv_two); + + if (!mf1.ptr) { + if (!mf2.ptr) + ret = 0; /* ignore unmerged */ /* created */ - return contains(p-two, o, regexp, kws) != 0; - } - if (!DIFF_FILE_VALID(p-two)) - return contains(p-one, o, regexp, kws) != 0; - if (!diff_unmodified_pair(p)) { - return contains(p-one, o, regexp, kws) != - contains(p-two, o, regexp, kws); + ret = contains(mf2, o, regexp, kws) != 0; } - return 0; + else if (!mf2.ptr) /* removed */ + ret = contains(mf1, o, regexp, kws) != 0; + else + ret = contains(mf1, o, regexp, kws) != + contains(mf2, o, regexp, kws); + + if (textconv_one) + free(mf1.ptr); + if (textconv_two) + free(mf2.ptr); + diff_free_filespec_data(p-one); + diff_free_filespec_data(p-two); + + return ret; } static void diffcore_pickaxe_count(struct diff_options *o) diff --git a/t/t4030-diff-textconv.sh b/t/t4030-diff-textconv.sh index 461d27a..53ec330 100755 --- a/t/t4030-diff-textconv.sh +++ b/t/t4030-diff-textconv.sh @@ -96,6 +96,18 @@ test_expect_success 'grep-diff (-G) operates on textconv data (modification)' ' test_cmp expect actual ' +test_expect_success 'pickaxe (-S) operates on textconv data (add)' ' + echo one expect + git log --root --format=%s -S0 actual + test_cmp expect actual +' + +test_expect_success 'pickaxe (-S) operates on textconv data (modification)' ' + echo two expect + git log --root --format=%s -S1 actual + test_cmp expect actual +' + cat expect.stat 'EOF' file | Bin 2 - 4 bytes 1 file changed, 0 insertions(+), 0 deletions(-) -- 1.8.0.3.g3456896 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html