On Tue, Feb 27, 2018 at 04:25:38PM -0500, Jeff King wrote:
> On Tue, Feb 27, 2018 at 10:05:17PM +0100, Torsten Bögershausen wrote:
>
> > The other question is:
> > Would this help showing diffs of UTF-16 encoded files on a "git hoster",
> > github/bitbucket/.... ?
>
> Almost. There's probably one more thing needed. We don't currently read
> in-tree .gitattributes when doing a diff in a bare repository. And most
> hosting sites will store bare repositories.
>
> And of course it would require the users to actually set the attributes
> themselves.
>
> > Or would the auto-magic UTF-16 avoid binary patch that I send out be more
> > helpful ?
> > Or both ?
> > Or the w-t-e encoding ?
>
> Of the three solutions, I think the relative merits are something like
> this:
>
> 1. baked-in textconv (my patch)
>
> - reuses an existing diff feature, so minimal code and not likely to
> break things
>
> - requires people to add a .gitattributes entry
>
> - needs work to make bare-repo .gitattributes work (though I think
> this would be useful for other features, too)
>
> - has a run-time cost at each diff to do the conversion
>
> - may sometimes annoy people when it doesn't kick in (e.g.,
> emailed patches from format-patch won't have a readable diff)
>
> - doesn't combine with other custom-diff config (e.g., utf-16
> storing C code should still use diff=c funcname rules, but
> wouldn't with my patch)
>
> 2. auto-detect utf-16 (your patch)
> - Just Works for existing repositories storing utf-16
>
> - carries some risk of kicking in when people would like it not to
> (e.g., when they really do want a binary patch that can be
> applied).
The binary patch is still supported, but that detail may need some more
explanation
in the commit message. Please see t4066-diff-encoding.sh
test_expect_success 'diff --binary against local change' '
cp file2 file &&
test_tick &&
cat >expect <<-\EOF &&
diff --git a/file b/file
index
26acf09b0aad19fb22566956d1a39cb4e2a3b420..e98d27acfb90cfcfc84fcc5173baa4aa7828290f
100644
GIT binary patch
literal 28
ecmezW?;ArgLn;Fo!ykquAe{qbJq3!C0BHb{ln3Pi
literal 32
icmezW?+HT@Lpnn$kmO?c#!w7oaWVX1NCMJ1Ko$VA_z0~4
EOF
git diff --binary file >actual &&
test_cmp expect actual
>
> I think it would probably be OK if this kicked in only when
> ALLOW_TEXTCONV is set (the default for porcelain), and --binary
> is not (i.e., when we would otherwise just say "binary
> files differ").
The user can still use "git diff" (Where auto-detection of UTF-16 kicks in
and replaces "binary files differ" with an UTF-8 diff.
When the user wants a patch, "git diff --binary" will generate a binary patch,
as before.
The only thing which is missing is the line "binary files differ", which may be
a
regression. I can re-add it in V2.
>
> - similar to (1), carries a run-time cost for each diff, and users
> may sometimes still see binary diffs
>
> 3. w-t-e (Lars's patch)
>
> - requires no server-side modifications; the diff is plain vanilla
>
> - works everywhere you diff, plumbing and porcelain
>
> - does require people to add a .gitattributes entry
>
> - run-time cost is per-checkout, not per-diff
>
> So I can see room for (3) to co-exist alongside the others. Between (1)
> and (2), I think (2) is probably the better direction.
>
> -Peff