On Mon, Feb 26, 2018 at 03:46:35PM -0500, Jeff King wrote:
> On Mon, Feb 26, 2018 at 06:35:33PM +0100, Torsten Bögershausen wrote:
>
> > > diff --git a/userdiff.c b/userdiff.c
> > > index dbfb4e13cd..48fa7e8bdd 100644
> > > --- a/userdiff.c
> > > +++ b/userdiff.c
> > > @@ -161,6 +161,7 @@ IPATTERN("css",
> > > "-?[_a-zA-Z][-_a-zA-Z0-9]*" /* identifiers */
> > > "|-?[0-9]+|\\#[0-9a-fA-F]+" /* numbers */
> > > ),
> > > +{ "utf16", NULL, -1, { NULL, 0 }, NULL, "iconv:utf16" },
> > > { "default", NULL, -1, { NULL, 0 } },
> > > };
> > > #undef PATTERNS
> >
> > The patch looks like a possible step into the right direction -
> > some minor notes: "utf8" is better written as "UTF-8", when talking
> > to iconv.h, same for utf16.
> >
> > But, how do I activate the diff ?
> > I have in .gitattributes
> > XXXenglish.txt diff=UTF-16
> >
> > and in .git/config
> > [diff "UTF-16"]
> > command = iconv:UTF-16
> >
> >
> > What am I doing wrong ?
>
> After applying the patch, if I do:
>
> git init
> echo hello | iconv -f utf8 -t utf16 >file
> git add file
> git commit -m one
> echo goodbye | iconv -f utf8 -t utf16 >file
> git add file
> git commit -m two
>
> then:
>
> git log -p
>
> shows "binary files differ" but:
>
> echo "file diff=utf16" >.gitattributes
> git log -p
>
> shows text diffs. I assume you tweaked the patch before switching to
> the UTF-16 spelling in your example. Did you use a plumbing command to
> show the diff? textconv isn't enabled for plumbing, because the
> resulting patches cannot actually be applied (in that sense an encoding
> switch is potentially special, since in theory one could convert to the
> canonical text format, apply the patch, and then convert back).
>
> -Peff
Thanks for helping me out.
I didn't use "git log -p", but a simple "git diff".
(And after re-using utf16 with lowercase, it works as you described it)
I wasn't aware of "git log -p", something learned (or re-learned)
The other question is:
Would this help showing diffs of UTF-16 encoded files on a "git hoster",
github/bitbucket/.... ?
Or would the auto-magic UTF-16 avoid binary patch that I send out be more
helpful ?
Or both ?
Or the w-t-e encoding ?
Questions over questions.