Hello,

I’ve noticed git gui and gitk seem to have problems decoding certain
unicode characters. E.g., when a commit contains the character «👍»
(thumbs up sign; U+1F44D) in UTF-8 encoding, this character will show
as «ðŸ‘» in gitk. git gui also displays it using the same sequence.
When trying to stage lines within the context of such characters, the
program will error out (corrupt patch).

The character sequence appears to be mojibake introduced by decoding
UTF-8 as ISO-8859-1. However, my locale is set to «en_US.utf8». git gui
is also set to assume UTF-8 encoding for files, and in the list menu
where this encoding is selected, it lists the UTF-8 option under
«system encoding», which suggests that my locale is correctly picked
up.

Is there perchance any heuristics in place which tries decoding files
as unicode, with a fall-back to latin1? If so, then potentially the bug
could be due to U+1F44D tripping up the decoder, triggering a
fall-back, and rendering the characters as mojibake.

I’ve noticed a perhaps related glitch when the options in git gui is
shown. My committer name contains the character «ß» (latin small letter
sharp s; U+00DF). The text field in the options dialog displays this as
«ÃŸ», which also seems to be UTF-8 to latin1 mojibake. Curiously, the
same character displays just fine when staging parts of files via git
gui, so the issue is not quite the same as the one described above.

Best regards,
Tobias


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to