Am 16.04.2015 um 20:39 schrieb Junio C Hamano:
> This is on top of the ".gitignore can start with UTF8 BOM" patch
> from Carlos.
> 
> Second try; the first patch is new to clarify the logic in the
> codeflow after Carlos's patch, and the second one has been adjusted
> accordingly.
> 
> Junio C Hamano (4):
>   add_excludes_from_file: clarify the bom skipping logic
>   utf8-bom: introduce skip_utf8_bom() helper
>   config: use utf8_bom[] from utf.[ch] in git_parse_source()
>   attr: skip UTF8 BOM at the beginning of the input file
> 


Wouldn't it be better to just strip the BOM on commit, e.g. via a clean filter 
or pre-commit hook (as suggested in [1])? Or is this patch series only meant to 
supplement such a solution (i.e. only strip the BOM when reading files from the 
working-copy rather than the committed tree)?


According to rfc3629 chapter 6 [2], the use of a BOM as encoding signature 
should be forbidden if the encoding is *known* to be always UTF-8. And 
.gitignore, .gitattributes and .gitmodules contain path names, which are always 
UTF-8 as of Git for Windows v1.7.10.

IOW, allowing a BOM would mean that files *without* BOM are *not* UTF-8 and 
need to be decoded from e.g. system encoding (which unfortunately cannot be set 
to UTF-8 on Windows). But this makes no sense as the repository would not be 
portable. E.g. a .gitattributes file created on a Greek Windows, containing 
greek path names in Cp1253, would not work on platforms with different encoding.

On the other hand, just ignoring the BOM (as this patch series does) leaves us 
with two alternative binary representations of the same content file...i.e. 
we'll eventually end up with spurious 1st line changes as users add / remove 
BOMs from committed .git[ignore|attributes|modules] files, depending on their 
editor preference...


For local files (.gitconfig, .git/info/exclude, .git/COMMIT_EDITMSG...), 
auto-detecting encoding based on the presence of a BOM makes somewhat more 
sense. However, this will most likely break editors that follow the 
recommendation of the Unicode specification ("Use of a BOM is neither required 
nor recommended for UTF-8" [3]). So we'd probably need a core.editorEncoding or 
core.editorUseBom setting to tell git whether "no BOM" means UTF-8 or system 
encoding...

Just as a reminder: we should update the Git for Windows Unicode document [4] 
if we improve support for BOM-adamant editors.

Cheers,
Karsten

[1] 
http://stackoverflow.com/questions/27223985/git-ignore-bom-prevent-git-diff-from-showing-byte-order-mark-changes
[2] https://tools.ietf.org/html/rfc3629
[3] http://www.unicode.org/versions/Unicode7.0.0/ch02.pdf  p.40
[4] 
https://github.com/msysgit/msysgit/wiki/Git-for-Windows-Unicode-Support#editor


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to