Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote: > Alexey Shumkin wrote: > > On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: > > >> 1. Log messages use the configured log output encoding, which is > >> meant to be whatever encoding works best with local terminals > >> (and does not have much to do with what encoding should be used > >> for email) > >> > >> 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw > >> port (which uses Unicode filesystem APIs), always UTF-8 > > > > I cannot say exactly if it makes sense for THIS patch, but I'd like to > > remind about Cygwin port, which definitely does not use UTF-8 encoding > > (in my case it is Windows-1251) for filenames. > > > >> > >> 3. The "This is an automated email" preface uses a project description > >> from .git/description, which is typically in UTF-8 to support > >> gitweb. > > Thanks for clarifying. So in the context you describe, (1) is > configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and > there is no way with current git facilities to force the email to use > a single encoding unless (3) happens to contain no special characters. > > What is the value of the "[i18n] commitEncoding" setting in your > project? commitEncoding is equal to filenames' encoding, Windows-1251, of course. > What encoding do the raw commit messages (shown with > "git log --format=raw") use for their text, and what do they declare > with an in-commit 'encoding' header, if any? Well, despite `git log --help` --8<-- raw The raw format shows the entire commit exactly as stored in the commit object" --8<-- on a Linux box (UTF-8) I can see "readable" commit messages nevertheless they are stored in 'Windows-1251' (so they are converted to UTF-8). To be sure I've checked actual content of them with `git cat-file commit` Actually, to be honest, I usually use modified version of Git (see ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could affect the results, so I've checked `git log --format=raw` with unmodified v1.8.3.3 of Git. But let's go back to the answer to your question. Commit encoding stored as a header in a raw commit messages is 'Windows-1251'. > > Does everyone on this project use Cygwin?i This is a "closed" (commercial) project and every developer uses Cygwin, except me. I use a Linux box as a desktop (mail, IM, web-browsing; but development goes on Cygwin). And sometimes I run utility scripts included to that project on my desktop (as far as Linux works with files much faster than Cygwin does ;)) Also, a Git server is a coLinux box (http://www.colinux.org/) on a Windows Server 2003, but I guess, it does not much matter here. > That should be fine, but > I'd expect there to be problems as soon as someone wants to try the > Mingw port ("Git for Windows"). Yep, one of our developers tried to use modern version of TortoiseGit with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW port transcodes filenames to store them internally in UTF-8. This problem could be solved with converting once that non-ASCII filenames to UTF-8, but I do not want to use MinGW port. I like Cygwin "infrastructure" that is more Linux-like than MinGW. > > I wonder if there should be an "[i18n] repositoryPathEncoding" > configuration item to support this kind of repository. Then git could > be aware of the intended encoding of paths, could recode them for > display to a terminal, and at least on Linux and Mingw could recode > them for use in filenames on disk. "repositoryPathEncoding = none" > would mean the current behavior of treating paths as raw sequences of > bytes. I'd be happy if such a setting exists. That could solve many problems with cross-platform projects with non-ASCII filenames. Indeed, MinGW port does resolve that problem somehow! > > What do you think? > Jonathan -- Alexey Shumkin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
Alexey Shumkin wrote: > On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: >> 1. Log messages use the configured log output encoding, which is >> meant to be whatever encoding works best with local terminals >> (and does not have much to do with what encoding should be used >> for email) >> >> 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw >> port (which uses Unicode filesystem APIs), always UTF-8 > > I cannot say exactly if it makes sense for THIS patch, but I'd like to > remind about Cygwin port, which definitely does not use UTF-8 encoding > (in my case it is Windows-1251) for filenames. > >> >> 3. The "This is an automated email" preface uses a project description >> from .git/description, which is typically in UTF-8 to support >> gitweb. Thanks for clarifying. So in the context you describe, (1) is configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and there is no way with current git facilities to force the email to use a single encoding unless (3) happens to contain no special characters. What is the value of the "[i18n] commitEncoding" setting in your project? What encoding do the raw commit messages (shown with "git log --format=raw") use for their text, and what do they declare with an in-commit 'encoding' header, if any? Does everyone on this project use Cygwin? That should be fine, but I'd expect there to be problems as soon as someone wants to try the Mingw port ("Git for Windows"). I wonder if there should be an "[i18n] repositoryPathEncoding" configuration item to support this kind of repository. Then git could be aware of the intended encoding of paths, could recode them for display to a terminal, and at least on Linux and Mingw could recode them for use in filenames on disk. "repositoryPathEncoding = none" would mean the current behavior of treating paths as raw sequences of bytes. What do you think? Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote: > Git commands write commit messages in UTF-8 by default, but that > default can be overridden by the [i18n] commitEncoding and > logOutputEncoding settings. With such a setting, the emails written > by the post-receive-email hook use a mixture of encodings: > > 1. Log messages use the configured log output encoding, which is > meant to be whatever encoding works best with local terminals > (and does not have much to do with what encoding should be used > for email) > > 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw > port (which uses Unicode filesystem APIs), always UTF-8 I cannot say exactly if it makes sense for THIS patch, but I'd like to remind about Cygwin port, which definitely does not use UTF-8 encoding (in my case it is Windows-1251) for filenames. > > 3. The "This is an automated email" preface uses a project description > from .git/description, which is typically in UTF-8 to support > gitweb. > > So (1) is configurable, and (2) and (3) are unconfigurable and > typically UTF-8. Override the log output encoding to always use UTF-8 > when writing the email to get the best chance of a comprehensible > single-encoding email. I cannot agree to receive e-mails in UTF-8 only for Windows projects which have non-UTF-8 encoding. I want to see and read correctly formed e-mail without any corrupted symbols instead of filenames (that is the main problem here as far as filenames are not converted unlike log messages) > > Signed-off-by: Jonathan Nieder > --- > contrib/hooks/post-receive-email | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/contrib/hooks/post-receive-email > b/contrib/hooks/post-receive-email > index 72084511..ba93a0d8 100755 > --- a/contrib/hooks/post-receive-email > +++ b/contrib/hooks/post-receive-email > @@ -471,7 +471,7 @@ generate_delete_branch_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git diff-tree -s --always --pretty=oneline $oldrev > + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev > echo $LOGEND > } > > @@ -571,7 +571,7 @@ generate_delete_atag_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git diff-tree -s --always --pretty=oneline $oldrev > + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev > echo $LOGEND > } > > @@ -617,7 +617,7 @@ generate_general_email() > echo "" > if [ "$newrev_type" = "commit" ]; then > echo $LOGBEGIN > - git diff-tree -s --always --pretty=medium $newrev > + git diff-tree -s --always --encoding=UTF-8 --pretty=medium > $newrev > echo $LOGEND > else > # What can we do here? The tag marks an object that is not > @@ -636,7 +636,7 @@ generate_delete_general_email() > echo " was $oldrev" > echo "" > echo $LOGBEGIN > - git diff-tree -s --always --pretty=oneline $oldrev > + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev > echo $LOGEND > } > > -- > 1.8.4.rc1 > -- Alexey Shumkin -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8
Git commands write commit messages in UTF-8 by default, but that default can be overridden by the [i18n] commitEncoding and logOutputEncoding settings. With such a setting, the emails written by the post-receive-email hook use a mixture of encodings: 1. Log messages use the configured log output encoding, which is meant to be whatever encoding works best with local terminals (and does not have much to do with what encoding should be used for email) 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw port (which uses Unicode filesystem APIs), always UTF-8 3. The "This is an automated email" preface uses a project description from .git/description, which is typically in UTF-8 to support gitweb. So (1) is configurable, and (2) and (3) are unconfigurable and typically UTF-8. Override the log output encoding to always use UTF-8 when writing the email to get the best chance of a comprehensible single-encoding email. Signed-off-by: Jonathan Nieder --- contrib/hooks/post-receive-email | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 72084511..ba93a0d8 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -471,7 +471,7 @@ generate_delete_branch_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -571,7 +571,7 @@ generate_delete_atag_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } @@ -617,7 +617,7 @@ generate_general_email() echo "" if [ "$newrev_type" = "commit" ]; then echo $LOGBEGIN - git diff-tree -s --always --pretty=medium $newrev + git diff-tree -s --always --encoding=UTF-8 --pretty=medium $newrev echo $LOGEND else # What can we do here? The tag marks an object that is not @@ -636,7 +636,7 @@ generate_delete_general_email() echo " was $oldrev" echo "" echo $LOGBEGIN - git diff-tree -s --always --pretty=oneline $oldrev + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev echo $LOGEND } -- 1.8.4.rc1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html