Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-05 Thread Alexey Shumkin
On Sun, Aug 04, 2013 at 11:14:40AM -0700, Jonathan Nieder wrote:
> Alexey Shumkin wrote:
> > On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
> 
> >>  1. Log messages use the configured log output encoding, which is
> >> meant to be whatever encoding works best with local terminals
> >> (and does not have much to do with what encoding should be used
> >> for email)
> >>
> >>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
> >> port (which uses Unicode filesystem APIs), always UTF-8
> >
> > I cannot say exactly if it makes sense for THIS patch, but I'd like to
> > remind about Cygwin port, which definitely does not use UTF-8 encoding
> > (in my case it is Windows-1251) for filenames.
> >
> >> 
> >>  3. The "This is an automated email" preface uses a project description
> >> from .git/description, which is typically in UTF-8 to support
> >> gitweb.
> 
> Thanks for clarifying.  So in the context you describe, (1) is
> configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
> there is no way with current git facilities to force the email to use
> a single encoding unless (3) happens to contain no special characters.
> 
> What is the value of the "[i18n] commitEncoding" setting in your
> project?
commitEncoding is equal to filenames' encoding, Windows-1251, of course.

> What encoding do the raw commit messages (shown with
> "git log --format=raw") use for their text, and what do they declare
> with an in-commit 'encoding' header, if any?
Well, despite `git log --help` 
--8<--
raw
   The raw format shows the entire commit exactly as stored in
   the commit object"
--8<--
on a Linux box (UTF-8) I can see "readable" commit messages nevertheless
they are stored in 'Windows-1251' (so they are converted to UTF-8). To
be sure I've checked actual content of them with `git cat-file commit`
Actually, to be honest, I usually use modified version of Git (see
ecaee8050cec23eb4cf082512e907e3e52c20b57) in 'next' branch, that could
affect the results, so I've checked `git log --format=raw` with
unmodified v1.8.3.3 of Git.

But let's go back to the answer to your question. Commit encoding stored
as a header in a raw commit messages is 'Windows-1251'.
> 
> Does everyone on this project use Cygwin?i
This is a "closed" (commercial) project and every developer uses Cygwin,
except me. I use a Linux box as a desktop (mail, IM, web-browsing; but
development goes on Cygwin). And sometimes I run utility scripts
included to that project on my desktop (as far as Linux works with files
much faster than Cygwin does ;))
Also, a Git server is a coLinux box (http://www.colinux.org/) on a
Windows Server 2003, but I guess, it does not much matter here.
>  That should be fine, but
> I'd expect there to be problems as soon as someone wants to try the
> Mingw port ("Git for Windows").
Yep, one of our developers tried to use modern version of TortoiseGit
with MinGW port of Git. That was a failure. As far as since v1.7.9 MinGW
port transcodes filenames to store them internally in UTF-8. This
problem could be solved with converting once that non-ASCII filenames to
UTF-8, but I do not want to use MinGW port. I like Cygwin
"infrastructure" that is more Linux-like than MinGW.
> 
> I wonder if there should be an "[i18n] repositoryPathEncoding"
> configuration item to support this kind of repository.  Then git could
> be aware of the intended encoding of paths, could recode them for
> display to a terminal, and at least on Linux and Mingw could recode
> them for use in filenames on disk.  "repositoryPathEncoding = none"
> would mean the current behavior of treating paths as raw sequences of
> bytes.
I'd be happy if such a setting exists. That could solve many problems
with cross-platform projects with non-ASCII filenames.
Indeed, MinGW port does resolve that problem somehow!
> 
> What do you think?
> Jonathan

-- 
Alexey Shumkin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-04 Thread Jonathan Nieder
Alexey Shumkin wrote:
> On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:

>>  1. Log messages use the configured log output encoding, which is
>> meant to be whatever encoding works best with local terminals
>> (and does not have much to do with what encoding should be used
>> for email)
>>
>>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
>> port (which uses Unicode filesystem APIs), always UTF-8
>
> I cannot say exactly if it makes sense for THIS patch, but I'd like to
> remind about Cygwin port, which definitely does not use UTF-8 encoding
> (in my case it is Windows-1251) for filenames.
>
>> 
>>  3. The "This is an automated email" preface uses a project description
>> from .git/description, which is typically in UTF-8 to support
>> gitweb.

Thanks for clarifying.  So in the context you describe, (1) is
configurable, (2) is Windows-1251, (3) is unconfigurably UTF-8, and
there is no way with current git facilities to force the email to use
a single encoding unless (3) happens to contain no special characters.

What is the value of the "[i18n] commitEncoding" setting in your
project?  What encoding do the raw commit messages (shown with
"git log --format=raw") use for their text, and what do they declare
with an in-commit 'encoding' header, if any?

Does everyone on this project use Cygwin?  That should be fine, but
I'd expect there to be problems as soon as someone wants to try the
Mingw port ("Git for Windows").

I wonder if there should be an "[i18n] repositoryPathEncoding"
configuration item to support this kind of repository.  Then git could
be aware of the intended encoding of paths, could recode them for
display to a terminal, and at least on Linux and Mingw could recode
them for use in filenames on disk.  "repositoryPathEncoding = none"
would mean the current behavior of treating paths as raw sequences of
bytes.

What do you think?
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-04 Thread Alexey Shumkin
On Fri, Aug 02, 2013 at 04:23:38PM -0700, Jonathan Nieder wrote:
> Git commands write commit messages in UTF-8 by default, but that
> default can be overridden by the [i18n] commitEncoding and
> logOutputEncoding settings.  With such a setting, the emails written
> by the post-receive-email hook use a mixture of encodings:
> 
>  1. Log messages use the configured log output encoding, which is
> meant to be whatever encoding works best with local terminals
> (and does not have much to do with what encoding should be used
> for email)
> 
>  2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
> port (which uses Unicode filesystem APIs), always UTF-8
I cannot say exactly if it makes sense for THIS patch, but I'd like to
remind about Cygwin port, which definitely does not use UTF-8 encoding
(in my case it is Windows-1251) for filenames.
> 
>  3. The "This is an automated email" preface uses a project description
> from .git/description, which is typically in UTF-8 to support
> gitweb.
> 
> So (1) is configurable, and (2) and (3) are unconfigurable and
> typically UTF-8.  Override the log output encoding to always use UTF-8
> when writing the email to get the best chance of a comprehensible
> single-encoding email.
I cannot agree to receive e-mails in UTF-8 only for Windows projects
which have non-UTF-8 encoding. I want to see and read correctly formed
e-mail without any corrupted symbols instead of filenames (that is the
main problem here as far as filenames are not converted unlike log
messages)
> 
> Signed-off-by: Jonathan Nieder 
> ---
>  contrib/hooks/post-receive-email | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/contrib/hooks/post-receive-email 
> b/contrib/hooks/post-receive-email
> index 72084511..ba93a0d8 100755
> --- a/contrib/hooks/post-receive-email
> +++ b/contrib/hooks/post-receive-email
> @@ -471,7 +471,7 @@ generate_delete_branch_email()
>   echo "   was  $oldrev"
>   echo ""
>   echo $LOGBEGIN
> - git diff-tree -s --always --pretty=oneline $oldrev
> + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>   echo $LOGEND
>  }
>  
> @@ -571,7 +571,7 @@ generate_delete_atag_email()
>   echo "   was  $oldrev"
>   echo ""
>   echo $LOGBEGIN
> - git diff-tree -s --always --pretty=oneline $oldrev
> + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>   echo $LOGEND
>  }
>  
> @@ -617,7 +617,7 @@ generate_general_email()
>   echo ""
>   if [ "$newrev_type" = "commit" ]; then
>   echo $LOGBEGIN
> - git diff-tree -s --always --pretty=medium $newrev
> + git diff-tree -s --always --encoding=UTF-8 --pretty=medium 
> $newrev
>   echo $LOGEND
>   else
>   # What can we do here?  The tag marks an object that is not
> @@ -636,7 +636,7 @@ generate_delete_general_email()
>   echo "   was  $oldrev"
>   echo ""
>   echo $LOGBEGIN
> - git diff-tree -s --always --pretty=oneline $oldrev
> + git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
>   echo $LOGEND
>  }
>  
> -- 
> 1.8.4.rc1
> 

-- 
Alexey Shumkin
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] hooks/post-receive-email: force log messages in UTF-8

2013-08-02 Thread Jonathan Nieder
Git commands write commit messages in UTF-8 by default, but that
default can be overridden by the [i18n] commitEncoding and
logOutputEncoding settings.  With such a setting, the emails written
by the post-receive-email hook use a mixture of encodings:

 1. Log messages use the configured log output encoding, which is
meant to be whatever encoding works best with local terminals
(and does not have much to do with what encoding should be used
for email)

 2. Filenames are left as is: on Linux, usually UTF-8, and in the Mingw
port (which uses Unicode filesystem APIs), always UTF-8

 3. The "This is an automated email" preface uses a project description
from .git/description, which is typically in UTF-8 to support
gitweb.

So (1) is configurable, and (2) and (3) are unconfigurable and
typically UTF-8.  Override the log output encoding to always use UTF-8
when writing the email to get the best chance of a comprehensible
single-encoding email.

Signed-off-by: Jonathan Nieder 
---
 contrib/hooks/post-receive-email | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email
index 72084511..ba93a0d8 100755
--- a/contrib/hooks/post-receive-email
+++ b/contrib/hooks/post-receive-email
@@ -471,7 +471,7 @@ generate_delete_branch_email()
echo "   was  $oldrev"
echo ""
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
@@ -571,7 +571,7 @@ generate_delete_atag_email()
echo "   was  $oldrev"
echo ""
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
@@ -617,7 +617,7 @@ generate_general_email()
echo ""
if [ "$newrev_type" = "commit" ]; then
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=medium $newrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=medium 
$newrev
echo $LOGEND
else
# What can we do here?  The tag marks an object that is not
@@ -636,7 +636,7 @@ generate_delete_general_email()
echo "   was  $oldrev"
echo ""
echo $LOGBEGIN
-   git diff-tree -s --always --pretty=oneline $oldrev
+   git diff-tree -s --always --encoding=UTF-8 --pretty=oneline $oldrev
echo $LOGEND
 }
 
-- 
1.8.4.rc1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html