Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-07 Thread Ian Lance Taylor via Gcc-patches
On Wed, Jan 6, 2021 at 5:37 AM Martin Liška  wrote:
>
> On 1/6/21 8:25 AM, Martin Liška wrote:
> > Anyway, I've got a workaround that I'm going to push.
>
> It's fixed now.
>
> @Ian: Can you please try to push the changes now?

It worked.

Thanks.

Ian
b87ec922c4090fcacf802c73b6bfd59a8632f8a5
diff --git 
"a/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go" 
"b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go"
new file mode 100644
index 000..8b6a814c3c4
--- /dev/null
+++ "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go"
@@ -0,0 +1,13 @@
+package Äfoo
+
+var ÄbarV int = 101
+
+func Äbar(x int) int {
+   defer func() { ÄbarV += 3 }()
+   return Äblix(x)
+}
+
+func Äblix(x int) int {
+   defer func() { ÄbarV += 9 }()
+   return ÄbarV + x
+}
diff --git 
"a/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go" 
"b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go"
new file mode 100644
index 000..25d2c71fc00
--- /dev/null
+++ "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go"
@@ -0,0 +1,13 @@
+package main
+
+import (
+   "fmt"
+
+   "./Äfoo"
+   Äblix "./Äfoo"
+)
+
+func main() {
+   fmt.Printf("Äfoo.Äbar(33) returns %v\n", Äfoo.Äbar(33))
+   fmt.Printf("Äblix.Äbar(33) returns %v\n", Äblix.Äbar(33))
+}
diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue27836.go 
b/gcc/testsuite/go.test/test/fixedbugs/issue27836.go
new file mode 100644
index 000..128cf9d06ad
--- /dev/null
+++ b/gcc/testsuite/go.test/test/fixedbugs/issue27836.go
@@ -0,0 +1,7 @@
+// compiledir
+
+// Copyright 2018 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+package ignored


Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-06 Thread Martin Liška

On 1/6/21 8:25 AM, Martin Liška wrote:

Anyway, I've got a workaround that I'm going to push.


It's fixed now.

@Ian: Can you please try to push the changes now?

Thanks,
Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-05 Thread Martin Liška

On 1/4/21 12:47 PM, Martin Liška wrote:

On 1/4/21 12:01 PM, Martin Liška wrote:

Anyway, I'm going to update server hook first and I'll create an issue for 
GitPython.


So I was not correct about this. Also the server hooks uses now GitPython
to identify modified files.

I've just created an issue for that:
https://github.com/gitpython-developers/GitPython/issues/1099


This one got fixed and it's present in the newly done release v3.1.12.

Anyway, I've got a workaround that I'm going to push.

Martin



Martin


>From ed9ffe47d6964dc92c92cfddbb8aac555c28e085 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 6 Jan 2021 08:11:57 +0100
Subject: [PATCH] gcc-changelog: workaround for utf8 filenames

contrib/ChangeLog:

	* gcc-changelog/git_commit.py: Add decode_path function.
	* gcc-changelog/git_email.py: Use it in order to solve
	utf8 encoding filename issues.
	* gcc-changelog/git_repository.py: Likewise.
	* gcc-changelog/test_email.py: Test it.
---
 contrib/gcc-changelog/git_commit.py | 26 +
 contrib/gcc-changelog/git_email.py  |  6 +++---
 contrib/gcc-changelog/git_repository.py |  6 +++---
 contrib/gcc-changelog/test_email.py |  3 ++-
 4 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py
index d2e5dbe294a..ee1973371be 100755
--- a/contrib/gcc-changelog/git_commit.py
+++ b/contrib/gcc-changelog/git_commit.py
@@ -174,6 +174,24 @@ REVIEW_PREFIXES = ('reviewed-by: ', 'reviewed-on: ', 'signed-off-by: ',
 DATE_FORMAT = '%Y-%m-%d'
 
 
+def decode_path(path):
+# When core.quotepath is true (default value), utf8 chars are encoded like:
+# "b/ko\304\215ka.txt"
+#
+# The upstream bug is fixed:
+# https://github.com/gitpython-developers/GitPython/issues/1099
+#
+# but we still need a workaround for older versions of the library.
+# Please take a look at the explanation of the transformation:
+# https://stackoverflow.com/questions/990169/how-do-convert-unicode-escape-sequences-to-unicode-characters-in-a-python-string
+
+if path.startswith('"') and path.endswith('"'):
+return (path.strip('"').encode('utf8').decode('unicode-escape')
+.encode('latin-1').decode('utf8'))
+else:
+return path
+
+
 class Error:
 def __init__(self, message, line=None):
 self.message = message
@@ -303,14 +321,6 @@ class GitCommit:
  'separately from normal commits'))
 return
 
-# check for an encoded utf-8 filename
-hint = 'git config --global core.quotepath false'
-for modified, _ in self.info.modified_files:
-if modified.startswith('"') or modified.endswith('"'):
-self.errors.append(Error('Quoted UTF8 filename, please set: '
- f'"{hint}"', modified))
-return
-
 all_are_ignored = (len(project_files) + len(ignored_files)
== len(self.info.modified_files))
 self.parse_lines(all_are_ignored)
diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py
index 5b53ca4a6a9..00ad00458f4 100755
--- a/contrib/gcc-changelog/git_email.py
+++ b/contrib/gcc-changelog/git_email.py
@@ -22,7 +22,7 @@ from itertools import takewhile
 
 from dateutil.parser import parse
 
-from git_commit import GitCommit, GitInfo
+from git_commit import GitCommit, GitInfo, decode_path
 
 from unidiff import PatchSet, PatchedFile
 
@@ -52,8 +52,8 @@ class GitEmail(GitCommit):
 modified_files = []
 for f in diff:
 # Strip "a/" and "b/" prefixes
-source = f.source_file[2:]
-target = f.target_file[2:]
+source = decode_path(f.source_file)[2:]
+target = decode_path(f.target_file)[2:]
 
 if f.is_added_file:
 t = 'A'
diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py
index 8edcff91ad6..a0e293d756d 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -26,7 +26,7 @@ except ImportError:
 print('  Debian, Ubuntu: python3-git')
 exit(1)
 
-from git_commit import GitCommit, GitInfo
+from git_commit import GitCommit, GitInfo, decode_path
 
 
 def parse_git_revisions(repo_path, revisions, strict=True):
@@ -51,11 +51,11 @@ def parse_git_revisions(repo_path, revisions, strict=True):
 # Consider that renamed files are two operations:
 # the deletion of the original name
 # and the addition of the new one.
-modified_files.append((file.a_path, 'D'))
+modified_files.append((decode_path(file.a_path), 'D'))
 t = 'A'
 else:
 t = 'M'
-modified_files.append((file.b_path, t))
+

Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-04 Thread Martin Liška

On 1/4/21 12:01 PM, Martin Liška wrote:

Anyway, I'm going to update server hook first and I'll create an issue for 
GitPython.


So I was not correct about this. Also the server hooks uses now GitPython
to identify modified files.

I've just created an issue for that:
https://github.com/gitpython-developers/GitPython/issues/1099

Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2021-01-04 Thread Martin Liška

On 12/24/20 1:16 PM, Joel Brobecker wrote:

I have no idea who that is (if it is a single user at all,
if it isn't any user with git write permissions).


CCing Joel, he should help us how to set a git config
that will be used by the server hooks.


I am not sure that requiring both the server and the user to agree
on a non-default configuration value would be a practical idea.


I agree with that but I was unable to find a way how to "decode"
the filenames:



 From what I understand of the problem, I think the proper fix
is really to adapt the git-changelog script to avoid the need
for any assumption about the user's configuration. In particular,
how does the script get the list of files?


On server we use:
git diff HEAD~ --name-status

which works really fine with -z option:
Mcontrib/gcc-changelog/git_repository.pyAšpatně.txt

without it, the patch is quoted as well:

git diff HEAD~ --name-status | cat
M   contrib/gcc-changelog/git_repository.py
A   "\305\241patn\304\233.txt"


Poking around, it looks like
you guys are using the GitPython module, which I'm not familiar with,
unfortunately.  But as a reference point, the git-hooks simply use
the -z option to get the information in raw format, and thus avoids
the problem of filename quoting entirely. Does GitPython support
something similar? For instance, browing the GitPython documentation,
I found attributes a_raw_path and b_raw_path. Could that be the
solution (instead of using a_path and b_path?


Thanks for looking into it. Unfortunately, for a file called "špatně.txt"
I get for a_rawpath and b_rawpath:
b'"\\305\\241patn\\304\\233.txt"' b'"\\305\\241patn\\304\\233.txt"'



Either way, the solution will be independent of the git-hooks,
as I don't think they are actually involved, here.



Anyway, I'm going to update server hook first and I'll create an issue for 
GitPython.

Thanks for help,
Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-24 Thread Joel Brobecker
> > I have no idea who that is (if it is a single user at all,
> > if it isn't any user with git write permissions).
> 
> CCing Joel, he should help us how to set a git config
> that will be used by the server hooks.

I am not sure that requiring both the server and the user to agree
on a non-default configuration value would be a practical idea.

>From what I understand of the problem, I think the proper fix
is really to adapt the git-changelog script to avoid the need
for any assumption about the user's configuration. In particular,
how does the script get the list of files? Poking around, it looks like
you guys are using the GitPython module, which I'm not familiar with,
unfortunately.  But as a reference point, the git-hooks simply use
the -z option to get the information in raw format, and thus avoids
the problem of filename quoting entirely. Does GitPython support
something similar? For instance, browing the GitPython documentation,
I found attributes a_raw_path and b_raw_path. Could that be the
solution (instead of using a_path and b_path?

Either way, the solution will be independent of the git-hooks,
as I don't think they are actually involved, here.

-- 
Joel


Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-21 Thread Martin Liška

On 12/21/20 10:48 AM, Jakub Jelinek wrote:

I have no idea who that is (if it is a single user at all,
if it isn't any user with git write permissions).


CCing Joel, he should help us how to set a git config
that will be used by the server hooks.

Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-21 Thread Jakub Jelinek via Gcc-patches
On Mon, Dec 21, 2020 at 10:39:31AM +0100, Martin Liška wrote:
> On 12/18/20 7:30 PM, Ian Lance Taylor wrote:
> > I don't know the tradeoffs here.  This approach sounds fine to me.
> 
> Trade off is that we need to setup server (that's fine).
> And people have to locally do the same, otherwise they'll newly see:
> 
> $ git gcc-verify  -p
> Checking 6c439cadf0362cc0f8f2b894c1b596bbf822849b: FAILED
> ERR: Quoted UTF8 filename, please set: "git config --global core.quotepath 
> false":""kon\303\255\304\215ek.txt""
> 
> @Jakub: Can you please update the server hook (git_commit.py) file and set:
> 
> git config --global core.quotepath false
> 
> for the user that runs the server hooks?

I have no idea who that is (if it is a single user at all,
if it isn't any user with git write permissions).

Jakub



Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-21 Thread Martin Liška

On 12/18/20 7:30 PM, Ian Lance Taylor wrote:

I don't know the tradeoffs here.  This approach sounds fine to me.


Trade off is that we need to setup server (that's fine).
And people have to locally do the same, otherwise they'll newly see:

$ git gcc-verify  -p
Checking 6c439cadf0362cc0f8f2b894c1b596bbf822849b: FAILED
ERR: Quoted UTF8 filename, please set: "git config --global core.quotepath 
false":""kon\303\255\304\215ek.txt""

@Jakub: Can you please update the server hook (git_commit.py) file and set:

git config --global core.quotepath false

for the user that runs the server hooks?

Thanks,
Martin


Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-18 Thread Ian Lance Taylor via Gcc-patches
On Fri, Dec 18, 2020 at 2:28 AM Martin Liška  wrote:
>
> On 12/17/20 5:26 AM, Ian Lance Taylor via Gcc-patches wrote:
> > As discussed at
> > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html ,
> > the ChangeLog checker does not correctly handle files with non-ASCII
> > file names.
> >
> > This patch fixes the problem.  I have little experience with Python,
> > so I may have made some foolish mistakes here.
> >
> > OK to commit?
> >
> > Thanks.
> >
> > Ian
> >
> > * gcc-changelog/git_repository.py: Ignore quotation marks added by git
> > for non-ASCII file names.
> >
>
> First, sorry for a slow response about the previous
> thread (Change to gcc/testsuite/go.test/test rejected by ChangeLog checker).
>
> Well, the suggested change will not help us because we will not be able
> to find a file with a given path (\xxx\yyy...).
>
> Proper solution is likely doing:
> $ git config --global core.quotepath false
>
> both on server side (and client side).
>
> Having that, git properly displays non-ascii filenames:


> Thoughts?


I don't know the tradeoffs here.  This approach sounds fine to me.

Ian


Re: Patch RFA: Support non-ASCII file names in git-changelog

2020-12-18 Thread Martin Liška

On 12/17/20 5:26 AM, Ian Lance Taylor via Gcc-patches wrote:

As discussed at
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html ,
the ChangeLog checker does not correctly handle files with non-ASCII
file names.

This patch fixes the problem.  I have little experience with Python,
so I may have made some foolish mistakes here.

OK to commit?

Thanks.

Ian

* gcc-changelog/git_repository.py: Ignore quotation marks added by git
for non-ASCII file names.



First, sorry for a slow response about the previous
thread (Change to gcc/testsuite/go.test/test rejected by ChangeLog checker).

Well, the suggested change will not help us because we will not be able
to find a file with a given path (\xxx\yyy...).

Proper solution is likely doing:
$ git config --global core.quotepath false

both on server side (and client side).

Having that, git properly displays non-ascii filenames:

commit 1814a090a816884892240752c927f7dbb50a10da
Author: Martin Liska 
Date:   Fri Dec 18 11:00:52 2020 +0100

Add new file.

ChangeLog:

* špatně.txt: New file.


diff --git a/špatně.txt b/špatně.txt
new file mode 100644
index 000..e69de29bb2d

Thoughts?
Martin


Patch RFA: Support non-ASCII file names in git-changelog

2020-12-16 Thread Ian Lance Taylor via Gcc-patches
As discussed at
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html ,
the ChangeLog checker does not correctly handle files with non-ASCII
file names.

This patch fixes the problem.  I have little experience with Python,
so I may have made some foolish mistakes here.

OK to commit?

Thanks.

Ian

* gcc-changelog/git_repository.py: Ignore quotation marks added by git
for non-ASCII file names.
diff --git a/contrib/gcc-changelog/git_repository.py 
b/contrib/gcc-changelog/git_repository.py
index 8edcff91ad6..86b470b0881 100755
--- a/contrib/gcc-changelog/git_repository.py
+++ b/contrib/gcc-changelog/git_repository.py
@@ -55,7 +55,10 @@ def parse_git_revisions(repo_path, revisions, strict=True):
 t = 'A'
 else:
 t = 'M'
-modified_files.append((file.b_path, t))
+path = file.b_path
+if path.startswith('"') and path.endswith('"'):
+path = path[1:len(path)-1]
+modified_files.append((path, t))
 
 date = datetime.utcfromtimestamp(c.committed_date)
 author = '%s  <%s>' % (c.author.name, c.author.email)