Re: Patch RFA: Support non-ASCII file names in git-changelog
On Wed, Jan 6, 2021 at 5:37 AM Martin Liška wrote: > > On 1/6/21 8:25 AM, Martin Liška wrote: > > Anyway, I've got a workaround that I'm going to push. > > It's fixed now. > > @Ian: Can you please try to push the changes now? It worked. Thanks. Ian b87ec922c4090fcacf802c73b6bfd59a8632f8a5 diff --git "a/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go" "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go" new file mode 100644 index 000..8b6a814c3c4 --- /dev/null +++ "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204foo.go" @@ -0,0 +1,13 @@ +package Äfoo + +var ÄbarV int = 101 + +func Äbar(x int) int { + defer func() { ÄbarV += 3 }() + return Äblix(x) +} + +func Äblix(x int) int { + defer func() { ÄbarV += 9 }() + return ÄbarV + x +} diff --git "a/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go" "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go" new file mode 100644 index 000..25d2c71fc00 --- /dev/null +++ "b/gcc/testsuite/go.test/test/fixedbugs/issue27836.dir/\303\204main.go" @@ -0,0 +1,13 @@ +package main + +import ( + "fmt" + + "./Äfoo" + Äblix "./Äfoo" +) + +func main() { + fmt.Printf("Äfoo.Äbar(33) returns %v\n", Äfoo.Äbar(33)) + fmt.Printf("Äblix.Äbar(33) returns %v\n", Äblix.Äbar(33)) +} diff --git a/gcc/testsuite/go.test/test/fixedbugs/issue27836.go b/gcc/testsuite/go.test/test/fixedbugs/issue27836.go new file mode 100644 index 000..128cf9d06ad --- /dev/null +++ b/gcc/testsuite/go.test/test/fixedbugs/issue27836.go @@ -0,0 +1,7 @@ +// compiledir + +// Copyright 2018 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +package ignored
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 1/6/21 8:25 AM, Martin Liška wrote: Anyway, I've got a workaround that I'm going to push. It's fixed now. @Ian: Can you please try to push the changes now? Thanks, Martin
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 1/4/21 12:47 PM, Martin Liška wrote: On 1/4/21 12:01 PM, Martin Liška wrote: Anyway, I'm going to update server hook first and I'll create an issue for GitPython. So I was not correct about this. Also the server hooks uses now GitPython to identify modified files. I've just created an issue for that: https://github.com/gitpython-developers/GitPython/issues/1099 This one got fixed and it's present in the newly done release v3.1.12. Anyway, I've got a workaround that I'm going to push. Martin Martin >From ed9ffe47d6964dc92c92cfddbb8aac555c28e085 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Wed, 6 Jan 2021 08:11:57 +0100 Subject: [PATCH] gcc-changelog: workaround for utf8 filenames contrib/ChangeLog: * gcc-changelog/git_commit.py: Add decode_path function. * gcc-changelog/git_email.py: Use it in order to solve utf8 encoding filename issues. * gcc-changelog/git_repository.py: Likewise. * gcc-changelog/test_email.py: Test it. --- contrib/gcc-changelog/git_commit.py | 26 + contrib/gcc-changelog/git_email.py | 6 +++--- contrib/gcc-changelog/git_repository.py | 6 +++--- contrib/gcc-changelog/test_email.py | 3 ++- 4 files changed, 26 insertions(+), 15 deletions(-) diff --git a/contrib/gcc-changelog/git_commit.py b/contrib/gcc-changelog/git_commit.py index d2e5dbe294a..ee1973371be 100755 --- a/contrib/gcc-changelog/git_commit.py +++ b/contrib/gcc-changelog/git_commit.py @@ -174,6 +174,24 @@ REVIEW_PREFIXES = ('reviewed-by: ', 'reviewed-on: ', 'signed-off-by: ', DATE_FORMAT = '%Y-%m-%d' +def decode_path(path): +# When core.quotepath is true (default value), utf8 chars are encoded like: +# "b/ko\304\215ka.txt" +# +# The upstream bug is fixed: +# https://github.com/gitpython-developers/GitPython/issues/1099 +# +# but we still need a workaround for older versions of the library. +# Please take a look at the explanation of the transformation: +# https://stackoverflow.com/questions/990169/how-do-convert-unicode-escape-sequences-to-unicode-characters-in-a-python-string + +if path.startswith('"') and path.endswith('"'): +return (path.strip('"').encode('utf8').decode('unicode-escape') +.encode('latin-1').decode('utf8')) +else: +return path + + class Error: def __init__(self, message, line=None): self.message = message @@ -303,14 +321,6 @@ class GitCommit: 'separately from normal commits')) return -# check for an encoded utf-8 filename -hint = 'git config --global core.quotepath false' -for modified, _ in self.info.modified_files: -if modified.startswith('"') or modified.endswith('"'): -self.errors.append(Error('Quoted UTF8 filename, please set: ' - f'"{hint}"', modified)) -return - all_are_ignored = (len(project_files) + len(ignored_files) == len(self.info.modified_files)) self.parse_lines(all_are_ignored) diff --git a/contrib/gcc-changelog/git_email.py b/contrib/gcc-changelog/git_email.py index 5b53ca4a6a9..00ad00458f4 100755 --- a/contrib/gcc-changelog/git_email.py +++ b/contrib/gcc-changelog/git_email.py @@ -22,7 +22,7 @@ from itertools import takewhile from dateutil.parser import parse -from git_commit import GitCommit, GitInfo +from git_commit import GitCommit, GitInfo, decode_path from unidiff import PatchSet, PatchedFile @@ -52,8 +52,8 @@ class GitEmail(GitCommit): modified_files = [] for f in diff: # Strip "a/" and "b/" prefixes -source = f.source_file[2:] -target = f.target_file[2:] +source = decode_path(f.source_file)[2:] +target = decode_path(f.target_file)[2:] if f.is_added_file: t = 'A' diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py index 8edcff91ad6..a0e293d756d 100755 --- a/contrib/gcc-changelog/git_repository.py +++ b/contrib/gcc-changelog/git_repository.py @@ -26,7 +26,7 @@ except ImportError: print(' Debian, Ubuntu: python3-git') exit(1) -from git_commit import GitCommit, GitInfo +from git_commit import GitCommit, GitInfo, decode_path def parse_git_revisions(repo_path, revisions, strict=True): @@ -51,11 +51,11 @@ def parse_git_revisions(repo_path, revisions, strict=True): # Consider that renamed files are two operations: # the deletion of the original name # and the addition of the new one. -modified_files.append((file.a_path, 'D')) +modified_files.append((decode_path(file.a_path), 'D')) t = 'A' else: t = 'M' -modified_files.append((file.b_path, t)) +modified_
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 1/4/21 12:01 PM, Martin Liška wrote: Anyway, I'm going to update server hook first and I'll create an issue for GitPython. So I was not correct about this. Also the server hooks uses now GitPython to identify modified files. I've just created an issue for that: https://github.com/gitpython-developers/GitPython/issues/1099 Martin
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 12/24/20 1:16 PM, Joel Brobecker wrote: I have no idea who that is (if it is a single user at all, if it isn't any user with git write permissions). CCing Joel, he should help us how to set a git config that will be used by the server hooks. I am not sure that requiring both the server and the user to agree on a non-default configuration value would be a practical idea. I agree with that but I was unable to find a way how to "decode" the filenames: From what I understand of the problem, I think the proper fix is really to adapt the git-changelog script to avoid the need for any assumption about the user's configuration. In particular, how does the script get the list of files? On server we use: git diff HEAD~ --name-status which works really fine with -z option: Mcontrib/gcc-changelog/git_repository.pyAšpatně.txt without it, the patch is quoted as well: git diff HEAD~ --name-status | cat M contrib/gcc-changelog/git_repository.py A "\305\241patn\304\233.txt" Poking around, it looks like you guys are using the GitPython module, which I'm not familiar with, unfortunately. But as a reference point, the git-hooks simply use the -z option to get the information in raw format, and thus avoids the problem of filename quoting entirely. Does GitPython support something similar? For instance, browing the GitPython documentation, I found attributes a_raw_path and b_raw_path. Could that be the solution (instead of using a_path and b_path? Thanks for looking into it. Unfortunately, for a file called "špatně.txt" I get for a_rawpath and b_rawpath: b'"\\305\\241patn\\304\\233.txt"' b'"\\305\\241patn\\304\\233.txt"' Either way, the solution will be independent of the git-hooks, as I don't think they are actually involved, here. Anyway, I'm going to update server hook first and I'll create an issue for GitPython. Thanks for help, Martin
Re: Patch RFA: Support non-ASCII file names in git-changelog
> > I have no idea who that is (if it is a single user at all, > > if it isn't any user with git write permissions). > > CCing Joel, he should help us how to set a git config > that will be used by the server hooks. I am not sure that requiring both the server and the user to agree on a non-default configuration value would be a practical idea. >From what I understand of the problem, I think the proper fix is really to adapt the git-changelog script to avoid the need for any assumption about the user's configuration. In particular, how does the script get the list of files? Poking around, it looks like you guys are using the GitPython module, which I'm not familiar with, unfortunately. But as a reference point, the git-hooks simply use the -z option to get the information in raw format, and thus avoids the problem of filename quoting entirely. Does GitPython support something similar? For instance, browing the GitPython documentation, I found attributes a_raw_path and b_raw_path. Could that be the solution (instead of using a_path and b_path? Either way, the solution will be independent of the git-hooks, as I don't think they are actually involved, here. -- Joel
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 12/21/20 10:48 AM, Jakub Jelinek wrote: I have no idea who that is (if it is a single user at all, if it isn't any user with git write permissions). CCing Joel, he should help us how to set a git config that will be used by the server hooks. Martin
Re: Patch RFA: Support non-ASCII file names in git-changelog
On Mon, Dec 21, 2020 at 10:39:31AM +0100, Martin Liška wrote: > On 12/18/20 7:30 PM, Ian Lance Taylor wrote: > > I don't know the tradeoffs here. This approach sounds fine to me. > > Trade off is that we need to setup server (that's fine). > And people have to locally do the same, otherwise they'll newly see: > > $ git gcc-verify -p > Checking 6c439cadf0362cc0f8f2b894c1b596bbf822849b: FAILED > ERR: Quoted UTF8 filename, please set: "git config --global core.quotepath > false":""kon\303\255\304\215ek.txt"" > > @Jakub: Can you please update the server hook (git_commit.py) file and set: > > git config --global core.quotepath false > > for the user that runs the server hooks? I have no idea who that is (if it is a single user at all, if it isn't any user with git write permissions). Jakub
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 12/18/20 7:30 PM, Ian Lance Taylor wrote: I don't know the tradeoffs here. This approach sounds fine to me. Trade off is that we need to setup server (that's fine). And people have to locally do the same, otherwise they'll newly see: $ git gcc-verify -p Checking 6c439cadf0362cc0f8f2b894c1b596bbf822849b: FAILED ERR: Quoted UTF8 filename, please set: "git config --global core.quotepath false":""kon\303\255\304\215ek.txt"" @Jakub: Can you please update the server hook (git_commit.py) file and set: git config --global core.quotepath false for the user that runs the server hooks? Thanks, Martin
Re: Patch RFA: Support non-ASCII file names in git-changelog
On Fri, Dec 18, 2020 at 2:28 AM Martin Liška wrote: > > On 12/17/20 5:26 AM, Ian Lance Taylor via Gcc-patches wrote: > > As discussed at > > https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html , > > the ChangeLog checker does not correctly handle files with non-ASCII > > file names. > > > > This patch fixes the problem. I have little experience with Python, > > so I may have made some foolish mistakes here. > > > > OK to commit? > > > > Thanks. > > > > Ian > > > > * gcc-changelog/git_repository.py: Ignore quotation marks added by git > > for non-ASCII file names. > > > > First, sorry for a slow response about the previous > thread (Change to gcc/testsuite/go.test/test rejected by ChangeLog checker). > > Well, the suggested change will not help us because we will not be able > to find a file with a given path (\xxx\yyy...). > > Proper solution is likely doing: > $ git config --global core.quotepath false > > both on server side (and client side). > > Having that, git properly displays non-ascii filenames: > Thoughts? I don't know the tradeoffs here. This approach sounds fine to me. Ian
Re: Patch RFA: Support non-ASCII file names in git-changelog
On 12/17/20 5:26 AM, Ian Lance Taylor via Gcc-patches wrote: As discussed at https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html , the ChangeLog checker does not correctly handle files with non-ASCII file names. This patch fixes the problem. I have little experience with Python, so I may have made some foolish mistakes here. OK to commit? Thanks. Ian * gcc-changelog/git_repository.py: Ignore quotation marks added by git for non-ASCII file names. First, sorry for a slow response about the previous thread (Change to gcc/testsuite/go.test/test rejected by ChangeLog checker). Well, the suggested change will not help us because we will not be able to find a file with a given path (\xxx\yyy...). Proper solution is likely doing: $ git config --global core.quotepath false both on server side (and client side). Having that, git properly displays non-ascii filenames: commit 1814a090a816884892240752c927f7dbb50a10da Author: Martin Liska Date: Fri Dec 18 11:00:52 2020 +0100 Add new file. ChangeLog: * špatně.txt: New file. diff --git a/špatně.txt b/špatně.txt new file mode 100644 index 000..e69de29bb2d Thoughts? Martin
Patch RFA: Support non-ASCII file names in git-changelog
As discussed at https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561995.html , the ChangeLog checker does not correctly handle files with non-ASCII file names. This patch fixes the problem. I have little experience with Python, so I may have made some foolish mistakes here. OK to commit? Thanks. Ian * gcc-changelog/git_repository.py: Ignore quotation marks added by git for non-ASCII file names. diff --git a/contrib/gcc-changelog/git_repository.py b/contrib/gcc-changelog/git_repository.py index 8edcff91ad6..86b470b0881 100755 --- a/contrib/gcc-changelog/git_repository.py +++ b/contrib/gcc-changelog/git_repository.py @@ -55,7 +55,10 @@ def parse_git_revisions(repo_path, revisions, strict=True): t = 'A' else: t = 'M' -modified_files.append((file.b_path, t)) +path = file.b_path +if path.startswith('"') and path.endswith('"'): +path = path[1:len(path)-1] +modified_files.append((path, t)) date = datetime.utcfromtimestamp(c.committed_date) author = '%s <%s>' % (c.author.name, c.author.email)