[PATCH] remote-hg: unquote C-style paths when exporting
git-fast-import documentation says that paths can be C-style quoted. Unfortunately, the current remote-hg helper doesn't unquote quoted path and pass them as-is to Mercurial when the commit is created. This result in the following situation: - clone a mercurial repository with git - Add a file with space: `mkdir dir/foo\ bar` - Commit that new file, and push the change to mercurial - The mercurial repository as now a new directory named '"dir', which contains a file named 'foo bar"' Use python ast.literal_eval to unquote the string if it starts with ". It has been tested with quotes, spaces, and utf-8 encoded file-names. Signed-off-by: Antoine Pelisse --- contrib/remote-helpers/git-remote-hg | 3 +++ 1 file changed, 3 insertions(+) diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg index 92d994e..0141949 100755 --- a/contrib/remote-helpers/git-remote-hg +++ b/contrib/remote-helpers/git-remote-hg @@ -14,6 +14,7 @@ from mercurial import hg, ui, bookmarks, context, encoding, node, error, extensions, discovery, util +import ast import re import sys import os @@ -742,6 +743,8 @@ def parse_commit(parser): f = { 'deleted' : True } else: die('Unknown file command: %s' % line) +if path.startswith('"'): +path = ast.literal_eval(path) files[path] = f # only export the commits if we are on an internal proxy repo -- 1.8.4.1.507.g9768648.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remote-hg: unquote C-style paths when exporting
Antoine Pelisse writes: > git-fast-import documentation says that paths can be C-style quoted. > Unfortunately, the current remote-hg helper doesn't unquote quoted > path and pass them as-is to Mercurial when the commit is created. > > This result in the following situation: > > - clone a mercurial repository with git > - Add a file with space: `mkdir dir/foo\ bar` > - Commit that new file, and push the change to mercurial > - The mercurial repository as now a new directory named '"dir', which > contains a file named 'foo bar"' > > Use python ast.literal_eval to unquote the string if it starts with ". > It has been tested with quotes, spaces, and utf-8 encoded file-names. > > Signed-off-by: Antoine Pelisse > --- A path you read in fast-import input indeed needs to be unquoted when it begins with a dq, and I _think_ by using ast.literal_eval(), you probably can correctly unquote any valid C-quoted string. But it bothers me somewhat that what the patch does seems to be overly broad. Doesn't ast.literal_eval() take a lot more than just strings? ast.literal_eval(node_or_string) Safely evaluate an expression node or a Unicode or Latin-1 encoded string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None. Also doesn't Python's double-quoted string have a lot more magic than C-quoted string, e.g. $ python -i >>> import ast >>> not_cq_path = '"abc" "def"' >>> ast.literal_eval(not_cq_path) 'abcdef' > contrib/remote-helpers/git-remote-hg | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/contrib/remote-helpers/git-remote-hg > b/contrib/remote-helpers/git-remote-hg > index 92d994e..0141949 100755 > --- a/contrib/remote-helpers/git-remote-hg > +++ b/contrib/remote-helpers/git-remote-hg > @@ -14,6 +14,7 @@ > > from mercurial import hg, ui, bookmarks, context, encoding, node, error, > extensions, discovery, util > > +import ast > import re > import sys > import os > @@ -742,6 +743,8 @@ def parse_commit(parser): > f = { 'deleted' : True } > else: > die('Unknown file command: %s' % line) > +if path.startswith('"'): > +path = ast.literal_eval(path) > files[path] = f > > # only export the commits if we are on an internal proxy repo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remote-hg: unquote C-style paths when exporting
On Tue, Oct 22, 2013 at 9:13 PM, Junio C Hamano wrote: > Antoine Pelisse writes: > >> git-fast-import documentation says that paths can be C-style quoted. >> Unfortunately, the current remote-hg helper doesn't unquote quoted >> path and pass them as-is to Mercurial when the commit is created. >> >> This result in the following situation: >> >> - clone a mercurial repository with git >> - Add a file with space: `mkdir dir/foo\ bar` Note to myself, mkdir doesn't create a "file" >> - Commit that new file, and push the change to mercurial >> - The mercurial repository as now a new directory named '"dir', which >> contains a file named 'foo bar"' >> >> Use python ast.literal_eval to unquote the string if it starts with ". >> It has been tested with quotes, spaces, and utf-8 encoded file-names. >> >> Signed-off-by: Antoine Pelisse >> --- > > A path you read in fast-import input indeed needs to be unquoted > when it begins with a dq, and I _think_ by using ast.literal_eval(), > you probably can correctly unquote any valid C-quoted string. > > But it bothers me somewhat that what the patch does seems to be > overly broad. Doesn't ast.literal_eval() take a lot more than just > strings? Good point > ast.literal_eval(node_or_string) > > Safely evaluate an expression node or a Unicode or Latin-1 > encoded string containing a Python expression. The string or > node provided may only consist of the following Python literal > structures: strings, numbers, tuples, lists, dicts, booleans, > and None. Fortunately, I don't believe any of the other type can start with a dq. So currently, I don't believe we can end-up with anything else but a string. We could certainly check that this is always true though. > Also doesn't Python's double-quoted string have a lot more magic > than C-quoted string, e.g. > > $ python -i > >>> import ast > >>> not_cq_path = '"abc" "def"' > >>> ast.literal_eval(not_cq_path) > 'abcdef' It is true that I have expected "valid output" from git-fast-export. And I don't have in mind any easy solution to detect that the output is broken, yet still accepted as a valid string by python. We could obviously write a unquote_c_style() equivalent in python if needed. Thanks, -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remote-hg: unquote C-style paths when exporting
On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse wrote: > It is true that I have expected "valid output" from git-fast-export. > And I don't have in mind any easy solution to detect that the output > is broken, yet still accepted as a valid string by python. We could > obviously write a unquote_c_style() equivalent in python if needed. Something like this? def c_style_unescape(string): if string[0] == string[-1] == '"': return string.decode('string-escape')[1:-1] return string It's in git-remote-bzr.py. -- Felipe Contreras -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] remote-hg: unquote C-style paths when exporting
On Wed, Oct 23, 2013 at 2:45 AM, Felipe Contreras wrote: > On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse wrote: > >> It is true that I have expected "valid output" from git-fast-export. >> And I don't have in mind any easy solution to detect that the output >> is broken, yet still accepted as a valid string by python. We could >> obviously write a unquote_c_style() equivalent in python if needed. > > Something like this? > > def c_style_unescape(string): > if string[0] == string[-1] == '"': > return string.decode('string-escape')[1:-1] > return string > > It's in git-remote-bzr.py. Yeah, that's certainly better, Thanks, -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re* [PATCH] remote-hg: unquote C-style paths when exporting
Antoine Pelisse writes: >> def c_style_unescape(string): >> if string[0] == string[-1] == '"': >> return string.decode('string-escape')[1:-1] >> return string >> >> It's in git-remote-bzr.py. > > Yeah, that's certainly better, > > Thanks, OK, so an amended one will look like this? -- >8 -- From: Antoine Pelisse Subject: remote-hg: unquote C-style paths when exporting git-fast-import documentation says that paths can be C-style quoted. Unfortunately, the current remote-hg helper doesn't unquote quoted path and pass them as-is to Mercurial when the commit is created. This result in the following situation: - clone a mercurial repository with git - Add a file with space: `mkdir dir/foo\ bar` - Commit that new file, and push the change to mercurial - The mercurial repository as now a new directory named '"dir', which contains a file named 'foo bar"' Use python ast.literal_eval to unquote the string if it starts with ". It has been tested with quotes, spaces, and utf-8 encoded file-names. Signed-off-by: Antoine Pelisse Signed-off-by: Junio C Hamano --- contrib/remote-helpers/git-remote-hg | 6 ++ 1 file changed, 6 insertions(+) diff --git a/contrib/remote-helpers/git-remote-hg b/contrib/remote-helpers/git-remote-hg index 0194c67..85abbed 100755 --- a/contrib/remote-helpers/git-remote-hg +++ b/contrib/remote-helpers/git-remote-hg @@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files): f = { 'ctx' : repo[p1][e] } files[e] = f +def c_style_unescape(string): +if string[0] == string[-1] == '"': +return string.decode('string-escape')[1:-1] +return string + def parse_commit(parser): global marks, blob_marks, parsed_refs global mode @@ -720,6 +725,7 @@ def parse_commit(parser): f = { 'deleted' : True } else: die('Unknown file command: %s' % line) +path = c_style_unescape(path).decode('utf-8') files[path] = f # only export the commits if we are on an internal proxy repo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re* [PATCH] remote-hg: unquote C-style paths when exporting
On Wed, Oct 23, 2013 at 5:44 PM, Junio C Hamano wrote: > Antoine Pelisse writes: > >>> def c_style_unescape(string): >>> if string[0] == string[-1] == '"': >>> return string.decode('string-escape')[1:-1] >>> return string >>> >>> It's in git-remote-bzr.py. >> >> Yeah, that's certainly better, >> >> Thanks, > > OK, so an amended one will look like this? The commit message needs to be updated as well. > -- >8 -- > From: Antoine Pelisse > Subject: remote-hg: unquote C-style paths when exporting > > git-fast-import documentation says that paths can be C-style quoted. > Unfortunately, the current remote-hg helper doesn't unquote quoted > path and pass them as-is to Mercurial when the commit is created. > > This result in the following situation: s/result/&s/ > - clone a mercurial repository with git > - Add a file with space: `mkdir dir/foo\ bar` - Add a file with space in a directory: `>dir/foo\ bar` > - Commit that new file, and push the change to mercurial > - The mercurial repository as now a new directory named '"dir', which > contains a file named 'foo bar"' I'm so ashamed I'd rather not report this one: s/as/has/ > Use python ast.literal_eval to unquote the string if it starts with ". Use python str.decode('string-escape') to unquote the string if it starts and ends with ". > It has been tested with quotes, spaces, and utf-8 encoded file-names. > > Signed-off-by: Antoine Pelisse > Signed-off-by: Junio C Hamano > --- > contrib/remote-helpers/git-remote-hg | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/contrib/remote-helpers/git-remote-hg > b/contrib/remote-helpers/git-remote-hg > index 0194c67..85abbed 100755 > --- a/contrib/remote-helpers/git-remote-hg > +++ b/contrib/remote-helpers/git-remote-hg > @@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files): > f = { 'ctx' : repo[p1][e] } > files[e] = f > > +def c_style_unescape(string): > +if string[0] == string[-1] == '"': > +return string.decode('string-escape')[1:-1] > +return string > + > def parse_commit(parser): > global marks, blob_marks, parsed_refs > global mode > @@ -720,6 +725,7 @@ def parse_commit(parser): > f = { 'deleted' : True } > else: > die('Unknown file command: %s' % line) > +path = c_style_unescape(path).decode('utf-8') > files[path] = f > > # only export the commits if we are on an internal proxy repo That is consistent with git-remote-bzr, Thanks -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html