[PATCH] remote-hg: unquote C-style paths when exporting

2013-10-18 Thread Antoine Pelisse
git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.

This result in the following situation:

- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'

Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.

Signed-off-by: Antoine Pelisse 
---
 contrib/remote-helpers/git-remote-hg | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/remote-helpers/git-remote-hg 
b/contrib/remote-helpers/git-remote-hg
index 92d994e..0141949 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -14,6 +14,7 @@
 
 from mercurial import hg, ui, bookmarks, context, encoding, node, error, 
extensions, discovery, util
 
+import ast
 import re
 import sys
 import os
@@ -742,6 +743,8 @@ def parse_commit(parser):
 f = { 'deleted' : True }
 else:
 die('Unknown file command: %s' % line)
+if path.startswith('"'):
+path = ast.literal_eval(path)
 files[path] = f
 
 # only export the commits if we are on an internal proxy repo
-- 
1.8.4.1.507.g9768648.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-22 Thread Junio C Hamano
Antoine Pelisse  writes:

> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:
>
> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`
> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'
>
> Use python ast.literal_eval to unquote the string if it starts with ".
> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse 
> ---

A path you read in fast-import input indeed needs to be unquoted
when it begins with a dq, and I _think_ by using ast.literal_eval(),
you probably can correctly unquote any valid C-quoted string.

But it bothers me somewhat that what the patch does seems to be
overly broad.  Doesn't ast.literal_eval() take a lot more than just
strings?

ast.literal_eval(node_or_string)

Safely evaluate an expression node or a Unicode or Latin-1
encoded string containing a Python expression. The string or
node provided may only consist of the following Python literal
structures: strings, numbers, tuples, lists, dicts, booleans,
and None.

Also doesn't Python's double-quoted string have a lot more magic
than C-quoted string, e.g.

$ python -i
>>> import ast
>>> not_cq_path = '"abc" "def"'
>>> ast.literal_eval(not_cq_path)
'abcdef'

>  contrib/remote-helpers/git-remote-hg | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg 
> b/contrib/remote-helpers/git-remote-hg
> index 92d994e..0141949 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -14,6 +14,7 @@
>  
>  from mercurial import hg, ui, bookmarks, context, encoding, node, error, 
> extensions, discovery, util
>  
> +import ast
>  import re
>  import sys
>  import os
> @@ -742,6 +743,8 @@ def parse_commit(parser):
>  f = { 'deleted' : True }
>  else:
>  die('Unknown file command: %s' % line)
> +if path.startswith('"'):
> +path = ast.literal_eval(path)
>  files[path] = f
>  
>  # only export the commits if we are on an internal proxy repo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-22 Thread Antoine Pelisse
On Tue, Oct 22, 2013 at 9:13 PM, Junio C Hamano  wrote:
> Antoine Pelisse  writes:
>
>> git-fast-import documentation says that paths can be C-style quoted.
>> Unfortunately, the current remote-hg helper doesn't unquote quoted
>> path and pass them as-is to Mercurial when the commit is created.
>>
>> This result in the following situation:
>>
>> - clone a mercurial repository with git
>> - Add a file with space: `mkdir dir/foo\ bar`

Note to myself, mkdir doesn't create a "file"

>> - Commit that new file, and push the change to mercurial
>> - The mercurial repository as now a new directory named '"dir', which
>> contains a file named 'foo bar"'
>>
>> Use python ast.literal_eval to unquote the string if it starts with ".
>> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>>
>> Signed-off-by: Antoine Pelisse 
>> ---
>
> A path you read in fast-import input indeed needs to be unquoted
> when it begins with a dq, and I _think_ by using ast.literal_eval(),
> you probably can correctly unquote any valid C-quoted string.
>
> But it bothers me somewhat that what the patch does seems to be
> overly broad.  Doesn't ast.literal_eval() take a lot more than just
> strings?

Good point

> ast.literal_eval(node_or_string)
>
> Safely evaluate an expression node or a Unicode or Latin-1
> encoded string containing a Python expression. The string or
> node provided may only consist of the following Python literal
> structures: strings, numbers, tuples, lists, dicts, booleans,
> and None.

Fortunately, I don't believe any of the other type can start with a
dq. So currently, I don't believe we can end-up with anything else but
a string. We could certainly check that this is always true though.

> Also doesn't Python's double-quoted string have a lot more magic
> than C-quoted string, e.g.
>
> $ python -i
> >>> import ast
> >>> not_cq_path = '"abc" "def"'
> >>> ast.literal_eval(not_cq_path)
> 'abcdef'

It is true that I have expected "valid output" from git-fast-export.
And I don't have in mind any easy solution to detect that the output
is broken, yet still accepted as a valid string by python. We could
obviously write a unquote_c_style() equivalent in python if needed.

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-22 Thread Felipe Contreras
On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse  wrote:

> It is true that I have expected "valid output" from git-fast-export.
> And I don't have in mind any easy solution to detect that the output
> is broken, yet still accepted as a valid string by python. We could
> obviously write a unquote_c_style() equivalent in python if needed.

Something like this?

def c_style_unescape(string):
if string[0] == string[-1] == '"':
return string.decode('string-escape')[1:-1]
return string

It's in git-remote-bzr.py.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-23 Thread Antoine Pelisse
On Wed, Oct 23, 2013 at 2:45 AM, Felipe Contreras
 wrote:
> On Tue, Oct 22, 2013 at 3:49 PM, Antoine Pelisse  wrote:
>
>> It is true that I have expected "valid output" from git-fast-export.
>> And I don't have in mind any easy solution to detect that the output
>> is broken, yet still accepted as a valid string by python. We could
>> obviously write a unquote_c_style() equivalent in python if needed.
>
> Something like this?
>
> def c_style_unescape(string):
> if string[0] == string[-1] == '"':
> return string.decode('string-escape')[1:-1]
> return string
>
> It's in git-remote-bzr.py.

Yeah, that's certainly better,

Thanks,
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re* [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-23 Thread Junio C Hamano
Antoine Pelisse  writes:

>> def c_style_unescape(string):
>> if string[0] == string[-1] == '"':
>> return string.decode('string-escape')[1:-1]
>> return string
>>
>> It's in git-remote-bzr.py.
>
> Yeah, that's certainly better,
>
> Thanks,

OK, so an amended one will look like this?

-- >8 --
From: Antoine Pelisse 
Subject: remote-hg: unquote C-style paths when exporting

git-fast-import documentation says that paths can be C-style quoted.
Unfortunately, the current remote-hg helper doesn't unquote quoted
path and pass them as-is to Mercurial when the commit is created.

This result in the following situation:

- clone a mercurial repository with git
- Add a file with space: `mkdir dir/foo\ bar`
- Commit that new file, and push the change to mercurial
- The mercurial repository as now a new directory named '"dir', which
contains a file named 'foo bar"'

Use python ast.literal_eval to unquote the string if it starts with ".
It has been tested with quotes, spaces, and utf-8 encoded file-names.

Signed-off-by: Antoine Pelisse 
Signed-off-by: Junio C Hamano 
---

 contrib/remote-helpers/git-remote-hg | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/contrib/remote-helpers/git-remote-hg 
b/contrib/remote-helpers/git-remote-hg
index 0194c67..85abbed 100755
--- a/contrib/remote-helpers/git-remote-hg
+++ b/contrib/remote-helpers/git-remote-hg
@@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
 f = { 'ctx' : repo[p1][e] }
 files[e] = f
 
+def c_style_unescape(string):
+if string[0] == string[-1] == '"':
+return string.decode('string-escape')[1:-1]
+return string
+
 def parse_commit(parser):
 global marks, blob_marks, parsed_refs
 global mode
@@ -720,6 +725,7 @@ def parse_commit(parser):
 f = { 'deleted' : True }
 else:
 die('Unknown file command: %s' % line)
+path = c_style_unescape(path).decode('utf-8')
 files[path] = f
 
 # only export the commits if we are on an internal proxy repo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re* [PATCH] remote-hg: unquote C-style paths when exporting

2013-10-23 Thread Antoine Pelisse
On Wed, Oct 23, 2013 at 5:44 PM, Junio C Hamano  wrote:
> Antoine Pelisse  writes:
>
>>> def c_style_unescape(string):
>>> if string[0] == string[-1] == '"':
>>> return string.decode('string-escape')[1:-1]
>>> return string
>>>
>>> It's in git-remote-bzr.py.
>>
>> Yeah, that's certainly better,
>>
>> Thanks,
>
> OK, so an amended one will look like this?

The commit message needs to be updated as well.

> -- >8 --
> From: Antoine Pelisse 
> Subject: remote-hg: unquote C-style paths when exporting
>
> git-fast-import documentation says that paths can be C-style quoted.
> Unfortunately, the current remote-hg helper doesn't unquote quoted
> path and pass them as-is to Mercurial when the commit is created.
>
> This result in the following situation:

s/result/&s/

> - clone a mercurial repository with git
> - Add a file with space: `mkdir dir/foo\ bar`

- Add a file with space in a directory: `>dir/foo\ bar`

> - Commit that new file, and push the change to mercurial
> - The mercurial repository as now a new directory named '"dir', which
> contains a file named 'foo bar"'

I'm so ashamed I'd rather not report this one: s/as/has/

> Use python ast.literal_eval to unquote the string if it starts with ".

Use python str.decode('string-escape') to unquote the string if it
starts and ends with ".

> It has been tested with quotes, spaces, and utf-8 encoded file-names.
>
> Signed-off-by: Antoine Pelisse 
> Signed-off-by: Junio C Hamano 
> ---
>  contrib/remote-helpers/git-remote-hg | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/contrib/remote-helpers/git-remote-hg 
> b/contrib/remote-helpers/git-remote-hg
> index 0194c67..85abbed 100755
> --- a/contrib/remote-helpers/git-remote-hg
> +++ b/contrib/remote-helpers/git-remote-hg
> @@ -678,6 +678,11 @@ def get_merge_files(repo, p1, p2, files):
>  f = { 'ctx' : repo[p1][e] }
>  files[e] = f
>
> +def c_style_unescape(string):
> +if string[0] == string[-1] == '"':
> +return string.decode('string-escape')[1:-1]
> +return string
> +
>  def parse_commit(parser):
>  global marks, blob_marks, parsed_refs
>  global mode
> @@ -720,6 +725,7 @@ def parse_commit(parser):
>  f = { 'deleted' : True }
>  else:
>  die('Unknown file command: %s' % line)
> +path = c_style_unescape(path).decode('utf-8')
>  files[path] = f
>
>  # only export the commits if we are on an internal proxy repo

That is consistent with git-remote-bzr,

Thanks
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html