Re: Raw string substitution problem

2010-01-01 Thread Aahz
In article <7p2juvfu8...@mid.individual.net>,
Gregory Ewing   wrote:
>MRAB wrote:
>>
>> In simple cases you might be replacing with the same string every time,
>> but other cases you might want the replacement to contain substrings
>> captured by the regex.
>
>But you can give it a function that has access to the match object and
>can produce whatever replacement string it wants.

Assuming I remember correctly, the function capability came after the
replacement capability.  I think that breaking replacement would be a
Bad Idea.
-- 
Aahz (a...@pythoncraft.com)   <*> http://www.pythoncraft.com/

Weinberg's Second Law: If builders built buildings the way programmers wrote 
programs, then the first woodpecker that came along would destroy civilization.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-19 Thread Rhodri James
On Fri, 18 Dec 2009 17:58:08 -, Alan G Isaac   
wrote:



On 12/17/2009 7:59 PM, Rhodri James wrote:

"re.compile('a\\nc')" passes a sequence of four characters to
re.compile: 'a', '\', 'n' and 'c'.  re.compile() then does it's own
interpretation: 'a' passes through as is, '\' flags an escape which
combined with 'n' produces the newline character (0x0a), and 'c' passes
through as is.



I got that from MRAB's posts. (Thanks.)
What I'm not getting is why the replacement string
gets this particular interpretation.  What is the payoff?


So that the substitution escapes \1, \2 and so on work.

--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Steven D'Aprano
On Sat, 19 Dec 2009 02:24:00 +, MRAB wrote:

> Gregory Ewing wrote:
>> MRAB wrote:
>> 
>>> In simple cases you might be replacing with the same string every
>>> time, but other cases you might want the replacement to contain
>>> substrings captured by the regex.
>> 
>> But you can give it a function that has access to the match object and
>> can produce whatever replacement string it wants.
>> 
>> You already have a complete programming language at your disposal.
>> There's no need to invent yet another mini-language for the replacement
>> string.
>> 
> There's no need for list comprehensions either, but they're much-used
> shorthand.

The same can't be said for regex replacement strings, which are far more 
specialised.

And list comps don't make anything *harder*, they just make things 
easier. In contrast, the current behaviour of regex replacements makes it 
difficult to use special characters as part of the replacement string. 
That's not good.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread MRAB

Gregory Ewing wrote:

MRAB wrote:


In simple cases you might be replacing with the same string every
time, but other cases you might want the replacement to contain
substrings captured by the regex.


But you can give it a function that has access to the match object
and can produce whatever replacement string it wants.

You already have a complete programming language at your disposal.
There's no need to invent yet another mini-language for the
replacement string.


There's no need for list comprehensions either, but they're much-used
shorthand.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Gregory Ewing

MRAB wrote:


In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.


But you can give it a function that has access to the
match object and can produce whatever replacement string
it wants.

You already have a complete programming language at
your disposal. There's no need to invent yet another
mini-language for the replacement string.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Lie Ryan

On 12/19/2009 4:59 AM, Alan G Isaac wrote:

On 12/18/2009 12:17 PM, MRAB wrote:

In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.



Of course that "conversion" is needed in the replacement.
But e.g. Vim substitutions handle this fine without the
odd (to non perlers) handling of backslashes in replacement.

Alan Isaac


Short answer: Python is not Perl, Python's re.sub is not Vim's :s.

Slightly longer answer: Different environments have different need; 
vim-ers more often needs to escape with just a plain text. All in all, 
the decision for default behaviors are often made so that less backslash 
will be needed for the more common case in the particular environment.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Alan G Isaac

On 12/18/2009 12:17 PM, MRAB wrote:

In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.



Of course that "conversion" is needed in the replacement.
But e.g. Vim substitutions handle this fine without the
odd (to non perlers) handling of backslashes in replacement.

Alan Isaac

--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Alan G Isaac

On 12/17/2009 7:59 PM, Rhodri James wrote:

"re.compile('a\\nc')" passes a sequence of four characters to
re.compile: 'a', '\', 'n' and 'c'.  re.compile() then does it's own
interpretation: 'a' passes through as is, '\' flags an escape which
combined with 'n' produces the newline character (0x0a), and 'c' passes
through as is.



I got that from MRAB's posts. (Thanks.)
What I'm not getting is why the replacement string
gets this particular interpretation.  What is the payoff?
(Contrast e.g. Vim's substitution syntax.)

Thanks,
Alan

--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread MRAB

Gregory Ewing wrote:

MRAB wrote:


Regular expressions and replacement strings have their own escaping
mechanism, which also uses backslashes.


This seems like a misfeature to me. It makes sense for a regular
expression to give special meanings to backslash sequences, because
it's a sublanguage with its own syntax. But I can't see any earthly
reason to do that with the *replacement* string, which is just data.

It looks like a feature that's been blindly copied over from Perl
without thinking about whether it makes sense in Python.


In simple cases you might be replacing with the same string every time,
but other cases you might want the replacement to contain substrings
captured by the regex.

For example, swapping pairs of words:


re.sub(r'(\w+) (\w+)', r'\2 \1', r'first second third fourth')

'second first fourth third'


Python also allows you to provide a function that returns the
replacement string, but that seems a bit long-winded for those cases
when a simple replacement template would suffice.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-18 Thread Sion Arrowsmith
Gregory Ewing   wrote:
>MRAB wrote:
>> Regular expressions and replacement strings have their own escaping
>> mechanism, which also uses backslashes.
>This seems like a misfeature to me. It makes sense for
>a regular expression to give special meanings to backslash
>sequences, because it's a sublanguage with its own syntax.
>But I can't see any earthly reason to do that with the
>*replacement* string, which is just data.

>>> re.sub('a(.)c', r'\1', "123abcdefg")
'123bdefg'

Still think the replacement string is "just data"?

-- 
\S

   under construction

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Gregory Ewing

MRAB wrote:


Regular expressions and replacement strings have their own escaping
mechanism, which also uses backslashes.


This seems like a misfeature to me. It makes sense for
a regular expression to give special meanings to backslash
sequences, because it's a sublanguage with its own syntax.
But I can't see any earthly reason to do that with the
*replacement* string, which is just data.

It looks like a feature that's been blindly copied over
from Perl without thinking about whether it makes sense
in Python.

--
Greg
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Rhodri James
On Thu, 17 Dec 2009 20:18:12 -, Alan G Isaac   
wrote:



So is the bottom line the following?
A string replacement is not just "converted"
as described in the documentation, essentially
it is compiled?


That depends entirely on what you mean.


But that cannot quite be right.  E.g., \b will be a back
space not a word boundary.  So then the question arises
again, why isn't '\\' a backslash? Just because?
Why does it not get the "obvious" conversion?


'\\' *is* a backslash.  That string containing a single backslash is then  
processed by the re module which sees a backslash, tries to interpret it  
as an escape, fails and barfs.


"re.compile('a\\nc')" passes a sequence of four characters to re.compile:  
'a', '\', 'n' and 'c'.  re.compile() then does it's own interpretation:  
'a' passes through as is, '\' flags an escape which combined with 'n'  
produces the newline character (0x0a), and 'c' passes through as is.


"re.compile('a\nc')" by contrast passes a sequence of three character to  
re.compile: 'a', 0x0a and 'c'.  re.compile() does it's own interpretation,  
which happens not to change any of the characters, resulting in the same  
regular expression as before.


Your problem is that you are conflating the compile-time processing of  
string literals with the run-time processing of strings specific to re.


--
Rhodri James *-* Wildebeeste Herder to the Masses
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread MRAB

Alan G Isaac wrote:

On 12/17/2009 2:45 PM, MRAB wrote:

re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').

However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').

However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.



OK, this is helpful.
(I did check equality but did not understand
I got True only because re used caching.)
So is the bottom line the following?
A string replacement is not just "converted"
as described in the documentation, essentially
it is compiled?

But that cannot quite be right.  E.g., \b will be a back
space not a word boundary.  So then the question arises
again, why isn't '\\' a backslash? Just because?
Why does it not get the "obvious" conversion?


If you give the re module a string containing \b, eg. '\\b' or r'\b',
then it will compile it to a word boundary if it's in a regex string or
a backspace if it's in a replacement string. This is different from
giving the re module a string which actually contains a backspace, eg,
'\b'.

Because the re module uses backslashes for escaping, you'll need to
escape a literal backslash with a backslash in the string you give it.
But string literals also use backslashes for escaping, so you'll need to
escape each of those backslashes with a backslash.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Alan G Isaac

On 12/17/2009 2:45 PM, MRAB wrote:

re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').

However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').

However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.



OK, this is helpful.
(I did check equality but did not understand
I got True only because re used caching.)
So is the bottom line the following?
A string replacement is not just "converted"
as described in the documentation, essentially
it is compiled?

But that cannot quite be right.  E.g., \b will be a back
space not a word boundary.  So then the question arises
again, why isn't '\\' a backslash? Just because?
Why does it not get the "obvious" conversion?

Thanks,
Alan Isaac
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread MRAB

Alan G Isaac wrote:

Alan G Isaac  wrote:
  >>>  re.sub('abc', r'a\nb\n.c\a','123abcdefg') == 
re.sub('abc', 'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 
'a\nb\n.c\a','123abcdefg')

  True
Why are the first two strings being treated as if they are the last one?
 


On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

They aren't.  The last string is different.


Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)



Alan G Isaac  wrote:

More simply, consider::

  >>>  re.sub('abc', '\\', '123abcdefg')
  Traceback (most recent call last):
File "", line 1, in
File "C:\Python26\lib\re.py", line 151, in sub
  return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
  template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
  raise error, v # invalid expression
  sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?



On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

Is this what you want?  What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.

 re.sub('abc', r'\\', '123abcdefg')
> '123\\defg'


Turning again to the documentation:
"if it is a string, any backslash escapes in it are processed.
That is, \n is converted to a single newline character, \r is
converted to a linefeed, and so forth."
So why is '\n' converted to a newline but '\\' does not become a literal
backslash?  OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::

>>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
True
>>> re.compile('a\\nc') == re.compile('a\nc')
False

So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?


re.compile('a\\nc') _does_ compile to the same as regex as
re.compile('a\nc').

However, regex objects never compare equal to each other, so, strictly
speaking, re.compile('a\nc') != re.compile('a\nc').

However, having said that, the re module contains a cache (keyed on the
string and options supplied), so the first re.compile('a\nc') will put
the regex object in the cache and the second re.compile('a\nc') will
return that same regex object from the cache. If you clear the cache in
between the two calls (do re._cache.clear()) you'll get two different
regex objects which won't compare equal even though they are to all
intents identical.


If the answer looks too obvious to state, assume I'm missing it anyway
and please state it.  As I said, I seldom use the re module.


--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Alan G Isaac

Alan G Isaac  wrote:

  >>>  re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 
'a\\nb\\n.c\\a','123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
  True
Why are the first two strings being treated as if they are the last one?
 


On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

They aren't.  The last string is different.


Of course it is different.
That is the basis of my question.
Why is it being treated as if it is the same?
(See the end of this post.)



Alan G Isaac  wrote:

More simply, consider::

  >>>  re.sub('abc', '\\', '123abcdefg')
  Traceback (most recent call last):
File "", line 1, in
File "C:\Python26\lib\re.py", line 151, in sub
  return _compile(pattern, 0).sub(repl, string, count)
File "C:\Python26\lib\re.py", line 273, in _subx
  template = _compile_repl(template, pattern)
File "C:\Python26\lib\re.py", line 260, in _compile_repl
  raise error, v # invalid expression
  sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?



On 12/17/2009 12:19 PM, D'Arcy J.M. Cain wrote:

Is this what you want?  What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.

 re.sub('abc', r'\\', '123abcdefg')
> '123\\defg'


Turning again to the documentation:
"if it is a string, any backslash escapes in it are processed.
That is, \n is converted to a single newline character, \r is
converted to a linefeed, and so forth."
So why is '\n' converted to a newline but '\\' does not become a literal
backslash?  OK, I don't do much string processing, so perhaps this is where
I am missing the point: how is the replacement being "converted"?
(As Peter's example shows, if you supply the replacement via
a function, this does not happen.) You suggest it is just a matter of
it being an re, but::

>>> re.sub('abc', 'a\\nc','1abcd') == re.sub('abc', 'a\nc','1abcd')
True
>>> re.compile('a\\nc') == re.compile('a\nc')
False

So I have two string that are not the same, nor do they compile
equivalently, yet apparently they are "converted" to something
equivalent for the substitution. Why? Is my question clearer?

If the answer looks too obvious to state, assume I'm missing it anyway
and please state it.  As I said, I seldom use the re module.

Alan Isaac
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread MRAB

Alan G Isaac wrote:

On 12/17/2009 11:24 AM, Richard Brodie wrote:

A raw string is not a distinct type from an ordinary string
in the same way byte strings and Unicode strings are. It
is a merely a notation for constants, like writing integers
in hexadecimal.


(r'\n', u'a', 0x16)

('\\n', u'a', 22)




Yes, that was a mistake.  But the problem remains::

>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 
'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')

True
>>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
False

Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong.  Why isn't it?

More simply, consider::

>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
  File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?


Regular expressions and replacement strings have their own escaping
mechanism, which also uses backslashes.

Some of these regex escape sequences are the same as those of string
literals, eg \n represents a newline; others are different, eg \b in a
regex represents a word boundary and not a backspace as in a string
literal.

You can match a newline in a regex by either using an actual newline
character ('\n' in a string literal) or an escape sequence ('\\n' or
r'\n' in a string literal). If you want a regex to match an actual
backslash followed by a letter 'n' then you need to escape the backslash
in the regex and then either use a raw string literal or escape it again
in a non-raw string literal.

Match characters: 
Regex: \n
Raw string literal: r'\n'
Non-raw string literal: '\\n'

Match characters: \n
Regex: \\n
Raw string literal: r'\\n'
Non-raw string literal: 'n'

Replace with characters: 
Replacement: \n
Raw string literal: r'\n'
Non-raw string literal: '\\n'

Replace with characters: \n
Replacement: \\n
Raw string literal: r'\\n'
Non-raw string literal: 'n'
--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread D'Arcy J.M. Cain
On Thu, 17 Dec 2009 11:51:26 -0500
Alan G Isaac  wrote:
>  >>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 
> 'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
>  True

Was this a straight cut and paste or did you make a manual change?  Is
that leading space in the middle one a copying error?  I get False for
what you actually have there for obvious reasons.

>  >>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
>  False
> 
> Why are the first two strings being treated as if they are the last one?

They aren't.  The last string is different.

>>> for x in (r'a\nb\n.c\a', 'a\\nb\\n.c\\a', 'a\nb\n.c\a'): print repr(x)
...
'a\\nb\\n.c\\a'
'a\\nb\\n.c\\a'
'a\nb\n.c\x07'

> That is, why isn't '\\' being processed in the obvious way?
> This still seems wrong.  Why isn't it?

What do you think is wrong?  What would the "obvious" way of handling
'//' be?
> 
> More simply, consider::
> 
>  >>> re.sub('abc', '\\', '123abcdefg')
>  Traceback (most recent call last):
>File "", line 1, in 
>File "C:\Python26\lib\re.py", line 151, in sub
>  return _compile(pattern, 0).sub(repl, string, count)
>File "C:\Python26\lib\re.py", line 273, in _subx
>  template = _compile_repl(template, pattern)
>File "C:\Python26\lib\re.py", line 260, in _compile_repl
>  raise error, v # invalid expression
>  sre_constants.error: bogus escape (end of line)
> 
> Why is this the proper handling of what one might think would be an
> obvious substitution?

Is this what you want?  What you have is a re expression consisting of
a single backslash that doesn't escape anything (EOL) so it barfs.

>>> re.sub('abc', r'\\', '123abcdefg')
'123\\defg'

-- 
D'Arcy J.M. Cain  |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Alan G Isaac

On 12/17/2009 11:24 AM, Richard Brodie wrote:

A raw string is not a distinct type from an ordinary string
in the same way byte strings and Unicode strings are. It
is a merely a notation for constants, like writing integers
in hexadecimal.


(r'\n', u'a', 0x16)

('\\n', u'a', 22)




Yes, that was a mistake.  But the problem remains::

>>> re.sub('abc', r'a\nb\n.c\a','123abcdefg') == re.sub('abc', 
'a\\nb\\n.c\\a',' 123abcdefg') == re.sub('abc', 'a\nb\n.c\a','123abcdefg')
True
>>> r'a\nb\n.c\a' == 'a\\nb\\n.c\\a' == 'a\nb\n.c\a'
False

Why are the first two strings being treated as if they are the last one?
That is, why isn't '\\' being processed in the obvious way?
This still seems wrong.  Why isn't it?

More simply, consider::

>>> re.sub('abc', '\\', '123abcdefg')
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\Python26\lib\re.py", line 151, in sub
return _compile(pattern, 0).sub(repl, string, count)
  File "C:\Python26\lib\re.py", line 273, in _subx
template = _compile_repl(template, pattern)
  File "C:\Python26\lib\re.py", line 260, in _compile_repl
raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)

Why is this the proper handling of what one might think would be an
obvious substitution?

Thanks,
Alan Isaac
 
--

http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Richard Brodie

"Alan G Isaac"  wrote in message 
news:qemdnrut0jvj1lfwnz2dnuvz_vqdn...@rcn.net...

> Naturally enough.  So I think the right answer is:
>
> 1. this is a documentation bug (i.e., the documentation
>fails to specify unexpected behavior for raw strings), or
> 2. this is a bug (i.e., raw strings are not handled correctly
>when used as replacements)

 There is no raw string. 

A raw string is not a distinct type from an ordinary string
in the same way byte strings and Unicode strings are. It
is a merely a notation for constants, like writing integers
in hexadecimal.

>>> (r'\n', u'a', 0x16)
('\\n', u'a', 22)





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-17 Thread Alan G Isaac

En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith  escribió:


I am having a problem when substituting a raw string. When I do the
following:

re.sub('abc', r'a\nb\nc', '123abcdefg')

I get

"""
123a
b
cdefg
"""

what I want is

r'123a\nb\ncdefg'


 
On 12/16/2009 9:35 AM, Gabriel Genellina wrote:

 From http://docs.python.org/library/re.html#re.sub

re.sub(pattern, repl, string[, count])

...repl can be a string or a function; if
it is a string, any backslash escapes in
it are processed. That is, \n is converted
to a single newline character, \r is
converted to a linefeed, and so forth.

So you'll have to double your backslashes:




I'm not persuaded that the docs are clear.  Consider:

>>> 'ab\\ncd' == r'ab\ncd'
True

Naturally enough.  So I think the right answer is:

1. this is a documentation bug (i.e., the documentation
   fails to specify unexpected behavior for raw strings), or
2. this is a bug (i.e., raw strings are not handled correctly
   when used as replacements)

I vote for 2.

Peter's use of a function highlights just how odd this is:
getting the raw string via a function produces a different
result than providing it directly.  If this is really the
way things ought to be, I'd appreciate a clear explanation
of why.

Alan Isaac

--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Ed Keith
--- On Wed, 12/16/09, Peter Otten <__pete...@web.de> wrote:

> Another possibility:
> 
> >>> print re.sub('abc', lambda m: r'a\nb\n.c\a',
> '123abcdefg')
> 123a\nb\n.c\adefg

I'm not sure whether that is clever, ugly, or just plain strange! 

I think I'll stick with:

>>> m = re.match('^(.*)abc(.*)$', '123abcdefg')
>>> print m.group(1) + r'a\nb\n.c\a' + m.group(2)
123a\nb\n.c\adefg

It's much less likely to fry the poor maintenance programmer's mind.

-EdK

Ed Keith
e_...@yahoo.com

Blog: edkeith.blogspot.com



  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Peter Otten
Gabriel Genellina wrote:

> En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__pete...@web.de>
> escribió:
> 
>> Ed Keith wrote:
>>
>>> --- On Wed, 12/16/09, Gabriel Genellina  wrote:
>>>
 Ed Keith 
 escribió:

 > I am having a problem when substituting a raw string.
 When I do the following:
 >
 > re.sub('abc', r'a\nb\nc', '123abcdefg')
 >
 > I get
 >
 > """
 > 123a
 > b
 > cdefg
 > """
 >
 > what I want is
 >
 > r'123a\nb\ncdefg'

 So you'll have to double your backslashes:

 py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
 '123a\\nb\\ncdefg'

>>> That is going to be a nontrivial exercise. I have control over the
>>> pattern, but the texts to be substituted and substituted into will be
>>> read
>>> from user supplied files. I need to reproduce the exact text the is read
>>> from the file.
>>
>> There is a helper function re.escape() that you can use to sanitize the
>> substitution:
>>
> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
>> 123a\nb\ncdefg
> 
> Unfortunately re.escape does much more than that:
> 
> py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
> 123a\.b\.cdefg

Sorry, I didn't think of that.
 
> I think the string_escape encoding is what the OP needs:
> 
> py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),
> '123abcdefg')
> 123a\n(b.c)\nddefg

Another possibility:

>>> print re.sub('abc', lambda m: r'a\nb\n.c\a', '123abcdefg')
123a\nb\n.c\adefg

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Gabriel Genellina
En Wed, 16 Dec 2009 14:51:08 -0300, Peter Otten <__pete...@web.de>  
escribió:



Ed Keith wrote:


--- On Wed, 12/16/09, Gabriel Genellina  wrote:


Ed Keith 
escribió:

> I am having a problem when substituting a raw string.
When I do the following:
>
> re.sub('abc', r'a\nb\nc', '123abcdefg')
>
> I get
>
> """
> 123a
> b
> cdefg
> """
>
> what I want is
>
> r'123a\nb\ncdefg'

So you'll have to double your backslashes:

py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
'123a\\nb\\ncdefg'


That is going to be a nontrivial exercise. I have control over the
pattern, but the texts to be substituted and substituted into will be  
read

from user supplied files. I need to reproduce the exact text the is read
from the file.


There is a helper function re.escape() that you can use to sanitize the
substitution:


print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')

123a\nb\ncdefg


Unfortunately re.escape does much more than that:

py> print re.sub('abc', re.escape(r'a.b.c'), '123abcdefg')
123a\.b\.cdefg

I think the string_escape encoding is what the OP needs:

py> print re.sub('abc', r'a\n(b.c)\nd'.encode("string_escape"),  
'123abcdefg')

123a\n(b.c)\nddefg

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Peter Otten
Ed Keith wrote:

> --- On Wed, 12/16/09, Gabriel Genellina  wrote:
> 
>> From: Gabriel Genellina 
>> Subject: Re: Raw string substitution problem
>> To: python-list@python.org
>> Date: Wednesday, December 16, 2009, 9:35 AM
>> En Wed, 16 Dec 2009 11:09:32 -0300,
>> Ed Keith 
>> escribió:
>> 
>> > I am having a problem when substituting a raw string.
>> When I do the following:
>> > 
>> > re.sub('abc', r'a\nb\nc', '123abcdefg')
>> > 
>> > I get
>> > 
>> > """
>> > 123a
>> > b
>> > cdefg
>> > """
>> > 
>> > what I want is
>> > 
>> > r'123a\nb\ncdefg'
>> 
>> From http://docs.python.org/library/re.html#re.sub
>> 
>> re.sub(pattern, repl, string[, count])
>> 
>> ...repl can be a string or a function;
>> if
>> it is a string, any backslash escapes
>> in
>> it are processed. That is, \n is
>> converted
>> to a single newline character, \r is
>> converted to a linefeed, and so forth.
>> 
>> So you'll have to double your backslashes:
>> 
>> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
>> '123a\\nb\\ncdefg'
>> 
>> --Gabriel Genellina
>> 
>> --http://mail.python.org/mailman/listinfo/python-list
>> 
> 
> That is going to be a nontrivial exercise. I have control over the
> pattern, but the texts to be substituted and substituted into will be read
> from user supplied files. I need to reproduce the exact text the is read
> from the file.

There is a helper function re.escape() that you can use to sanitize the  
substitution:

>>> print re.sub('abc', re.escape(r'a\nb\nc'), '123abcdefg')
123a\nb\ncdefg

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Ed Keith
--- On Wed, 12/16/09, Gabriel Genellina  wrote:

> From: Gabriel Genellina 
> Subject: Re: Raw string substitution problem
> To: python-list@python.org
> Date: Wednesday, December 16, 2009, 9:35 AM
> En Wed, 16 Dec 2009 11:09:32 -0300,
> Ed Keith 
> escribió:
> 
> > I am having a problem when substituting a raw string.
> When I do the following:
> > 
> > re.sub('abc', r'a\nb\nc', '123abcdefg')
> > 
> > I get
> > 
> > """
> > 123a
> > b
> > cdefg
> > """
> > 
> > what I want is
> > 
> > r'123a\nb\ncdefg'
> 
> From http://docs.python.org/library/re.html#re.sub
> 
>     re.sub(pattern, repl, string[, count])
> 
>     ...repl can be a string or a function;
> if
>     it is a string, any backslash escapes
> in
>     it are processed. That is, \n is
> converted
>     to a single newline character, \r is
>     converted to a linefeed, and so forth.
> 
> So you'll have to double your backslashes:
> 
> py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
> '123a\\nb\\ncdefg'
> 
> --Gabriel Genellina
> 
> --http://mail.python.org/mailman/listinfo/python-list
> 

That is going to be a nontrivial exercise. I have control over the pattern, but 
the texts to be substituted and substituted into will be read from user 
supplied files. I need to reproduce the exact text the is read from the file. 

Maybe what I should do is use re to break the string into two pieces, the part 
before the pattern to be replaces and the part after it, then splice the 
replacement text in between them. Seems like doing it the hard way, but it 
should work. 

Thanks,

   -EdK



  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Chris Hulan
On Dec 16, 9:09 am, Ed Keith  wrote:
> I am having a problem when substituting a raw string. When I do the following:
>
> re.sub('abc', r'a\nb\nc', '123abcdefg')
>
> I get
>
> """
> 123a
> b
> cdefg
> """
>
> what I want is
>
> r'123a\nb\ncdefg'
>
> How do I get what I want?
>
> Thanks,
>
>     -EdK
>
> Ed Keith
> e_...@yahoo.com
>
> Blog: edkeith.blogspot.com

Looks like raw strings lets you avoid having to escape slashes when
specifying the literal, but doesn't preserve it during operations.
changing your replacement string to r'a\\nb\\nc' seems to give the
desired output

cheers
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Raw string substitution problem

2009-12-16 Thread Gabriel Genellina

En Wed, 16 Dec 2009 11:09:32 -0300, Ed Keith  escribió:

I am having a problem when substituting a raw string. When I do the  
following:


re.sub('abc', r'a\nb\nc', '123abcdefg')

I get

"""
123a
b
cdefg
"""

what I want is

r'123a\nb\ncdefg'


From http://docs.python.org/library/re.html#re.sub

re.sub(pattern, repl, string[, count])

...repl can be a string or a function; if
it is a string, any backslash escapes in
it are processed. That is, \n is converted
to a single newline character, \r is
converted to a linefeed, and so forth.

So you'll have to double your backslashes:

py> re.sub('abc', r'a\\nb\\nc', '123abcdefg')
'123a\\nb\\ncdefg'

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Raw string substitution problem

2009-12-16 Thread Ed Keith
I am having a problem when substituting a raw string. When I do the following:

re.sub('abc', r'a\nb\nc', '123abcdefg')

I get

"""
123a
b
cdefg
"""

what I want is 

r'123a\nb\ncdefg'

How do I get what I want?

Thanks,

-EdK

Ed Keith
e_...@yahoo.com

Blog: edkeith.blogspot.com


  
-- 
http://mail.python.org/mailman/listinfo/python-list