7stud wrote: > On Apr 18, 11:08 pm, Steven Bethard <[EMAIL PROTECTED]> wrote: >> EMC ROY wrote: >> > Original Sentence: An apple for you. >> > Present: An<AT0> apple<NN1> for<PRP> you<PNP> .<.> >> > Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>. >> >>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' >> >>> import re >> >>> re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text) >> >> '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.' > > If you end up calling re.sub() repeatedly, e.g. for each line in your > file, then you should "compile" the regular expression so that python > doesn't have to recompile it for every call: > > import re > > text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' > myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3')
re.compile() doesn't accept a replacement pattern: """ Help on function compile in module re: compile(pattern, flags=0) Compile a regular expression pattern, returning a pattern object. """ > re.sub(myR, r'\2\1\3', text) > > > Unfortunately, I must be doing something wrong because I can't get > that code to work. When I run it, I get the error: > > Traceback (most recent call last): > File "2pythontest.py", line 3, in ? > myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3') > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre.py", line 180, in compile > return _compile(pattern, flags) > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre.py", line 225, in _compile > p = sre_compile.compile(pattern, flags) > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre_compile.py", line 496, in compile > p = sre_parse.parse(p, flags) > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre_parse.py", line 668, in parse > p = _parse_sub(source, pattern, 0) > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre_parse.py", line 308, in _parse_sub > itemsappend(_parse(source, state)) > File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ > python2.4/sre_parse.py", line 396, in _parse > if state.flags & SRE_FLAG_VERBOSE: > TypeError: unsupported operand type(s) for &: 'str' and 'int' > > > Yet, these two examples work without error: > > ------ > import re > > text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' > #myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3') > print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text) > > myR = re.compile(r'(hello)') > text = "hello world" > print re.sub(myR, r"\1XXX", text) > > ---------output: > <AT0>An <NN1>apple <PRP>for <PNP>you <.>. > helloXXX world > > > Can anyone help? You can precompile the regular expression like this: >>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' >>> r = re.compile(r'(\S+)(<[^>]+>)(\s*)') >>> r.sub(r'\2\1\3', text) '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.' or even >>> sub = re.compile(r'(\S+)(<[^>]+>)(\s*)').sub >>> sub(r'\2\1\3', text) '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.' Note that this is not as much more efficient as you might think since re.sub() and the other re functions look up already compiled regexps in a cache. Peter -- http://mail.python.org/mailman/listinfo/python-list