On Apr 18, 11:08 pm, Steven Bethard <[EMAIL PROTECTED]> wrote: > EMC ROY wrote: > > Original Sentence: An apple for you. > > Present: An<AT0> apple<NN1> for<PRP> you<PNP> .<.> > > Desire: <AT0>An <NN1>apple <PRP>for <PNP>you <.>. > >>> text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' > >>> import re > >>> re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text) > > '<AT0>An <NN1>apple <PRP>for <PNP>you <.>.'
If you end up calling re.sub() repeatedly, e.g. for each line in your file, then you should "compile" the regular expression so that python doesn't have to recompile it for every call: import re text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3') re.sub(myR, r'\2\1\3', text) Unfortunately, I must be doing something wrong because I can't get that code to work. When I run it, I get the error: Traceback (most recent call last): File "2pythontest.py", line 3, in ? myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3') File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre.py", line 180, in compile return _compile(pattern, flags) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre.py", line 225, in _compile p = sre_compile.compile(pattern, flags) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre_compile.py", line 496, in compile p = sre_parse.parse(p, flags) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre_parse.py", line 668, in parse p = _parse_sub(source, pattern, 0) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre_parse.py", line 308, in _parse_sub itemsappend(_parse(source, state)) File "/Library/Frameworks/Python.framework/Versions/2.4/lib/ python2.4/sre_parse.py", line 396, in _parse if state.flags & SRE_FLAG_VERBOSE: TypeError: unsupported operand type(s) for &: 'str' and 'int' Yet, these two examples work without error: ------ import re text = 'An<AT0> apple<NN1> for<PRP> you<PNP> .<.>' #myR = re.compile(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3') print re.sub(r'(\S+)(<[^>]+>)(\s*)', r'\2\1\3', text) myR = re.compile(r'(hello)') text = "hello world" print re.sub(myR, r"\1XXX", text) ---------output: <AT0>An <NN1>apple <PRP>for <PNP>you <.>. helloXXX world Can anyone help? -- http://mail.python.org/mailman/listinfo/python-list