Re: text processing problem

Matt Thu, 07 Apr 2005 18:05:05 -0700

Maurice LING wrote:
> Matt wrote:
> >
> >
> > Try this:
> > import re
> > my_expr = re.compile(r'(\w+) (\(\1\))')
> > s = "this is (is) a test"
> > print my_expr.sub(r'\1', s)
> > #prints 'this is a test'
> >
> > M@
> >
>
> Thank you Matt. It works out well. The only think that gives it
problem
> is in events as "there  (there)", where between the word and the same


> bracketted word is more than one whitespaces...
>
> Cheers
> Maurice


Maurice,
I'd HIGHLY suggest purchasing the excellent <a
href="http://www.oreilly.com/catalog/regex2/index.html";>Mastering
Regular Expressions</a> by Jeff Friedl.  Although it's mostly geared
towards Perl, it will answer all your questions about regular
expressions.  If you're going to work with regexs, this is a must-have.

That being said, here's what the new regular expression should be with
a bit of instruction (in the spirit of teaching someone to fish after
giving them a fish ;-)   )

my_expr = re.compile(r'(\w+)\s*(\(\1\))')

Note the "\s*", in place of the single space " ".  The "\s" means "any
whitespace character (equivalent to [ \t\n\r\f\v]).  The "*" following
it means "0 or more occurances".  So this will now match:

"there  (there)"
"there (there)"
"there(there)"
"there                                          (there)"
"there\t(there)" (tab)
"there\t\t\t\t\t\t\t\t\t\t\t\t(there)"
etc.

Hope that's helpful.  Pick up the book!

M@

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: text processing problem

Reply via email to