George Sakkis wrote:
On Nov 21, 4:46 pm, harijay <[EMAIL PROTECTED]> wrote:
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.
I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string
However the regexp
p = re.compile(r'\d{4}')
Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"
I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:
p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)
which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
--
http://mail.python.org/mailman/listinfo/python-list