On Nov 21, 4:46 pm, harijay <[EMAIL PROTECTED]> wrote: > Hi > I am a few months new into python. I have used regexps before in perl > and java but am a little confused with this problem. > > I want to parse a number of strings and extract only those that > contain a 4 digit number anywhere inside a string > > However the regexp > p = re.compile(r'\d{4}') > > Matches even sentences that have longer than 4 numbers inside > strings ..for example it matches "I have 3324234 and more" > > I am very confused. Shouldnt the \d{4,} match exactly four digit > numbers so a 5 digit number sentence should not be matched .
No, why should it ? What you're saying is "give me 4 consecutive digits", without specifying what should precede or follow these digits. A correct expression is a bit more hairy: p = re.compile(r''' (?:\D|\b) # find a non-digit or word boundary.. (\d{4}) # .. followed by the 4 digits to be matched as group #1.. (?:\D|\b) # .. which are followed by non-digit or word boundary ''', re.VERBOSE) HTH, George -- http://mail.python.org/mailman/listinfo/python-list