Thanks John Machin and Mark Tolonen .. SO I guess the correct one is to use the word boundary meta character "\b"
so r'\b\d{4}\b' is what I need since it reads a 4 digit number in between word boundaries Thanks a tonne, and this being my second post to comp.lang.python. I am always amazed at how helpful everyone on this group is Hari On Nov 21, 5:12 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Nov 22, 8:46 am, harijay <[EMAIL PROTECTED]> wrote: > > > Hi > > I am a few months new into python. I have used regexps before in perl > > and java but am a little confused with this problem. > > > I want to parse a number of strings and extract only those that > > contain a 4 digit number anywhere inside a string > > > However the regexp > > p = re.compile(r'\d{4}') > > > Matches even sentences that have longer than 4 numbers inside > > strings ..for example it matches "I have 3324234 and more" > > No it doesn't. When used with re.search on that string it matches > 3324, it doesn't "match" the whole sentence. > > > > > I am very confused. Shouldnt the \d{4,} match exactly four digit > > numbers so a 5 digit number sentence should not be matched . > > {4} does NOT mean the same as {4,}. > {4} is the same as {4,4} > {4,} means {4,INFINITY} > > Ignoring {4,}: > > You need to specify a regex that says "4 digits followed by (non-digit > or end-of-string)". Have a try at that and come back here if you have > any more problems. > > some test data: > xxx1234 > xxx12345 > xxx1234xxx > xxx12345xxx > xxx1234xxx1235xxx > xxx12345xxx1234xxx -- http://mail.python.org/mailman/listinfo/python-list