Sania wrote: > Hi, > So I am trying to get the number of casualties in a text. After 'death > toll' in the text the number I need is presented as you can see from > the variable called text. Here is my code > I'm pretty sure my regex is correct, I think it's the group part > that's the problem.
No. A regex like ".*(\d+)" is "greedy", the ".*" matches as much as possible: >>> re.match(".*(\d+)", "alpha 123 beta 456 gamma").group(1) '6' You want to find the first number and need the non-greedy form ".*?" >>> re.match(".*?(\d+)", "alpha 123 beta 456 gamma").group(1) '123' > I am using nltk by python. Group grabs the string in parenthesis and > stores it in deadnum and I make deadnum into a list. > > text="accounts put the death toll at 637 and those missing at > 653 , but the total number is likely to be much bigger" > dead=re.match(r".*death toll.*(\d[,\d\.]*)", text) > deadnum=dead.group(1) > deaths.append(deadnum) > print deaths > > Any help would be appreciated, > Thank you, > Sania -- http://mail.python.org/mailman/listinfo/python-list