Sania writes: > So I am trying to get the number of casualties in a text. After 'death > toll' in the text the number I need is presented as you can see from > the variable called text. Here is my code > I'm pretty sure my regex is correct, I think it's the group part > that's the problem. > I am using nltk by python. Group grabs the string in parenthesis and > stores it in deadnum and I make deadnum into a list. > > text="accounts put the death toll at 637 and those missing at > 653 , but the total number is likely to be much bigger" > dead=re.match(r".*death toll.*(\d[,\d\.]*)", text) > deadnum=dead.group(1) > deaths.append(deadnum) > print deaths
It's the regexp. The .* after "death toll" each the input as far as it can without making the whole match fail. The group matches only the last digit in the text. You could allow only non-digits before the number. Or you could look up the variant of * that only matches as much as it must. -- http://mail.python.org/mailman/listinfo/python-list