On Apr 19, 9:52 am, Jon Clements <jon...@googlemail.com> wrote: > On Thursday, 19 April 2012 07:11:54 UTC+1, Sania wrote: > > Hi, > > So I am trying to get the number of casualties in a text. After 'death > > toll' in the text the number I need is presented as you can see from > > the variable called text. Here is my code > > I'm pretty sure my regex is correct, I think it's the group part > > that's the problem. > > I am using nltk by python. Group grabs the string in parenthesis and > > stores it in deadnum and I make deadnum into a list. > > > text="accounts put the death toll at 637 and those missing at > > 653 , but the total number is likely to be much bigger" > > dead=re.match(r".*death toll.*(\d[,\d\.]*)", text) > > deadnum=dead.group(1) > > deaths.append(deadnum) > > print deaths > > > Any help would be appreciated, > > Thank you, > > Sania > > Or just don't fully rely on a regex. I would, for time, and the little sanity > I believe I have left, would just do something like: > > death_toll = re.search(r'death toll.*\d+', text).group().rsplit(' ', 1)[1] > > hth, > > Jon.
Thank you all so much! I ended up using Jussi's advice..... \D{0,20} Azrazer what you suggested works but I need to make sure that it catches numbers like 6,370 as well as 637. And I tried tweaking the regex around from the one you said in your reply but It didn't work (probably would have if I was more adept). But thanks! Jon- I kind of see what you are doing. In the regex you say that after death toll there can be 0 or more characters followed by 1 or more digits (although I would need to add a comma within digit so it catches 6,370). I can also see that you are splitting each string but I don't understand the 1 in rsplit(' ', 1)[1]. I am not really familiar with the syntax I guess. Thanks again! -- http://mail.python.org/mailman/listinfo/python-list