Le 19/04/2012 14:02, Sania a écrit :
On Apr 19, 2:48 am, Jussi Piitulainen<jpiit...@ling.helsinki.fi>
[...]
  text="accounts put the death toll at 637 and those missing at
653 , but the total number is likely to be much bigger"
       dead=re.match(r".*death toll.*(\d[,\d\.]*)", text)
       deadnum=dead.group(1)
       deaths.append(deadnum)
       print deaths

It's the regexp. The .* after "death toll" each the input as far as it
can without making the whole match fail. The group matches only the
last digit in the text.

You could allow only non-digits before the number. Or you could look
up the variant of * that only matches as much as it must.

Hey Thanks,
So now my regex is

     dead=re.match(r".*death toll.{0,20}(\d[,\d\.]*)", text)
Hi,
But there, your regex matches :
<something>death toll<anything which length is <=20> followed by what you capture (which is made up of a digit, at least)
there are at least two issues here :
 - the number of characters between death toll and the figure may be > 20
- your {0,20} is greedy => .{0,20} matches as many as "." as it can AND one digit is matched by (\d[,\d\.]*), since your group captures a digit followed(OR NOT) by a digit, a comma, a dot =====> so " at 63" is sucked by .{0,20} and (\d[,\d\.]*) matches the remaining digit "7"

a solution would be to follow what Jussi suggested...
=> dead=re.match(r".*death toll\D*(\d*)", text)

But I only find 7 not 657. How is it that the group is only matching
the last digit?
=> .{,20} greed
The whole thing is parenthesis not just the last part. ?
yeah but only one digit remains when your group matches...

Good luck understanding regexes, it's a powerful tool ! :)

best,
azra.

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to