In <[EMAIL PROTECTED]>, dmbkiwi wrote:

> I'm trying to parse a line of html as follows:
> 
> <td style="width:20%" align="left">101.120:( KPA (-)</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
> 
> however, sometimes it looks like this:
> 
> <td style="width:20%" align="left">N/A</td>
> <td style="width:35%" align="left">Snow on Ground)0 </td>
> 
> 
> I want to get either the numerical value 101.120 (which could be a
> different number depending on the data that's been fed into the page,
> or in terms of the second option, 'N/A'.
> 
> The regexp I'm using is:
> 
> .*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround
> 
> Can someone help me debug this.  It's not picking up the number, and
> I'm not sure I've got the syntax for '|' right, but can't find a
> detailed tutorial on how to use |.

What about something like

   align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\)

You need the flags re.MULTILINE and re.DOTALL when compiling the regular
expression.

You'll have to check the 'baro' and 'na' groups to decide if it matched a
numerical value or 'N/A'.

Ciao,
        Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to