In <[EMAIL PROTECTED]>, dmbkiwi wrote: > I'm trying to parse a line of html as follows: > > <td style="width:20%" align="left">101.120:( KPA (-)</td> > <td style="width:35%" align="left">Snow on Ground)0 </td> > > however, sometimes it looks like this: > > <td style="width:20%" align="left">N/A</td> > <td style="width:35%" align="left">Snow on Ground)0 </td> > > > I want to get either the numerical value 101.120 (which could be a > different number depending on the data that's been fed into the page, > or in terms of the second option, 'N/A'. > > The regexp I'm using is: > > .*?Pressure.*?"left">(?P<baro>\d+?|N/A)</td>|\sKPA.*?Snow\son\sGround > > Can someone help me debug this. It's not picking up the number, and > I'm not sure I've got the syntax for '|' right, but can't find a > detailed tutorial on how to use |.
What about something like align="left">((?P<baro>[\d.]+):\(\sKPA)|(?P<na>N/A).*Ground\) You need the flags re.MULTILINE and re.DOTALL when compiling the regular expression. You'll have to check the 'baro' and 'na' groups to decide if it matched a numerical value or 'N/A'. Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list