Re: Python Regex Question

Ivo Fri, 21 Sep 2007 12:11:20 -0700

crybaby wrote:
> On Sep 20, 4:12 pm, Tobiah <[EMAIL PROTECTED]> wrote:
>> [EMAIL PROTECTED] wrote:
>>> I need to extract the number on each <td tags from a html file.
>>> i.e 49.950 from the following:
>>> <td align=right width=80><font size=2 face="New Times
>>> Roman,Times,Serif">&nbsp;49.950&nbsp;</font></td>
>>> The actual number between: &nbsp;49.950&nbsp; can be any number of
>>> digits before decimal and after decimal.
>>> <td align=right width=80><font size=2 face="New Times
>>> Roman,Times,Serif">&nbsp;######.####&nbsp;</font></td>
>>> How can I just extract the real/integer number using regex?
>> '[0-9]*\.[0-9]*'
>>
>> --
>> Posted via a free Usenet account fromhttp://www.teranews.com
> 
> I am trying to use BeautifulSoup:
> 
>     soup = BeautifulSoup(page)
> 
>     td_tags = soup.findAll('td')
>     i=0
>     for td in td_tags:
>         i = i+1
>         print "td: ", td
>         # re.search('[0-9]*\.[0-9]*', td)
>         price = re.compile('[0-9]*\.[0-9]*').search(td)
> 
> I am getting an error:
> 
>            price= re.compile('[0-9]*\.[0-9]*').search(td)
> TypeError: expected string or buffer
> 
> Does beautiful soup returns array of objects? If so, how do I pass
> "td" instance as string to re.search?  What is the different between
> re.search vs re.compile().search?
>


I don't know anything about BeautifulSoup, but to the other questions:

var=re.compile(regexpr) compiles the expression and after that you can 
use var as the reference to that compiled expression (costs less)

re.search(expr, string) compiles and searches every time. This can 
potentially be more expensive in calculating power. especially if you 
have to use the expression a lot of times.

The way you use it it doesn't matter.

do:
pattern = re.compile('[0-9]*\.[0-9]*')
result = pattern.findall(your tekst here)

Now you can reuse pattern.

Cheers,
Ivo.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python Regex Question

Reply via email to