Its an extremely bad idea to use regex for HTML. You want to change one tiny little thing and you have to write the regex all over again. if its a throwaway script, then go ahead. 2010/3/20 Luis M. González <luis...@gmail.com>
> On Mar 20, 12:04 am, Jimbo <nill...@yahoo.com> wrote: > > Hello > > > > I am trying to grab some numbers from a string containing HTML text. > > Can you suggest any good functions that I could use to do this? What > > would be the easiest way to extract the following numbers from this > > string... > > > > My String has this layout & I have commented what I want to grab: > > [CODE] """</th> > > <td class="last">43.200 </td> > > <td class="change indicator" nowrap>0.040 > </td> > > > > <td>43.150 </td> # > > I need to grab this number only > > <td>43.200 </td> > > <td>43.130 </td> # > > I need to grab this number only > > <td>43.290 </td> > <td>43.100 </td> # I need to > > grab this number only > > <td>7,450,447 </td> > > <td class="middle"><a > > > href="/asx/markets/optionPrices.do? > > by=underlyingCode&underlyingCode=BHP&expiryDate=&optionType=">Options</ > > a></td> > > <td class="middle"><a > > > href="/asx/markets/warrantPrices.do? > > by=underlyingAsxCode&underlyingCode=BHP">Warrants & Structured > > Products</a></td> > > <td class="middle"><a > > href="/asx/markets/cfdPrices.do? > > by=underlyingAsxCode&underlyingCode=BHP">CFDs</a></td> > > <td class="middle"><a href=" > http://hfgapps.hubb.com/asxtools/ > > Charts.aspx? > > > TimeFrame=D6&compare=comp_index&indicies=XJO&pma1=20&pma2=20&asxCode=BHP">< > img > > src="/images/chart.gif" border="0" height="15" width="15"></a> > > </td> > > <td><a > href="/research/announcements/status_notes.htm#XD">XD</a> > > </td> > > <td><a > href="/asx/statistics/announcements.do? > > by=asxCode&asxCode=BHP&timeframe=D&period=W">Recent</a> > > </td> > > </tr>"""[/CODE] > > > You should use BeautifulSoup or perhaps regular expressions. > Or if you are not very smart, lik me, just try a brute force approach: > > >>> for i in s.split('>'): > for e in i.split(): > if '.' in e and e[0].isdigit(): > print (e) > > > 43.200 > 0.040 > 43.150 > 43.200 > 43.130 > 43.290 > 43.100 > >>> > -- > http://mail.python.org/mailman/listinfo/python-list >
-- http://mail.python.org/mailman/listinfo/python-list