Re: BeautifulSoup and Problem Tables

Peter Pearson Sun, 21 Sep 2008 10:41:39 -0700

On Sat, 20 Sep 2008 20:51:52 -0700 (PDT), [EMAIL PROTECTED] wrote:
[snip]
> from BeautifulSoup import BeautifulSoup
> bst=file(r"c:\bstest.htm").read()
> soup=BeautifulSoup(bst)
> rows=soup.findAll('tr')
> len(rows)
> a=len(rows[0].findAll('td'))
> b=len(rows[1].findAll('td'))
> c=len(rows[2].findAll('td'))
> d=len(rows[3].findAll('td'))
> e=len(rows[4].findAll('td'))
> f=len(rows[5].findAll('td'))
> g=len(rows[6].findAll('td'))
> h=len(rows[8].findAll('td'))
> i=len(rows[9].findAll('td'))
> j=len(rows[10].findAll('td'))
> k=rows[1].findAll('td')[1].contents[0]
[snip]
> However, I discovered that my tables have inconsistent numbers of
> rows.  
[snip]
> I have been Googling for some insight into this and I have not been
> successful finding anything. I would really appreciate any suggestions
> or some direction about how to better describe the problem.


Would it be accurate to describe the problem as wanting to
extract the contents of the cth column of the rth row of a
table in spite of various pathologies in the construction of
the table?

If so, maybe it would help to post sample HTML (trimmed to a
minimum) of the pathologies that must be handled.  I gotta
confess, though, that it doesn't take many rowspans or colspans
to put this problem beyond my reach.

-- 
To email me, substitute nowhere->spamcop, invalid->net.
--
http://mail.python.org/mailman/listinfo/python-list

Re: BeautifulSoup and Problem Tables

Reply via email to