On 11.12.2010 22:38, Stef Mientki wrote:
On 11-12-2010 17:24, Martin Kaspar wrote:
Hello commnity
i am new to Python and to Beatiful Soup also!
It is told to be a great tool to parse and extract content. So here i
am...:
I want to take the content of a<td>-tag of a table in a html
document. For example, i have this table
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
How can i use beautifulsoup to take the text "This is a sample text"?
Should i make use
soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get
the whole table.
See the target
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323
Well - what have we to do first:
The first thing is t o find the table:
i do this with Using find rather than findall returns the first item
in the list
(rather than returning a list of all finds - in which case we'd have
to add an extra [0]
to take the first element of the list):
table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'})
Then use find again to find the first td:
first_td = soup.find('td')
Then we have to use renderContents() to extract the textual contents:
text = first_td.renderContents()
... and the job is done (though we may also want to use strip() to
remove leading and trailing spaces:
trimmed_text = text.strip()
This should give us:
print trimmed_text
This is a sample text
as desired.
What do you think about the code? I love to hear from you!?
I've no opinion.
I'm just struggling with BeautifulSoup myself, finding it one of the toughest
libs I've seen ;-)
Really? While I'm by no means an expert, I find it very easy to work
with. It's very well structured IMHO.
So the simplest solution I came up with:
Text = """
<table class="bp_ergebnis_tab_info">
<tr>
<td>
This is a sample text
</td>
<td>
This is the second sample text
</td>
</tr>
</table>
"""
Content = BeautifulSoup ( Text )
print Content.find('td').contents[0].strip()
This is a sample text
And now I wonder how to get the next contents !!
Content = BeautifulSoup ( Text )
for td in Content.findAll('td'):
print td.string.strip() # or td.renderContents().strip()
--
http://mail.python.org/mailman/listinfo/python-list