Hello commnity i am new to Python and to Beatiful Soup also! It is told to be a great tool to parse and extract content. So here i am...:
I want to take the content of a <td>-tag of a table in a html document. For example, i have this table <table class="bp_ergebnis_tab_info"> <tr> <td> This is a sample text </td> <td> This is the second sample text </td> </tr> </table> How can i use beautifulsoup to take the text "This is a sample text"? Should i make use soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get the whole table. See the target http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323 Well - what have we to do first: The first thing is t o find the table: i do this with Using find rather than findall returns the first item in the list (rather than returning a list of all finds - in which case we'd have to add an extra [0] to take the first element of the list): table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'}) Then use find again to find the first td: first_td = soup.find('td') Then we have to use renderContents() to extract the textual contents: text = first_td.renderContents() ... and the job is done (though we may also want to use strip() to remove leading and trailing spaces: trimmed_text = text.strip() This should give us: print trimmed_text This is a sample text as desired. What do you think about the code? I love to hear from you!? greetings matze -- http://mail.python.org/mailman/listinfo/python-list