Hello,
Some columns in a DB have badly formed HTML, to the point BeautifulSoup (lxml?)
fails:
=============
#Some records start with 0A</crap>
soup = BeautifulSoup("\n</strong>", 'lxml')
#AttributeError: 'NoneType' object has no attribute 'text'
print(soup.body.text)
=============
What would be a nice way to solve the problem?
Is there a command to remove wrong tags altogether (eg. strings that starts
with </strong>), or should I just catch the error?
Thank you.
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]