Alec Taylor, 12.09.2011 10:33:
from creole import html2creole

from BeautifulSoup import BeautifulSoup

VALID_TAGS = ['strong', 'em', 'p', 'ul', 'li', 'br', 'b', 'i', 'a', 'h1', 'h2']

def sanitize_html(value):

    soup = BeautifulSoup(value)

    for tag in soup.findAll(True):
        if tag.name not in VALID_TAGS:
            tag.hidden = True

    return soup.renderContents()
html2creole(u(sanitize_html('''<h1
style="margin-left:76.8px;margin-right:0;text-indent:0;">Abstract</h1>
    <p class="Standard"
style="margin-left:76.8px;margin-right:0;text-indent:0;">
[more stuff here]
"""))

Hi,

I'm not sure what you are trying to say with the above code, but if it's the code that fails for you with the exception you posted, I would guess that the problem is in the "[more stuff here]" part, which likely contains a non-ASCII character. Note that you didn't declare the source file encoding above. Do as Gary told you.

Stefan

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to