Re: Clean "Durty" strings

irstas Mon, 02 Apr 2007 12:26:05 -0700

On Apr 2, 10:08 pm, Michael Hoffman <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > But it could be that he just wants all HTML tags to disappear, like in
> > his example. A code like this might be sufficient then: re.sub(r'<[^>]
> > +>', '', s).
>
> Won't work for, say, this:
>
> <img src="src" alt="<text>">
> --
> Michael Hoffman


True, but is that legal? I think the alt attribute needs to use &lt;
and &gt;. Although I know what you're going to reply. That
BeautifulSoup probably parses it even if it's invalid HTML. And I'd
say that I agree, using BeautifulSoup is a better solution than custom
regexps.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Clean "Durty" strings

Reply via email to