" found in attribute value

Duncan Booth Wed, 27 Dec 2006 10:42:25 -0800

John Nagle <[EMAIL PROTECTED]> wrote:

>     It's worse than that.  Look at the last line of BeautifulSoup
>     output: 
> 
>      &linkurl;=/Europe/Spain/Madrid/Apartments/Offer/2408" />
> 
> That "/>" doesn't match anything.  We're outside a tag at that point.
> And it was introduced by BeautifulSoup.  That's both wrong and
> puzzling; given that this was created from a parse tree, that type
> of error shouldn't ever happen.  This looks like the parser didn't
> delete a string item after deciding it was actually part of a tag.


The /> was in the original input that you gave it:

<param name="movie" value="/images/offersBanners/sw04.swf?binfot=We
offer fantastic rates for selected weeks or days!!&blinkt=Click here
>>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408" />

You don't actually *have* to escape > when it appears in html.

As I said before, it looks like BeautifulSoup decided that the tag ended
at the first > although it took text beyond that up to the closing " as
the value of the attribute. The remaining text was then simply treated
as text content of the unclosed param tag. Finally it inserted a
</param> to close the unclosed param tag. 

... some time later ...

Ok, it looks like I was wrong and this is a bug in BeautifulSoup: it
seems that it *is* legal to have an unescaped > in an attribute value,
although it should (not must) be escaped: 

>From the HTML 4.01 spec:
> Similarly, authors should use "&gt;" (ASCII decimal 62) in text
> instead of ">" to avoid problems with older user agents that
> incorrectly perceive this as the end of a tag (tag close delimiter)
> when it appears in quoted attribute values. 

Thank you, it looks like I just learned something new.

Mind you, the sentence before that says 'should' for quoting < characters 
which is just plain silly.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: BeautifulSoup bug when ">>>" found in attribute value

Reply via email to