This, which is from a real web site, went into BeautifulSoup:

<param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer
fantastic rates for selected weeks or days!!&blinkt=Click here
>>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408" />

And this came out, via prettify:

<addresssnippet siteurl="http%3A//apartmentsapart.com" 
url="http%3A//www.apartmentsapart.com/Europe/Spain/Madrid/FAQ">
     <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer 
fantastic rates for selected weeks or days!!&amp;blinkt=Click here 
&gt;&gt;&gt;&amp;linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408">
 >>&linkurl;=/Europe/Spain/Madrid/Apartments/Offer/2408" />
</param>

BeautifulSoup seems to have become confused by the ">>>" within
a quoted attribute value.  It first parsed it right, but then stuck
in an extra, totally bogus line.  Note the entity "&linkurl;", which
appears nowhere in the original.  It looks like code to handle a missing
quote mark did the wrong thing.

                                John Nagle

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to