John Nagle <[EMAIL PROTECTED]> wrote: > It's worse than that. Look at the last line of BeautifulSoup > output: > > &linkurl;=/Europe/Spain/Madrid/Apartments/Offer/2408" /> > > That "/>" doesn't match anything. We're outside a tag at that point. > And it was introduced by BeautifulSoup. That's both wrong and > puzzling; given that this was created from a parse tree, that type > of error shouldn't ever happen. This looks like the parser didn't > delete a string item after deciding it was actually part of a tag.
The /> was in the original input that you gave it: <param name="movie" value="/images/offersBanners/sw04.swf?binfot=We offer fantastic rates for selected weeks or days!!&blinkt=Click here >>>&linkurl=/Europe/Spain/Madrid/Apartments/Offer/2408" /> You don't actually *have* to escape > when it appears in html. As I said before, it looks like BeautifulSoup decided that the tag ended at the first > although it took text beyond that up to the closing " as the value of the attribute. The remaining text was then simply treated as text content of the unclosed param tag. Finally it inserted a </param> to close the unclosed param tag. ... some time later ... Ok, it looks like I was wrong and this is a bug in BeautifulSoup: it seems that it *is* legal to have an unescaped > in an attribute value, although it should (not must) be escaped: >From the HTML 4.01 spec: > Similarly, authors should use ">" (ASCII decimal 62) in text > instead of ">" to avoid problems with older user agents that > incorrectly perceive this as the end of a tag (tag close delimiter) > when it appears in quoted attribute values. Thank you, it looks like I just learned something new. Mind you, the sentence before that says 'should' for quoting < characters which is just plain silly. -- http://mail.python.org/mailman/listinfo/python-list