Paweł Widera mo...@man.poznan.pl added the comment:
No. As the value of the href attribute is not suppose to contain spaces, I'd
rather expect the parser to assume that there is an ending missing before the
space.
--
___
Python tracker rep
Paweł Widera mo...@man.poznan.pl added the comment:
Great! With one but... the second case *is* handled by browsers. Browsers do
not throw an exception on it as HTMLParser do. So improvement is definitely
possible here. If it is worth an effort, it is not for me to judge
Changes by Paweł Widera mo...@man.poznan.pl:
--
nosy: +momat
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue670664
___
___
Python-bugs-list mailing
Paweł Widera mo...@man.poznan.pl added the comment:
A simple workaround for the BeautifulSoup is the following wrapper. It
sanitize the javascript code before passing it to the parser by joining
the disjoint strings, so that /scr+ipt becomes /script.
def bs(input):
pattern = re.compile
New submission from Paweł Widera mo...@man.poznan.pl:
Of course both are not correct HTML but are easy to guess, so I believe
the parser should not give up too quick here.
1) extra comma between attributes
form action=/xxx.php?a=1amp;b=2amp, method=post
2) missing closing quotation mark
Paweł Widera mo...@man.poznan.pl added the comment:
It depends whether you want a HTMLParser to be an useful tool that can
deal with real world HTML or just a toy without practical meaning.
Crashing on every little deviation from the standard, where more relaxed
approach is possible, doesn't