[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-21 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: No. As the value of the href attribute is not suppose to contain spaces, I'd rather expect the parser to assume that there is an ending missing before the space. -- ___ Python tracker rep

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: Great! With one but... the second case *is* handled by browsers. Browsers do not throw an exception on it as HTMLParser do. So improvement is definitely possible here. If it is worth an effort, it is not for me to judge

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-04 Thread Paweł Widera
Changes by Paweł Widera mo...@man.poznan.pl: -- nosy: +momat ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue670664 ___ ___ Python-bugs-list mailing

[issue670664] HTMLParser.py - more robust SCRIPT tag parsing

2009-06-04 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: A simple workaround for the BeautifulSoup is the following wrapper. It sanitize the javascript code before passing it to the parser by joining the disjoint strings, so that /scr+ipt becomes /script. def bs(input): pattern = re.compile

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
New submission from Paweł Widera mo...@man.poznan.pl: Of course both are not correct HTML but are easy to guess, so I believe the parser should not give up too quick here. 1) extra comma between attributes form action=/xxx.php?a=1amp;b=2amp, method=post 2) missing closing quotation mark

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: It depends whether you want a HTMLParser to be an useful tool that can deal with real world HTML or just a toy without practical meaning. Crashing on every little deviation from the standard, where more relaxed approach is possible, doesn't