[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-05-14 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: What I described in my previous message is what Firefox does. If you think this should be changed, I suggest you to open another issue, possibly attaching a test case with the desired behavior and a patch to change it. --

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-21 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: No. As the value of the href attribute is not suppose to contain spaces, I'd rather expect the parser to assume that there is an ending missing before the space. -- ___ Python tracker

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: The first case has been fixed already in 1cbfeffea19f, the second case is not even handled by browsers, so I'm closing this. -- resolution: - fixed stage: - committed/rejected status: open - closed

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: Great! With one but... the second case *is* handled by browsers. Browsers do not throw an exception on it as HTMLParser do. So improvement is definitely possible here. If it is worth an effort, it is not for me to judge. --

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-14 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: So you are suggesting that a href=http://xxx.org/xxx.php?a=1 target=_blankclick me/a should result in an 'a' element with an href attribute equals to http://xxx.org/xxx.php?a=1 target= and then discard _blank as extra data? --

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2011-04-05 Thread Ezio Melotti
Changes by Ezio Melotti ezio.melo...@gmail.com: -- versions: +Python 3.2, Python 3.3 -Python 2.6 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue6191 ___

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-06 Thread Ezio Melotti
Ezio Melotti ezio.melo...@gmail.com added the comment: BeautifulSoup use SGMLParser for all the versions 3.1. BeautifulSoup 3.1 is supposed to be compatible with Python 3 and since SGMLParser is gone it's now using HTMLParser, but it's not able to handle some things anymore. For more

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
New submission from Paweł Widera mo...@man.poznan.pl: Of course both are not correct HTML but are easy to guess, so I believe the parser should not give up too quick here. 1) extra comma between attributes form action=/xxx.php?a=1amp;b=2amp, method=post 2) missing closing quotation mark for

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: I do not think HTMLParser should guess. Guessing always opens the door to misinterpretation. -- nosy: +georg.brandl resolution: - wont fix status: open - closed ___ Python tracker

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Paweł Widera
Paweł Widera mo...@man.poznan.pl added the comment: It depends whether you want a HTMLParser to be an useful tool that can deal with real world HTML or just a toy without practical meaning. Crashing on every little deviation from the standard, where more relaxed approach is possible, doesn't

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: Throwing an exception and giving up is just not good enough. Yes it is, in some cases. There are forgiving HTML parsers out there, HTMLParser does not strive to be one. There are *so many* cases where HTML is a bit malformed that it takes more

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread R. David Murray
R. David Murray rdmur...@bitdance.com added the comment: In doing web scraping I started using BeautifulSoup precisely because it was very lenient in what html it accepted (I haven't written such an ap for a while, so I'm not sure what BeautifulSoup currently does...I thought I heard it was now

[issue6191] HTMLParser attribute parsing - 2 test cases when it fails

2009-06-04 Thread Georg Brandl
Georg Brandl ge...@python.org added the comment: So BeautifulSoup is using HTMLParser? That is interesting, because they claim to support broken HTML. In any case, if a quirky mode is added, it should have to be turned on explicitly by a flag. -- resolution: wont fix -