[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Irit Katriel
Irit Katriel added the comment: Reopening to discuss what the correct behaviour should be. -- resolution: out of date -> status: closed -> open versions: +Python 3.11 -Python 2.7, Python 3.6, Python 3.7, Python 3.8 ___ Python tracker

[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Hanno Boeck
Hanno Boeck added the comment: Now the example code raises an AssertionError(). Is that intended? I don't think that's any better. I usually wouldn't expect an HTML parser to raise any error if you pass it a string, but instead to do fault tolerant parsing. And if it's expected that some

[issue32876] HTMLParser raises exception on some inputs

2022-01-14 Thread Irit Katriel
Irit Katriel added the comment: The error() method was removed in issue31844. -- resolution: -> out of date stage: patch review -> resolved status: open -> closed superseder: -> HTMLParser: undocumented not implemented method ___ Python tracker

[issue32876] HTMLParser raises exception on some inputs

2021-09-09 Thread Irit Katriel

Irit Katriel  added the comment:

I get a different error now:

>>> import html.parser
>>> html.parser.HTMLParser().feed("", line 1, in 
  File "/Users/iritkatriel/src/cpython-1/Lib/html/parser.py", 

[issue32876] HTMLParser raises exception on some inputs

2018-09-14 Thread Ezio Melotti

Ezio Melotti  added the comment:

There are at least a couple of issues here.

The first one is the way the parser handles '' and since the parser currently checks 
for '

[issue32876] HTMLParser raises exception on some inputs

2018-09-14 Thread Ezio Melotti
Change by Ezio Melotti : -- keywords: +patch pull_requests: +8724 stage: -> patch review ___ Python tracker ___ ___

[issue32876] HTMLParser raises exception on some inputs

2018-08-23 Thread Berker Peksag
Berker Peksag added the comment: Issue 34480 is another relevant issue. The HTMLParse method doesn't have an error() method and it doesn't raise any exceptions, but its base class still does. I think there is a compatibility problem between html.parser.HTMLParser() and

[issue32876] HTMLParser raises exception on some inputs

2018-02-25 Thread Ezio Melotti
Change by Ezio Melotti : -- assignee: -> ezio.melotti ___ Python tracker ___ ___

[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Ezio Melotti
Ezio Melotti added the comment: The HTMLParser has been updated to handle HTML5 and should never fail parsing a document, so if it raises an error it's probably a bug. -- ___ Python tracker

[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Hanno Boeck
Hanno Boeck added the comment: Actually BeautifulSoup also uses the python html parser in the backend, so it has the same problem. (It can use alternative backends, but the python parser is the default and they also describe it as "lenient", which I would interpret as "it

[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Steven D'Aprano
Steven D'Aprano added the comment: The stdlib HTML parser requires correct HTML. To parse broken HTML, as you find in the real world, you need a third-party library like BeautifulSoup. BeautifulSoup is much more complex (about 7-8 times as many LOC) but can handle

[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Serhiy Storchaka
Change by Serhiy Storchaka : -- nosy: +ezio.melotti ___ Python tracker ___ ___

[issue32876] HTMLParser raises exception on some inputs

2018-02-19 Thread Hanno Boeck
New submission from Hanno Boeck :

I noticed that the HTMLParser will raise an exception on some inputs.
I'm not sure what the expectations here are, but given that real-world HTML 
often contains all kinds of broken content I would assume an HTMLParser to 
always try to