Jim Jewett <jimjjew...@gmail.com> added the comment:

It sounds like this is a case where the docs should mention an external 
library; perhaps something like changing the intro of 
http://docs.python.org/dev/library/html.parser.html from:

"""
19.2. html.parser — Simple HTML and XHTML parser
Source code: Lib/html/parser.py

This module defines a class HTMLParser which serves as the basis for parsing 
text files formatted in HTML (HyperText Mark-up Language) and XHTML.
"""

to:


"""
19.2. html.parser — Simple HTML and XHTML parser
Source code: Lib/html/parser.py

This module defines a class HTMLParser which serves as the basis for parsing 
text files formatted in HTML (HyperText Mark-up Language) and XHTML.  

Note that mainstream web browsers also attempt to repair invalid markup; the 
algorithms for this can be quite complex, and are evolving too quickly for the 
Python release cycle.  Applications handling arbitrary web pages should 
consider using 3rd-party modules.  The python version of html5lib ( 
http://code.google.com/p/html5lib/ ) is being developed in parallel with the 
HTML standard itself, and serves as a reference implementation.
"""

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue14538>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to