Stefan Behnel <stefan...@behnel.de> wrote: > Bill Janssen, 09.12.2011 19:15: > > I think another thing that might go into "refreshing the batteries" is a > > feature comparison of BeautifulSoup and HTML5lib against the stdlib > > competition, to see what needs to be added/revised. Having to switch to > > an outside package for parsing possibly invalid HTML is a pain. > > Such a feature request should be worth a separate thread. > > Note, however, that html5lib is likely way too big to add it to the > stdlib, and that BeautifulSoup lacks a parser for non-conforming HTML > in Python 3, which would be the target release series for better HTML > support. So, whatever library or API you would want to use for HTML > processing is currently only the second question as long as Py3 lacks > a real-world HTML parser in the stdlib, as well as a robust character > detection mechanism. I don't think that can be fixed all that easily.
Sounds like it needs a PEP. I'm only advocating spending some thought on what needs to be done -- whether outside libraries need to be adopted into the stdlib would be a step after that. But understanding *why* those libraries exist and are widely used should be a prerequisite to "refreshing" the stdlib's support. Bill _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com