Re: [pypy-dev] HTMLParser compatibility with cPython 2.7.3

Amaury Forgeot d'Arc Mon, 18 Jun 2012 05:17:34 -0700

2012/6/18 Robert Zaremba <[email protected]>

> Hi, I would like to import changes from:
> The problem is that HTMLParser from 2.7.2 is not lenient and likes to throw
> exceptions, when html document is not well formed:
> http://bugs.python.org/issue13987
>
> This often involves exception from BeautifoulSoup, which gains great speed
> up
> when using from pypy + HTMLParser from stdlib:
>    "RuntimeWarning: Python's built-in HTMLParser cannot parse the given
> document. This is not a bug in Beautiful Soup. The best solution is to
> install
> an external parser (lxml or html5lib), and use Beautiful Soup with that
> parser. See
> http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-
> a-parser for help."
>
> However lxml is not compatibile with PyPy, and html5lib is slow.
>
> Can I port the HTMLParser.py from python 2.7.3 to PyPy?
>


In general, no, unless you port the all the rest to 2.7.3 as well.
There is already work in progress for this, in the stdlib-2.7.3 branch.

It's almost finished (and definitely worth a try),
there are some nightly builds there (only 32bit Linux for the moment):
http://buildbot.pypy.org/nightly/stdlib-2.7.3/

Still missing are the implementation of randomized hashes (not enabled by
default anyway)
and a couple of obscure bugs in the import system, probably implementation
details.

-- 
Amaury Forgeot d'Arc

_______________________________________________
pypy-dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] HTMLParser compatibility with cPython 2.7.3

Reply via email to