In a message of Mon, 25 Aug 2014 03:20:55 -0400, Mike Kaplinskiy writes: >Hey folks, > >One of the projects I'm working on in CPython is becoming a little CPU >bound and I was hoping to use pypy. One problem though - one of the pieces >uses the regex library (which claims to be CPython's re-next). Running >regex through cpyext works, but is deadly slow. > >>From reading the docs it seems like I have a few options: > - rewrite all of regex in Python - seems like a bad idea > - rewrite regex to be non-python specific & use cppyy or cffi to interface >with it. I actually looked into this & unfortunately the CPython API seems >quite deep in there. > - get rid of the dependency somehow. What I'm missing are named lists >(basically "L<a>", a=["1","2"] will match 1 or 2). Unfortunately creating >one really long re string is out of the question - I have not seen >compile() finish with that approach. Writing a custom DFA could be on the >table, but I was hoping to avoid that error prone step. > - somehow factor out the part using regex and keep using CPython for it. > - add the missing functionality to pypy's re. This seems like the path of >least resistance. > >I've started looking into the sre module and it looks like quite a few bits >(parsing & compiling to byte code mostly) are reused from CPython. I would >have to change some of those bits. My question is then - is there any hope >of getting these changes upstream then? Do stdlib pieces have a "no touch" >policy? > >Thanks, >Mike.
Do you know about https://pypi.python.org/pypi/regex If I were you, I would try to get the behaviour you want put into the new replacement version -- which would, of course, be easiest if you contributed the code. Then we can see about having pypy do the same ... Laura _______________________________________________ pypy-dev mailing list pypy-dev@python.org https://mail.python.org/mailman/listinfo/pypy-dev