Re: 100 groups in python re limitation slows down PLY performance

Joseph S. Tate Mon, 20 Aug 2012 09:00:57 -0700

Have you tried either of the non-stdlib re replacements?  I note there
are at least two: re2 and regex.  Seems like a monkey patching wrapper
on either of those would be easier to distribute than a patched
version of python.


Besides the 100 groups limit, re also has limitations on total regex
length depending on compiled character width.

On Mon, Aug 20, 2012 at 11:45 AM, Eugene Voytitsky <[email protected]> wrote:
> Thanks, Albert.
>
> It seems that I have really only 2 options:
> 0. Live with it, or
> 1. Recompile the Python without the checking of that limitation. (And you
> are right – I should ask python community. But I decided to share info
> here.)
>
> Other options don't suite, because of a lot of work was been done and code
> already in production.
>
>
>
> On 20.08.12 18:18, A.T.Hofkamp wrote:
>>
>> On 08/17/2012 02:52 PM, viy wrote:
>>>
>>> Hi all, jfyi
>>>
>>> I've added just one token to my lexer rules and stuck in 100 groups
>>> limit in
>>> python re
>>>
>>> http://stackoverflow.com/questions/478458/python-regular-expressions-with-more-than-100-groups
>>>
>>>
>>>
>>> PLY has workaround in its code - when your master re exceeds 100
>>> groups, PLY
>>> catches AssertionError from python, splits master re into parts and
>>> retries.
>>>
>>> All works smoothly, but in my case my unit tests suite became 10x slower.
>>> Single parsing is about 1.5x slower.
>>>
>>> The solution is obvious - to get rid of the python limitation.
>>> Does anyone know the best way to do so?
>>
>>
>> Re-implement RE? :D
>> Much happiness would spread throughout the Python community, I am sure :)
>>
>>
>> Other solutions include
>>
>> 0. Live with it. Other solutions may cost more time than you are ever
>> going to save.
>>
>> 1. DIY: You can easily define your own scanner, using arbitrary Python
>> code. Just make sure you match the interface. String scanning is
>> relatively easy, it just takes a lot of code.
>>
>> 2. A long time ago (several years at least), someone wrote a Lex
>> framework. I forgot about the details, but the mailinglist archive or
>> google can probably help you. Iirc, it was a true lex, and had a
>> different approach than using RE.
>>
>> 3. More exotic solutions like writing a scanner C extension (generated
>> with lex/flex) are also possible.
>>
>> 4. Even more exotic stuff like generating a DFA somehow, and
>> implementing that in Python can be done.
>>
>> 4. Other Python parser generators may have better solutions (I somewhat
>> doubt it, but it should be easy enough to scan through it, checking for
>> how the scanner works)
>>
>>
>> Good luck
>>
>> Albert
>>
>
>
> --
> Best regards,
> Eugene Voytitsky
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ply-hack" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>
>



-- 
Joseph Tate
Personal e-mail: jtate AT dragonstrider DOT com
Web: http://www.dragonstrider.com

-- 
You received this message because you are subscribed to the Google Groups 
"ply-hack" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: 100 groups in python re limitation slows down PLY performance

Reply via email to