Hi Eric,
> For information, removing the "str('<%').absent?" from the :text rule and
> parsing the same string takes about 0.5 seconds.
>
> For erb like parsers, having long plain text is quite usual, and I was
> wondering what would be the most efficient way to parse it?
I am going to answer this on two levels. Both levels are implemented in
the experiment 'optimizer.rb' in master. [1]
1) Currently, what I would suggest is to capture the pattern
(str('<%').absent? >> any).repeat(1)
in a custom parser atom and use that instead of the above. This is
illustrated by 'AbsentParser' in the referenced source code and makes
the ERB parser already quite a bit faster.
In fact, there is really no limit to what custom parser atoms can do -
parslet is exceptionally easy to extend that way. Unfortunately no one
is doing it.
2) In the near future, the pattern will be to write an optimizer that
optimizes the parser ahead of time. Using the optimized parser and the
'AbsentParser' atom from solution 1) will then make parsing faster
without making the parser description more complex.
Maybe someone will even jump at the possibility of implementing a
parslet-optimizer that generalizes on the idea & implements a standard
set of optimisations.
1) is open to programmers with even the released version of parslet. 2)
will be implemented and available for the next version (or the one after
that, depending on other factors).
Please note that 2) can be implemented with almost only the public and
the semipublic API of parslet. Nothing stops the motivated programmer
from doing it him/herself.
And in the far future, we might even compile down to C... Either way:
Here's your answer. Hope it makes any sense at all.
regards,
kaspar
[1]
https://github.com/kschiess/parslet/blob/872611321c4af390c8b89e5d54b24613b4280fba/experiments/optimizer.rb