[ruby.parslet] Re: parsing large input

Kaspar Schiess Wed, 23 Nov 2011 00:47:22 -0800

> I'm having an issue now with input that is very large, specifically a
> document has a section with 5000+ subsections. The total input has
> over 500k lines!
>
> My problem is that parslet consumes all the RAM/CPU available  [...]
>
> I'm assuming that parslet is holding on to information so that it can
> backtrack when it doesn't match a rule. [...]


Hei Melissa,

I was kind of wondering when the first request like this would come 
along. Parslet currently will run into trouble with large inputs. A part 
of that is by design: Returning one large return value after a lengthy 
parsing process means that it has to hold on to it for the whole time.

But let's make sure that we've really ran into this limitation. Is it 
possible to provide us with your parser for analysis? Can you be more 
specific (how much RAM, for instance)? How big is your document really? 
And is it really that much of structured information that cannot be 
parsed with a simpler approach, like regexps for instance? (the right 
tool for the right job...) If you convert the parslet parser into a 
treetop parser, does that one manage to complete?

That it would consume 'all your CPU' is quite natural - that's called 
work ;)

Also, you are hinting that your document is really just a sequence of 
sections that you can easily split off: That leaves me with the 
impression that there is really a top layer of meaning in your document 
that can be extracted using a simple str.split. If that solves your 
trouble, why look further?

regards,
kaspar

[ruby.parslet] Re: parsing large input

Reply via email to