Hmmm, what about just implementing mmap-as-string? Then, assuming the parsing process is somewhat stream-like, the OS will take care of swapping in chunks as you need them. You don't even need anything special to support backtracking -- it's just a memory address, after all.
-Martin On Thu, 14 Aug 2014, Fields, Christopher J wrote: > Yeah, I'm thinking of a Cat-like class that would chunkify the data and check > for matches. > > The main reason I would like to stick with a consistent grammar-based > approach is I have seen many instances in BioPerl where a parser is > essentially rewritten based on its purpose (full parsing, lazy parsing, > indexing of flat files, adding to a persistent data store, etc). Having a > way to both parse a full grammar but also subparse for a specific token/rule > is very handy, and when Cat comes around even more so. > > Chris > > Sent from my iPad > > > On Aug 14, 2014, at 6:40 AM, "Carl Mäsak" <cma...@gmail.com> wrote: > > > > I was going to pipe in and say that I wouldn't wait around for Cat, > > I'd write something that reads chunks and then parses that. It'll be a > > bit more code, but it'll work today. But I see you reached that > > conclusion already. :) > > > > Lately I've found myself writing more and more grammars that parse > > just one line of some input. Provided that the same action object gets > > attached to the parse each time, that's an excellent place to store > > information that you want to persist between lines. Actually, action > > objects started to make a whole lot more sense to me after I found > > that use case, because it takes on the role of a session/lifetime > > object for the parse process itself. > > > > // Carl > > > > On Wed, Aug 13, 2014 at 3:19 PM, Fields, Christopher J > > <cjfie...@illinois.edu> wrote: > >> On Aug 13, 2014, at 8:11 AM, Christopher Fields <cjfie...@illinois.edu> > >> wrote: > >> > >>>> On Aug 13, 2014, at 4:50 AM, Solomon Foster <colo...@gmail.com> wrote: > >>>> > >>>> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J > >>>> <cjfie...@illinois.edu> wrote: > >>>>> I have a fairly simple question regarding the feasibility of using > >>>>> grammars with commonly used biological data formats. > >>>>> > >>>>> My main question: if I wanted to parse() or subparse() vary large files > >>>>> (not unheard of to have FASTA/FASTQ or other similar data files exceed > >>>>> 100’s of GB) would a grammar be the best solution? For instance, based > >>>>> on what I am reading the semantics appear to be greedy; for instance: > >>>>> > >>>>> Grammar.parsefile($file) > >>>>> > >>>>> appears to be a convenient shorthand for: > >>>>> > >>>>> Grammar.parse($file.slurp) > >>>>> > >>>>> since Grammar.parse() works on a Str, not a IO::Handle or Buf. Or am I > >>>>> misunderstanding how this could be accomplished? > >>>> > >>>> My understanding is it is intended that parsing can work on Cats > >>>> (hypothetical lazy strings) but this hasn't been implemented yet > >>>> anywhere. > >>>> > >>>> -- > >>>> Solomon Foster: colo...@gmail.com > >>>> HarmonyWare, Inc: http://www.harmonyware.com > >>> > >>> Yeah, that’s what I recall as well. I see very little in the specs re: > >>> Cat unfortunately. > >>> > >>> chris > >> > >> Ah, nevermind. I did a search of the IRC channel and found it’s > >> considered to be a ‘6.1’ feature: > >> > >> http://irclog.perlgeek.de/perl6/2014-07-06#i_8978974 > >> > >> It is mentioned a few times in the specs, I’m guessing based on where it’s > >> thought to fit in best. For the moment the proposal is to run grammar > >> parsing on sized chunks of the input data, which might be how Cat would be > >> implemented anyway. > >> > >> chris > >> >