Yes, that looks like an even better option. I see that this is implemented in p5 as File::Map, which is a nice portable option.
Chris > On Aug 16, 2014, at 7:51 AM, "Martin D Kealey" <mar...@kurahaupo.gen.nz> > wrote: > > > Hmmm, what about just implementing mmap-as-string? > > Then, assuming the parsing process is somewhat stream-like, the OS will take > care of swapping in chunks as you need them. You don't even need anything > special to support backtracking -- it's just a memory address, after all. > > -Martin > >> On Thu, 14 Aug 2014, Fields, Christopher J wrote: >> Yeah, I'm thinking of a Cat-like class that would chunkify the data and >> check for matches. >> >> The main reason I would like to stick with a consistent grammar-based >> approach is I have seen many instances in BioPerl where a parser is >> essentially rewritten based on its purpose (full parsing, lazy parsing, >> indexing of flat files, adding to a persistent data store, etc). Having a >> way to both parse a full grammar but also subparse for a specific token/rule >> is very handy, and when Cat comes around even more so. >> >> Chris >> >> Sent from my iPad >> >>> On Aug 14, 2014, at 6:40 AM, "Carl Mäsak" <cma...@gmail.com> wrote: >>> >>> I was going to pipe in and say that I wouldn't wait around for Cat, >>> I'd write something that reads chunks and then parses that. It'll be a >>> bit more code, but it'll work today. But I see you reached that >>> conclusion already. :) >>> >>> Lately I've found myself writing more and more grammars that parse >>> just one line of some input. Provided that the same action object gets >>> attached to the parse each time, that's an excellent place to store >>> information that you want to persist between lines. Actually, action >>> objects started to make a whole lot more sense to me after I found >>> that use case, because it takes on the role of a session/lifetime >>> object for the parse process itself. >>> >>> // Carl >>> >>> On Wed, Aug 13, 2014 at 3:19 PM, Fields, Christopher J >>> <cjfie...@illinois.edu> wrote: >>>> On Aug 13, 2014, at 8:11 AM, Christopher Fields <cjfie...@illinois.edu> >>>> wrote: >>>> >>>>>> On Aug 13, 2014, at 4:50 AM, Solomon Foster <colo...@gmail.com> wrote: >>>>>> >>>>>> On Sat, Aug 9, 2014 at 7:26 PM, Fields, Christopher J >>>>>> <cjfie...@illinois.edu> wrote: >>>>>>> I have a fairly simple question regarding the feasibility of using >>>>>>> grammars with commonly used biological data formats. >>>>>>> >>>>>>> My main question: if I wanted to parse() or subparse() vary large files >>>>>>> (not unheard of to have FASTA/FASTQ or other similar data files exceed >>>>>>> 100’s of GB) would a grammar be the best solution? For instance, based >>>>>>> on what I am reading the semantics appear to be greedy; for instance: >>>>>>> >>>>>>> Grammar.parsefile($file) >>>>>>> >>>>>>> appears to be a convenient shorthand for: >>>>>>> >>>>>>> Grammar.parse($file.slurp) >>>>>>> >>>>>>> since Grammar.parse() works on a Str, not a IO::Handle or Buf. Or am I >>>>>>> misunderstanding how this could be accomplished? >>>>>> >>>>>> My understanding is it is intended that parsing can work on Cats >>>>>> (hypothetical lazy strings) but this hasn't been implemented yet >>>>>> anywhere. >>>>>> >>>>>> -- >>>>>> Solomon Foster: colo...@gmail.com >>>>>> HarmonyWare, Inc: http://www.harmonyware.com >>>>> >>>>> Yeah, that’s what I recall as well. I see very little in the specs re: >>>>> Cat unfortunately. >>>>> >>>>> chris >>>> >>>> Ah, nevermind. I did a search of the IRC channel and found it’s >>>> considered to be a ‘6.1’ feature: >>>> >>>> http://irclog.perlgeek.de/perl6/2014-07-06#i_8978974 >>>> >>>> It is mentioned a few times in the specs, I’m guessing based on where it’s >>>> thought to fit in best. For the moment the proposal is to run grammar >>>> parsing on sized chunks of the input data, which might be how Cat would be >>>> implemented anyway. >>>> >>>> chris >>>> >>