Thanks Edgar and Jérémie, this indeed seems to be the right track. I just hope that a repeated use of input_char is not 10-100X slower than input_line :o). ph.
2012/3/16 Edgar Friendly <thelema...@gmail.com> > So given a large file and a line number, you want to: > 1) extract that line from the file > 2) produce an enum of all k-length slices of that line? > 3) match each slice against your regexp set to produce a list/enum of > substrings that match the regexps? > Without reading the whole line into memory at once. > > I'm with Dimino on the right solution - just use a matcher that that works > incrementally, feed it one byte at a time, and have it return a list of > match offsets. Then work backwards from these endpoints to figure out > which substrings you want. > > There shouldn't be a reason to use substrings (0,k-1) and (1,k) - it > should suffice to use (0,k-1) and (k,2k-1) with an incremental matching > routine. > > E. > > > > On Fri, Mar 16, 2012 at 10:48 AM, Philippe Veber <philippe.ve...@gmail.com > > wrote: > >> Thank you Edgar for your answer (and also Christophe). It seems my >> question was a bit misleading: actually I target a subset of regexps whose >> matching is really trivial, so this is no worry for me. I was more >> interested in how accessing a large line in a file by chunks of fixed >> length k. For instance how to build a [Substring.t Enum.t] from some line >> in a file, without building the whole line in memory. This enum would yield >> the substrings (0,k-1), (1,k), (2,k+1), etc ... without doing too many >> string copy/concat operations. I think I can do it myself but I'm not too >> confident regarding good practices on buffered reads of files. Maybe there >> are some good examples in Batteries? >> >> Thanks again, >> ph. >> >> >> > -- Caml-list mailing list. Subscription management and archives: https://sympa-roc.inria.fr/wws/info/caml-list Beginner's list: http://groups.yahoo.com/group/ocaml_beginners Bug reports: http://caml.inria.fr/bin/caml-bugs