Thanks Edgar and Jérémie, this indeed seems to be the right track. I just
hope that a repeated use of input_char is not 10-100X slower than
input_line :o).
ph.

2012/3/16 Edgar Friendly <thelema...@gmail.com>

> So given a large file and a line number, you want to:
> 1) extract that line from the file
> 2) produce an enum of all k-length slices of that line?
> 3) match each slice against your regexp set to produce a list/enum of
> substrings that match the regexps?
> Without reading the whole line into memory at once.
>
> I'm with Dimino on the right solution - just use a matcher that that works
> incrementally, feed it one byte at a time, and have it return a list of
> match offsets.  Then work backwards from these endpoints to figure out
> which substrings you want.
>
> There shouldn't be a reason to use substrings (0,k-1) and (1,k) - it
> should suffice to use (0,k-1) and (k,2k-1) with an incremental matching
> routine.
>
> E.
>
>
>
> On Fri, Mar 16, 2012 at 10:48 AM, Philippe Veber <philippe.ve...@gmail.com
> > wrote:
>
>> Thank you Edgar for your answer (and also Christophe). It seems my
>> question was a bit misleading: actually I target a subset of regexps whose
>> matching is really trivial, so this is no worry for me. I was more
>> interested in how accessing a large line in a file by chunks of fixed
>> length k. For instance how to build a [Substring.t Enum.t] from some line
>> in a file, without building the whole line in memory. This enum would yield
>> the substrings (0,k-1), (1,k), (2,k+1), etc ... without doing too many
>> string copy/concat operations. I think I can do it myself but I'm not too
>> confident regarding good practices on buffered reads of files. Maybe there
>> are some good examples in Batteries?
>>
>> Thanks again,
>>   ph.
>>
>>
>>
>

-- 
Caml-list mailing list.  Subscription management and archives:
https://sympa-roc.inria.fr/wws/info/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Reply via email to