John Lato wrote:
I'm working
with moderate-sized files (tens to hundreds of MBs) that have some
ascii header data followed by a bunch of 32-bit ints.

but I don't know if [Int32] is actually the best choice.  It seems to me
that something like a lazy list of strict arrays (analogous to a lazy
bytestring) would be better.

Depends on your data access pattern. If you access the words strictly linearly, from the beginning of the file to the end, and that's all, then [Int32] is absolutely fine. A list is a data-structure equivalent of a for loop; it's the correct structure if you are dealing with things linearly or nearly-linearly. If you were using adjacent words together, that would be fine too (as in, e.g., zip xs (tail xs)).

If your data access pattern is more scattered or random-access in style, then [Int32] does not scale well to 10s of MBs. If you keep the data around, the overhead for [] is inappropriate (around 600-800% memory usage overhead on [Int32]) and its performance guarantees are not good either, for random access. In this case, as a first approximation, I would be inclined to try a library which simple backended onto lazy bytestring. For example the 'index' operation to fetch a single word would fetch four bytes and bit-twiddle them into a word. If that doesn't give the high speed you're after, then perhaps something *like* LBS, i.e. foreignptr behind the scenes, but directly accessing word-at-a-time.

Jules
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to