Re: [Haskell-cafe] newbie question about list performance

Jules Bean Sun, 28 Oct 2007 23:08:53 -0800

John Lato wrote:

I'm working
with moderate-sized files (tens to hundreds of MBs) that have some
ascii header data followed by a bunch of 32-bit ints.

but I don't know if [Int32] is actually the best choice.  It seems to me
that something like a lazy list of strict arrays (analogous to a lazy

bytestring) would be better.

Depends on your data access pattern. If you access the words strictlylinearly, from the beginning of the file to the end, and that's all,then [Int32] is absolutely fine. A list is a data-structure equivalentof a for loop; it's the correct structure if you are dealing with thingslinearly or nearly-linearly. If you were using adjacent words together,that would be fine too (as in, e.g., zip xs (tail xs)).

If your data access pattern is more scattered or random-access in style,then [Int32] does not scale well to 10s of MBs. If you keep the dataaround, the overhead for [] is inappropriate (around 600-800% memoryusage overhead on [Int32]) and its performance guarantees are not goodeither, for random access. In this case, as a first approximation, Iwould be inclined to try a library which simple backended onto lazybytestring. For example the 'index' operation to fetch a single wordwould fetch four bytes and bit-twiddle them into a word. If that doesn'tgive the high speed you're after, then perhaps something *like* LBS,i.e. foreignptr behind the scenes, but directly accessing word-at-a-time.


Jules
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] newbie question about list performance

Reply via email to