Jefferson Heard write: > I'm using the Data.AltBinary package to read in a list of 4.8 million > floats and 1.6 million ints. Doing so caused the memory footprint to > blow up to more than 2gb, which on my laptop simply causes the program > to crash. I can do it on my workstation, but I'd really rather not, > because I want my program to be fairly portable. > > The file that I wrote out in packing the data structure was only 28MB, > so I assume I'm just using the wrong data structure, or I'm using full > laziness somewhere I shouldn't be.
Here's a quick example of how to efficient read and write such a structure to disk, compressing and decompressing on the fly. $ time ./A Wrote 4800000 floats, and 1600000 ints Read 4800000 floats, and 1600000 ints ./A 0.93s user 0.06s system 89% cpu 1.106 total It uses Data.Binary to provide quick serialisation, and the zlib library to compress the resulting stream. It builds the tables in memory, writes and compresses the result to disk, reads them back in, and checks we read the right amount of CFloats and CInts. You'd then pass the CFloats over to your C library that needs them. Compressing with zlib is a flourish, but cheap and simple, so we may as well do it. With zlib and Data.Binary, the core code just becomes: encodeFile "/tmp/table.gz" table table' <- decodeFile "/tmp/table.gz" Which transparently streams the data through zlib, and onto the disk, and back. Simple and efficient. -- Don Here's the code: -- some imports import Text.Printf import Data.Binary import Codec.Compression.Zlib import Control.Monad import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Base as B import Foreign import Foreign.C.Types -- A simple table type for the arrays, that will be easy to manipulate data Table = Table { floats :: L.ByteString , ints :: L.ByteString } -- Serialise this data in gzip-compressed form: a gzipping Binary instance. instance Binary Table where put t = do put (compress (floats t)) put (compress (ints t)) get = liftM2 Table (decompress `liftM` get) (decompress `liftM` get) -- Default sizes floatSize = 4800000 intSize = 1600000 -- Build a new empty table newTable :: IO Table newTable = do f <- mallocArray floatSize :: IO (Ptr CFloat) i <- mallocArray intSize :: IO (Ptr CInt) -- fill them with data here -- -- convert to bytestrings. bf <- B.packCStringFinalizer (castPtr f) (floatSize * sizeOf (undefined :: CFloat)) (return ()) bi <- B.packCStringFinalizer (castPtr i) (intSize * sizeOf (undefined :: CInt)) (return ()) return $ Table (L.fromChunks [bf]) (L.fromChunks [bi]) -- Now just build the table, serialise it, read it back in, and print the result main = do table <- newTable -- write the data to disk, compressed with gzip as we go. encodeFile "/tmp/table.gz" table draw "Wrote" table -- load it back in, decompressing on the fly table' <- decodeFile "/tmp/table.gz" draw "Read" table' -- now use 'floats' as a Ptr to pass back to C. where draw s v = printf "%s %d floats, and %d ints\n" s (fromIntegral (lengthFloats v) `div` sizeOf (undefined :: CFloat)) (fromIntegral (lengthInts v ) `div` sizeOf (undefined :: CInt)) lengthFloats = L.length . floats lengthInts = L.length . ints _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe