Claus Reinke ha scritto:
At first guess it sounds like you're holding onto too much, if not the
whole stream perhaps bits within each chunk.
It is possible.
I split the string in lines, then map some functions on each line to
parse the data, and finally calling toU, for converting to an UArr.
Just to make sure (code fragments or, better, reduced examples
would make it easier to see what the discussion is about): are you
forcing the UArr to be constructed before putting it into the Map?
parse handle =
contents <- S.hGetContents handle
let v = map singleton' $ ratings contents
let m = foldl1' (unionWith appendU) v
v `seq` return $! m
where
-- Build a Map with a single movie rating
singleton' :: (Word32, Word8) -> MovieRatings
singleton' (id, rate) =
singleton (fromIntegral $ id) (singletonU $ pairS (id, rate))
This function gets called over each file, with
r <- mapM parse' [1..17770]
let movieRatings = foldl1' (unionWith appendU) r
The `ratings` function parse each line of the file, and return a tuple.
For each line of the file I build an IntMap, then merge them together;
The IntMaps, are then further merged in the main function.
NOTE that the memory usage is the same if I remove array concatenation.
There are 100,000,000 ratings, so I create 100,000,000 arrays containing
only one element.
However, memory usage is 1 GB just after 800 files.
The data type is:
type Rating = Word32 :*: Word8
type MovieRatings = IntMap (UArr Rating) -- UArr from uvector
Code is here: http://haskell.mperillo.ath.cx/netflix-0.0.1.tar.gz
but it is an old version (where I used lazy ByteString).
Thanks Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe