That helps to make things clearer, I think. One issue is
the nature of Maps (strict in keys, non-strict in values).

- neither singleton nor unionWith are strict in the Map values, so
   nothing here forces the evaluation of rate or construction    of UArr

But, as I have written, in one of my tests I also tried rnf to force evaluation:
rnf v `seq` rnf m `seq` return m
Isn't this sufficient?

It will force the Map that results from the repeated unioning,
but does not ensure that this is done in an efficient way.

A standard trick to keep Map values evaluated by construction
is to make the availability of keys dependent on their values, eg
(singleton key) $! value. That won't help with unionWith and
the appendUs, but it should allow the source string references
to be dropped early, as the singletons are constructed.

Tried; but, even using union instead of unionWith, the memory grows fast as before.

Strange. I built myself a small wrapper to make your code
fragment compilable, and just replacing (unionWith appendU)
with (union) makes a drastic difference - as it should.

It is rather annoying that Data.IntMap doesn't provide a strict
form of unionWith or insertWith (Data.Map does at least provide
insertWith'). But we can define our own, at the cost of an extra
lookup. We can then foldl' that insertWith' directly over the ratings list, bypassing the non-strict parts of the Data.IntMap API (see
code below).

Claus (who still thinks that all Maps should be parameterized
   over their key-value pair type constructor, so that the default
   non-strict Maps would result from using non-strict pairs

   type IntMap = IntMapP (,)

while the often desired element-strict Maps would result from using strict pairs, with no other change in API

   type IntMapStrict = IntMapP (:*:)
   )

-------------------------------------------------------
{-# LANGUAGE TypeOperators #-}
import qualified Data.ByteString.Lazy as L
import Data.Array.Vector
import qualified Data.IntMap as IM
import Data.List
import Data.Word
import qualified Data.ByteString.Lazy.Char8 as L8
import Data.Maybe
import System.IO

-- don't use this for real, test wrapper only
ratings :: L.ByteString -> [(Word32,Word8)]
ratings = map (\[i,r]->(fromIntegral $ fst $ fromJust $ L8.readInt i
                      ,fromIntegral $ fst $ fromJust $ L8.readInt r))
       . map L8.words . L8.lines

parse handle = do
 contents <- L.hGetContents handle
 let v =  map singleton' $ ratings contents
 let m = foldl' (\m (kw,v)->insertWith' appendU (fromIntegral kw,v) m) IM.empty 
v
 -- let m = foldl1' (IM.unionWith appendU) v
 -- let m = foldl1' (IM.union) v
 return $! m

 where
   -- Build a Map with a single movie rating
   singleton' :: (Word32, Word8) -> (Int,UArr Rating)
   singleton' (id, rate) =
     ((fromIntegral $ id), (singletonU $ pairS (id, rate)))
     -- (IM.singleton (fromIntegral $ id)) $ (singletonU $ pairS (id, rate))

insertWith' op (k,v) m =
 maybe (IM.insert k v m)
       (\old->((IM.insert k) $! (v `op` old)) m)
       (IM.lookup k m)

type Rating = Word32 :*: Word8
type MovieRatings = IM.IntMap (UArr Rating) -- UArr from uvector

-- more test wrapper, some trivial input data
generate = withFile "in.data" WriteMode $ \h->
mapM_ (\(i,r)->hPutStrLn h $ show i++" "++show r) $ take 1000000 $ cycle [(i,i)|i<-[0..100]]

main = withFile "in.data" ReadMode parse >>= print

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to