Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Manlio Perillo Sun, 01 Mar 2009 06:03:41 -0800

Kenneth Hoste ha scritto:

Hello,
I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.

I kind of have an algorithm in mind that I want to implement using Haskell,
but up until now, the main issue has been to find a way to efficientlyrepresent
the data...
For people who are not familiar with the Netflix data, in short, itconsist ofroughly 100M (1e8) user ratings (1-5, integer) for 17,770 differentmovies, coming from
480,109 different users.


Hi Kenneth.

I have written a simple program that parses the Netflix training dataset, using this data structure:


type MovieRatings = IntMap (UArr Word32, UArr Word8)

The ratings are grouped by movies.

The parsing is done in:
real    8m32.476s
user    3m5.276s
sys     0m8.681s

On a DELL Inspiron 6400 notebook,
Intel Core2 T7200 @ 2.00GHz, and 2 GB memory.


However the memory used is about 1.4 GB.
How did you manage to get 700 MB memory usage?

Note that the minimum space required is about 480 MB (assuming 4 byteinteger for the ID, and 1 byte integer for rating).Using a 4 byte integer for both ID and rating, the space required isabout 765 MB.

1.5 GB is the space required if one uses a total of 16 bytes to storeboth the ID and the rating.

Maybe it is the garbage collector that does not release memory to theoperating system?




Thanks  Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Reply via email to