Kenneth Hoste ha scritto:
Hello,

I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.

I kind of have an algorithm in mind that I want to implement using Haskell,
but up until now, the main issue has been to find a way to efficiently represent
the data...

For people who are not familiar with the Netflix data, in short, it consist of roughly 100M (1e8) user ratings (1-5, integer) for 17,770 different movies, coming from
480,109 different users.


Hi Kenneth.

I have written a simple program that parses the Netflix training data set, using this data structure:

type MovieRatings = IntMap (UArr Word32, UArr Word8)

The ratings are grouped by movies.

The parsing is done in:
real    8m32.476s
user    3m5.276s
sys     0m8.681s

On a DELL Inspiron 6400 notebook,
Intel Core2 T7200 @ 2.00GHz, and 2 GB memory.


However the memory used is about 1.4 GB.
How did you manage to get 700 MB memory usage?


Note that the minimum space required is about 480 MB (assuming 4 byte integer for the ID, and 1 byte integer for rating). Using a 4 byte integer for both ID and rating, the space required is about 765 MB.

1.5 GB is the space required if one uses a total of 16 bytes to store both the ID and the rating.



Maybe it is the garbage collector that does not release memory to the operating system?



Thanks  Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to