Kenneth Hoste ha scritto:
Hello,
I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.
I kind of have an algorithm in mind that I want to implement using Haskell,
but up until now, the main issue has been to find a way to efficiently
represent
the data...
For people who are not familiar with the Netflix data, in short, it
consist of
roughly 100M (1e8) user ratings (1-5, integer) for 17,770 different
movies, coming from
480,109 different users.
Hi Kenneth.
I have written a simple program that parses the Netflix training data
set, using this data structure:
type MovieRatings = IntMap (UArr Word32, UArr Word8)
The ratings are grouped by movies.
The parsing is done in:
real 8m32.476s
user 3m5.276s
sys 0m8.681s
On a DELL Inspiron 6400 notebook,
Intel Core2 T7200 @ 2.00GHz, and 2 GB memory.
However the memory used is about 1.4 GB.
How did you manage to get 700 MB memory usage?
Note that the minimum space required is about 480 MB (assuming 4 byte
integer for the ID, and 1 byte integer for rating).
Using a 4 byte integer for both ID and rating, the space required is
about 765 MB.
1.5 GB is the space required if one uses a total of 16 bytes to store
both the ID and the rating.
Maybe it is the garbage collector that does not release memory to the
operating system?
Thanks Manlio Perillo
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe