Hello,

I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.

I kind of have an algorithm in mind that I want to implement using Haskell, but up until now, the main issue has been to find a way to efficiently represent
the data...

For people who are not familiar with the Netflix data, in short, it consist of roughly 100M (1e8) user ratings (1-5, integer) for 17,770 different movies, coming from
480,109 different users.

The idea I have in mind requires fast access to all the ratings for a particular user,
so I was thinking about an IntMap in terms of user ids, which maps to
movie id/rating pairs somehow.

To see if I could efficiently represent the data set in this way, I wrote a small
Haskell program (attached) which uses the following data type:

Attachment: testMemSizeUArray_UArray_Word8.hs
Description: Binary data




type data = IntMap (UArray Int Word8) -- map of user IDs to ratings (lacks movie IDs)

For convenience, and because I've been discussing this with various people in #haskell @ IRC, the code is also available here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=1671#a1671 .

I'm aware that the way in which the UArray's are constructed (i.e. by adding a single element each time), is far from efficient performance-wise, but that's not the point here... By reformatting the raw data, I can easily read in the data more efficiently.

The issue I ran into is that changing the data type to the following, doesn't yield any significant
different in memory usage.

type data = IntMap (UArray Int Int) -- use Int instead of Word8 for a user rating

Someone (dolio) @ #haskell suggested that maybe UArray is not byte packed for Word8, which would cause little difference with a UArray containing Int's, but someone else (dons @ #ghc)
was able to tell me it _is_ byte packed.

Does anyone know why the Word8 version is not significantly better in terms of memory usage?

greetings,

Kenneth

PS: My adventures on trying to tackle the Netflix Prize problem with Haskell can be followed at http://boegel.kejo.be.

--

Kenneth Hoste
ELIS - Ghent University
email: kenneth.ho...@elis.ugent.be
blog: http://www.elis.ugent.be/~kehoste/blog
website: http://www.elis.ugent.be/~kehoste

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to