Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Kenneth Hoste Thu, 26 Feb 2009 06:15:10 -0800


On Feb 26, 2009, at 13:00 , Manlio Perillo wrote:

Kenneth Hoste ha scritto:
Hello,
I'm having a go at the Netflix Prize using Haskell. Yes, I'm brave.
[...]
To see if I could efficiently represent the data set in this way, Iwrote a small
Haskell program (attached) which uses the following data type:
From what I see, to append a new integer to the Array, you convertthe array to a list, append the new element to the list, and thenconvert to array again.
Isn't this a bit inefficient?

Yes, performance-wise this is terribly inefficient, I agree. But, itwas just an artefact of how the raw data is organized.

My main concern was the memory usage of the huge IntMap with UArrayelements.Once I solved that, I would be able to get around the performanceissue by reorganizing the raw data.

However, as I posted yesterday, I've been able to circumvent the issueby rethinking my data type, i.e. usingthe ~18K movie IDs as key instead of the 480K user IDs, whichradically limits the overhead...That way, I'm able to fit the data set in <700M of memory, withouthaving to reorganize the raw data.

The uvector package implements a vector of unboxed types, and has ansnocU operation, to append an element to the array.
I don't know how efficient it is, however.

By the way, about uvector: it has a Stream data type, and you canbuild a vector from a stream.


Thanks for letting me know, I'll keep this in mind.

greetings,

Kenneth

--

Kenneth Hoste
Paris research group - ELIS - Ghent University, Belgium
email: kenneth.ho...@elis.ugent.be
website: http://www.elis.ugent.be/~kehoste
blog: http://boegel.kejo.be

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] memory-efficient data type for Netflix data - UArray Int Int vs UArray Int Word8

Reply via email to