Re: Reg. Netflix Prize Apache Mahout GSoC Application

Sean Owen Sun, 04 Apr 2010 03:40:38 -0700

I think you want to write this to accept "generic" data, and not
necessarily assume the Netflix input format. I suggest you accept CSV
data, in the form "userID,itemID,value", since that is what all the
recommenders do.

You may need a quick utility program to convert Netflix data format to
this. this wouldn't be part of the project, or else, we can put it in
utils later.

I don't think data storage is an issue here -- the files will live on
HDFS/S3, that's it. No code is needed. I don't think this has anything
to do with the classifier data stores unless I misunderstand the
project.

On Sun, Apr 4, 2010 at 11:17 AM, Sisir Koppaka <sisir.kopp...@gmail.com> wrote:
> Thanks, this is what I wanted to know. So, now, there would be a separate
> example that reads-in the Netflix dataset in a distributed way, that would
> be utilize the RBM implementation. Would that be right?
>
> The datastore I was referring to in the proposal was based on
> mahout.classifier.bayes.datastore. I understand the HBase, Cassandra and
> other adapters are being refactored out in a separate ticket, so I'll just
> stick with HDFS and S3.
>
> If there's anything else that I would need to add in the proposal, do let me
> know.

Re: Reg. Netflix Prize Apache Mahout GSoC Application

Reply via email to