I think you want to write this to accept "generic" data, and not necessarily assume the Netflix input format. I suggest you accept CSV data, in the form "userID,itemID,value", since that is what all the recommenders do.
You may need a quick utility program to convert Netflix data format to this. this wouldn't be part of the project, or else, we can put it in utils later. I don't think data storage is an issue here -- the files will live on HDFS/S3, that's it. No code is needed. I don't think this has anything to do with the classifier data stores unless I misunderstand the project. On Sun, Apr 4, 2010 at 11:17 AM, Sisir Koppaka <sisir.kopp...@gmail.com> wrote: > Thanks, this is what I wanted to know. So, now, there would be a separate > example that reads-in the Netflix dataset in a distributed way, that would > be utilize the RBM implementation. Would that be right? > > The datastore I was referring to in the proposal was based on > mahout.classifier.bayes.datastore. I understand the HBase, Cassandra and > other adapters are being refactored out in a separate ticket, so I'll just > stick with HDFS and S3. > > If there's anything else that I would need to add in the proposal, do let me > know.