Re: What is the most efficient and scalable way to get all the recommendation results from ALS model ?

2016-03-20 Thread Hiroyuki Yamada
each. I only use small-sized data set so far, like about 5 users and 5000 products with only about 10 ratings. Thanks. On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <mogwa...@gmail.com> wrote: > Hi, > > I'm testing Collaborative Filtering with Milib. &

What is the most efficient and scalable way to get all the recommendation results from ALS model ?

2016-03-19 Thread Hiroyuki Yamada
Hi, I'm testing Collaborative Filtering with Milib. Making a model by ALS.trainImplicit (or train) seems scalable as far as I have tested, but I'm wondering how I can get all the recommendation results efficiently. The predictAll method can get all the results, but it needs the whole

spark-submit with cluster deploy mode fails with ClassNotFoundException (jars are not passed around properley?)

2016-03-11 Thread Hiroyuki Yamada
Hi, I am trying to work with spark-submit with cluster deploy mode in single node, but I keep getting ClassNotFoundException as shown below. (in this case, snakeyaml.jar is not found from the spark cluster) === 16/03/12 14:19:12 INFO Remoting: Starting remoting 16/03/12 14:19:12 INFO Remoting:

Re: which is a more appropriate form of ratings ?

2016-02-25 Thread Hiroyuki Yamada
> > On Thu, 25 Feb 2016 at 13:26 Sabarish Sasidharan <sabarish@gmail.com> > wrote: > >> I believe the ALS algo expects the ratings to be aggregated (A). I don't >> see why you have to use decimals for rating. >> >> Regards >> Sab >> >> On

which is a more appropriate form of ratings ?

2016-02-25 Thread Hiroyuki Yamada
Hello. I just started working on CF in MLlib. I am using trainImplicit because I only have implicit ratings like page views. I am wondering which is a more appropriate form of ratings. Let's assume that view count is regarded as a rating and user 1 sees page 1 3 times and sees page 2 twice and

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

2016-02-25 Thread Hiroyuki Yamada
> best result. > > I think that generally sparser input needs higher alpha, and maybe > someone tells me that really alpha should be a function of the > sparsity, but I've never seen that done. > > > > On Thu, Feb 25, 2016 at 6:33 AM, Hiroyuki Yamada <mogwa...@gmail.com

Re: What is the point of alpha value in Collaborative Filtering in MLlib ?

2016-02-24 Thread Hiroyuki Yamada
Hi, I've been doing some POC for CF in MLlib. In my environment, ratings are all implicit so that I try to use it with trainImplicit method (in python). The trainImplicit method takes alpha as one of the arguments to specify a confidence for the ratings as described in <