Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-27 Thread Nick Pentreath
This is exactly the core problem in the linked issue - normally you would use the TrainValidationSplit or CrossValidator to do hyper-parameter selection using cross-validation. You could tune the factor size, regularization parameter and alpha (for implicit preference data), for example. Because

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-25 Thread Rohit Chaddha
Hi Krishna, Great .. I had no idea about this. I tried your suggestion by using na.drop() and got a rmse = 1.5794048211812495 Any suggestions how this can be reduced and the model improved ? Regards, Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Nick Pentreath
Good suggestion Krishna One issue is that this doesn't work with TrainValidationSplit or CrossValidator for parameter tuning. Hence my solution in the PR which makes it work with the cross-validators. On Mon, 25 Jul 2016 at 00:42, Krishna Sankar wrote: > Thanks Nick. I

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Rohit Chaddha
Great thanks both of you. I was struggling with this issue as well. -Rohit On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar wrote: > Thanks Nick. I also ran into this issue. > VG, One workaround is to drop the NaN from predictions (df.na.drop()) and > then use the dataset

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Krishna Sankar
Thanks Nick. I also ran into this issue. VG, One workaround is to drop the NaN from predictions (df.na.drop()) and then use the dataset for the evaluator. In real life, probably detect the NaN and recommend most popular on some window. HTH. Cheers On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Nick Pentreath
It seems likely that you're running into https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the test dataset in the train/test split contains users or items that were not in the training set. Hence the model doesn't have computed factors for those ids, and ALS 'transform'

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread VG
ping. Anyone has some suggestions/advice for me . It will be really helpful. VG On Sun, Jul 24, 2016 at 12:19 AM, VG wrote: > Sean, > > I did this just to test the model. When I do a split of my data as > training to 80% and test to be 20% > > I get a Root-mean-square error

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread VG
Any suggestions / ideas here ? On Sun, Jul 24, 2016 at 12:19 AM, VG wrote: > Sean, > > I did this just to test the model. When I do a split of my data as > training to 80% and test to be 20% > > I get a Root-mean-square error = NaN > > So I am wondering where I might be

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread VG
Sean, I did this just to test the model. When I do a split of my data as training to 80% and test to be 20% I get a Root-mean-square error = NaN So I am wondering where I might be going wrong Regards, VG On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen wrote: > No, that's

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread Sean Owen
No, that's certainly not to be expected. ALS works by computing a much lower-rank representation of the input. It would not reproduce the input exactly, and you don't want it to -- this would be seriously overfit. This is why in general you don't evaluate a model on the training set. On Sat, Jul

Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread VG
I am trying to run ml.ALS to compute some recommendations. Just to test I am using the same dataset for training using ALSModel and for predicting the results based on the model . When I evaluate the result using RegressionEvaluator I get a Root-mean-square error = 1.5544064263236066 I thin