subject:"Re\: Spark ml.ALS question \-\- RegressionEvaluator .evaluate giving \~1.5 output for same train and predict data"

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-27 Thread Nick Pentreath

This is exactly the core problem in the linked issue - normally you would
use the TrainValidationSplit or CrossValidator to do hyper-parameter
selection using cross-validation. You could tune the factor size,
regularization parameter and alpha (for implicit preference data), for
example.

Because of the NaN issue you cannot use the cross-validators currently with
ALS. So you would have to do it yourself manually (dropping the NaNs from
the prediction results as Krishna says).



On Mon, 25 Jul 2016 at 11:40 Rohit Chaddha 
wrote:

> Hi Krishna,
>
> Great .. I had no idea about this.  I tried your suggestion by using
> na.drop() and got a rmse = 1.5794048211812495
> Any suggestions how this can be reduced and the model improved ?
>
> Regards,
> Rohit
>
> On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar 
> wrote:
>
>> Thanks Nick. I also ran into this issue.
>> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
>> then use the dataset for the evaluator. In real life, probably detect the
>> NaN and recommend most popular on some window.
>> HTH.
>> Cheers
>> 
>>
>> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath <
>> nick.pentre...@gmail.com> wrote:
>>
>>> It seems likely that you're running into
>>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when
>>> the test dataset in the train/test split contains users or items that were
>>> not in the training set. Hence the model doesn't have computed factors for
>>> those ids, and ALS 'transform' currently returns NaN for those ids. This in
>>> turn results in NaN for the evaluator result.
>>>
>>> I have a PR open on that issue that will hopefully address this soon.
>>>
>>>
>>> On Sun, 24 Jul 2016 at 17:49 VG  wrote:
>>>
 ping. Anyone has some suggestions/advice for me .
 It will be really helpful.

 VG
 On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:

> Sean,
>
> I did this just to test the model. When I do a split of my data as
> training to 80% and test to be 20%
>
> I get a Root-mean-square error = NaN
>
> So I am wondering where I might be going wrong
>
> Regards,
> VG
>
> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen 
> wrote:
>
>> No, that's certainly not to be expected. ALS works by computing a much
>> lower-rank representation of the input. It would not reproduce the
>> input exactly, and you don't want it to -- this would be seriously
>> overfit. This is why in general you don't evaluate a model on the
>> training set.
>>
>> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
>> > I am trying to run ml.ALS to compute some recommendations.
>> >
>> > Just to test I am using the same dataset for training using
>> ALSModel and for
>> > predicting the results based on the model .
>> >
>> > When I evaluate the result using RegressionEvaluator I get a
>> > Root-mean-square error = 1.5544064263236066
>> >
>> > I thin this should be 0. Any suggestions what might be going wrong.
>> >
>> > Regards,
>> > Vipul
>>
>
>
>>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-25 Thread Rohit Chaddha

Hi Krishna,

Great .. I had no idea about this.  I tried your suggestion by using
na.drop() and got a rmse = 1.5794048211812495
Any suggestions how this can be reduced and the model improved ?

Regards,
Rohit

On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar  wrote:

> Thanks Nick. I also ran into this issue.
> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
> then use the dataset for the evaluator. In real life, probably detect the
> NaN and recommend most popular on some window.
> HTH.
> Cheers
> 
>
> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath  > wrote:
>
>> It seems likely that you're running into
>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
>> test dataset in the train/test split contains users or items that were not
>> in the training set. Hence the model doesn't have computed factors for
>> those ids, and ALS 'transform' currently returns NaN for those ids. This in
>> turn results in NaN for the evaluator result.
>>
>> I have a PR open on that issue that will hopefully address this soon.
>>
>>
>> On Sun, 24 Jul 2016 at 17:49 VG  wrote:
>>
>>> ping. Anyone has some suggestions/advice for me .
>>> It will be really helpful.
>>>
>>> VG
>>> On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:
>>>
 Sean,

 I did this just to test the model. When I do a split of my data as
 training to 80% and test to be 20%

 I get a Root-mean-square error = NaN

 So I am wondering where I might be going wrong

 Regards,
 VG

 On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:

> No, that's certainly not to be expected. ALS works by computing a much
> lower-rank representation of the input. It would not reproduce the
> input exactly, and you don't want it to -- this would be seriously
> overfit. This is why in general you don't evaluate a model on the
> training set.
>
> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
> > I am trying to run ml.ALS to compute some recommendations.
> >
> > Just to test I am using the same dataset for training using ALSModel
> and for
> > predicting the results based on the model .
> >
> > When I evaluate the result using RegressionEvaluator I get a
> > Root-mean-square error = 1.5544064263236066
> >
> > I thin this should be 0. Any suggestions what might be going wrong.
> >
> > Regards,
> > Vipul
>


>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Nick Pentreath

Good suggestion Krishna

One issue is that this doesn't work with TrainValidationSplit or
CrossValidator for parameter tuning. Hence my solution in the PR which
makes it work with the cross-validators.

On Mon, 25 Jul 2016 at 00:42, Krishna Sankar  wrote:

> Thanks Nick. I also ran into this issue.
> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
> then use the dataset for the evaluator. In real life, probably detect the
> NaN and recommend most popular on some window.
> HTH.
> Cheers
> 
>
> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath  > wrote:
>
>> It seems likely that you're running into
>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
>> test dataset in the train/test split contains users or items that were not
>> in the training set. Hence the model doesn't have computed factors for
>> those ids, and ALS 'transform' currently returns NaN for those ids. This in
>> turn results in NaN for the evaluator result.
>>
>> I have a PR open on that issue that will hopefully address this soon.
>>
>>
>> On Sun, 24 Jul 2016 at 17:49 VG  wrote:
>>
>>> ping. Anyone has some suggestions/advice for me .
>>> It will be really helpful.
>>>
>>> VG
>>> On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:
>>>
 Sean,

 I did this just to test the model. When I do a split of my data as
 training to 80% and test to be 20%

 I get a Root-mean-square error = NaN

 So I am wondering where I might be going wrong

 Regards,
 VG

 On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:

> No, that's certainly not to be expected. ALS works by computing a much
> lower-rank representation of the input. It would not reproduce the
> input exactly, and you don't want it to -- this would be seriously
> overfit. This is why in general you don't evaluate a model on the
> training set.
>
> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
> > I am trying to run ml.ALS to compute some recommendations.
> >
> > Just to test I am using the same dataset for training using ALSModel
> and for
> > predicting the results based on the model .
> >
> > When I evaluate the result using RegressionEvaluator I get a
> > Root-mean-square error = 1.5544064263236066
> >
> > I thin this should be 0. Any suggestions what might be going wrong.
> >
> > Regards,
> > Vipul
>


>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Rohit Chaddha

Great thanks both of you.  I was struggling with this issue as well.

-Rohit


On Mon, Jul 25, 2016 at 4:12 AM, Krishna Sankar  wrote:

> Thanks Nick. I also ran into this issue.
> VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
> then use the dataset for the evaluator. In real life, probably detect the
> NaN and recommend most popular on some window.
> HTH.
> Cheers
> 
>
> On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath  > wrote:
>
>> It seems likely that you're running into
>> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
>> test dataset in the train/test split contains users or items that were not
>> in the training set. Hence the model doesn't have computed factors for
>> those ids, and ALS 'transform' currently returns NaN for those ids. This in
>> turn results in NaN for the evaluator result.
>>
>> I have a PR open on that issue that will hopefully address this soon.
>>
>>
>> On Sun, 24 Jul 2016 at 17:49 VG  wrote:
>>
>>> ping. Anyone has some suggestions/advice for me .
>>> It will be really helpful.
>>>
>>> VG
>>> On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:
>>>
 Sean,

 I did this just to test the model. When I do a split of my data as
 training to 80% and test to be 20%

 I get a Root-mean-square error = NaN

 So I am wondering where I might be going wrong

 Regards,
 VG

 On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:

> No, that's certainly not to be expected. ALS works by computing a much
> lower-rank representation of the input. It would not reproduce the
> input exactly, and you don't want it to -- this would be seriously
> overfit. This is why in general you don't evaluate a model on the
> training set.
>
> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
> > I am trying to run ml.ALS to compute some recommendations.
> >
> > Just to test I am using the same dataset for training using ALSModel
> and for
> > predicting the results based on the model .
> >
> > When I evaluate the result using RegressionEvaluator I get a
> > Root-mean-square error = 1.5544064263236066
> >
> > I thin this should be 0. Any suggestions what might be going wrong.
> >
> > Regards,
> > Vipul
>


>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Krishna Sankar

Thanks Nick. I also ran into this issue.
VG, One workaround is to drop the NaN from predictions (df.na.drop()) and
then use the dataset for the evaluator. In real life, probably detect the
NaN and recommend most popular on some window.
HTH.
Cheers


On Sun, Jul 24, 2016 at 12:49 PM, Nick Pentreath 
wrote:

> It seems likely that you're running into
> https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
> test dataset in the train/test split contains users or items that were not
> in the training set. Hence the model doesn't have computed factors for
> those ids, and ALS 'transform' currently returns NaN for those ids. This in
> turn results in NaN for the evaluator result.
>
> I have a PR open on that issue that will hopefully address this soon.
>
>
> On Sun, 24 Jul 2016 at 17:49 VG  wrote:
>
>> ping. Anyone has some suggestions/advice for me .
>> It will be really helpful.
>>
>> VG
>> On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:
>>
>>> Sean,
>>>
>>> I did this just to test the model. When I do a split of my data as
>>> training to 80% and test to be 20%
>>>
>>> I get a Root-mean-square error = NaN
>>>
>>> So I am wondering where I might be going wrong
>>>
>>> Regards,
>>> VG
>>>
>>> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:
>>>
 No, that's certainly not to be expected. ALS works by computing a much
 lower-rank representation of the input. It would not reproduce the
 input exactly, and you don't want it to -- this would be seriously
 overfit. This is why in general you don't evaluate a model on the
 training set.

 On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
 > I am trying to run ml.ALS to compute some recommendations.
 >
 > Just to test I am using the same dataset for training using ALSModel
 and for
 > predicting the results based on the model .
 >
 > When I evaluate the result using RegressionEvaluator I get a
 > Root-mean-square error = 1.5544064263236066
 >
 > I thin this should be 0. Any suggestions what might be going wrong.
 >
 > Regards,
 > Vipul

>>>
>>>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread Nick Pentreath

It seems likely that you're running into
https://issues.apache.org/jira/browse/SPARK-14489 - this occurs when the
test dataset in the train/test split contains users or items that were not
in the training set. Hence the model doesn't have computed factors for
those ids, and ALS 'transform' currently returns NaN for those ids. This in
turn results in NaN for the evaluator result.

I have a PR open on that issue that will hopefully address this soon.


On Sun, 24 Jul 2016 at 17:49 VG  wrote:

> ping. Anyone has some suggestions/advice for me .
> It will be really helpful.
>
> VG
> On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:
>
>> Sean,
>>
>> I did this just to test the model. When I do a split of my data as
>> training to 80% and test to be 20%
>>
>> I get a Root-mean-square error = NaN
>>
>> So I am wondering where I might be going wrong
>>
>> Regards,
>> VG
>>
>> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:
>>
>>> No, that's certainly not to be expected. ALS works by computing a much
>>> lower-rank representation of the input. It would not reproduce the
>>> input exactly, and you don't want it to -- this would be seriously
>>> overfit. This is why in general you don't evaluate a model on the
>>> training set.
>>>
>>> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
>>> > I am trying to run ml.ALS to compute some recommendations.
>>> >
>>> > Just to test I am using the same dataset for training using ALSModel
>>> and for
>>> > predicting the results based on the model .
>>> >
>>> > When I evaluate the result using RegressionEvaluator I get a
>>> > Root-mean-square error = 1.5544064263236066
>>> >
>>> > I thin this should be 0. Any suggestions what might be going wrong.
>>> >
>>> > Regards,
>>> > Vipul
>>>
>>
>>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-24 Thread VG

ping. Anyone has some suggestions/advice for me .
It will be really helpful.

VG

On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:

> Sean,
>
> I did this just to test the model. When I do a split of my data as
> training to 80% and test to be 20%
>
> I get a Root-mean-square error = NaN
>
> So I am wondering where I might be going wrong
>
> Regards,
> VG
>
> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:
>
>> No, that's certainly not to be expected. ALS works by computing a much
>> lower-rank representation of the input. It would not reproduce the
>> input exactly, and you don't want it to -- this would be seriously
>> overfit. This is why in general you don't evaluate a model on the
>> training set.
>>
>> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
>> > I am trying to run ml.ALS to compute some recommendations.
>> >
>> > Just to test I am using the same dataset for training using ALSModel
>> and for
>> > predicting the results based on the model .
>> >
>> > When I evaluate the result using RegressionEvaluator I get a
>> > Root-mean-square error = 1.5544064263236066
>> >
>> > I thin this should be 0. Any suggestions what might be going wrong.
>> >
>> > Regards,
>> > Vipul
>>
>
>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread VG

Any suggestions / ideas here ?



On Sun, Jul 24, 2016 at 12:19 AM, VG  wrote:

> Sean,
>
> I did this just to test the model. When I do a split of my data as
> training to 80% and test to be 20%
>
> I get a Root-mean-square error = NaN
>
> So I am wondering where I might be going wrong
>
> Regards,
> VG
>
> On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:
>
>> No, that's certainly not to be expected. ALS works by computing a much
>> lower-rank representation of the input. It would not reproduce the
>> input exactly, and you don't want it to -- this would be seriously
>> overfit. This is why in general you don't evaluate a model on the
>> training set.
>>
>> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
>> > I am trying to run ml.ALS to compute some recommendations.
>> >
>> > Just to test I am using the same dataset for training using ALSModel
>> and for
>> > predicting the results based on the model .
>> >
>> > When I evaluate the result using RegressionEvaluator I get a
>> > Root-mean-square error = 1.5544064263236066
>> >
>> > I thin this should be 0. Any suggestions what might be going wrong.
>> >
>> > Regards,
>> > Vipul
>>
>
>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread VG

Sean,

I did this just to test the model. When I do a split of my data as training
to 80% and test to be 20%

I get a Root-mean-square error = NaN

So I am wondering where I might be going wrong

Regards,
VG

On Sun, Jul 24, 2016 at 12:12 AM, Sean Owen  wrote:

> No, that's certainly not to be expected. ALS works by computing a much
> lower-rank representation of the input. It would not reproduce the
> input exactly, and you don't want it to -- this would be seriously
> overfit. This is why in general you don't evaluate a model on the
> training set.
>
> On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
> > I am trying to run ml.ALS to compute some recommendations.
> >
> > Just to test I am using the same dataset for training using ALSModel and
> for
> > predicting the results based on the model .
> >
> > When I evaluate the result using RegressionEvaluator I get a
> > Root-mean-square error = 1.5544064263236066
> >
> > I thin this should be 0. Any suggestions what might be going wrong.
> >
> > Regards,
> > Vipul
>

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

2016-07-23 Thread Sean Owen

No, that's certainly not to be expected. ALS works by computing a much
lower-rank representation of the input. It would not reproduce the
input exactly, and you don't want it to -- this would be seriously
overfit. This is why in general you don't evaluate a model on the
training set.

On Sat, Jul 23, 2016 at 7:37 PM, VG  wrote:
> I am trying to run ml.ALS to compute some recommendations.
>
> Just to test I am using the same dataset for training using ALSModel and for
> predicting the results based on the model .
>
> When I evaluate the result using RegressionEvaluator I get a
> Root-mean-square error = 1.5544064263236066
>
> I thin this should be 0. Any suggestions what might be going wrong.
>
> Regards,
> Vipul

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

Re: Spark ml.ALS question -- RegressionEvaluator .evaluate giving ~1.5 output for same train and predict data

10 matches

Site Navigation

Mail list logo

Footer information