Hi list,

I'm benchmarking MLlib for a regression task [1] and get strange results.

Namely, using RidgeRegressionWithSGD it seems the predicted points miss the
intercept:

{code}
val trainedModel = RidgeRegressionWithSGD.train(trainingData, 1000)
...
valuesAndPreds.take(10).map(t => println(t))
{code}

output:

(2007.0,-3.784588726958493E75)
(2003.0,-1.9562390324037716E75)
(2005.0,-4.147413202985629E75)
(2003.0,-1.524938024096847E75)
...

If I change the parameters (step size, regularization and iterations) I get
NaNs more often than not:
(2007.0,NaN)
(2003.0,NaN)
(2005.0,NaN)
...

On the other hand DecisionTree model give sensible results.

I see there is a `setIntercept()` method in abstract class
GeneralizedLinearAlgorithm that seems to trigger the use of the intercept
but I'm unable to use it from the public interface :(

Any help appreciated :)

Eustache

[1] https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

Reply via email to