-dev, +user

You’ll need to set the gradient descent step size to something small - a bit of 
trial and error shows that 0.00000001 works.

You’ll need to create a LinearRegressionWithSGD instance and set the step size 
explicitly:

val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.00000001)
lr.optimizer.setNumIterations(100)
val model = lr.run(parsedData)

On 15 Jan 2015, at 16:46, devl.development <devl.developm...@gmail.com> wrote:

> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
> 
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
> 
> 10000,10000
> 
> Using the out-of-box example from the website (with and without scaling):
> 
> val data = sc.textFile(file)
> 
>    val parsedData = data.map { line =>
>      val parts = line.split(',')
>     LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
> 
>    }
>    val scaler = new StandardScaler(withMean = true, withStd = true)
>      .fit(parsedData.map(x => x.features))
>    val scaledData = parsedData
>      .map(x =>
>      LabeledPoint(x.label,
>        scaler.transform(Vectors.dense(x.features.toArray))))
> 
>    // Building the model
>    val numIterations = 100
>    val model = LinearRegressionWithSGD.train(parsedData, numIterations)
> 
>    // Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
>    val valuesAndPreds = scaledData.map { point =>
>      val prediction = model.predict(point.features)
>      (point.label, prediction)
>    }
>    val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>    println("training Mean Squared Error = " + MSE)
> 
> Both scaled and unscaled attempts give:
> 
> training Mean Squared Error = NaN
> 
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
> 
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
> 
> Thanks for any advice.
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 

Reply via email to