LinearRegressionWithSGD accuracy

devl.development Thu, 15 Jan 2015 08:50:48 -0800

>From what I gather, you use LinearRegressionWithSGD to predict y or the
response variable given a feature vector x.


In a simple example I used a perfectly linear dataset such that x=y
y,x
1,1
2,2
...

10000,10000

Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

    val parsedData = data.map { line =>
      val parts = line.split(',')
     LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
and x

    }
    val scaler = new StandardScaler(withMean = true, withStd = true)
      .fit(parsedData.map(x => x.features))
    val scaledData = parsedData
      .map(x =>
      LabeledPoint(x.label,
        scaler.transform(Vectors.dense(x.features.toArray))))

    // Building the model
    val numIterations = 100
    val model = LinearRegressionWithSGD.train(parsedData, numIterations)

    // Evaluate model on training examples and compute training error *
tried using both scaledData and parsedData
    val valuesAndPreds = scaledData.map { point =>
      val prediction = model.predict(point.features)
      (point.label, prediction)
    }
    val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
    println("training Mean Squared Error = " + MSE)

Both scaled and unscaled attempts give:

training Mean Squared Error = NaN

I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
still comes up with the same thing.

Is this not supposed to work for x and y or 2 dimensional plots? Is there
something I'm missing or wrong in the code above? Or is there a limitation
in the method?

Thanks for any advice.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

LinearRegressionWithSGD accuracy

Reply via email to