>From what I gather, you use LinearRegressionWithSGD to predict y or the response variable given a feature vector x.
In a simple example I used a perfectly linear dataset such that x=y y,x 1,1 2,2 ... 10000,10000 Using the out-of-box example from the website (with and without scaling): val data = sc.textFile(file) val parsedData = data.map { line => val parts = line.split(',') LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y and x } val scaler = new StandardScaler(withMean = true, withStd = true) .fit(parsedData.map(x => x.features)) val scaledData = parsedData .map(x => LabeledPoint(x.label, scaler.transform(Vectors.dense(x.features.toArray)))) // Building the model val numIterations = 100 val model = LinearRegressionWithSGD.train(parsedData, numIterations) // Evaluate model on training examples and compute training error * tried using both scaledData and parsedData val valuesAndPreds = scaledData.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() println("training Mean Squared Error = " + MSE) Both scaled and unscaled attempts give: training Mean Squared Error = NaN I've even tried x, y+(sample noise from normal with mean 0 and stddev 1) still comes up with the same thing. Is this not supposed to work for x and y or 2 dimensional plots? Is there something I'm missing or wrong in the code above? Or is there a limitation in the method? Thanks for any advice. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org