It looks like you're training on the non-scaled data but testing on the scaled data. Have you tried this training & testing on only the scaled data?
On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel <devl.developm...@gmail.com> wrote: > Thanks, that helps a bit at least with the NaN but the MSE is still very > high even with that step size and 10k iterations: > > training Mean Squared Error = 3.3322561285919316E7 > > Does this method need say 100k iterations? > > > > > > > On Thu, Jan 15, 2015 at 5:42 PM, Robin East <robin.e...@xense.co.uk> > wrote: > > > -dev, +user > > > > You’ll need to set the gradient descent step size to something small - a > > bit of trial and error shows that 0.00000001 works. > > > > You’ll need to create a LinearRegressionWithSGD instance and set the step > > size explicitly: > > > > val lr = new LinearRegressionWithSGD() > > lr.optimizer.setStepSize(0.00000001) > > lr.optimizer.setNumIterations(100) > > val model = lr.run(parsedData) > > > > On 15 Jan 2015, at 16:46, devl.development <devl.developm...@gmail.com> > > wrote: > > > > From what I gather, you use LinearRegressionWithSGD to predict y or the > > response variable given a feature vector x. > > > > In a simple example I used a perfectly linear dataset such that x=y > > y,x > > 1,1 > > 2,2 > > ... > > > > 10000,10000 > > > > Using the out-of-box example from the website (with and without scaling): > > > > val data = sc.textFile(file) > > > > val parsedData = data.map { line => > > val parts = line.split(',') > > LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y > > and x > > > > } > > val scaler = new StandardScaler(withMean = true, withStd = true) > > .fit(parsedData.map(x => x.features)) > > val scaledData = parsedData > > .map(x => > > LabeledPoint(x.label, > > scaler.transform(Vectors.dense(x.features.toArray)))) > > > > // Building the model > > val numIterations = 100 > > val model = LinearRegressionWithSGD.train(parsedData, numIterations) > > > > // Evaluate model on training examples and compute training error * > > tried using both scaledData and parsedData > > val valuesAndPreds = scaledData.map { point => > > val prediction = model.predict(point.features) > > (point.label, prediction) > > } > > val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), > 2)}.mean() > > println("training Mean Squared Error = " + MSE) > > > > Both scaled and unscaled attempts give: > > > > training Mean Squared Error = NaN > > > > I've even tried x, y+(sample noise from normal with mean 0 and stddev 1) > > still comes up with the same thing. > > > > Is this not supposed to work for x and y or 2 dimensional plots? Is there > > something I'm missing or wrong in the code above? Or is there a > limitation > > in the method? > > > > Thanks for any advice. > > > > > > > > -- > > View this message in context: > > > http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html > > Sent from the Apache Spark Developers List mailing list archive at > > Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > > For additional commands, e-mail: dev-h...@spark.apache.org > > > > > > >