I'm working on LinearRegressionWithElasticNet using OWLQN now. This will do the data standardization internally so it's transparent to users. With OWLQN, you don't have to manually choose stepSize. Will send out PR soon next week.
Sincerely, DB Tsai ------------------------------------------------------- Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Jan 15, 2015 at 8:46 AM, devl.development <devl.developm...@gmail.com> wrote: > From what I gather, you use LinearRegressionWithSGD to predict y or the > response variable given a feature vector x. > > In a simple example I used a perfectly linear dataset such that x=y > y,x > 1,1 > 2,2 > ... > > 10000,10000 > > Using the out-of-box example from the website (with and without scaling): > > val data = sc.textFile(file) > > val parsedData = data.map { line => > val parts = line.split(',') > LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y > and x > > } > val scaler = new StandardScaler(withMean = true, withStd = true) > .fit(parsedData.map(x => x.features)) > val scaledData = parsedData > .map(x => > LabeledPoint(x.label, > scaler.transform(Vectors.dense(x.features.toArray)))) > > // Building the model > val numIterations = 100 > val model = LinearRegressionWithSGD.train(parsedData, numIterations) > > // Evaluate model on training examples and compute training error * > tried using both scaledData and parsedData > val valuesAndPreds = scaledData.map { point => > val prediction = model.predict(point.features) > (point.label, prediction) > } > val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() > println("training Mean Squared Error = " + MSE) > > Both scaled and unscaled attempts give: > > training Mean Squared Error = NaN > > I've even tried x, y+(sample noise from normal with mean 0 and stddev 1) > still comes up with the same thing. > > Is this not supposed to work for x and y or 2 dimensional plots? Is there > something I'm missing or wrong in the code above? Or is there a limitation > in the method? > > Thanks for any advice. > > > > -- > View this message in context: > http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html > Sent from the Apache Spark Developers List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org