Hi Robin, You can try this PR out. This has built-in features scaling, and has ElasticNet regularization (L1/L2 mix). This implementation can stably converge to model from R's glmnet package.
https://github.com/apache/spark/pull/4259 Sincerely, DB Tsai ------------------------------------------------------- Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Jan 15, 2015 at 9:42 AM, Robin East <robin.e...@xense.co.uk> wrote: > -dev, +user > > You’ll need to set the gradient descent step size to something small - a bit > of trial and error shows that 0.00000001 works. > > You’ll need to create a LinearRegressionWithSGD instance and set the step > size explicitly: > > val lr = new LinearRegressionWithSGD() > lr.optimizer.setStepSize(0.00000001) > lr.optimizer.setNumIterations(100) > val model = lr.run(parsedData) > > On 15 Jan 2015, at 16:46, devl.development <devl.developm...@gmail.com> wrote: > >> From what I gather, you use LinearRegressionWithSGD to predict y or the >> response variable given a feature vector x. >> >> In a simple example I used a perfectly linear dataset such that x=y >> y,x >> 1,1 >> 2,2 >> ... >> >> 10000,10000 >> >> Using the out-of-box example from the website (with and without scaling): >> >> val data = sc.textFile(file) >> >> val parsedData = data.map { line => >> val parts = line.split(',') >> LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y >> and x >> >> } >> val scaler = new StandardScaler(withMean = true, withStd = true) >> .fit(parsedData.map(x => x.features)) >> val scaledData = parsedData >> .map(x => >> LabeledPoint(x.label, >> scaler.transform(Vectors.dense(x.features.toArray)))) >> >> // Building the model >> val numIterations = 100 >> val model = LinearRegressionWithSGD.train(parsedData, numIterations) >> >> // Evaluate model on training examples and compute training error * >> tried using both scaledData and parsedData >> val valuesAndPreds = scaledData.map { point => >> val prediction = model.predict(point.features) >> (point.label, prediction) >> } >> val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean() >> println("training Mean Squared Error = " + MSE) >> >> Both scaled and unscaled attempts give: >> >> training Mean Squared Error = NaN >> >> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1) >> still comes up with the same thing. >> >> Is this not supposed to work for x and y or 2 dimensional plots? Is there >> something I'm missing or wrong in the code above? Or is there a limitation >> in the method? >> >> Thanks for any advice. >> >> >> >> -- >> View this message in context: >> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org >> For additional commands, e-mail: dev-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org