I'm working on LinearRegressionWithElasticNet using OWLQN now. This
will do the data standardization internally so it's transparent to
users. With OWLQN, you don't have to manually choose stepSize. Will
send out PR soon next week.

Sincerely,

DB Tsai
-------------------------------------------------------
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 8:46 AM, devl.development
<devl.developm...@gmail.com> wrote:
> From what I gather, you use LinearRegressionWithSGD to predict y or the
> response variable given a feature vector x.
>
> In a simple example I used a perfectly linear dataset such that x=y
> y,x
> 1,1
> 2,2
> ...
>
> 10000,10000
>
> Using the out-of-box example from the website (with and without scaling):
>
>  val data = sc.textFile(file)
>
>     val parsedData = data.map { line =>
>       val parts = line.split(',')
>      LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
> and x
>
>     }
>     val scaler = new StandardScaler(withMean = true, withStd = true)
>       .fit(parsedData.map(x => x.features))
>     val scaledData = parsedData
>       .map(x =>
>       LabeledPoint(x.label,
>         scaler.transform(Vectors.dense(x.features.toArray))))
>
>     // Building the model
>     val numIterations = 100
>     val model = LinearRegressionWithSGD.train(parsedData, numIterations)
>
>     // Evaluate model on training examples and compute training error *
> tried using both scaledData and parsedData
>     val valuesAndPreds = scaledData.map { point =>
>       val prediction = model.predict(point.features)
>       (point.label, prediction)
>     }
>     val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
>     println("training Mean Squared Error = " + MSE)
>
> Both scaled and unscaled attempts give:
>
> training Mean Squared Error = NaN
>
> I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
> still comes up with the same thing.
>
> Is this not supposed to work for x and y or 2 dimensional plots? Is there
> something I'm missing or wrong in the code above? Or is there a limitation
> in the method?
>
> Thanks for any advice.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to