Re: LinearRegressionWithSGD accuracy

2015-01-28 Thread DB Tsai
Hi Robin,

You can try this PR out. This has built-in features scaling, and has
ElasticNet regularization (L1/L2 mix). This implementation can stably
converge to model from R's glmnet package.

https://github.com/apache/spark/pull/4259

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 9:42 AM, Robin East robin.e...@xense.co.uk wrote:
 -dev, +user

 You’ll need to set the gradient descent step size to something small - a bit 
 of trial and error shows that 0.0001 works.

 You’ll need to create a LinearRegressionWithSGD instance and set the step 
 size explicitly:

 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 val model = lr.run(parsedData)

 On 15 Jan 2015, at 16:46, devl.development devl.developm...@gmail.com wrote:

 From what I gather, you use LinearRegressionWithSGD to predict y or the
 response variable given a feature vector x.

 In a simple example I used a perfectly linear dataset such that x=y
 y,x
 1,1
 2,2
 ...

 1,1

 Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

val parsedData = data.map { line =
  val parts = line.split(',')
 LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
 and x

}
val scaler = new StandardScaler(withMean = true, withStd = true)
  .fit(parsedData.map(x = x.features))
val scaledData = parsedData
  .map(x =
  LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray

// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error *
 tried using both scaledData and parsedData
val valuesAndPreds = scaledData.map { point =
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) = math.pow((v - p), 2)}.mean()
println(training Mean Squared Error =  + MSE)

 Both scaled and unscaled attempts give:

 training Mean Squared Error = NaN

 I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
 still comes up with the same thing.

 Is this not supposed to work for x and y or 2 dimensional plots? Is there
 something I'm missing or wrong in the code above? Or is there a limitation
 in the method?

 Thanks for any advice.



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
 Sent from the Apache Spark Developers List mailing list archive at 
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: LinearRegressionWithSGD accuracy

2015-01-17 Thread DB Tsai
I'm working on LinearRegressionWithElasticNet using OWLQN now. This
will do the data standardization internally so it's transparent to
users. With OWLQN, you don't have to manually choose stepSize. Will
send out PR soon next week.

Sincerely,

DB Tsai
---
Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai



On Thu, Jan 15, 2015 at 8:46 AM, devl.development
devl.developm...@gmail.com wrote:
 From what I gather, you use LinearRegressionWithSGD to predict y or the
 response variable given a feature vector x.

 In a simple example I used a perfectly linear dataset such that x=y
 y,x
 1,1
 2,2
 ...

 1,1

 Using the out-of-box example from the website (with and without scaling):

  val data = sc.textFile(file)

 val parsedData = data.map { line =
   val parts = line.split(',')
  LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
 and x

 }
 val scaler = new StandardScaler(withMean = true, withStd = true)
   .fit(parsedData.map(x = x.features))
 val scaledData = parsedData
   .map(x =
   LabeledPoint(x.label,
 scaler.transform(Vectors.dense(x.features.toArray

 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)

 // Evaluate model on training examples and compute training error *
 tried using both scaledData and parsedData
 val valuesAndPreds = scaledData.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
 }
 val MSE = valuesAndPreds.map{case(v, p) = math.pow((v - p), 2)}.mean()
 println(training Mean Squared Error =  + MSE)

 Both scaled and unscaled attempts give:

 training Mean Squared Error = NaN

 I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
 still comes up with the same thing.

 Is this not supposed to work for x and y or 2 dimensional plots? Is there
 something I'm missing or wrong in the code above? Or is there a limitation
 in the method?

 Thanks for any advice.



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Devl Devel
Thanks, that helps a bit at least with the NaN but the MSE is still very
high even with that step size and 10k iterations:

training Mean Squared Error = 3.3322561285919316E7

Does this method need say 100k iterations?






On Thu, Jan 15, 2015 at 5:42 PM, Robin East robin.e...@xense.co.uk wrote:

 -dev, +user

 You’ll need to set the gradient descent step size to something small - a
 bit of trial and error shows that 0.0001 works.

 You’ll need to create a LinearRegressionWithSGD instance and set the step
 size explicitly:

 val lr = new LinearRegressionWithSGD()
 lr.optimizer.setStepSize(0.0001)
 lr.optimizer.setNumIterations(100)
 val model = lr.run(parsedData)

 On 15 Jan 2015, at 16:46, devl.development devl.developm...@gmail.com
 wrote:

 From what I gather, you use LinearRegressionWithSGD to predict y or the
 response variable given a feature vector x.

 In a simple example I used a perfectly linear dataset such that x=y
 y,x
 1,1
 2,2
 ...

 1,1

 Using the out-of-box example from the website (with and without scaling):

 val data = sc.textFile(file)

val parsedData = data.map { line =
  val parts = line.split(',')
 LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
 and x

}
val scaler = new StandardScaler(withMean = true, withStd = true)
  .fit(parsedData.map(x = x.features))
val scaledData = parsedData
  .map(x =
  LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray

// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)

// Evaluate model on training examples and compute training error *
 tried using both scaledData and parsedData
val valuesAndPreds = scaledData.map { point =
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) = math.pow((v - p), 2)}.mean()
println(training Mean Squared Error =  + MSE)

 Both scaled and unscaled attempts give:

 training Mean Squared Error = NaN

 I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
 still comes up with the same thing.

 Is this not supposed to work for x and y or 2 dimensional plots? Is there
 something I'm missing or wrong in the code above? Or is there a limitation
 in the method?

 Thanks for any advice.



 --
 View this message in context:
 http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
 Sent from the Apache Spark Developers List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Robin East
-dev, +user

You’ll need to set the gradient descent step size to something small - a bit of 
trial and error shows that 0.0001 works.

You’ll need to create a LinearRegressionWithSGD instance and set the step size 
explicitly:

val lr = new LinearRegressionWithSGD()
lr.optimizer.setStepSize(0.0001)
lr.optimizer.setNumIterations(100)
val model = lr.run(parsedData)

On 15 Jan 2015, at 16:46, devl.development devl.developm...@gmail.com wrote:

 From what I gather, you use LinearRegressionWithSGD to predict y or the
 response variable given a feature vector x.
 
 In a simple example I used a perfectly linear dataset such that x=y
 y,x
 1,1
 2,2
 ...
 
 1,1
 
 Using the out-of-box example from the website (with and without scaling):
 
 val data = sc.textFile(file)
 
val parsedData = data.map { line =
  val parts = line.split(',')
 LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
 and x
 
}
val scaler = new StandardScaler(withMean = true, withStd = true)
  .fit(parsedData.map(x = x.features))
val scaledData = parsedData
  .map(x =
  LabeledPoint(x.label,
scaler.transform(Vectors.dense(x.features.toArray
 
// Building the model
val numIterations = 100
val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 
// Evaluate model on training examples and compute training error *
 tried using both scaledData and parsedData
val valuesAndPreds = scaledData.map { point =
  val prediction = model.predict(point.features)
  (point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) = math.pow((v - p), 2)}.mean()
println(training Mean Squared Error =  + MSE)
 
 Both scaled and unscaled attempts give:
 
 training Mean Squared Error = NaN
 
 I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
 still comes up with the same thing.
 
 Is this not supposed to work for x and y or 2 dimensional plots? Is there
 something I'm missing or wrong in the code above? Or is there a limitation
 in the method?
 
 Thanks for any advice.
 
 
 
 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 



Re: LinearRegressionWithSGD accuracy

2015-01-15 Thread Joseph Bradley
It looks like you're training on the non-scaled data but testing on the
scaled data.  Have you tried this training  testing on only the scaled
data?

On Thu, Jan 15, 2015 at 10:42 AM, Devl Devel devl.developm...@gmail.com
wrote:

 Thanks, that helps a bit at least with the NaN but the MSE is still very
 high even with that step size and 10k iterations:

 training Mean Squared Error = 3.3322561285919316E7

 Does this method need say 100k iterations?






 On Thu, Jan 15, 2015 at 5:42 PM, Robin East robin.e...@xense.co.uk
 wrote:

  -dev, +user
 
  You’ll need to set the gradient descent step size to something small - a
  bit of trial and error shows that 0.0001 works.
 
  You’ll need to create a LinearRegressionWithSGD instance and set the step
  size explicitly:
 
  val lr = new LinearRegressionWithSGD()
  lr.optimizer.setStepSize(0.0001)
  lr.optimizer.setNumIterations(100)
  val model = lr.run(parsedData)
 
  On 15 Jan 2015, at 16:46, devl.development devl.developm...@gmail.com
  wrote:
 
  From what I gather, you use LinearRegressionWithSGD to predict y or the
  response variable given a feature vector x.
 
  In a simple example I used a perfectly linear dataset such that x=y
  y,x
  1,1
  2,2
  ...
 
  1,1
 
  Using the out-of-box example from the website (with and without scaling):
 
  val data = sc.textFile(file)
 
 val parsedData = data.map { line =
   val parts = line.split(',')
  LabeledPoint(parts(1).toDouble, Vectors.dense(parts(0).toDouble)) //y
  and x
 
 }
 val scaler = new StandardScaler(withMean = true, withStd = true)
   .fit(parsedData.map(x = x.features))
 val scaledData = parsedData
   .map(x =
   LabeledPoint(x.label,
 scaler.transform(Vectors.dense(x.features.toArray
 
 // Building the model
 val numIterations = 100
 val model = LinearRegressionWithSGD.train(parsedData, numIterations)
 
 // Evaluate model on training examples and compute training error *
  tried using both scaledData and parsedData
 val valuesAndPreds = scaledData.map { point =
   val prediction = model.predict(point.features)
   (point.label, prediction)
 }
 val MSE = valuesAndPreds.map{case(v, p) = math.pow((v - p),
 2)}.mean()
 println(training Mean Squared Error =  + MSE)
 
  Both scaled and unscaled attempts give:
 
  training Mean Squared Error = NaN
 
  I've even tried x, y+(sample noise from normal with mean 0 and stddev 1)
  still comes up with the same thing.
 
  Is this not supposed to work for x and y or 2 dimensional plots? Is there
  something I'm missing or wrong in the code above? Or is there a
 limitation
  in the method?
 
  Thanks for any advice.
 
 
 
  --
  View this message in context:
 
 http://apache-spark-developers-list.1001551.n3.nabble.com/LinearRegressionWithSGD-accuracy-tp10127.html
  Sent from the Apache Spark Developers List mailing list archive at
  Nabble.com.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org