[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141851806
  
Can you merge the master to resolve the conflicts? Also, add warning in 
training summary that it ignores the training weights currently (except for the 
objective trace).

Other than those small items, LGTM. You may remove WIP.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937401
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .zip(testSummary.residuals.select("residuals").collect())
   .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 
relTol 1E-5 }
   }
+
+  test("linear regression with weighted samples"){
+val (data, weightedData) = {
+  val activeData = LinearDataGenerator.generateLinearInput(
+6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 
0.1)
+
+  val rnd = new Random(8392)
+  val signedData = activeData map { case p: LabeledPoint =>
+(rnd.nextGaussian() > 0.0, p)
+  }
+
+  val data1 = signedData flatMap {
+case (true, p) => Iterator(p, p)
+case (false, p) => Iterator(p)
+  }
+
+  val weightedSignedData = signedData flatMap {
+case (true, LabeledPoint(label, features)) =>
+  Iterator(
+Instance(label, 1.2, features),
+Instance(label, 0.8, features)
+  )
+case (false, LabeledPoint(label, features)) =>
+  Iterator(
+Instance(label, 0.3, features),
+Instance(label, 0.1, features),
+Instance(label, 0.6, features)
+  )
+  }
+
+  val noiseData = LinearDataGenerator.generateLinearInput(
+2, Array(1, 3), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
+  val weightedNoiseData = noiseData map {
+case LabeledPoint(label, features) => Instance(label, 0, features)
+  }
+  val data2 = weightedSignedData ++ weightedNoiseData
+
+  (sqlContext.createDataFrame(sc.parallelize(data1, 4)),
+sqlContext.createDataFrame(sc.parallelize(data2, 4)))
+}
+
+val trainer1a = (new LinearRegression).setFitIntercept(true)
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
+val trainer1b = (new 
LinearRegression).setFitIntercept(true).setWeightCol("weight")
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
+val model1a0 = trainer1a.fit(data)
+val model1a1 = trainer1a.fit(weightedData)
+val model1b = trainer1b.fit(weightedData)
+assert(model1a0.weights !~= model1a1.weights absTol 1E-3)
+assert(model1a0.intercept !~= model1a1.intercept absTol 1E-3)
+assert(model1a0.weights ~== model1b.weights absTol 1E-3)
+assert(model1a0.intercept ~== model1b.intercept absTol 1E-3)
+
+val trainer2a = (new LinearRegression).setFitIntercept(true)
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
+val trainer2b = (new 
LinearRegression).setFitIntercept(true).setWeightCol("weight")
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
+val model2a0 = trainer2a.fit(data)
+val model2a1 = trainer2a.fit(weightedData)
+val model2b = trainer2b.fit(weightedData)
+assert(model2a0.weights !~= model2a1.weights absTol 1E-3)
+assert(model2a0.intercept !~= model2a1.intercept absTol 1E-3)
+assert(model2a0.weights ~== model2b.weights absTol 1E-3)
+assert(model2a0.intercept ~== model2b.intercept absTol 1E-3)
+
+val trainer3a = (new LinearRegression).setFitIntercept(false)
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
+val trainer3b = (new 
LinearRegression).setFitIntercept(false).setWeightCol("weight")
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(true)
+val model3a0 = trainer3a.fit(data)
+val model3a1 = trainer3a.fit(weightedData)
+val model3b = trainer3b.fit(weightedData)
+assert(model3a0.weights !~= model3a1.weights absTol 1E-3)
+assert(model3a0.weights ~== model3b.weights absTol 1E-3)
+
+val trainer4a = (new LinearRegression).setFitIntercept(false)
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
+val trainer4b = (new 
LinearRegression).setFitIntercept(false).setWeightCol("weight")
+  .setElasticNetParam(0.38).setRegParam(0.21).setStandardization(false)
+val model4a0 = trainer4a.fit(data)
+val model4a1 = trainer4a.fit(weightedData)
+val model4b = trainer4b.fit(weightedData)
+assert(model4a0.weights !~= model4a1.weights absTol 1E-3)
+assert(model4a0.weights ~== model4b.weights absTol 1E-3)
+
--- End diff --

remove this extra line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937392
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .zip(testSummary.residuals.select("residuals").collect())
   .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 
relTol 1E-5 }
   }
+
+  test("linear regression with weighted samples"){
+val (data, weightedData) = {
+  val activeData = LinearDataGenerator.generateLinearInput(
+6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 
0.1)
+
+  val rnd = new Random(8392)
+  val signedData = activeData map { case p: LabeledPoint =>
+(rnd.nextGaussian() > 0.0, p)
+  }
+
+  val data1 = signedData flatMap {
+case (true, p) => Iterator(p, p)
+case (false, p) => Iterator(p)
+  }
+
+  val weightedSignedData = signedData flatMap {
+case (true, LabeledPoint(label, features)) =>
+  Iterator(
+Instance(label, 1.2, features),
+Instance(label, 0.8, features)
+  )
+case (false, LabeledPoint(label, features)) =>
+  Iterator(
+Instance(label, 0.3, features),
+Instance(label, 0.1, features),
+Instance(label, 0.6, features)
+  )
+  }
+
+  val noiseData = LinearDataGenerator.generateLinearInput(
+2, Array(1, 3), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 0.1)
+  val weightedNoiseData = noiseData map {
+case LabeledPoint(label, features) => Instance(label, 0, features)
--- End diff --

Make `case LabeledPoint(label, features) => Instance(label, weight =  0.0, 
features)` for easier readability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937361
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .zip(testSummary.residuals.select("residuals").collect())
   .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 
relTol 1E-5 }
   }
+
+  test("linear regression with weighted samples"){
+val (data, weightedData) = {
+  val activeData = LinearDataGenerator.generateLinearInput(
+6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 
0.1)
+
+  val rnd = new Random(8392)
+  val signedData = activeData map { case p: LabeledPoint =>
+(rnd.nextGaussian() > 0.0, p)
+  }
+
+  val data1 = signedData flatMap {
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937357
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .zip(testSummary.residuals.select("residuals").collect())
   .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 
relTol 1E-5 }
   }
+
+  test("linear regression with weighted samples"){
+val (data, weightedData) = {
+  val activeData = LinearDataGenerator.generateLinearInput(
+6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 
0.1)
+
+  val rnd = new Random(8392)
+  val signedData = activeData map { case p: LabeledPoint =>
--- End diff --

Please use `activeData.map`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937365
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -510,4 +513,90 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
   .zip(testSummary.residuals.select("residuals").collect())
   .forall { case (Row(r1: Double), Row(r2: Double)) => r1 ~== r2 
relTol 1E-5 }
   }
+
+  test("linear regression with weighted samples"){
+val (data, weightedData) = {
+  val activeData = LinearDataGenerator.generateLinearInput(
+6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 500, 1, 
0.1)
+
+  val rnd = new Random(8392)
+  val signedData = activeData map { case p: LabeledPoint =>
+(rnd.nextGaussian() > 0.0, p)
+  }
+
+  val data1 = signedData flatMap {
+case (true, p) => Iterator(p, p)
+case (false, p) => Iterator(p)
+  }
+
+  val weightedSignedData = signedData flatMap {
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937291
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -598,17 +629,14 @@ private class LeastSquaresCostFun(
 featuresMean: Array[Double],
 effectiveL2regParam: Double) extends DiffFunction[BDV[Double]] {
 
-  override def calculate(weights: BDV[Double]): (Double, BDV[Double]) = {
-val w = Vectors.fromBreeze(weights)
+  override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) 
= {
+val coeff = Vectors.fromBreeze(coefficients)
 
-val leastSquaresAggregator = data.treeAggregate(new 
LeastSquaresAggregator(w, labelStd,
+val leastSquaresAggregator = data.treeAggregate(new 
LeastSquaresAggregator(coeff, labelStd,
   labelMean, fitIntercept, featuresStd, featuresMean))(
-seqOp = (c, v) => (c, v) match {
-  case (aggregator, (label, features)) => aggregator.add(label, 
features)
-},
-combOp = (c1, c2) => (c1, c2) match {
-  case (aggregator1, aggregator2) => aggregator1.merge(aggregator2)
-})
+seqOp = (aggregator, instance) => aggregator.add(instance),
+combOp = (aggregator1, aggregator2) => 
aggregator1.merge(aggregator2)
+)
 
--- End diff --

Move `)` to the end of line `combOp`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937180
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -31,21 +31,30 @@ import org.apache.spark.ml.util.Identifiable
 import org.apache.spark.mllib.evaluation.RegressionMetrics
 import org.apache.spark.mllib.linalg.{Vector, Vectors}
 import org.apache.spark.mllib.linalg.BLAS._
-import org.apache.spark.mllib.regression.LabeledPoint
 import org.apache.spark.mllib.stat.MultivariateOnlineSummarizer
 import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{DataFrame, Row}
-import org.apache.spark.sql.functions.{col, udf}
-import org.apache.spark.sql.types.StructField
+import org.apache.spark.sql.functions.{col, udf, lit}
 import org.apache.spark.storage.StorageLevel
-import org.apache.spark.util.StatCounter
+
--- End diff --

remove extra line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937145
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -520,28 +544,28 @@ private class LeastSquaresAggregator(
* Add a new training data to this LeastSquaresAggregator, and update 
the loss and gradient
* of the objective function.
*
-   * @param label The label for this data point.
-   * @param data The features for one data point in dense/sparse vector 
format to be added
-   * into this aggregator.
+   * @param data  The data point to be added.
* @return This LeastSquaresAggregator object.
--- End diff --

make `data` as `instance`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937155
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -493,26 +515,28 @@ private class LeastSquaresAggregator(
 featuresMean: Array[Double]) extends Serializable {
 
   private var totalCnt: Long = 0L
+  private var weightSum: Double = 0
--- End diff --

`private var weightSum: Double = 0.0`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-20 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39937140
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -520,28 +544,28 @@ private class LeastSquaresAggregator(
* Add a new training data to this LeastSquaresAggregator, and update 
the loss and gradient
* of the objective function.
*
-   * @param label The label for this data point.
-   * @param data The features for one data point in dense/sparse vector 
format to be added
-   * into this aggregator.
+   * @param data  The data point to be added.
* @return This LeastSquaresAggregator object.
*/
-  def add(label: Double, data: Vector): this.type = {
-require(dim == data.size, s"Dimensions mismatch when adding new 
sample." +
-  s" Expecting $dim but got ${data.size}.")
+  def add(data: Instance): this.type = data match { case Instance(label, 
weight, features) =>
+require(dim == features.size, s"Dimensions mismatch when adding new 
sample." +
+  s" Expecting $dim but got ${features.size}.")
+require(weight >= 0.0, s"instance weight, ${weight} has to be >= 0.0")
 
--- End diff --

Please add `if (weight == 0) return this`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141533658
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141533663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141533529
  
  [Test build #42670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/console)
 for   PR 8631 at commit 
[`854d0bb`](https://github.com/apache/spark/commit/854d0bb58d0a6b43135ce9e750e4f9df36a65003).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141492966
  
  [Test build #42670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42670/consoleFull)
 for   PR 8631 at commit 
[`854d0bb`](https://github.com/apache/spark/commit/854d0bb58d0a6b43135ce9e750e4f9df36a65003).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141489800
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141489727
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141374317
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141374237
  
  [Test build #42640 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/console)
 for   PR 8631 at commit 
[`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override 
val uid: String)`
  * `  require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint 
must be 1.0 or 0.0")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141374315
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141345895
  
  [Test build #42640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/consoleFull)
 for   PR 8631 at commit 
[`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344287
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344297
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread rotationsymmetry
Github user rotationsymmetry commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344261
  
@dbtsai Thanks for the comment on indentation. I have fixed it in the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141301253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141301251
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141301161
  
  [Test build #42621 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/console)
 for   PR 8631 at commit 
[`2afa2a1`](https://github.com/apache/spark/commit/2afa2a190368adb99ec398c64744fc7dafc98bed).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Interaction(override val uid: String) extends Transformer`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39812153
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -123,30 +132,41 @@ class LinearRegression(override val uid: String)
   def setTol(value: Double): this.type = set(tol, value)
   setDefault(tol -> 1E-6)
 
+  /**
+   * Whether to over-/under-sample training instances according to the 
given weights in weightCol.
+   * If empty, all instances are treated equally (weight 1.0).
+   * Default is empty, so all instances have weight one.
+   * @group setParam
+   */
+  def setWeightCol(value: String): this.type = set(weightCol, value)
+  setDefault(weightCol -> "")
+
   override protected def train(dataset: DataFrame): LinearRegressionModel 
= {
 // Extract columns from data.  If dataset is persisted, do not persist 
instances.
-val instances = extractLabeledPoints(dataset).map {
-  case LabeledPoint(label: Double, features: Vector) => (label, 
features)
+val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol))
+val instances: RDD[Instance] = dataset.select(col($(labelCol)), w, 
col($(featuresCol))).map {
+  case Row(label: Double, weight: Double, features: Vector) =>
+Instance(label, weight, features)
 }
+
 val handlePersistence = dataset.rdd.getStorageLevel == 
StorageLevel.NONE
 if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
 
-val (summarizer, statCounter) = instances.treeAggregate(
-  (new MultivariateOnlineSummarizer, new StatCounter))(
-seqOp = (c, v) => (c, v) match {
-  case ((summarizer: MultivariateOnlineSummarizer, statCounter: 
StatCounter),
-  (label: Double, features: Vector)) =>
-(summarizer.add(features), statCounter.merge(label))
-  },
-combOp = (c1, c2) => (c1, c2) match {
-  case ((summarizer1: MultivariateOnlineSummarizer, statCounter1: 
StatCounter),
-  (summarizer2: MultivariateOnlineSummarizer, statCounter2: 
StatCounter)) =>
-(summarizer1.merge(summarizer2), 
statCounter1.merge(statCounter2))
-  })
-
-val numFeatures = summarizer.mean.size
-val yMean = statCounter.mean
-val yStd = math.sqrt(statCounter.variance)
+val (featuresSummarizer, ySummarizer) = {
+  val seqOp = (c: (MultivariateOnlineSummarizer, 
MultivariateOnlineSummarizer),
+   instance: Instance) =>
+(c._1.add(instance.features, instance.weight),
+  c._2.add(Vectors.dense(instance.label), instance.weight))
+  val combOp = (c1: (MultivariateOnlineSummarizer, 
MultivariateOnlineSummarizer),
+c2: (MultivariateOnlineSummarizer, 
MultivariateOnlineSummarizer)) =>
+(c1._1.merge(c2._1), c1._2.merge(c2._2))
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39812144
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -123,30 +132,41 @@ class LinearRegression(override val uid: String)
   def setTol(value: Double): this.type = set(tol, value)
   setDefault(tol -> 1E-6)
 
+  /**
+   * Whether to over-/under-sample training instances according to the 
given weights in weightCol.
+   * If empty, all instances are treated equally (weight 1.0).
+   * Default is empty, so all instances have weight one.
+   * @group setParam
+   */
+  def setWeightCol(value: String): this.type = set(weightCol, value)
+  setDefault(weightCol -> "")
+
   override protected def train(dataset: DataFrame): LinearRegressionModel 
= {
 // Extract columns from data.  If dataset is persisted, do not persist 
instances.
-val instances = extractLabeledPoints(dataset).map {
-  case LabeledPoint(label: Double, features: Vector) => (label, 
features)
+val w = if ($(weightCol).isEmpty) lit(1.0) else col($(weightCol))
+val instances: RDD[Instance] = dataset.select(col($(labelCol)), w, 
col($(featuresCol))).map {
+  case Row(label: Double, weight: Double, features: Vector) =>
+Instance(label, weight, features)
 }
+
 val handlePersistence = dataset.rdd.getStorageLevel == 
StorageLevel.NONE
 if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)
 
-val (summarizer, statCounter) = instances.treeAggregate(
-  (new MultivariateOnlineSummarizer, new StatCounter))(
-seqOp = (c, v) => (c, v) match {
-  case ((summarizer: MultivariateOnlineSummarizer, statCounter: 
StatCounter),
-  (label: Double, features: Vector)) =>
-(summarizer.add(features), statCounter.merge(label))
-  },
-combOp = (c1, c2) => (c1, c2) match {
-  case ((summarizer1: MultivariateOnlineSummarizer, statCounter1: 
StatCounter),
-  (summarizer2: MultivariateOnlineSummarizer, statCounter2: 
StatCounter)) =>
-(summarizer1.merge(summarizer2), 
statCounter1.merge(statCounter2))
-  })
-
-val numFeatures = summarizer.mean.size
-val yMean = statCounter.mean
-val yStd = math.sqrt(statCounter.variance)
+val (featuresSummarizer, ySummarizer) = {
+  val seqOp = (c: (MultivariateOnlineSummarizer, 
MultivariateOnlineSummarizer),
+   instance: Instance) =>
--- End diff --

indentation. see LoR for example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141258078
  
  [Test build #42621 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42621/consoleFull)
 for   PR 8631 at commit 
[`2afa2a1`](https://github.com/apache/spark/commit/2afa2a190368adb99ec398c64744fc7dafc98bed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141257039
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141257022
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141217999
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141218002
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141217897
  
  [Test build #42611 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/console)
 for   PR 8631 at commit 
[`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141163296
  
  [Test build #42611 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42611/consoleFull)
 for   PR 8631 at commit 
[`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141160787
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141160754
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141160556
  
Jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread rotationsymmetry
Github user rotationsymmetry commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141136871
  
retest this please.

"org.apache.spark.HeartbeatReceiverSuite.reregister if heartbeat from 
removed executor" failed, which should be unrelated to this patch.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140996440
  
  [Test build #42579 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/console)
 for   PR 8631 at commit 
[`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140996513
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140996512
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140973032
  
  [Test build #42579 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42579/consoleFull)
 for   PR 8631 at commit 
[`3f98247`](https://github.com/apache/spark/commit/3f98247801368a86aaffabd78b3755bf36fab330).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140972488
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140972473
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-16 Thread rotationsymmetry
Github user rotationsymmetry commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140972365
  
@dbtsai Thank you for your comments. I have revised the patch. Please test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39580975
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -123,30 +123,48 @@ class LinearRegression(override val uid: String)
   def setTol(value: Double): this.type = set(tol, value)
   setDefault(tol -> 1E-6)
 
+  /**
+   * Whether to over-/undersamples each of training instance according to 
the given
--- End diff --

The doc is changed in LoR. Please sync with that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39580918
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -123,30 +123,48 @@ class LinearRegression(override val uid: String)
   def setTol(value: Double): this.type = set(tol, value)
   setDefault(tol -> 1E-6)
 
+  /**
+   * Whether to over-/undersamples each of training instance according to 
the given
+   * weight in `weightCol`. If empty, all samples are supposed to have 
weights as 1.0.
+   * Default is empty, so all samples have weight one.
+   * @group setParam
+   */
+  def setWeightCol(value: String): this.type = set(weightCol, value)
+  setDefault(weightCol -> "")
+
   override protected def train(dataset: DataFrame): LinearRegressionModel 
= {
 // Extract columns from data.  If dataset is persisted, do not persist 
instances.
-val instances = extractLabeledPoints(dataset).map {
--- End diff --

use `lit` and `col` for simplifying the code. See example in LoR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39580848
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -572,7 +591,7 @@ private class LeastSquaresAggregator(
 this
   }
 
-  def count: Long = totalCnt
+  def count: Double = totalCnt
 
--- End diff --

We decided to keep `count` as it, and add `weightSum`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-15 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8631#discussion_r39580880
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -589,7 +608,7 @@ private class LeastSquaresAggregator(
  * It's used in Breeze's convex optimization routines.
  */
 private class LeastSquaresCostFun(
-data: RDD[(Double, Vector)],
+data: RDD[(Double, Vector, Double)],
--- End diff --

Refactor the `Instance` case class out from LoR, and use it for code 
readability. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-15 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-140583259
  
Hello, weighted `MultivariateOnlineSummarizer` is merged which unblocks 
you. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread rotationsymmetry
Github user rotationsymmetry commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139587297
  
@dbtsai Thank you for OKing the test. My patch depends on the 
`MultivariateOnlineSummarizer` in your PR for applying weights to logistics 
regressions ([link](https://github.com/apache/spark/pull/7884)). My patch 
should be OK to test after your PR is merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139481651
  
  [Test build #42318 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/console)
 for   PR 8631 at commit 
[`e9093cb`](https://github.com/apache/spark/commit/e9093cbea2554fbc124899a58e3cbfdade5ea795).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class WeightedLabeledPoint(label: Double, features: Vector, 
weight: Double)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139481653
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139481656
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139481021
  
  [Test build #42318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42318/consoleFull)
 for   PR 8631 at commit 
[`e9093cb`](https://github.com/apache/spark/commit/e9093cbea2554fbc124899a58e3cbfdade5ea795).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139480026
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139479994
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139479960
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-11 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-139479926
  
Jenkins, add to whitelist


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-138123811
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-06 Thread rotationsymmetry
GitHub user rotationsymmetry opened a pull request:

https://github.com/apache/spark/pull/8631

[SPARK-9642] [ML] [WIP] LinearRegression should supported weighted data

In many modeling application, data points are not necessarily sampled with 
equal probabilities. Linear regression should support weighting which account 
the over or under sampling.

work in progress. 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rotationsymmetry/spark SPARK-9642

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8631.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8631


commit e9093cbea2554fbc124899a58e3cbfdade5ea795
Author: Meihua Wu 
Date:   2015-09-06T15:15:55Z

[WIP] Add support for weighted sample and associated test.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org