[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-09 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r44308271
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  /** Number of instances in DataFrame predictions */
+  lazy val numInstances: Long = predictions.count()
+
+  /** Degrees of freedom */
+  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+numInstances - model.coefficients.size - 1
+  } else {
+numInstances - model.coefficients.size
+  }
+
+  /**
+   * The weighted residuals, the usual residuals rescaled by
+   * the square root of the instance weights.
+   */
+  lazy val devianceResiduals: Array[Double] = {
--- End diff --

Sounds good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-09 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r44254161
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  /** Number of instances in DataFrame predictions */
+  lazy val numInstances: Long = predictions.count()
+
+  /** Degrees of freedom */
+  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+numInstances - model.coefficients.size - 1
+  } else {
+numInstances - model.coefficients.size
+  }
+
+  /**
+   * The weighted residuals, the usual residuals rescaled by
+   * the square root of the instance weights.
+   */
+  lazy val devianceResiduals: Array[Double] = {
--- End diff --

@jkbradley There is "residuals" already exist, so I call it 
```devianceResiduals```. I agree your opinion about adding other types of 
residuals later, so I think we can try to combine the two functions into one 
with different arguments. We also need do some code clean up for 
```LinearRegressionSummary``` due to redundant arguments, I can finish it in a 
follow up PR. @mengxr  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r44046160
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  /** Number of instances in DataFrame predictions */
+  lazy val numInstances: Long = predictions.count()
+
+  /** Degrees of freedom */
+  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+numInstances - model.coefficients.size - 1
+  } else {
+numInstances - model.coefficients.size
+  }
+
+  /**
+   * The weighted residuals, the usual residuals rescaled by
+   * the square root of the instance weights.
+   */
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .first()
+Array(dr.getDouble(0), dr.getDouble(1))
+  }
+
+  /**
+   * Standard error of estimated coefficients.
+   * Note that standard error of estimated intercept is not supported 
currently.
+   */
+  lazy val coefficientStandardErrors: Array[Double] = {
--- End diff --

Should we return a Vector (to match the type of coefficients)?  Same for 
tValues and pValues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r44046167
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/CholeskyDecomposition.scala 
---
@@ -40,4 +40,20 @@ private[spark] object CholeskyDecomposition {
 assert(code == 0, s"lapack.dpotrs returned $code.")
 bx
   }
+
+  /**
+   * Computes the inverse of a real symmetric positive definite matrix A
+   * using the Cholesky factorization A = U**T*U.
+   * The input arguments are modified in-place to store the inverse matrix.
+   * @param UAi the upper triangular factor U from the Cholesky 
factorization A = U**T*U
+   * @param k the dimension of A
+   * @return the upper triangle of the (symmetric) inverse of A
+   */
+  def inverse(UAi: Array[Double], k: Int): Array[Double] = {
+val info = new intW(0)
+lapack.dpptri("U", k, UAi, info)
+val code = info.`val`
+assert(code == 0, s"lapack.dpptri returned $code.")
--- End diff --

This throws an AssertionError on failure.  It'd be better to throw a 
RuntimeError (or one based on the return code, though that may be too much 
trouble).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-05 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r44046156
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  /** Number of instances in DataFrame predictions */
+  lazy val numInstances: Long = predictions.count()
+
+  /** Degrees of freedom */
+  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+numInstances - model.coefficients.size - 1
+  } else {
+numInstances - model.coefficients.size
+  }
+
+  /**
+   * The weighted residuals, the usual residuals rescaled by
+   * the square root of the instance weights.
+   */
+  lazy val devianceResiduals: Array[Double] = {
--- End diff --

I'm late to comment, but am wondering:
* Why do we not return all deviance residuals as a DataFrame?  If we only 
return min,max, then that should be documented.  But I'd prefer we return a 
DataFrame with all deviance residuals.
* Should we follow R's example and just call this "residuals"?  That will 
let us add other types of residuals later (specified via an argument, with a 
default argument of "deviance").


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-03 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153289839
  
@mengxr I created 
[SPARK-11473](https://issues.apache.org/jira/browse/SPARK-11473) to track the 
issue of supporting summary statistic for intercept. I can work on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-03 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43770122
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -474,6 +487,75 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  /** Number of instances in DataFrame predictions */
+  lazy val numInstances: Long = predictions.count()
+
+  /** Degrees of freedom */
+  private val degreesOfFreedom: Long = if (model.getFitIntercept) {
+numInstances - model.coefficients.size - 1
+  } else {
+numInstances - model.coefficients.size
+  }
+
+  /**
+   * The weighted residuals, the usual residuals rescaled by
+   * the square root of the instance weights.
+   */
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .first()
+Array(dr.getDouble(0), dr.getDouble(1))
+  }
+
+  /**
+   * Standard error of estimated coefficients.
+   * Note that standard error of estimated intercept is not supported 
currently.
+   */
+  lazy val coefficientStandardErrors: Array[Double] = {
+if (diagInvAtWA.length == 1 && diagInvAtWA(0) == 0) {
+  throw new UnsupportedOperationException(
+"No Std. Error of coefficients available for this 
LinearRegressionModel")
+} else {
+  val rss = if (model.getWeightCol.isEmpty) {
+meanSquaredError * numInstances
+  } else {
+val t = udf { (pred: Double, label: Double, weight: Double) =>
+  math.pow(label - pred, 2.0) * weight }
+predictions.select(t(col(model.getPredictionCol), 
col(model.getLabelCol),
+  
col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).first().getDouble(0)
+  }
+  val sigma2 = rss / degreesOfFreedom
+  diagInvAtWA.map(_ * sigma2).map(math.sqrt(_))
+}
+  }
+
+  /** T-statistic of estimated coefficients.
--- End diff --

minor: This is ScalaDoc style. We can fix it in the next update.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/9413


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-03 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153406552
  
LGTM. Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153031528
  
In the current implementation we provide ```Std. Error``` for 
```coefficients``` excepts ```intercept```, because that we use optimized 
method to calculate ```intercept```. If we want to calculate ```Std. Error``` 
for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` 
like
```scala
val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
val newAtB = Array.concat(abBar.values, Array(bBar))

val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
```
I'm afraid that it will cause performance degradation, so I propose output 
```Std. Error``` only for ```coefficients``. May be here we should discuss, or 
figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org




[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153031344
  
In the current implementation we provide ```Std. Error``` for 
```coefficients``` excepts ```intercept```, because that we use optimized 
method to calculate ```intercept```. If we want to calculate ```Std. Error``` 
for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` 
like
```scala
val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
val newAtB = Array.concat(abBar.values, Array(bBar))

val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
```
I'm afraid that it will cause performance degradation, so I propose output 
```Std. Error``` only for ```coefficients``. May be here we should discuss, or 
figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153031428
  
In the current implementation we provide ```Std. Error``` for 
```coefficients``` excepts ```intercept```, because that we use optimized 
method to calculate ```intercept```. If we want to calculate ```Std. Error``` 
for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` 
like
```scala
val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
val newAtB = Array.concat(abBar.values, Array(bBar))

val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
```
I'm afraid that it will cause performance degradation, so I propose output 
```Std. Error``` only for ```coefficients``. May be here we should discuss, or 
figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
Github user yanboliang commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153031391
  
In the current implementation we provide ```Std. Error``` for 
```coefficients``` excepts ```intercept```, because that we use optimized 
method to calculate ```intercept```. If we want to calculate ```Std. Error``` 
for ```intercept```, we need to concat ```aBar``` array with ```aaBar.values``` 
like
```scala
val newAtA = Array.concat(aaBar.values, summary.aBar.toArray, Array(1.0))
val newAtB = Array.concat(abBar.values, Array(bBar))

val xWithIntercept = CholeskyDecomposition.solve(newAtA, newAtB)
val newAtAi = CholeskyDecomposition.inverse(newAtA, summary.k)
```
I'm afraid that it will cause performance degradation, so I propose output 
```Std. Error``` only for ```coefficients``. May be here we should discuss, or 
figure out better way to output ```Std. Error``` for ```intercept``. @mengxr


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153029589
  
**[Test build #44812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/consoleFull)**
 for PR 9413 at commit 
[`655fb43`](https://github.com/apache/spark/commit/655fb436950e44e1783a2bc3767e40a0295ce83f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43634982
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .take(1)(0)
+Array(dr.getDouble(0), dr.getDouble(1))
--- End diff --

DataFrame currently does not provide interface to calculate percentile 
(only Hive UDAF), so here we only provide max and min value of deviance 
residuals. [SPARK-9299](https://issues.apache.org/jira/browse/SPARK-9299) works 
on providing ```percentile``` and ```percentile_approx``` aggregate functions, 
after it was resolved we can provide deviance residuals of quantile (0.25, 0.5, 
0.75).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/9413

[SPARK-9836] [ML] Provide R-like summary statistics for OLS via normal 
equation solver

https://issues.apache.org/jira/browse/SPARK-9836

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-9836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9413


commit 655fb436950e44e1783a2bc3767e40a0295ce83f
Author: Yanbo Liang 
Date:   2015-11-02T14:07:56Z

Provide R-like summary statistics for OLS via normal equation solver




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153263990
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153265915
  
**[Test build #44891 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/consoleFull)**
 for PR 9413 at commit 
[`42ac991`](https://github.com/apache/spark/commit/42ac991775af48ab80869d0d2d9874cadf665b3e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153281693
  
**[Test build #44891 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/consoleFull)**
 for PR 9413 at commit 
[`42ac991`](https://github.com/apache/spark/commit/42ac991775af48ab80869d0d2d9874cadf665b3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153264106
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153281755
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44891/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153281753
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685888
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -26,10 +26,12 @@ import org.apache.spark.rdd.RDD
  * Model fitted by [[WeightedLeastSquares]].
  * @param coefficients model coefficients
  * @param intercept model intercept
+ * @param diag diagonal of matrix (A^T * W * A)^-1
--- End diff --

`diag` is not a descriptive name, `diagInvAtWA?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685972
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
--- End diff --

Keep this one private, or a more descriptive name? We need explicit types 
for public/private members.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685896
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala ---
@@ -109,6 +113,9 @@ private[ml] class WeightedLeastSquares(
 
 val x = new DenseVector(CholeskyDecomposition.solve(aaBar.values, 
abBar.values))
 
+val aaInv = CholeskyDecomposition.inverse(aaBar.values, k)
+val diag = new DenseVector((1 to k).map{ i => aaInv(i + (i - 1) * i / 
2 - 1) / wSum }.toArray)
--- End diff --

Need an inline comment to explain the index mapping. It is sufficient to 
just mention that `aaInv` is a packed upper triangular matrix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685986
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .take(1)(0)
+Array(dr.getDouble(0), dr.getDouble(1))
+  }
+
+  lazy val seCoef: Array[Double] = {
--- End diff --

`coefficientStandardErrors`? It is hard to guess what `seCoef` means. In 
the doc, we should say "intercept" is not supported.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685991
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .take(1)(0)
+Array(dr.getDouble(0), dr.getDouble(1))
+  }
+
+  lazy val seCoef: Array[Double] = {
+if (diag.length == 1 && diag(0) == 0) {
+  throw new UnsupportedOperationException(
+"No Std. Error coefficients available for this 
LinearRegressionModel")
+} else {
+  val rss = if (model.getWeightCol.isEmpty) {
+meanSquaredError * numInstances
+  } else {
+val t = udf { (pred: Double, label: Double, weight: Double) =>
+  math.pow(label - pred, 2.0) * weight }
+predictions.select(t(col(model.getPredictionCol), 
col(model.getLabelCol),
+  
col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).take(1)(0).getDouble(0)
+  }
+  val sigma2 = rss / dfe
+  diag.map(_ * sigma2).map(math.sqrt(_))
+}
+  }
+
+  lazy val tVals: Array[Double] = {
--- End diff --

`tValues`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685999
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .take(1)(0)
+Array(dr.getDouble(0), dr.getDouble(1))
+  }
+
+  lazy val seCoef: Array[Double] = {
+if (diag.length == 1 && diag(0) == 0) {
+  throw new UnsupportedOperationException(
+"No Std. Error coefficients available for this 
LinearRegressionModel")
+} else {
+  val rss = if (model.getWeightCol.isEmpty) {
+meanSquaredError * numInstances
+  } else {
+val t = udf { (pred: Double, label: Double, weight: Double) =>
+  math.pow(label - pred, 2.0) * weight }
+predictions.select(t(col(model.getPredictionCol), 
col(model.getLabelCol),
+  
col(model.getWeightCol)).as("wse")).agg(sum(col("wse"))).take(1)(0).getDouble(0)
+  }
+  val sigma2 = rss / dfe
+  diag.map(_ * sigma2).map(math.sqrt(_))
+}
+  }
+
+  lazy val tVals: Array[Double] = {
+if (diag.length == 1 && diag(0) == 0) {
+  throw new UnsupportedOperationException(
+"No t values available for this LinearRegressionModel")
+} else {
+  model.weights.toArray.zip(seCoef).map { x => x._1 / x._2 }
+}
+  }
+
+  lazy val pVals: Array[Double] = {
--- End diff --

`pValues`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685970
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
--- End diff --

missing doc (please also update other public/private methods)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43685982
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
+val weighted = if (model.getWeightCol.isEmpty) lit(1.0) else 
sqrt(col(model.getWeightCol))
+val dr = 
predictions.select(col(model.getLabelCol).minus(col(model.getPredictionCol))
+  .multiply(weighted).as("weightedResiduals"))
+  .select(min(col("weightedResiduals")).as("min"), 
max(col("weightedResiduals")).as("max"))
+  .take(1)(0)
--- End diff --

`.take(1)(0`) -> `.first()`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43686227
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala ---
@@ -471,6 +484,59 @@ class LinearRegressionSummary private[regression] (
 predictions.select(t(col(predictionCol), 
col(labelCol)).as("residuals"))
   }
 
+  lazy val numInstances: Long = predictions.count()
+
+  lazy val dfe = if (model.getFitIntercept) {
+numInstances - model.weights.size -1
+  } else {
+numInstances - model.weights.size
+  }
+
+  lazy val devianceResiduals: Array[Double] = {
--- End diff --

It is useful to document that this is weighted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153165800
  
@yanboliang The implementation looks good to me. I left some comments about 
comment/documentation. Could you address them today to catch 1.6? It is okay to 
address the issue with intercept in a follow-up PR. You can create a JIRA for 
it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9413#discussion_r43702995
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala 
---
@@ -715,4 +724,63 @@ class LinearRegressionSuite extends SparkFunSuite with 
MLlibTestSparkContext {
 .sliding(2)
 .forall(x => x(0) >= x(1)))
   }
+
+  test("linear regression training summary with weighted samples by normal 
solver") {
--- End diff --

Could you also add a test without intercept?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153027517
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153027489
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153043631
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153043459
  
**[Test build #44812 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/consoleFull)**
 for PR 9413 at commit 
[`655fb43`](https://github.com/apache/spark/commit/655fb436950e44e1783a2bc3767e40a0295ce83f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-9836] [ML] Provide R-like summary stati...

2015-11-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9413#issuecomment-153043633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/44812/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org