[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support a vertical display mod...

2017-04-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17733#discussion_r113370773
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -277,43 +279,73 @@ class Dataset[T] private[sql](
 
 val sb = new StringBuilder
 val numCols = schema.fieldNames.length
+// We set a minimum column width at '3'
+val minimumColWidth = 3
 
-// Initialise the width of each column to a minimum value of '3'
-val colWidths = Array.fill(numCols)(3)
+if (!vertical) {
+  // Initialise the width of each column to a minimum value
+  val colWidths = Array.fill(numCols)(minimumColWidth)
 
-// Compute the width of each column
-for (row <- rows) {
-  for ((cell, i) <- row.zipWithIndex) {
-colWidths(i) = math.max(colWidths(i), cell.length)
-  }
-}
-
-// Create SeparateLine
-val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
-
-// column names
-rows.head.zipWithIndex.map { case (cell, i) =>
-  if (truncate > 0) {
-StringUtils.leftPad(cell, colWidths(i))
-  } else {
-StringUtils.rightPad(cell, colWidths(i))
+  // Compute the width of each column
+  for (row <- rows) {
+for ((cell, i) <- row.zipWithIndex) {
+  colWidths(i) = math.max(colWidths(i), cell.length)
+}
   }
-}.addString(sb, "|", "|", "|\n")
 
-sb.append(sep)
+  // Create SeparateLine
+  val sep: String = colWidths.map("-" * _).addString(sb, "+", "+", 
"+\n").toString()
 
-// data
-rows.tail.map {
-  _.zipWithIndex.map { case (cell, i) =>
+  // column names
+  rows.head.zipWithIndex.map { case (cell, i) =>
 if (truncate > 0) {
-  StringUtils.leftPad(cell.toString, colWidths(i))
+  StringUtils.leftPad(cell, colWidths(i))
 } else {
-  StringUtils.rightPad(cell.toString, colWidths(i))
+  StringUtils.rightPad(cell, colWidths(i))
 }
   }.addString(sb, "|", "|", "|\n")
-}
 
-sb.append(sep)
+  sb.append(sep)
+
+  // data
+  rows.tail.foreach {
+_.zipWithIndex.map { case (cell, i) =>
+  if (truncate > 0) {
+StringUtils.leftPad(cell.toString, colWidths(i))
+  } else {
+StringUtils.rightPad(cell.toString, colWidths(i))
+  }
+}.addString(sb, "|", "|", "|\n")
+  }
+
+  sb.append(sep)
+} else {
+  // Extended display mode enabled
+  val fieldNames = rows.head
+  val dataRows = rows.tail
+
+  // Compute the width of field name and data columns
+  val fieldNameColWidth = fieldNames.foldLeft(minimumColWidth) { case 
(curMax, fieldName) =>
+math.max(curMax, fieldName.length)
+  }
+  val dataColWidth = dataRows.foldLeft(minimumColWidth) { case 
(curMax, row) =>
+math.max(curMax, row.map(_.length).reduceLeftOption[Int] { case 
(cellMax, cell) =>
+  math.max(cellMax, cell)
+}.getOrElse(0))
+  }
+
+  dataRows.zipWithIndex.foreach { case (row, i) =>
--- End diff --

When no row exists, we at least need to output the column names.

```Scala
df.limit(0).show(20, 0, true)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76167/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #76167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76167/testReport)**
 for PR 17758 at commit 
[`05a7a61`](https://github.com/apache/spark/commit/05a7a61259d87b8fa97214c96cedde9dc52dd3ec).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...

2017-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17768
  
@joshrosen, could you take a look and see if it makes sense?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...

2017-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17768
  
This actually also happens when the directory exists but the user does not 
have the permission.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17768
  
**[Test build #76172 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76172/testReport)**
 for PR 17768 at commit 
[`bf21e3b`](https://github.com/apache/spark/commit/bf21e3bef93cd865744c603e373aef2916b2ce79).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17768: [SPARK-20465][CORE] Throws a proper exception when any t...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17768
  
**[Test build #76171 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76171/testReport)**
 for PR 17768 at commit 
[`b9ce248`](https://github.com/apache/spark/commit/b9ce24832dcd0b91e70026cf71d379cd99f26ead).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentations for f...

2017-04-25 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17737
  
@holdenk do you have bandwidth to review this or ok with me pushing this to 
master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17130: [SPARK-19791] [ML] Add doc and example for fpgrow...

2017-04-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17130#discussion_r113366111
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala ---
@@ -268,12 +269,8 @@ class FPGrowthModel private[ml] (
 val predictUDF = udf((items: Seq[_]) => {
   if (items != null) {
 val itemset = items.toSet
-brRules.value.flatMap(rule =>
-  if (items != null && rule._1.forall(item => 
itemset.contains(item))) {
-rule._2.filter(item => !itemset.contains(item))
-  } else {
-Seq.empty
-  }).distinct
+brRules.value.filter(_._1.forall(itemset.contains))
+  .flatMap(_._2.filter(!itemset.contains(_))).distinct
--- End diff --

right, 2 things - first just calling out while the PR says doc changes 
there is this one code change here.
second, before this code was checking `items != null` do we need not 
consider that now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17728


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17768: [SPARK-20465][CORE] Throws a proper exception whe...

2017-04-25 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/17768

[SPARK-20465][CORE] Throws a proper exception when any temp directory could 
not be got/created (rather than ArrayIndexOutOfBoundsException)

## What changes were proposed in this pull request?

This PR proposes to throw an exception with better message rather than 
`ArrayIndexOutOfBoundsException` when temp directories could not be created.

**Before**

```
./bin/spark-shell --conf 
spark.local.dir=/NONEXISTENT_DIR_ONE,/NONEXISTENT_DIR_TWO
```

```
Exception in thread "main" java.lang.ExceptionInInitializerError
...
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
...
```

**After**

```
Exception in thread "main" java.lang.ExceptionInInitializerError
...
Caused by: java.io.IOException: Failed to get a temp directory under 
[/NONEXISTENT_DIR_ONE,/NONEXISTENT_DIR_TWO].
...
```

## How was this patch tested?

Unit tests in `LocalDirsSuite.scala`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark throws-temp-dir-exception

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17768.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17768


commit 4500c2f2d989bfc1e76e07cbd38a9acbd384d3d5
Author: hyukjinkwon 
Date:   2017-04-26T04:52:33Z

Throws a proper exception rather than ArrayIndexOutOfBoundsException when 
temp directories could not be got/created




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-25 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17728
  
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17191: [SPARK-14471][SQL] Aliases in SELECT could be used in GR...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17191
  
**[Test build #76170 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76170/testReport)**
 for PR 17191 at commit 
[`7b32f46`](https://github.com/apache/spark/commit/7b32f46b1dd83007f066ebcc4dc92a48da6ca89a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...

2017-04-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17191#discussion_r113365310
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -136,6 +136,7 @@ class Analyzer(
   ResolveGroupingAnalytics ::
   ResolvePivot ::
   ResolveOrdinalInOrderByAndGroupBy ::
+  ResolveAggAliasInGroupBy ::
--- End diff --

aha, ok. I'll move there. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17757
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76168/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17757
  
**[Test build #76168 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76168/testReport)**
 for PR 17757 at commit 
[`a87d5c0`](https://github.com/apache/spark/commit/a87d5c0c578542916706745cdbeca58ae24269e8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17757
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17596
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76157/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17596
  
**[Test build #76157 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76157/testReport)**
 for PR 17596 at commit 
[`7f41155`](https://github.com/apache/spark/commit/7f41155bf5c02485c5606f874c327a9330cb2c9f).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17596
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17757#discussion_r113362917
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R ---
@@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # test initialWeights
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
+  model <- spark.mlp(df, label ~ features, layers = c(4, 3), 
initialWeights =
 c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
   mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
   expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
-c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 
9.0, 9.0))
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", 
"1.0", "0.0"))
+   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
--- End diff --

checking more closely it looks like earlier tests do call `predict`. I'm 
good with simplifying this part of the test with weights.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17757#discussion_r113362748
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R ---
@@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # test initialWeights
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
+  model <- spark.mlp(df, label ~ features, layers = c(4, 3), 
initialWeights =
 c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
   mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
   expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
-c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 
9.0, 9.0))
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", 
"1.0", "0.0"))
+   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
--- End diff --

I got the uncoverged test with the maxIter. 
My main concern at this end is to at least exercise calling from R to JVM 
for each public API we export (ie. by calling `predict` on the MLP model) - we 
have had issues in the past the API never works and/or it is broken and we 
don't know.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17733
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17733
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76166/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17733
  
**[Test build #76166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76166/testReport)**
 for PR 17733 at commit 
[`f696d35`](https://github.com/apache/spark/commit/f696d357ae9aa2e850f82d408aa413750c4d84b8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113362108
  
--- Diff: R/pkg/inst/tests/testthat/test_Serde.R ---
@@ -28,6 +28,10 @@ test_that("SerDe of primitive types", {
   expect_equal(x, 1)
   expect_equal(class(x), "numeric")
 
+  x <- callJStatic("SparkRHandler", "echo", 1380742793415240)
--- End diff --

I did some google search. R can't specify `bigint` type. So, we can't 
directly test `bigint` type.

We can remove the tests above, as we added `schema` tests and scala API 
tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17503: [SPARK-3159][MLlib] Check for reducible DecisionTree

2017-04-25 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17503
  
@srowen  I am not sure whether I understand your question clearly. 
RandomForest uses LearningNode to construct tree model when training, and 
convert them to Leaf or InternalNode at last.  Hence, all nodes are same type 
and can be merged when training. 

However, if two children of a node output same prediction, does the node 
keep step with its children? I don't know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/17757#discussion_r113361559
  
--- Diff: R/pkg/inst/tests/testthat/test_mllib_classification.R ---
@@ -284,22 +284,11 @@ test_that("spark.mlp", {
c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
 
   # test initialWeights
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
+  model <- spark.mlp(df, label ~ features, layers = c(4, 3), 
initialWeights =
 c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9))
   mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
   expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2, 
initialWeights =
-c(0.0, 0.0, 0.0, 0.0, 0.0, 5.0, 5.0, 5.0, 5.0, 5.0, 9.0, 9.0, 9.0, 
9.0, 9.0))
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "2.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
-
-  model <- spark.mlp(df, label ~ features, layers = c(4, 3), maxIter = 2)
-  mlpPredictions <- collect(select(predict(model, mlpTestDF), 
"prediction"))
-  expect_equal(head(mlpPredictions$prediction, 10),
-   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "0.0", "2.0", 
"1.0", "0.0"))
+   c("1.0", "1.0", "1.0", "1.0", "0.0", "1.0", "2.0", "2.0", 
"1.0", "0.0"))
--- End diff --

Yeah, here we just removed the unconverged test(with ```maxIter = 2```), 
since we can't guarantee any equality during the iteration. I think the best 
way to test the api works well is to check number of iterations. If we set 
proper initial weights, the number of iterations to converge would be different 
from other initial weights or no initial weights. Let's open a separate JIRA to 
expose training summary for MLP at MLlib side, and then we can expose them at 
SparkR and add check here. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...

2017-04-25 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17191#discussion_r113360559
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -136,6 +136,7 @@ class Analyzer(
   ResolveGroupingAnalytics ::
   ResolvePivot ::
   ResolveOrdinalInOrderByAndGroupBy ::
+  ResolveAggAliasInGroupBy ::
--- End diff --

we have a `postHocResolutionRules`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17503: [SPARK-3159][MLlib] Check for reducible DecisionT...

2017-04-25 Thread facaiy
Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/17503#discussion_r113360409
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 
---
@@ -61,6 +61,8 @@ import org.apache.spark.mllib.tree.impurity.{Entropy, 
Gini, Impurity, Variance}
  * @param subsamplingRate Fraction of the training data used for learning 
decision tree.
  * @param useNodeIdCache If this is true, instead of passing trees to 
executors, the algorithm will
  *   maintain a separate RDD of node Id cache for each 
row.
+ * @param canMergeChildren Merge pairs of leaf nodes of the same parent 
which
--- End diff --

A new parameter is added in Strategy class, which fails Mima tests. How to 
deal with it?

```bash
[error]  * synthetic method $default$13()Int in object 
org.apache.spark.mllib.tree.configuration.Strategy has a different result type 
in current version, where it is Boolean rather than Int
```
[see failed 
logs](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3675/consoleFull)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator

2017-04-25 Thread ebernhardson
Github user ebernhardson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16618#discussion_r113360277
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.evaluation
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.functions.{mean, sum}
+import org.apache.spark.sql.functions.udf
+import org.apache.spark.sql.types.DoubleType
+
+@Since("2.2.0")
+class RankingMetrics(
+  predictionAndObservations: DataFrame, predictionCol: String, labelCol: 
String)
+  extends Logging with Serializable {
+
+  /**
+   * Compute the Mean Percentile Rank (MPR) of all the queries.
+   *
+   * See the following paper for detail ("Expected percentile rank" in the 
paper):
+   * Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for 
Implicit Feedback Datasets.”
+   * In 2008 Eighth IEEE International Conference on Data Mining, 
263–72, 2008.
+   * doi:10.1109/ICDM.2008.22.
+   *
+   * @return the mean percentile rank
+   */
+  lazy val meanPercentileRank: Double = {
+
+def rank = udf((predicted: Seq[Any], actual: Any) => {
+  val l_i = predicted.indexOf(actual)
+
+  if (l_i == -1) {
+1
+  } else {
+l_i.toDouble / predicted.size
+  }
+}, DoubleType)
+
+val R_prime = predictionAndObservations.count()
+val predictionColumn: Column = 
predictionAndObservations.col(predictionCol)
+val labelColumn: Column = predictionAndObservations.col(labelCol)
+
+val rankSum: Double = predictionAndObservations
+  .withColumn("rank", rank(predictionColumn, labelColumn))
+  .agg(sum("rank")).first().getDouble(0)
+
+rankSum / R_prime
+  }
+
+  /**
+   * Compute the average precision of all the queries, truncated at 
ranking position k.
+   *
+   * If for a query, the ranking algorithm returns n (n is less than k) 
results, the precision
+   * value will be computed as #(relevant items retrieved) / k. This 
formula also applies when
+   * the size of the ground truth set is less than k.
+   *
+   * If a query has an empty ground truth set, zero will be used as 
precision together with
+   * a log warning.
+   *
+   * See the following paper for detail:
+   *
+   * IR evaluation methods for retrieving highly relevant documents. K. 
Jarvelin and J. Kekalainen
+   *
+   * @param k the position to compute the truncated precision, must be 
positive
+   * @return the average precision at the first k ranking positions
+   */
+  @Since("2.2.0")
+  def precisionAt(k: Int): Double = {
+require(k > 0, "ranking position k should be positive")
+
+def precisionAtK = udf((predicted: Seq[Any], actual: Seq[Any]) => {
+  val actualSet = actual.toSet
+  if (actualSet.nonEmpty) {
+val n = math.min(predicted.length, k)
+var i = 0
+var cnt = 0
+while (i < n) {
+  if (actualSet.contains(predicted(i))) {
+cnt += 1
+  }
+  i += 1
+}
+cnt.toDouble / k
+  } else {
+logWarning("Empty ground truth set, check input data")
+0.0
+  }
+}, DoubleType)
+
+val predictionColumn: Column = 
predictionAndObservations.col(predictionCol)
+val labelColumn: Column = predictionAndObservations.col(labelCol)
+
+predictionAndObservations
+  .withColumn("predictionAtK", precisionAtK(predictionColumn, 
labelColumn))
+  .agg(mean("predictionAtK")).first().getDouble(0)
+  }
+
+  /**
+   * Returns the mean average precision (MAP) of all the queries.
+   * If a query has an empty ground truth 

[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator

2017-04-25 Thread ebernhardson
Github user ebernhardson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16618#discussion_r113358325
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingEvaluator.scala ---
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.evaluation
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.{IntParam, Param, ParamMap, 
ParamValidators}
+import org.apache.spark.ml.param.shared.{HasLabelCol, HasPredictionCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable, SchemaUtils}
+import org.apache.spark.sql.{DataFrame, Dataset}
+import org.apache.spark.sql.expressions.Window
+import org.apache.spark.sql.functions.{coalesce, col, collect_list, 
row_number, udf}
+import org.apache.spark.sql.types.LongType
+
+/**
+ * Evaluator for ranking.
+ */
+@Since("2.2.0")
+@Experimental
+final class RankingEvaluator @Since("2.2.0")(@Since("2.2.0") override val 
uid: String)
+  extends Evaluator with HasPredictionCol with HasLabelCol with 
DefaultParamsWritable {
+
+  @Since("2.2.0")
+  def this() = this(Identifiable.randomUID("rankingEval"))
+
+  @Since("2.2.0")
+  val k = new IntParam(this, "k", "Top-K cutoff", (x: Int) => x > 0)
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getK: Int = $(k)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setK(value: Int): this.type = set(k, value)
+
+  setDefault(k -> 1)
+
+  @Since("2.2.0")
+  val metricName: Param[String] = {
+val allowedParams = ParamValidators.inArray(Array("mpr"))
+new Param(this, "metricName", "metric name in evaluation (mpr)", 
allowedParams)
+  }
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getMetricName: String = $(metricName)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setMetricName(value: String): this.type = set(metricName, value)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setLabelCol(value: String): this.type = set(labelCol, value)
+
+  /**
+   * Param for query column name.
+   * @group param
+   */
+  val queryCol: Param[String] = new Param[String](this, "queryCol", "query 
column name")
+
+  setDefault(queryCol, "query")
+
+  /** @group getParam */
+  @Since("2.2.0")
+  def getQueryCol: String = $(queryCol)
+
+  /** @group setParam */
+  @Since("2.2.0")
+  def setQueryCol(value: String): this.type = set(queryCol, value)
+
+  setDefault(metricName -> "mpr")
+
+  @Since("2.2.0")
+  override def evaluate(dataset: Dataset[_]): Double = {
+val schema = dataset.schema
+SchemaUtils.checkNumericType(schema, $(predictionCol))
+SchemaUtils.checkNumericType(schema, $(labelCol))
+SchemaUtils.checkNumericType(schema, $(queryCol))
+
+val w = 
Window.partitionBy(col($(queryCol))).orderBy(col($(predictionCol)).desc)
+
+val topAtk: DataFrame = dataset
+  .na.drop("all", Seq($(predictionCol)))
+  .select(col($(predictionCol)), col($(labelCol)).cast(LongType), 
col($(queryCol)))
+  .withColumn("rn", row_number().over(w)).where(col("rn") <= $(k))
+  .drop("rn")
+  .groupBy(col($(queryCol)))
+  .agg(collect_list($(labelCol)).as("topAtk"))
+
+val mapToEmptyArray_ = udf(() => Array.empty[Long])
+
+val predictionAndLabels: DataFrame = dataset
+  .join(topAtk, Seq($(queryCol)), "outer")
+  .withColumn("topAtk", coalesce(col("topAtk"), mapToEmptyArray_()))
+  .select($(labelCol), "topAtk")
--- End diff --

Don't we also need to run an aggregation on the label column, roughly the 
same as the previous aggregation but using labelCol as the sort 

[GitHub] spark pull request #16618: [SPARK-14409][ML][WIP] Add RankingEvaluator

2017-04-25 Thread ebernhardson
Github user ebernhardson commented on a diff in the pull request:

https://github.com/apache/spark/pull/16618#discussion_r113355473
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/evaluation/RankingMetrics.scala ---
@@ -0,0 +1,202 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.evaluation
+
+import org.apache.spark.annotation.Since
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.{Column, DataFrame}
+import org.apache.spark.sql.functions.{mean, sum}
+import org.apache.spark.sql.functions.udf
+import org.apache.spark.sql.types.DoubleType
+
+@Since("2.2.0")
+class RankingMetrics(
+  predictionAndObservations: DataFrame, predictionCol: String, labelCol: 
String)
+  extends Logging with Serializable {
+
+  /**
+   * Compute the Mean Percentile Rank (MPR) of all the queries.
+   *
+   * See the following paper for detail ("Expected percentile rank" in the 
paper):
+   * Hu, Y., Y. Koren, and C. Volinsky. “Collaborative Filtering for 
Implicit Feedback Datasets.”
+   * In 2008 Eighth IEEE International Conference on Data Mining, 
263–72, 2008.
+   * doi:10.1109/ICDM.2008.22.
+   *
+   * @return the mean percentile rank
+   */
+  lazy val meanPercentileRank: Double = {
+
+def rank = udf((predicted: Seq[Any], actual: Any) => {
+  val l_i = predicted.indexOf(actual)
+
+  if (l_i == -1) {
+1
+  } else {
+l_i.toDouble / predicted.size
+  }
+}, DoubleType)
+
+val R_prime = predictionAndObservations.count()
+val predictionColumn: Column = 
predictionAndObservations.col(predictionCol)
+val labelColumn: Column = predictionAndObservations.col(labelCol)
+
+val rankSum: Double = predictionAndObservations
+  .withColumn("rank", rank(predictionColumn, labelColumn))
+  .agg(sum("rank")).first().getDouble(0)
+
+rankSum / R_prime
+  }
+
+  /**
+   * Compute the average precision of all the queries, truncated at 
ranking position k.
+   *
+   * If for a query, the ranking algorithm returns n (n is less than k) 
results, the precision
+   * value will be computed as #(relevant items retrieved) / k. This 
formula also applies when
+   * the size of the ground truth set is less than k.
+   *
+   * If a query has an empty ground truth set, zero will be used as 
precision together with
+   * a log warning.
+   *
+   * See the following paper for detail:
+   *
+   * IR evaluation methods for retrieving highly relevant documents. K. 
Jarvelin and J. Kekalainen
+   *
+   * @param k the position to compute the truncated precision, must be 
positive
+   * @return the average precision at the first k ranking positions
+   */
+  @Since("2.2.0")
+  def precisionAt(k: Int): Double = {
+require(k > 0, "ranking position k should be positive")
+
+def precisionAtK = udf((predicted: Seq[Any], actual: Seq[Any]) => {
+  val actualSet = actual.toSet
+  if (actualSet.nonEmpty) {
+val n = math.min(predicted.length, k)
+var i = 0
+var cnt = 0
+while (i < n) {
+  if (actualSet.contains(predicted(i))) {
+cnt += 1
+  }
+  i += 1
+}
+cnt.toDouble / k
+  } else {
+logWarning("Empty ground truth set, check input data")
+0.0
+  }
+}, DoubleType)
+
+val predictionColumn: Column = 
predictionAndObservations.col(predictionCol)
+val labelColumn: Column = predictionAndObservations.col(labelCol)
+
+predictionAndObservations
+  .withColumn("predictionAtK", precisionAtK(predictionColumn, 
labelColumn))
+  .agg(mean("predictionAtK")).first().getDouble(0)
+  }
+
+  /**
+   * Returns the mean average precision (MAP) of all the queries.
+   * If a query has an empty ground truth 

[GitHub] spark issue #17596: [SPARK-12837][CORE] Do not send the name of internal acc...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17596
  
**[Test build #76169 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76169/testReport)**
 for PR 17596 at commit 
[`4df95f2`](https://github.com/apache/spark/commit/4df95f2abc28b362b8330f11efeb801ca00f2f6e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17757: [Minor][ML] Fix some PySpark & SparkR flaky tests

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17757
  
**[Test build #76168 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76168/testReport)**
 for PR 17757 at commit 
[`a87d5c0`](https://github.com/apache/spark/commit/a87d5c0c578542916706745cdbeca58ae24269e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17693: [SPARK-16548][SQL] Inconsistent error handling in...

2017-04-25 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17693


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17693
  
thanks, merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-25 Thread facaiy
Github user facaiy commented on the issue:

https://github.com/apache/spark/pull/17556
  
fix failed case, please retest it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17760: [SPARK-20439] [SQL] [Backport-2.1] Fix Catalog API listT...

2017-04-25 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/17760
  
thanks, merging  to 2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #76167 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76167/testReport)**
 for PR 17758 at commit 
[`05a7a61`](https://github.com/apache/spark/commit/05a7a61259d87b8fa97214c96cedde9dc52dd3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113358460
  
--- Diff: R/pkg/inst/tests/testthat/test_Serde.R ---
@@ -28,6 +28,10 @@ test_that("SerDe of primitive types", {
   expect_equal(x, 1)
   expect_equal(class(x), "numeric")
 
+  x <- callJStatic("SparkRHandler", "echo", 1380742793415240)
--- End diff --

I don't know how to specify in R console to enforce bigint type. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17640: [SPARK-17608][SPARKR]:Long type has incorrect ser...

2017-04-25 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17640#discussion_r113358355
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -3043,6 +3043,23 @@ test_that("catalog APIs, currentDatabase, 
setCurrentDatabase, listDatabases", {
   expect_equal(dbs[[1]], "default")
 })
 
+test_that("dapply with bigint type", {
+  df <- createDataFrame(
+list(list(1380742793415240, 1, "1"), list(1380742793415240, 2, 
"2"),
+list(1380742793415240, 3, "3")), c("a", "b", "c"))
+  schema <- structType(structField("a", "bigint"), structField("b", 
"bigint"),
--- End diff --

This one tests bigint


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #76165 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76165/testReport)**
 for PR 17758 at commit 
[`de11a5b`](https://github.com/apache/spark/commit/de11a5b9f063953cb77d53f666312a3df6ba9801).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76165/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17758
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76163/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76163/testReport)**
 for PR 17765 at commit 
[`bd13a01`](https://github.com/apache/spark/commit/bd13a0178705bee4237ff30f6eabe7a5383b6dc5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76162/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76162 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76162/testReport)**
 for PR 17765 at commit 
[`609d50e`](https://github.com/apache/spark/commit/609d50ed4568bb2bb8f22869543dacac2b51c42f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76160/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17744: [SPARK-20426] Lazy initialization of FileSegmentM...

2017-04-25 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request:

https://github.com/apache/spark/pull/17744#discussion_r113356306
  
--- Diff: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
 ---
@@ -93,14 +92,25 @@ protected void handleMessage(
 OpenBlocks msg = (OpenBlocks) msgObj;
 checkAuth(client, msg.appId);
 
-List blocks = Lists.newArrayList();
-long totalBlockSize = 0;
-for (String blockId : msg.blockIds) {
-  final ManagedBuffer block = blockManager.getBlockData(msg.appId, 
msg.execId, blockId);
-  totalBlockSize += block != null ? block.size() : 0;
-  blocks.add(block);
-}
-long streamId = streamManager.registerStream(client.getClientId(), 
blocks.iterator());
+Iterator iter = new Iterator() {
+  private int index = 0;
+
+  @Override
+  public boolean hasNext() {
+return index < msg.blockIds.length;
+  }
+
+  @Override
+  public ManagedBuffer next() {
+final ManagedBuffer block = 
blockManager.getBlockData(msg.appId, msg.execId,
+  msg.blockIds[index]);
--- End diff --

@tgravescs 
Thanks a lot for taking time looking into this :)
In my understanding, the `OpenBlocks` will be kept in heap after 
initialization(https://github.com/apache/spark/blob/master/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/OpenBlocks.java#L84).
Yes, `TransportRequestHandler.processRpcRequest` will release the 
`ByteBuf`, but the `OpenBlocks` will not be released.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76160 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76160/testReport)**
 for PR 17765 at commit 
[`bd13a01`](https://github.com/apache/spark/commit/bd13a0178705bee4237ff30f6eabe7a5383b6dc5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17649: [SPARK-20380][SQL] Output table comment for DESC FORMATT...

2017-04-25 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17649
  
The changes look good to me if we don't care about the case sensitivity 
issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17766
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76154/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17766
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17725
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76156/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17766: [SPARK-20421][core] Mark internal listeners as deprecate...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17766
  
**[Test build #76154 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76154/testReport)**
 for PR 17766 at commit 
[`d16be2b`](https://github.com/apache/spark/commit/d16be2b004c7b2f7ca34faf8bdd993cf0445694b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `@deprecated(\"This class will be removed in a future release.\", 
\"2.2.0\")`
  * `@deprecated(\"This class will be removed in a future release.\", 
\"2.2.0\")`
  * `@deprecated(\"This class will be removed in a future release.\", 
\"2.2.0\")`
  * `@deprecated(\"This class will be removed in a future release.\", 
\"2.2.0\")`
  * `@deprecated(\"This class will be removed in a future release.\", 
\"2.2.0\")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17725: [SPARK-20435][CORE] More thorough redaction of sensitive...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17725
  
**[Test build #76156 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76156/testReport)**
 for PR 17725 at commit 
[`80e40ba`](https://github.com/apache/spark/commit/80e40ba57ab6779604fca87cb696d2e889c4ddd2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17767: Refactoring of the ALS code

2017-04-25 Thread hhbyyh
Github user hhbyyh commented on the issue:

https://github.com/apache/spark/pull/17767
  
Preparing a PR like this takes a lot of efforts. Please try to follow the 
guidelines in http://spark.apache.org/contributing.html. (create a jira and 
rename the title). 

Like you said, I doubt if anyone would be able to review and confidently 
merge a PR of this scope. Could you please share some reasons for the 
refactoring or pain points of the current implementation? Then maybe we can 
find a way to break it down to some smaller changes. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17738: [SPARK-20422][Spark Core] Worker registration retries sh...

2017-04-25 Thread unsleepy22
Github user unsleepy22 commented on the issue:

https://github.com/apache/spark/pull/17738
  
Could someone take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...

2017-04-25 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17733#discussion_r113351508
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -663,8 +695,54 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
+  def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, 
extendedMode = false)
+
+  /**
+   * Displays the Dataset in a tabular form. For example:
+   * {{{
+   *   year  month AVG('Adj Close) MAX('Adj Close)
+   *   1980  120.5032180.595103
+   *   1981  010.5232890.570307
+   *   1982  020.4365040.475256
+   *   1983  030.4105160.442194
+   *   1984  040.4500900.483521
+   * }}}
+   *
+   * If `extendedMode` enabled, this command prints a column dat per line:
+   * {{{
+   * -RECORD 0-
+   *  c0  | 0.6988392500990668
+   *  c1  | 0.3035961718851606
+   *  c2  | 0.2446213804275899
+   *  c3  | 0.6132556607194246
+   *  c4  | 0.1904412430355646
+   *  c5  | 0.8856600775630444
+   * -RECORD 1-
+   *  c0  | 0.3942727621020799
+   *  c1  | 0.6501707200059537
+   *  c2  | 0.2550059028276454
+   *  c3  | 0.9806662488156962
+   *  c4  | 0.8533897091838063
+   *  c5  | 0.3911189623246518
+   * -RECORD 2-
+   *  c0  | 0.9024183805969801
+   *  c1  | 0.0242018765375147
+   *  c2  | 0.8508820250344251
+   *  c3  | 0.4593368817024575
+   *  c4  | 0.2216918145613194
+   *  c5  | 0.3756882647319614
+   * }}}
+   *
+   * @param numRows Number of rows to show
+   * @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param extendedMode Enable expanded table formatting mode to print a 
column data per line.
--- End diff --

Yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17733: [SPARK-20425][SQL] Support an extended display mode for ...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17733
  
**[Test build #76166 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76166/testReport)**
 for PR 17733 at commit 
[`f696d35`](https://github.com/apache/spark/commit/f696d357ae9aa2e850f82d408aa413750c4d84b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17191: [SPARK-14471][SQL] Aliases in SELECT could be use...

2017-04-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17191#discussion_r113351326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -136,6 +136,7 @@ class Analyzer(
   ResolveGroupingAnalytics ::
   ResolvePivot ::
   ResolveOrdinalInOrderByAndGroupBy ::
+  ResolveAggAliasInGroupBy ::
--- End diff --

@gatorsmile ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17760: [SPARK-20439] [SQL] [Backport-2.1] Fix Catalog API listT...

2017-04-25 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/17760
  
cc @cloud-fan @sameeragarwal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17693
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76153/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17693
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17693: [SPARK-16548][SQL] Inconsistent error handling in JSON p...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17693
  
**[Test build #76153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76153/testReport)**
 for PR 17693 at commit 
[`91bb487`](https://github.com/apache/spark/commit/91bb48708f852ea65ada9ebc48d03e57cd95ebf4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17737: [SPARK-20442][PYTHON][DOCS] Fill up documentation...

2017-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17737#discussion_r113350427
  
--- Diff: python/pyspark/sql/column.py ---
@@ -251,15 +285,16 @@ def __iter__(self):
 
 # string methods
 _rlike_doc = """
-Return a Boolean :class:`Column` based on a regex match.
+SQL RLIKE expression (LIKE with Regex). Returns a boolean 
:class:`Column` based on a regex
--- End diff --

Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...

2017-04-25 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17077
  
🙁 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-25 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r113346208
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -397,13 +392,38 @@ class ParquetHiveCompatibilitySuite extends 
ParquetCompatibilityTest with TestHi
 schema = new StructType().add("display", StringType).add("ts", 
TimestampType),
 options = options
   )
-  Seq(false, true).foreach { vectorized =>
-withClue(s"vectorized = $vectorized;") {
+
+  // also write out a partitioned table, to make sure we can 
access that correctly.
+  // add a column we can partition by (value doesn't particularly 
matter).
+  val partitionedData = adjustedRawData.withColumn("id", 
monotonicallyIncreasingId)
+  partitionedData.write.partitionBy("id")
+.parquet(partitionedPath.getCanonicalPath)
+  // unfortunately, catalog.createTable() doesn't let us specify 
partitioning, so just use
+  // a "CREATE TABLE" stmt.
+  val tblOpts = explicitTz.map { tz => raw"""TBLPROPERTIES 
($key="$tz")""" }.getOrElse("")
+  spark.sql(raw"""CREATE EXTERNAL TABLE partitioned_$baseTable (
+ |  display string,
+ |  ts timestamp
+ |)
+ |PARTITIONED BY (id bigint)
--- End diff --

We should test for the partitioned table like `PARTITIONED BY (ts 
timestamp)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16781: [SPARK-12297][SQL] Hive compatibility for Parquet...

2017-04-25 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16781#discussion_r113345272
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/ParquetHiveCompatibilitySuite.scala
 ---
@@ -17,14 +17,25 @@
 
 package org.apache.spark.sql.hive
 
+import java.io.File
 import java.sql.Timestamp
+import java.util.TimeZone
 
-import org.apache.spark.sql.Row
-import 
org.apache.spark.sql.execution.datasources.parquet.ParquetCompatibilityTest
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.parquet.hadoop.ParquetFileReader
+import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.sql.{AnalysisException, Dataset, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import 
org.apache.spark.sql.execution.datasources.parquet.{ParquetCompatibilityTest, 
ParquetFileFormat}
+import org.apache.spark.sql.functions._
 import org.apache.spark.sql.hive.test.TestHiveSingleton
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.{StringType, StructField, StructType, 
TimestampType}
 
-class ParquetHiveCompatibilitySuite extends ParquetCompatibilityTest with 
TestHiveSingleton {
+class ParquetHiveCompatibilitySuite extends ParquetCompatibilityTest with 
TestHiveSingleton
+with BeforeAndAfterEach {
--- End diff --

We don't need `BeforeAndAfterEach` anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17758: [SPARK-20460][SQL] Make it more consistent to handle col...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17758
  
**[Test build #76165 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76165/testReport)**
 for PR 17758 at commit 
[`de11a5b`](https://github.com/apache/spark/commit/de11a5b9f063953cb77d53f666312a3df6ba9801).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17767: Refactoring of the ALS code

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17767
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17767: Als refactor

2017-04-25 Thread danielyli
GitHub user danielyli opened a pull request:

https://github.com/apache/spark/pull/17767

Als refactor

## What changes were proposed in this pull request?

This is a non-feature-changing refactoring of the ALS code (specifically, 
the `org.apache.spark.ml.recommendation` package), done to improve code 
maintainability and to add significant documentation to the existing code.  My 
motivation for this PR is that I've been working on an online streaming ALS 
implementation [[SPARK-6407](https://issues.apache.org/jira/browse/SPARK-6407)] 
(PR coming soon), and I've been refactoring the package to help me understand 
the existing code before adding to it.  I've also tried my best to include a 
fair bit of Scaladocs and inline comments where I felt they would have helped 
when I was reading the code.

I've done a fair bit of rebasing and sausage making to make the commits 
easy to follow, since no one likes to stare at a 2,700-line PR.  Please let me 
know if I can make anything clearer.  I'd be happy to answer any questions.

In a few places, you'll find a `PLEASE_ADVISE(danielyli):` tag in the code. 
 These are questions I had in the course of the refactoring.  I'd appreciate if 
the relevant folks could help me with these.  Thanks.

## How was this patch tested?

As this is a non-feature-changing refactoring, existing tests were used.  
All existing ALS tests pass.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/danielyli/spark als-refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17767.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17767


commit deca4db3f234ea60c1494265d4f3ac9375869dd6
Author: Daniel Li 
Date:   2017-04-05T21:55:21Z

Split `ALS.scala` into multiple files

This commit moves the classes `ALS` and `ALSModel` and the traits
`ALSParams` and `ALSModelParams` into their own files.

commit 4086bc9d0c7689e0d2047ac17ada29fe236eb6e6
Author: Daniel Li 
Date:   2017-04-05T22:20:32Z

Move solver classes into their own file

This commit puts the classes `LeastSquaresNESolver`, `CholeskySolver`,
`NNLSSolver`, and `NormalEquation` into a mixin in a separate file in
order to reduce the size and improve the readability of `ALS.scala`.

commit 8aaa533df6f3c9a4b4e8c5d5023f831daf06fa9e
Author: Daniel Li 
Date:   2017-04-05T22:30:38Z

Minor cleanup of imports

  *  import java.util.Arrays
  *  import scala.collection.mutable.ArrayBuilder

commit b68680025e71ebd422087ef95d5ecb7af40fa26d
Author: Daniel Li 
Date:   2017-04-05T22:48:50Z

Create a package object to hold small type and class definitions

commit 83f849ee45fd7c80a1a50fcf12da1eb99d8b6346
Author: Daniel Li 
Date:   2017-04-06T02:17:14Z

Refactor `RatingBlock`-related code

This commit moves the following classes and methods into new files,
separating and encapsulating them as appropriate:

  *  RatingBlock
  *  RatingBlockBuilder
  *  UncompressedInBlock
  *  UncompressedInBlockBuilder
  *  KeyWrapper
  *  UncompressedInBlockSort
  *  LocalIndexEncoder
  *  partitionRatings
  *  makeBlocks

In the course of this refactoring we create a new class, `RatingBlocks`,
to hold the user/item in/out block data and associated logic.

commit 819a00f7fe7384e588ce78cb65e9413ac6588401
Author: Daniel Li 
Date:   2017-04-06T07:08:43Z

Pull out `RatingBlock` from `RatingBlocks` into its own file

This commit puts the `RatingBlock` class into a mixin for the
`RatingBlocks` companion object to extend.  This is done purely to
increase readability by reducing the file size of `RatingBlocks.scala`.

commit 56d10ba1fa627f343e67525e2a3b08e7287bfe2f
Author: Daniel Li 
Date:   2017-04-06T08:50:06Z

Tighten access modifiers where appropriate and make case classes `final`

commit b861d18784ba4ce688d3eacaea10169c9ce2d091
Author: Daniel Li 
Date:   2017-04-06T09:38:54Z

Improve code hygiene of `RatingBlocks`

Among other things, `while` loops that used manually incremented
counters have been changed to `for` loops to increase readability.
Performance should be nominally affected.

commit 5dfee79a1280d0a72bbe7b8596cdf86654fa0fbc
Author: Daniel Li 
Date:   2017-04-06T09:57:11Z

Spruce up `ALS#fit`

This commit adds vertical whitespace to improve readability.

commit 056d6d0ecc962f94c83f43a6384607bf8833d083
Author: Daniel Li 
Date:   2017-04-25T23:31:54Z

Mark `RatingBlocks` constructor as `private`

commit 

[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76164/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17605
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17605
  
**[Test build #76164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76164/testReport)**
 for PR 17605 at commit 
[`9d75094`](https://github.com/apache/spark/commit/9d750943860479fab48543038fa89cb1dec4037c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...

2017-04-25 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/17077
  
I think we should because branch-2.2 is cut out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...

2017-04-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17733#discussion_r113347799
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -663,8 +695,54 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
+  def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, 
extendedMode = false)
+
+  /**
+   * Displays the Dataset in a tabular form. For example:
+   * {{{
+   *   year  month AVG('Adj Close) MAX('Adj Close)
+   *   1980  120.5032180.595103
+   *   1981  010.5232890.570307
+   *   1982  020.4365040.475256
+   *   1983  030.4105160.442194
+   *   1984  040.4500900.483521
+   * }}}
+   *
+   * If `extendedMode` enabled, this command prints a column dat per line:
+   * {{{
+   * -RECORD 0-
+   *  c0  | 0.6988392500990668
+   *  c1  | 0.3035961718851606
+   *  c2  | 0.2446213804275899
+   *  c3  | 0.6132556607194246
+   *  c4  | 0.1904412430355646
+   *  c5  | 0.8856600775630444
+   * -RECORD 1-
+   *  c0  | 0.3942727621020799
+   *  c1  | 0.6501707200059537
+   *  c2  | 0.2550059028276454
+   *  c3  | 0.9806662488156962
+   *  c4  | 0.8533897091838063
+   *  c5  | 0.3911189623246518
+   * -RECORD 2-
+   *  c0  | 0.9024183805969801
+   *  c1  | 0.0242018765375147
+   *  c2  | 0.8508820250344251
+   *  c3  | 0.4593368817024575
+   *  c4  | 0.2216918145613194
+   *  c5  | 0.3756882647319614
+   * }}}
+   *
+   * @param numRows Number of rows to show
+   * @param truncate If set to more than 0, truncates strings to 
`truncate` characters and
+   *all cells will be aligned right.
+   * @param extendedMode Enable expanded table formatting mode to print a 
column data per line.
--- End diff --

This one? https://dev.mysql.com/doc/refman/5.7/en/mysql-command-options.html
`Print query output rows vertically (one line per column value)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...

2017-04-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17733#discussion_r113347676
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -663,8 +695,54 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
+  def show(numRows: Int, truncate: Int): Unit = show(numRows, truncate, 
extendedMode = false)
+
+  /**
+   * Displays the Dataset in a tabular form. For example:
+   * {{{
+   *   year  month AVG('Adj Close) MAX('Adj Close)
+   *   1980  120.5032180.595103
+   *   1981  010.5232890.570307
+   *   1982  020.4365040.475256
+   *   1983  030.4105160.442194
+   *   1984  040.4500900.483521
+   * }}}
+   *
+   * If `extendedMode` enabled, this command prints a column dat per line:
+   * {{{
+   * -RECORD 0-
+   *  c0  | 0.6988392500990668
+   *  c1  | 0.3035961718851606
+   *  c2  | 0.2446213804275899
+   *  c3  | 0.6132556607194246
+   *  c4  | 0.1904412430355646
+   *  c5  | 0.8856600775630444
+   * -RECORD 1-
+   *  c0  | 0.3942727621020799
+   *  c1  | 0.6501707200059537
+   *  c2  | 0.2550059028276454
+   *  c3  | 0.9806662488156962
+   *  c4  | 0.8533897091838063
+   *  c5  | 0.3911189623246518
+   * -RECORD 2-
+   *  c0  | 0.9024183805969801
+   *  c1  | 0.0242018765375147
+   *  c2  | 0.8508820250344251
+   *  c3  | 0.4593368817024575
+   *  c4  | 0.2216918145613194
+   *  c5  | 0.3756882647319614
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17733: [SPARK-20425][SQL] Support an extended display mo...

2017-04-25 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17733#discussion_r113347666
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -194,6 +194,8 @@ setMethod("isLocal",
 #' 20 characters will be truncated. However, if set 
greater than zero,
 #' truncates strings longer than \code{truncate} 
characters and all cells
 #' will be aligned right.
+#' @param extendedMode enable expanded table formatting mode to print a 
column data
--- End diff --

yea, STGM. I'll update. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17077: [SPARK-16931][PYTHON][SQL] Add Python wrapper for bucket...

2017-04-25 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/17077
  
@holdenk, @HyukjinKwon Do we retarget this to 2.3?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17728
  
**[Test build #76158 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76158/testReport)**
 for PR 17728 at commit 
[`0da03b2`](https://github.com/apache/spark/commit/0da03b2d1e1c0e752329b6816bcf7e076c4450cd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17728: [SPARK-20437][R] R wrappers for rollup and cube

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17728
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76158/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17605
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17605
  
**[Test build #76161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76161/testReport)**
 for PR 17605 at commit 
[`3254f8e`](https://github.com/apache/spark/commit/3254f8e76402dadbf0a62b073864b6f4b85a2eb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17605
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76161/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17605: [SPARK-20290][MINOR][PYTHON][SQL] Add PySpark wrapper fo...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17605
  
**[Test build #76164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76164/testReport)**
 for PR 17605 at commit 
[`9d75094`](https://github.com/apache/spark/commit/9d750943860479fab48543038fa89cb1dec4037c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and description for st...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76163 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76163/testReport)**
 for PR 17765 at commit 
[`bd13a01`](https://github.com/apache/spark/commit/bd13a0178705bee4237ff30f6eabe7a5383b6dc5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and an informative des...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76152/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and an informative des...

2017-04-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17765
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and an informative des...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76152 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76152/testReport)**
 for PR 17765 at commit 
[`07e182b`](https://github.com/apache/spark/commit/07e182ba36fa0499a8da1a2b480030b6785f78a5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17761: [SPARK-20461][Core][SS]Use UninterruptibleThread for Exe...

2017-04-25 Thread mridulm
Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17761
  
@zsxwing Got it, thanks for clarifying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17765: [SPARK-20464][SS] Add a job group and an informative des...

2017-04-25 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17765
  
**[Test build #76162 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76162/testReport)**
 for PR 17765 at commit 
[`609d50e`](https://github.com/apache/spark/commit/609d50ed4568bb2bb8f22869543dacac2b51c42f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >