date:20170304

[GitHub] spark issue #16762: [SPARK-19419] [SPARK-19420] Fix the cross join detection

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16762
  
**[Test build #73921 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73921/testReport)**
 for PR 16762 at commit 
[`efdf04e`](https://github.com/apache/spark/commit/efdf04ee00c68c4914dd52e8262bda8dfef476da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17161: [SPARK-19819][SparkR] Use concrete data in SparkR DataFr...

2017-03-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17161
  
Firstly, I see this as slightly different from Python, in that in R it is 
common to have built-in datasets and possibly users are used to having them and 
having examples using them.

And as of now, many of our examples are not meant currently to be runnable 
and they are clearly indicated as such.

I have done a pass on the changes in this PR and I'm happy with changing 
from non-existing json file to `mtcars`. I'm slightly concerned with the few 
cases of artificial 3 rows data (like 
[here](https://github.com/apache/spark/pull/17161/files#diff-508641a8bd6c6b59f3e77c80cdcfa6a9R2483))
 - more on that below on small dataset. 

That said, I wonder about the verbosity of adding to examples like this, 
similarly as in the Python discussions, and, since we have more than 300 pages 
of API doc, this is not a simple task to change them all. 

But I do agree that not having broken or incorrect examples is very 
important.

My concerns are:
- how much work and how much change is it to change all examples (this is 
only 1 .R out of 20-something files we have, in a total of 300+ methods which 
is on the high side for R packages)
- how much churn will it be to keep them up-to-date when we are having 
changes to API (eg. `sparkR.session()`); especially since in order to have 
examples self-contained we tend to add additional calls to manipulate data and 
thereby increasing the number of references of API calls 
- perhaps more importantly, how practical or useful it would be to use 
built-in datasets or native R data.frame (`mtcars`, `cars`, `Titanic`, `iris`, 
or make up some; that are super small) on a scalable data platform like Spark? 
perhaps it is better to demonstrate, in examples, how to work with external 
data sources, multiple file formats etc.?
- and lastly, we still have about a dozen methods that are without example 
that are being flagged by CRAN checks (but not enough to fail it yet)

Couple of *random* thoughts (would be interested to see how they look 
first!):
- group smaller functions into a single page and sharing a longer, more 
concrete example (need to check if it messes up parameter documentation or make 
them more confusing! or, how it might affect method help discoverability, like 
with `?predict`) (btw, this is the approach we have for ML methods)
- reference external example files
- have examples using datasets that come with Spark (like [this 
one](https://github.com/apache/spark/blob/master/examples/src/main/resources/people.json))
- have examples in templates and reuse them
- keep existing page breakdown but instead of scattering examples around in 
each, link to a special group of pages (via `@seealso`) with longer, more 
concrete examples (eg. column manipulation set)
- make example run (ie. remove dontrun) this, of course, would need to make 
sure examples are self-contained and are correct (this is a bigger effort; this 
could possibly extend build time and/or make build fails more often, as example 
will then run as a part of CRAN check) (?!)

I suspect we would likely need a combination or subset of these techniques.
To me, the high-level priority would be in order i) example correctness; 
ii) example coverage - we should have some examples for every method; iii) 
better, richer, self-contained examples in strategic places

Thoughts?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17161#discussion_r104306085
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -741,12 +724,12 @@ setMethod("coalesce",
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
+#' df <- createDataFrame(mtcars)
+#' newDF <- coalesce(df, 1L)
--- End diff --

should probably not have coalesce in the example blob for repartition


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17161#discussion_r104306095
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -548,10 +537,9 @@ setMethod("registerTempTable",
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' df <- read.df(path, "parquet")
-#' df2 <- read.df(path2, "parquet")
-#' createOrReplaceTempView(df, "table1")
-#' insertInto(df2, "table1", overwrite = TRUE)
+#' df <- limit(createDataFrame(faithful), 5)
--- End diff --

why limit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17161#discussion_r104306091
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -741,12 +724,12 @@ setMethod("coalesce",
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
+#' df <- createDataFrame(mtcars)
+#' newDF <- coalesce(df, 1L)
 #' newDF <- repartition(df, 2L)
 #' newDF <- repartition(df, numPartitions = 2L)
-#' newDF <- repartition(df, col = df$"col1", df$"col2")
-#' newDF <- repartition(df, 3L, col = df$"col1", df$"col2")
+#' newDF <- repartition(df, col = df[[1]], df[[2]])
--- End diff --

showing as an example column reference with `$name` is important too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17161#discussion_r104306047
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -2805,10 +2779,9 @@ setMethod("except",
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
-#' write.df(df, "myfile", "parquet", "overwrite")
-#' saveDF(df, parquetPath2, "parquet", mode = saveMode, mergeSchema = 
mergeSchema)
+#' df <- createDataFrame(mtcars)
+#' write.df(df, tempfile(), "parquet", "overwrite")
--- End diff --

I think we should avoid having `tempfile()` as output path in example, as 
that might point users into the wrong direction - anything saved in tempfile 
will disappear as soon as the R session ends.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17161: [SPARK-19819][SparkR] Use concrete data in SparkR...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17161#discussion_r104306070
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1123,10 +1096,9 @@ setMethod("dim",
 #' @examples
 #'\dontrun{
 #' sparkR.session()
-#' path <- "path/to/file.json"
-#' df <- read.json(path)
+#' df <- createDataFrame(mtcars)
 #' collected <- collect(df)
-#' firstName <- collected[[1]]$name
+#' collected[[1]]
--- End diff --

right, that seems rather unnecessary. any other idea on how to show it is a 
data.frame?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread mridulm

Github user mridulm commented on the issue:

https://github.com/apache/spark/pull/17166
  
What is the rationale for this change ? Is it to propagate the task kill 
reason to UI ?
The one line in 
https://github.com/apache/spark/pull/17166/files#diff-b8adb646ef90f616c34eb5c98d1ebd16R357.
Or did I miss some other use for this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-04 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17145
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17167
  
cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17167
  
cc @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17167
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73920/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17167
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17167
  
**[Test build #73920 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73920/testReport)**
 for PR 17167 at commit 
[`72f1963`](https://github.com/apache/spark/commit/72f1963a36f9f1abfe8ca10d30b01f52c2281d82).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17145
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73919/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17145
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17145
  
**[Test build #73919 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73919/testReport)**
 for PR 17145 at commit 
[`2cff2b2`](https://github.com/apache/spark/commit/2cff2b2e3261bb988391200c366a10ca0f274fc8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16656: [SPARK-18116][DStream] Report stream input information a...

2017-03-04 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16656
  
ping @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpointedOper...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17167
  
**[Test build #73920 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73920/testReport)**
 for PR 17167 at commit 
[`72f1963`](https://github.com/apache/spark/commit/72f1963a36f9f1abfe8ca10d30b01f52c2281d82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17167: [SPARK-19822][TEST] CheckpointSuite.testCheckpoin...

2017-03-04 Thread uncleGen

GitHub user uncleGen opened a pull request:

https://github.com/apache/spark/pull/17167

[SPARK-19822][TEST] CheckpointSuite.testCheckpointedOperation: should not 
check checkpointFilesOfLatestTime by the PATH string.

## What changes were proposed in this pull request?


https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73800/testReport/

```
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed 
to eventually never 
returned normally. Attempted 617 times over 10.003740484 seconds. Last 
failure message: 8 did 
not equal 2.
```

the check condition is:

```
val checkpointFilesOfLatestTime = 
Checkpoint.getCheckpointFiles(checkpointDir).filter {
 _.toString.contains(clock.getTimeMillis.toString)
}
// Checkpoint files are written twice for every batch interval. So assert 
that both
// are written to make sure that both of them have been written.
assert(checkpointFilesOfLatestTime.size === 2)
```

the path string may contain the `clock.getTimeMillis.toString`, like:

```

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-1500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-2500

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3000

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500.bk

file:/root/dev/spark/assembly/CheckpointSuite/spark-20035007-9891-4fb6-91c1-cc15b7ccaf15/checkpoint-3500

------
```

so we should only check the filename, but not the while path.

## How was this patch tested?

Jenkins.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uncleGen/spark flaky-CheckpointSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17167


commit 72f1963a36f9f1abfe8ca10d30b01f52c2281d82
Author: uncleGen 
Date:   2017-03-03T10:11:52Z

flaky CheckpointSuite test failure




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17134: [SPARK-19795][SPARKR] add column functions to_json, from...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17134
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73918/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17134: [SPARK-19795][SPARKR] add column functions to_json, from...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17134
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17134: [SPARK-19795][SPARKR] add column functions to_json, from...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17134
  
**[Test build #73918 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73918/testReport)**
 for PR 17134 at commit 
[`3748d9b`](https://github.com/apache/spark/commit/3748d9b081a83a0f97c4c711d3dba06ee350435b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17159: [SPARK-19818][SparkR] union should check for name consis...

2017-03-04 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/17159
  
hmm... this is somewhat by design - `union` could take in 2 DataFrames that 
might not match in column names or type. In that case values in one of the 
DataFrame will be coerced to make things fit
```
>>> d = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
>>> l = spark.createDataFrame([(1, 2)])
>>> d.union(l).head(2)
[Row(age=1, name=u'Alice'), Row(age=1, name=u'2')]

>>> l.dtypes
[('_1', 'bigint'), ('_2', 'bigint')]
>>> d.dtypes
[('age', 'bigint'), ('name', 'string')]
```

Do you see this as something that might be unexpected for R users (in which 
case `rbind` might be the overload to look into) or SQL users (documented as 
equivalent to SQL UNION ALL)?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17145: [SPARK-19805][TEST] Log the row type when query result d...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17145
  
**[Test build #73919 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73919/testReport)**
 for PR 17145 at commit 
[`2cff2b2`](https://github.com/apache/spark/commit/2cff2b2e3261bb988391200c366a10ca0f274fc8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17144: [SPARK-19803][TEST] flaky BlockManagerReplicationSuite t...

2017-03-04 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/17144
  
@kayousterhout sure, I was being doing that flaky test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17145: [SPARK-19805][TEST] Log the row type when query r...

2017-03-04 Thread uncleGen

Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/17145#discussion_r104304108
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala ---
@@ -312,13 +312,23 @@ object QueryTest {
   sparkAnswer: Seq[Row],
   isSorted: Boolean = false): Option[String] = {
 if (prepareAnswer(expectedAnswer, isSorted) != 
prepareAnswer(sparkAnswer, isSorted)) {
+  val getRowType: Option[Row] => String = row =>
+"RowType" + row.map(row =>
--- End diff --

@hvanhovell After use `schema.catalogString`

```
!== Correct Answer - 1 ==  == Spark Answer - 1 ==
!struct<_1:string,_2:string>   struct<_1:int,_2:string>
![1,a] [1,a]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17134: [SPARK-19795][SPARKR] add column functions to_json, from...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17134
  
**[Test build #73918 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73918/testReport)**
 for PR 17134 at commit 
[`3748d9b`](https://github.com/apache/spark/commit/3748d9b081a83a0f97c4c711d3dba06ee350435b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-04 Thread crackcell

Github user crackcell commented on the issue:

https://github.com/apache/spark/pull/17123
  
@imatiach-msft @cloud-fan I updated the code, replaced java.lang.Double 
with isNullAt() and getDouble().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16954
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73915/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16954
  
**[Test build #73915 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73915/testReport)**
 for PR 16954 at commit 
[`7178719`](https://github.com/apache/spark/commit/7178719aaae961a3b5b38132d09a0d4d91ade692).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73917/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73917 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73917/testReport)**
 for PR 17166 at commit 
[`91b8aef`](https://github.com/apache/spark/commit/91b8aeff8adca4454b9631a0bfa01876de71bb53).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskKilled(reason: String, override val shouldRetry: 
Boolean) extends TaskFailedReason `
  * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17035: [SPARK-19705][SQL] Preferred location supporting HDFS ca...

2017-03-04 Thread tanejagagan

Github user tanejagagan commented on the issue:

https://github.com/apache/spark/pull/17035
  
@hvanhovell 
Can you help me with this pull request


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73917 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73917/testReport)**
 for PR 17166 at commit 
[`91b8aef`](https://github.com/apache/spark/commit/91b8aeff8adca4454b9631a0bfa01876de71bb53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73916/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73916 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73916/testReport)**
 for PR 17166 at commit 
[`1a716aa`](https://github.com/apache/spark/commit/1a716aa31ff2e8e6f5d8e3b73362d28b944319f2).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskKilled(reason: String, override val shouldRetry: 
Boolean) extends TaskFailedReason `
  * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73916 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73916/testReport)**
 for PR 17166 at commit 
[`1a716aa`](https://github.com/apache/spark/commit/1a716aa31ff2e8e6f5d8e3b73362d28b944319f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17136: [SPARK-19783][SQL] Treat shorter/longer lengths o...

2017-03-04 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/17136#discussion_r104301364
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -246,8 +246,8 @@ test_that("read/write csv as DataFrame", {
   mockLinesCsv <- c("year,make,model,comment,blank",
"\"2012\",\"Tesla\",\"S\",\"No comment\",",
"1997,Ford,E350,\"Go get one now they are going 
fast\",",
-   "2015,Chevy,Volt",
-   "NA,Dummy,Placeholder")
+   "2015,Chevy,Volt,,",
--- End diff --

is there not a way to support variable number of values (and commas) in csv 
row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17094
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73914/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73914 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73914/testReport)**
 for PR 17094 at commit 
[`d7dceeb`](https://github.com/apache/spark/commit/d7dceebb5fecc22c74a4ba2a334ab8ca492a518b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16954: [SPARK-18874][SQL] First phase: Deferring the correlated...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16954
  
**[Test build #73915 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73915/testReport)**
 for PR 16954 at commit 
[`7178719`](https://github.com/apache/spark/commit/7178719aaae961a3b5b38132d09a0d4d91ade692).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-03-04 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16954#discussion_r104301109
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -109,6 +109,26 @@ object TypeCoercion {
   }
 
   /**
+   * This function determines the target type of a comparison operator 
when one operand
+   * is a String and the other is not. It also handles when one op is a 
Date and the
+   * other is a Timestamp by making the target type to be String. 
Currently this is used
+   * to coerce types between LHS and RHS of the IN expression.
+   */
+  val findCommonTypeForBinaryComparison: (DataType, DataType) => 
Option[DataType] = {
+case (StringType, DateType) => Some(StringType)
+case (DateType, StringType) => Some(StringType)
+case (StringType, TimestampType) => Some(StringType)
+case (TimestampType, StringType) => Some(StringType)
+case (TimestampType, DateType) => Some(StringType)
--- End diff --

@hvanhovell Thanks!!. I had tried to do this before as well as this came up 
during the internal review. I have made another try. Please let me know what 
you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Also please note the [UnsafeArrayData-producing 
branch](https://github.com/michalsenkyr/spark/compare/dataset-seq-builder...michalsenkyr:dataset-seq-builder-unsafe)
 that is not yet merged into this branch. I'd like to get somebody's opinion on 
that before I do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16541: [SPARK-19088][SQL] Optimize sequence type deserializatio...

2017-03-04 Thread michalsenkyr

Github user michalsenkyr commented on the issue:

https://github.com/apache/spark/pull/16541
  
Would it be possible for somebody to review this PR for me? I have a few 
ideas that are dependent on this and I'd like to get to work on them. Most 
notably support for Java Lists.
Maybe @cloud-fan could take a look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16842: [SPARK-19304] [Streaming] [Kinesis] fix kinesis slow che...

2017-03-04 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/16842
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16933
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73911/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16933
  
**[Test build #73911 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73911/testReport)**
 for PR 16933 at commit 
[`680c3af`](https://github.com/apache/spark/commit/680c3afa4f29aeffadd17798b7a06f1664964683).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class RepartitionOperation extends UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73913/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73913 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73913/testReport)**
 for PR 17166 at commit 
[`ba7cbd0`](https://github.com/apache/spark/commit/ba7cbd09ec0602ac8c9ad59966b2b45a70354bf7).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17094
  
**[Test build #73914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73914/testReport)**
 for PR 17094 at commit 
[`d7dceeb`](https://github.com/apache/spark/commit/d7dceebb5fecc22c74a4ba2a334ab8ca492a518b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17094: [SPARK-19762][ML] Hierarchy for consolidating ML aggrega...

2017-03-04 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/17094
  
Jenkins test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16998: [SPARK-19665][SQL] Improve constraint propagation

2017-03-04 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16998
  
@hvanhovell Do you have any thoughts on this already? Please let me know. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73913 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73913/testReport)**
 for PR 17166 at commit 
[`ba7cbd0`](https://github.com/apache/spark/commit/ba7cbd09ec0602ac8c9ad59966b2b45a70354bf7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-04 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/17163
  
If Avro is good at backwards compatibility it shouldn't be an issue; 
@JoshRosen seems to maintain the spark-avro package so he might have more 
insights.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73912/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73912 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73912/testReport)**
 for PR 17166 at commit 
[`e9178b6`](https://github.com/apache/spark/commit/e9178b61f356ecf4469a58a05ee4183e7beb4bf9).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskKilled(reason: String, override val shouldRetry: 
Boolean) extends TaskFailedReason `
  * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17166
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16933
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73910/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16933
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16933
  
**[Test build #73910 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73910/testReport)**
 for PR 16933 at commit 
[`0f95a6f`](https://github.com/apache/spark/commit/0f95a6f564b044c7f866ab69edd2ba0a565bb47b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class RepartitionOperation(numPartitions: Int) extends 
UnaryNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17166
  
**[Test build #73912 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73912/testReport)**
 for PR 17166 at commit 
[`e9178b6`](https://github.com/apache/spark/commit/e9178b61f356ecf4469a58a05ee4183e7beb4bf9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

2017-03-04 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/17166

[SPARK-19820] [core] Allow reason to be specified for task kill

## What changes were proposed in this pull request?

This refactors the task kill path to allow specifying a reason for the task 
kill. The reason is propagated opaquely through events, and will show up in the 
UI automatically as `(N tasks killed: $reason)` and `TaskKilled: $reason`.

Also, make the logic for whether a task failure should be retried explicit 
rather than special casing TaskKilled messages.

cc @rxin

## How was this patch tested?

Existing tests, tried killing some stages in the UI and verified the 
messages are as expected.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark kill-reason

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17166


commit e9178b61f356ecf4469a58a05ee4183e7beb4bf9
Author: Eric Liang 
Date:   2017-03-04T23:47:36Z

Allow reason to be specified for task kill




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL][WIP] Support codegen for sort-based a...

2017-03-04 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/17164
  
@maropu I think this is pretty exciting. This is very useful in situations 
where we have a lot of groups, in that case I will happily take a 2x 
performance improvement any day. This is still pretty decent if you consider 
that this aggregate is dominate by sorting.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16933
  
**[Test build #73911 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73911/testReport)**
 for PR 16933 at commit 
[`680c3af`](https://github.com/apache/spark/commit/680c3afa4f29aeffadd17798b7a06f1664964683).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16933: [SPARK-19601] [SQL] Fix CollapseRepartition rule to pres...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16933
  
**[Test build #73910 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73910/testReport)**
 for PR 16933 at commit 
[`0f95a6f`](https://github.com/apache/spark/commit/0f95a6f564b044c7f866ab69edd2ba0a565bb47b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12461: [SPARK-14409][ML] Adding a RankingEvaluator to ML

2017-03-04 Thread yongtang

Github user yongtang commented on the issue:

https://github.com/apache/spark/pull/12461
  
/cc @daniloascione please take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley

Github user jkbradley closed the pull request at:

https://github.com/apache/spark/pull/17165


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296156
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -142,18 +166,18 @@ class StringIndexerModel (
   }
 
   /** @group setParam */
-  @Since("1.6.0")
-  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
-  setDefault(handleInvalid, "error")
-
-  /** @group setParam */
   @Since("1.4.0")
   def setInputCol(value: String): this.type = set(inputCol, value)
 
   /** @group setParam */
   @Since("1.4.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
+  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
+  setDefault(handleInvalid, StringIndexer.ERROR_UNSEEN_LABEL)
--- End diff --

No need to set default here since it's set in the trait


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296099
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -105,7 +125,11 @@ class StringIndexer @Since("1.4.0") (
 
 @Since("1.6.0")
 object StringIndexer extends DefaultParamsReadable[StringIndexer] {
-
+  private[feature] val SKIP_UNSEEN_LABEL: String = "skip"
+  private[feature] val ERROR_UNSEEN_LABEL: String = "error"
+  private[feature] val KEEP_UNSEEN_LABEL: String = "keep"
+  private[feature] val supportedHandleInvalids: Array[String] =
+Array(SKIP_UNSEEN_LABEL, ERROR_UNSEEN_LABEL, KEEP_UNSEEN_LABEL)
   @Since("1.6.0")
--- End diff --

style: add newline here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296562
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +187,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
+// If we are skipping invalid records, filter them out.
+val (filteredDataset, keepInvalid) = getHandleInvalid match {
--- End diff --

I'm OK with returning a tuple; that's a common pattern.  Do you mean that 
it makes the code inside the match statement confusing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296367
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +190,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
--- End diff --

Yep, that's what I meant:  In ```withValues(labels)```, labels can be set 
as:
```
val labels = getHandleInvalid match {
  case StringIndexer.KEEP_UNSEEN_LABEL => labels :+ "__unknown"
  case _ => labels
}
```

I'm adding underscores to the attribute name to make it a little less 
likely to hit conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296546
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -71,18 +92,17 @@ class StringIndexer @Since("1.4.0") (
   def this() = this(Identifiable.randomUID("strIdx"))
 
   /** @group setParam */
-  @Since("1.6.0")
-  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
-  setDefault(handleInvalid, "error")
-
-  /** @group setParam */
   @Since("1.4.0")
   def setInputCol(value: String): this.type = set(inputCol, value)
 
   /** @group setParam */
   @Since("1.4.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("2.2.0")
+  def setHandleInvalid(value: String): this.type = set(handleInvalid, 
value)
--- End diff --

+1 for maintaining order.
setDefault will go in the trait (except in cases where it belongs in just 
one of the Estimator or Model)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296045
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -34,8 +36,27 @@ import org.apache.spark.util.collection.OpenHashMap
 /**
  * Base trait for [[StringIndexer]] and [[StringIndexerModel]].
  */
-private[feature] trait StringIndexerBase extends Params with HasInputCol 
with HasOutputCol
-with HasHandleInvalid {
+private[feature] trait StringIndexerBase extends Params with HasInputCol 
with HasOutputCol {
+
+  /**
+   * Param for how to handle unseen labels. Options are 'skip' (filter out 
rows with
+   * unseen labels), 'error' (throw an error), or 'keep' (put unseen 
labels in a special additional
+   * bucket, at index numLabels.
+   * Default: "error"
+   * @group param
+   */
+  @Since("2.1.0")
+  val handleInvalid: Param[String] = new Param[String](this, 
"handleInvalid", "how to handle " +
+"unseen labels. Options are 'skip' (filter out rows with unseen 
labels), " +
+"error (throw an error), or 'keep' (put unseen labels in a special 
additional bucket," +
--- End diff --

need space after comma: "bucket, "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296396
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -163,25 +190,28 @@ class StringIndexerModel (
 }
 transformSchema(dataset.schema, logging = true)
 
+val metadata = NominalAttribute.defaultAttr
+  .withName($(outputCol)).withValues(labels).toMetadata()
+// If we are skipping invalid records, filter them out.
+val (filteredDataset, keepInvalid) = getHandleInvalid match {
+  case SKIP_UNSEEN_LABEL =>
+val filterer = udf { label: String =>
+  labelToIndex.contains(label)
+}
+(dataset.where(filterer(dataset($(inputCol, false)
+  case _ => (dataset, getHandleInvalid == KEEP_UNSEEN_LABEL)
+}
+
 val indexer = udf { label: String =>
   if (labelToIndex.contains(label)) {
 labelToIndex(label)
+  } else if (keepInvalid) {
+labels.length
   } else {
 throw new SparkException(s"Unseen label: $label.")
--- End diff --

Can you improve the error message?
```
throw new SparkException(s"Unseen label: $label.  To handle unseen labels, 
set Param handleInvalid to ${StringIndexer.KEEP_UNSEEN_LABEL}.")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296526
  
--- Diff: docs/ml-features.md ---
@@ -542,12 +543,13 @@ column, we should get the following:
 "a" gets index `0` because it is the most frequent, followed by "c" with 
index `1` and "b" with
 index `2`.
 
-Additionally, there are two strategies regarding how `StringIndexer` will 
handle
+Additionally, there are three strategies regarding how `StringIndexer` 
will handle
 unseen labels when you have fit a `StringIndexer` on one dataset and then 
use it
 to transform another:
 
 - throw an exception (which is the default)
 - skip the row containing the unseen label entirely
+- map the unseen labels with indices [numLabels]
--- End diff --

Or just match the phrasing in the doc param


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16883: [SPARK-17498][ML] StringIndexer enhancement for h...

2017-03-04 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/16883#discussion_r104296075
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -105,7 +125,11 @@ class StringIndexer @Since("1.4.0") (
 
 @Since("1.6.0")
 object StringIndexer extends DefaultParamsReadable[StringIndexer] {
-
+  private[feature] val SKIP_UNSEEN_LABEL: String = "skip"
+  private[feature] val ERROR_UNSEEN_LABEL: String = "error"
+  private[feature] val KEEP_UNSEEN_LABEL: String = "keep"
--- End diff --

At some point, let's do that, but not yet.  I like keeping things private 
at first in case we find mistakes after release and need to change things.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17165
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17165
  
**[Test build #73909 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73909/testReport)**
 for PR 17165 at commit 
[`67f02d5`](https://github.com/apache/spark/commit/67f02d565685dc4b9be2709783539f7af1ea1bb5).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17165
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73909/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17161: [SPARK-19819][SparkR] Use concrete data in SparkR DataFr...

2017-03-04 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/17161
  
I think most examples in R packages are (supposed to be) runnable. Coming 
from a user perspective, I find it useful if I can run the examples directly 
and see what the function does in action. Since we already have the pseudo-code 
here, wouldn't it be better to change it to real data? 
Especially for the more complicated cases like `join`, providing 
self-contained examples will save users much time in constructing their own 
examples.

Indeed, by making the examples runnable, I have found and fixed several 
issues with the pseudo example. For example, the original example in 
`insertInto` seems to be wrong:
``` 
createOrReplaceTempView(df, "table1")   # This should be saveAsTable
insertInto(df2, "table1", overwrite = TRUE)
```
This is very hard to find without running real examples. 

@srowen @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17165
  
**[Test build #73909 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73909/testReport)**
 for PR 17165 at commit 
[`67f02d5`](https://github.com/apache/spark/commit/67f02d565685dc4b9be2709783539f7af1ea1bb5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17165: [DO NOT MERGE][TESTING] Vince shieh spark 17498

2017-03-04 Thread jkbradley

GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/17165

[DO NOT MERGE][TESTING] Vince shieh spark 17498

Temp PR to reproduce Jenkins compilation error

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark VinceShieh-spark-17498

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17165


commit b970728f48f22f0c2789a941c1fe1ac6b94a3b49
Author: VinceShieh 
Date:   2017-02-10T05:50:30Z

[SPARK-17498][ML] StringIndexer handles unseen labels

This PR is an enhancement to ML StringIndexer. Before this PR, String 
Indexer only supports "skip"/"error" options
to deal with unseen records. But sometimes those unseen records might still 
be useful in certain use cases, so user
would like to keep the unseen labels. This PR enables StringIndexer to 
support keeping unseen labels as indices
[numLabels].

'''Before
StringIndexer().setHandleInvalid("skip")
StringIndexer().setHandleInvalid("error")
'''After
support the third option "keep"
StringIndexer().setHandleInvalid("keep")

Signed-off-by: VinceShieh 

commit 5d4b07f517cdf52e5b3b0b786e1dba1993659b2e
Author: VinceShieh 
Date:   2017-02-10T07:02:44Z

fix compilation issue

Signed-off-by: VinceShieh 

commit 0eb7f0784a71cb695f4d936255abbe8ad30bd95d
Author: VinceShieh 
Date:   2017-02-10T08:16:57Z

code refactoring

Signed-off-by: VinceShieh 

commit 9a4174579aa811c99a81967dd829e506c0096ccd
Author: VinceShieh 
Date:   2017-02-10T09:08:30Z

add exclusion rules in mima to pass binary compability check

Signed-off-by: VinceShieh 

commit 1736057d055ad4a01dac3e9e79950bfcd9b91e1e
Author: VinceShieh 
Date:   2017-02-10T09:33:31Z

update document

Signed-off-by: VinceShieh 

commit ebe9ddb0dc3dd597d435f8a641fce790b4033a64
Author: VinceShieh 
Date:   2017-02-10T09:37:43Z

Revert "add exclusion rules in mima to pass binary compability check"

This reverts commit 9a4174579aa811c99a81967dd829e506c0096ccd.

commit 27c1b10f25db851cd1e670bd6a0d6e6f59c2ce1e
Author: VinceShieh 
Date:   2017-02-10T09:42:56Z

Mima changes to pass binary compatibility check

Signed-off-by: VinceShieh 

commit 9bcaffc19e7a11d31aa6bb9ebbcd96367fc1cd38
Author: VinceShieh 
Date:   2017-03-01T02:09:36Z

update

Signed-off-by: VinceShieh 

commit 4dc10e6390b30fa8df9789479430e0a3f7c65c39
Author: VinceShieh 
Date:   2017-03-01T02:16:29Z

update target version

Signed-off-by: VinceShieh 

commit fa24e433c3f9fe6f76fe0a55df4551881f194d7b
Author: VinceShieh 
Date:   2017-03-01T02:26:43Z

fix compilation on
val (filteredDataset, keepInvalid) = getHandleInvalid match {
  case ..
}

Signed-off-by: VinceShieh 

commit 67f02d565685dc4b9be2709783539f7af1ea1bb5
Author: Joseph K. Bradley 
Date:   2017-03-04T20:08:21Z

remove scala existentials import




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2017-03-04 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15274
  
Based on the comment @marmbrus in a JIRA, we prefer to using our DDL 
format. For example, like what we did for CREATE TABLE, we can specify the 
schema using `a int, b string`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17163
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73906/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17163: [SPARK-16617][BUILD][CORE] Upgrade to Avro 1.8.x

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17163
  
**[Test build #73906 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73906/testReport)**
 for PR 17163 at commit 
[`9461741`](https://github.com/apache/spark/commit/94617414ef580bc0ce2934c1c8e7e22423eff51e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15274: [SPARK-17699] Support for parsing JSON string columns

2017-03-04 Thread Sazpaimon

Github user Sazpaimon commented on the issue:

https://github.com/apache/spark/pull/15274
  
@gatorsmile Alternatively, one can use do what brickhouse's `from_json` 
Hive UDF does ( https://gist.github.com/jeromebanks/8855408#file-gistfile1-sql )

(For the record, I actually need this in SQL)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL][WIP] Support codegen for sort-based a...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17164
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73908/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL][WIP] Support codegen for sort-based a...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17164
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17164: [SPARK-16844][SQL][WIP] Support codegen for sort-based a...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17164
  
**[Test build #73908 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73908/testReport)**
 for PR 17164 at commit 
[`9a26a0a`](https://github.com/apache/spark/commit/9a26a0a0e9c7f9d0e90dc5257eb5038eafeb206c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class AggregateExec extends UnaryExecNode `
  * `trait CodegenAggregateSupport extends CodegenSupport `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73905/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-03-04 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-03-04 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16611
  
**[Test build #73905 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73905/testReport)**
 for PR 16611 at commit 
[`9f7e679`](https://github.com/apache/spark/commit/9f7e679586b9ede33d10ef0cd7db2fba3237c712).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 208 matches

Mail list logo