date:20161214

[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'latestFirst' option to File...

2016-12-14 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16251#discussion_r92558907
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -1059,6 +1060,72 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 val str = 
Source.fromFile(getClass.getResource(s"/structured-streaming/$file").toURI).mkString
 SerializedOffset(str.trim)
   }
+
+  test("FileStreamSource - latestFirst") {
+withTempDir { src =>
+  // Prepare two files: 1.txt, 2.txt, and make sure they have 
different modified time.
+  val f1 = stringToFile(new File(src, "1.txt"), "1")
+  val f2 = stringToFile(new File(src, "2.txt"), "2")
+  eventually(timeout(streamingTimeout)) {
+f2.setLastModified(System.currentTimeMillis())
+assert(f1.lastModified < f2.lastModified)
+  }
+
+  // Read oldest files first, so the first batch is "1", and the 
second batch is "2".
+  val fileStream = createFileStream(
+"text",
+src.getCanonicalPath,
+options = Map("latestFirst" -> "false", "maxFilesPerTrigger" -> 
"1"))
+  val clock = new StreamManualClock()
+  testStream(fileStream)(
+StartStream(trigger = ProcessingTime(10), triggerClock = clock),
+AssertOnQuery { _ =>
--- End diff --

why do you need to wait on the manual clock? CheckLastBatch will 
automatically wait for the batch to complete? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16281
  
Parquet is the default format of Spark. It is pretty significant to Spark. 
Now, Parquet is becoming stable and might be the right time to fork it. We are 
just fixing the bugs. @liancheng and @rdblue are Parquet committers. They might 
be the right person to judge the changes we made in the forked version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16251: [SPARK-18826][SS]Add 'latestFirst' option to File...

2016-12-14 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/16251#discussion_r92558557
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -1059,6 +1060,72 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 val str = 
Source.fromFile(getClass.getResource(s"/structured-streaming/$file").toURI).mkString
 SerializedOffset(str.trim)
   }
+
+  test("FileStreamSource - latestFirst") {
+withTempDir { src =>
+  // Prepare two files: 1.txt, 2.txt, and make sure they have 
different modified time.
+  val f1 = stringToFile(new File(src, "1.txt"), "1")
+  val f2 = stringToFile(new File(src, "2.txt"), "2")
+  eventually(timeout(streamingTimeout)) {
--- End diff --

why use eventually? Why not just set f1.setLatModified(f2.lastModified + 
1000)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16294: [WIP][SPARK-18669][SS][DOCS] Update Apache docs for Stru...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16294
  
**[Test build #70182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70182/testReport)**
 for PR 16294 at commit 
[`ed8d9e0`](https://github.com/apache/spark/commit/ed8d9e0e40292979ff250ceab76ff966510f2597).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16293: [SPARK-17119][Core]allow the history server to delete .i...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16293
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16289
  
**[Test build #70183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70183/testReport)**
 for PR 16289 at commit 
[`9cc8d2b`](https://github.com/apache/spark/commit/9cc8d2b65e38d8cf1395ce265e5ee08d6006a19c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16294: [WIP][SPARK-18669][SS][DOCS] Update Apache docs f...

2016-12-14 Thread tdas

GitHub user tdas opened a pull request:

https://github.com/apache/spark/pull/16294

[WIP][SPARK-18669][SS][DOCS] Update Apache docs for Structured Streaming 
regarding watermarking and status

## What changes were proposed in this pull request?

- Extended the Window operation section with code snippet and explanation 
of watermarking
- Extended the Output Mode section with a table showing the compatibility 
between query type and output mode
- Rewrote the Monitoring section with update jsons generated by 


## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tdas/spark SPARK-18669

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16294.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16294


commit a31c861c31a977537aa3b4c86a2fd1ea1ee544a3
Author: Tathagata Das 
Date:   2016-12-14T02:52:20Z

Added watermarking

commit ed8d9e0e40292979ff250ceab76ff966510f2597
Author: Tathagata Das 
Date:   2016-12-14T20:36:42Z

Update metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16293: [SPARK-17119][Core]allow the history server to de...

2016-12-14 Thread cnZach

GitHub user cnZach opened a pull request:

https://github.com/apache/spark/pull/16293

[SPARK-17119][Core]allow the history server to delete .inprogress 
files(configurable)

## What changes were proposed in this pull request?
The History Server (HS) currently only considers completed applications 
when deleting event logs from spark.history.fs.logDirectory (since SPARK-6879). 
This means that over time, .inprogress files (from failed jobs, jobs where the 
SparkContext is not closed, spark-shell exits etc...) can accumulate and impact 
the HS.

Instead of having to manually delete these files,  this change add a 
configurable feature to let user decide if the .inprogress files should also be 
deleted after a period of time:
spark.history.fs.cleaner.deleteInProgress.enabled
spark.history.fs.cleaner.noProgressMaxAge

## How was this patch tested?

verified with manual tests
unit tests added in FsHistoryProviderSuite.scala but I am not able to run 
./dev/run-tests for the whole project on my laptop, failed on SparkSinkSuite 
and network related tests uner org.apache.spark.network.* (all due to  
java.io.IOException: Failed to connect to /:62343).

[info] SparkSinkSuite:
[info] - Success with ack *** FAILED *** (1 minute)
[info]   java.io.IOException: Error connecting to /0.0.0.0:62298
[info]   at 
org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:261)


## doc ##
monitoring.md is also updated

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cnZach/spark SPARK-17119

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16293.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16293


commit aa45caa42a7bc1b4a06e6634f9a40c4db6b83a89
Author: Yuexin Zhang 
Date:   2016-12-15T06:19:11Z

allow the history server to delete .inprogress files and make it 
configurable

commit f281d92a49e54f64f157f8d2936a13a73c7284cb
Author: Yuexin Zhang 
Date:   2016-12-15T06:39:12Z

fix a typo noProgressMaxAg -> noProgressMaxAge

commit 989422d310a0addeb25217e61fda85c34e5d4c89
Author: Yuexin Zhang 
Date:   2016-12-15T06:41:57Z

fix checkstyle failures in FsHistoryProviderSuite.scala




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70175/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70175 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70175/testReport)**
 for PR 16263 at commit 
[`a2d071d`](https://github.com/apache/spark/commit/a2d071d6f5ab916f9e39b5ccb50e4fb11cba183d).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70178/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #70178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70178/testReport)**
 for PR 16030 at commit 
[`dc54b69`](https://github.com/apache/spark/commit/dc54b699c3c93f11eaa93063b3b950e04c614a56).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70176/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70176/testReport)**
 for PR 16263 at commit 
[`67882d2`](https://github.com/apache/spark/commit/67882d2d4ebfad955b07cf0020c726ea5a153864).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16285: [SPARK-18867] [SQL] Throw cause if IsolatedClientLoad ca...

2016-12-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16285
  
Is it ever possible cause is null?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70177/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #70177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70177/testReport)**
 for PR 16030 at commit 
[`5b23b89`](https://github.com/apache/spark/commit/5b23b89a4a0b9b16f16c56d03fc226b8eb53c92f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92553628
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

From my test, the `spark-warehouse` directory is created when I run `a <- 
createDataFrame(iris)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92553387
  
--- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R ---
@@ -2165,6 +2165,14 @@ test_that("SQL error message is returned from JVM", {
   expect_equal(grepl("blah", retError), TRUE)
 })
 
+test_that("Default warehouse dir should be set to tempdir", {
+  # nothing should be written outside tempdir() without explicit user 
permission
+  inital_working_directory_files <- list.files()
--- End diff --

I'm referring to other tests in this test file, test_sparkSQL, that is 
calling to API that might already initialize the warehouse dir.

`sparkR.session()` is called at the top. Does this 
`createOrReplaceTempView` cause the warehouse dir to be created?


https://github.com/shivaram/spark-1/blob/25834109588e8e545deafb1da162958766a057e2/R/pkg/inst/tests/testthat/test_sparkSQL.R#L570



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

2016-12-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16030#discussion_r92551213
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala
 ---
@@ -49,9 +49,12 @@ case class HadoopFsRelation(
   override def sqlContext: SQLContext = sparkSession.sqlContext
 
   val schema: StructType = {
-val dataSchemaColumnNames = dataSchema.map(_.name.toLowerCase).toSet
-StructType(dataSchema ++ partitionSchema.filterNot { column =>
-  dataSchemaColumnNames.contains(column.name.toLowerCase)
+val equality = sparkSession.sessionState.conf.resolver
+val overriddenDataSchema = dataSchema.map { dataField =>
--- End diff --

how about
```
val getColName: (StructField => String) = if (conf.caseSensitive) _.name 
else _.name.toLowerCase
val overlappedPartCols = mutable.Map.empty[String, StructField]
for {
  dataField <- dataSchema
  partitionField <- partitionSchema
  if getColName(dataField) == getColName(partitionField)
} overlappedPartCols += getColName(partitionField) -> partitionField

StructType(dataSchema.map(f => overlappedPartCols.getOrElse(getColName(f), 
f)) ++
  partitionSchema.filterNot(f => 
overlappedPartCols.contains(getColName(f
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16290
  
**[Test build #70181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70181/testReport)**
 for PR 16290 at commit 
[`1d0d1d2`](https://github.com/apache/spark/commit/1d0d1d219f392721e9be73e21752100db0ce065f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16276: [SPARK-18855][CORE] Add RDD flatten function

2016-12-14 Thread linbojin

Github user linbojin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16276#discussion_r92550699
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -381,6 +381,14 @@ abstract class RDD[T: ClassTag](
   }
 
   /**
+*  Return a new RDD by flattening all elements from RDD with 
traversable elements
+*/
+  def flatten[U: ClassTag](implicit asTraversable: T => 
TraversableOnce[U]): RDD[U] = withScope {
--- End diff --

@srowen I think i figured out a simpler way:
```
  def flatten[U: ClassTag](implicit asTraversable: T => 
TraversableOnce[U]): RDD[U] = withScope {
new MapPartitionsRDD[U, T](this, (context, pid, iter) => {
  var newIter: Iterator[U] = Iterator.empty
  for (x <- iter) newIter ++= asTraversable(x)
  newIter
})
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

2016-12-14 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16290
  
If the default database has already been created in the metastore, any 
following changes of `spark.sql.default.warehouse.dir` can trigger an issue 
when we create a data source table in the default database (Here, we assume 
Hive support is enabled). Note, we will not hit any issue if we create a Hive 
serde table in the default database, or create a data source table in the 
non-default database. 

The directory of managed data source tables is created by Hive. When 
creating a new data source table, the created directory is based on the current 
value of `hive.metastore.warehouse.dir`. However, the value of table location 
in the metastore is pointing to the child directory of the location of the 
default database. Thus, you will not hit any issue when you creating such a 
table. However, the mismatch will cause a problem (because the expected 
directory does not exist), when we try to select from /insert into this table. 
This is a bug of Hive metastore. 

@dilipbiswal hit this issue very recently. Below shows the location of 
these two tables. 

`t11` is a Hive managed data source table we created in the default 
database. 
```
spark-sql> describe extended t11;
...
Storage(Location: file:/user/hive/warehouse/t11, InputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, OutputFormat: 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, Serde: 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))
Time taken: 0.105 seconds, Fetched 8 row(s)
```  

`t1` is a Hive managed data source table we created in the non-default 
database. 
```
spark-sql> use dilip;
Time taken: 0.028 seconds
spark-sql> describe extended t1;
...
Storage(Location: 
file:/home/cloudera/mygit/apache/spark/bin/spark-warehouse/dilip.db/t1, 
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat, 
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat, 
Serde: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Properties: 
[serialization.format=1]))
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehous...

2016-12-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16290#discussion_r92548697
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
@@ -55,14 +55,19 @@ private[sql] class SharedState(val sparkContext: 
SparkContext) extends Logging {
 s"is set. Setting ${WAREHOUSE_PATH.key} to the value of " +
 s"hive.metastore.warehouse.dir ('$hiveWarehouseDir').")
   hiveWarehouseDir
-} else {
+} else if (sparkContext.conf.contains(WAREHOUSE_PATH.key) &&
+   sparkContext.conf.get(WAREHOUSE_PATH).isDefined) {
--- End diff --

Nit: indent is not right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16286: [SPARK-18849][ML][SPARKR][DOC] vignettes final ch...

2016-12-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16286


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16249: [SPARK-18828][SPARKR] Refactor scripts for R

2016-12-14 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16249
  
Just FYI, I'm holding off on this till 2.1 - I think its better to make 
this change after the release just to be safe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16286: [SPARK-18849][ML][SPARKR][DOC] vignettes final check upd...

2016-12-14 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16286
  
Merging this to master, branch-2.1 - to catch the next RC. @mengxr feel 
free to open a follow up if you find anything ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-14 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15996#discussion_r92547533
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -363,48 +364,120 @@ final class DataFrameWriter[T] private[sql](ds: 
Dataset[T]) {
   throw new AnalysisException("Cannot create hive serde table with 
saveAsTable API")
 }
 
-val tableExists = 
df.sparkSession.sessionState.catalog.tableExists(tableIdent)
-
-(tableExists, mode) match {
-  case (true, SaveMode.Ignore) =>
-// Do nothing
-
-  case (true, SaveMode.ErrorIfExists) =>
-throw new AnalysisException(s"Table $tableIdent already exists.")
-
-  case _ =>
-val existingTable = if (tableExists) {
-  
Some(df.sparkSession.sessionState.catalog.getTableMetadata(tableIdent))
-} else {
-  None
-}
-val storage = if (tableExists) {
-  existingTable.get.storage
-} else {
-  DataSource.buildStorageFormatFromOptions(extraOptions.toMap)
-}
-val tableType = if (tableExists) {
-  existingTable.get.tableType
-} else if (storage.locationUri.isDefined) {
-  CatalogTableType.EXTERNAL
-} else {
-  CatalogTableType.MANAGED
+val catalog = df.sparkSession.sessionState.catalog
+val db = tableIdent.database.getOrElse(catalog.getCurrentDatabase)
+val tableIdentWithDB = tableIdent.copy(database = Some(db))
+catalog.getTableMetadataOption(tableIdent) match {
+  // If the table already exists...
+  case Some(tableMeta) =>
+mode match {
+  case SaveMode.Ignore => // Do nothing
+
+  case SaveMode.ErrorIfExists =>
+throw new AnalysisException(s"Table $tableIdent already 
exists. You can set SaveMode " +
+  "to SaveMode.Append to insert data into the table or set 
SaveMode to " +
+  "SaveMode.Overwrite to overwrite the existing data.")
+
+  case SaveMode.Append =>
+// Check if the specified data source match the data source of 
the existing table.
+val specifiedProvider = DataSource.lookupDataSource(source)
+// TODO: Check that options from the resolved relation match 
the relation that we are
+// inserting into (i.e. using the same compression).
+
+// Pass a table identifier with database part, so that 
`lookupRelation` won't get temp
+// views unexpectedly.
+
EliminateSubqueryAliases(catalog.lookupRelation(tableIdentWithDB)) match {
+  case l @ LogicalRelation(_: InsertableRelation | _: 
HadoopFsRelation, _, _) =>
+// check if the file formats match
+l.relation match {
+  case r: HadoopFsRelation if r.fileFormat.getClass != 
specifiedProvider =>
+throw new AnalysisException(
+  s"The file format of the existing table $tableIdent 
is " +
+s"`${r.fileFormat.getClass.getName}`. It doesn't 
match the specified " +
+s"format `$source`")
+  case _ =>
+}
+  case s: SimpleCatalogRelation if 
DDLUtils.isDatasourceTable(s.metadata) => // OK.
+  case c: CatalogRelation if c.catalogTable.provider == 
Some(DDLUtils.HIVE_PROVIDER) =>
+throw new AnalysisException("Saving data in the Hive serde 
table " +
+  s"${c.catalogTable.identifier} is not supported yet. 
Please use the " +
+  "insertInto() API as an alternative..")
+  case o =>
+throw new AnalysisException(s"Saving data in ${o.toString} 
is not supported.")
+}
+
+val existingSchema = tableMeta.schema
+if (df.logicalPlan.schema.size != existingSchema.size) {
+  throw new AnalysisException(
+s"The column number of the existing 
schema[$existingSchema] " +
+  s"doesn't match the data 
schema[${df.logicalPlan.schema}]")
+}
+
+val specifiedPartCols = partitioningColumns.getOrElse(Nil)
+val existingPartCols = tableMeta.partitionColumnNames
+if (specifiedPartCols.map(_.toLowerCase) != 
existingPartCols.map(_.toLowerCase)) {
+  throw new AnalysisException("The partition columns of the 
existing table " +
+s"$tableIdent are: [${existingPartCols.mkString(", ")}]. 
It doesn't match the " +
+s"specified partition columns: 
[${specifiedPartCols.mkString(", ")}]")

[GitHub] spark pull request #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generat...

2016-12-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16292


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16292
  
Thank you, @shivaram !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16292
  
LGTM. Thanks @dongjoon-hyun - Merging this to master, branch-2.1 and 
branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16292
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70179/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16276: [SPARK-18855][CORE] Add RDD flatten function

2016-12-14 Thread linbojin

Github user linbojin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16276#discussion_r92546374
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -381,6 +381,14 @@ abstract class RDD[T: ClassTag](
   }
 
   /**
+*  Return a new RDD by flattening all elements from RDD with 
traversable elements
+*/
+  def flatten[U: ClassTag](implicit asTraversable: T => 
TraversableOnce[U]): RDD[U] = withScope {
--- End diff --

Hi @srowen, thx for your suggestion. I have one way to use scala flatMap as 
follows:
```
  def flatten[U: ClassTag](implicit asTraversable: T => 
TraversableOnce[U]): RDD[U] = withScope {
val f = (x: T) => asTraversable(x)
val cleanF = sc.clean(f)
new MapPartitionsRDD[U, T](this, (context, pid, iter) => 
iter.flatMap(f))
  }
```
Or i implement the logic by myself:
```
  def flatten[U: ClassTag](implicit asTraversable: T => 
TraversableOnce[U]): RDD[U] = withScope {
new MapPartitionsRDD[U, T](this, (context, pid, iter) => new 
Iterator[U] {
  private val empty = Iterator.empty
  private var cur: Iterator[U] = empty
  private def nextCur() { cur = asTraversable(iter.next).toIterator }
  def hasNext: Boolean = {
while (!cur.hasNext) {
  if (!iter.hasNext) return false
  nextCur()
}
true
  }
  def next(): U = (if (hasNext) cur else empty).next()
})
  }
```
ref: 
https://github.com/scala/scala/blob/v2.11.8/src/library/scala/collection/Iterator.scala#L432

Which one do you think is better?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16292
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16292
  
**[Test build #70179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70179/testReport)**
 for PR 16292 at commit 
[`50a5c2e`](https://github.com/apache/spark/commit/50a5c2e51d1bc99f7237ca896ce406caa33cd9bc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70174/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16263
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70174 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70174/testReport)**
 for PR 16263 at commit 
[`003da89`](https://github.com/apache/spark/commit/003da89d22f04cae62de1a3ed38d105d42fe0051).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16240: [SPARK-16792][SQL] Dataset containing a Case Clas...

2016-12-14 Thread aray

Github user aray commented on a diff in the pull request:

https://github.com/apache/spark/pull/16240#discussion_r92546082
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLImplicits.scala 
---
@@ -100,31 +100,76 @@ abstract class SQLImplicits {
   // Seqs
 
   /** @since 1.6.1 */
-  implicit def newIntSeqEncoder: Encoder[Seq[Int]] = ExpressionEncoder()
+  implicit def newIntSeqEncoder[T <: Seq[Int] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newLongSeqEncoder: Encoder[Seq[Long]] = ExpressionEncoder()
+  implicit def newLongSeqEncoder[T <: Seq[Long] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newDoubleSeqEncoder: Encoder[Seq[Double]] = 
ExpressionEncoder()
+  implicit def newDoubleSeqEncoder[T <: Seq[Double] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newFloatSeqEncoder: Encoder[Seq[Float]] = 
ExpressionEncoder()
+  implicit def newFloatSeqEncoder[T <: Seq[Float] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newByteSeqEncoder: Encoder[Seq[Byte]] = ExpressionEncoder()
+  implicit def newByteSeqEncoder[T <: Seq[Byte] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newShortSeqEncoder: Encoder[Seq[Short]] = 
ExpressionEncoder()
+  implicit def newShortSeqEncoder[T <: Seq[Short] : TypeTag]: Encoder[T] = 
ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newBooleanSeqEncoder: Encoder[Seq[Boolean]] = 
ExpressionEncoder()
+  implicit def newBooleanSeqEncoder[T <: Seq[Boolean] : TypeTag]: 
Encoder[T] = ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newStringSeqEncoder: Encoder[Seq[String]] = 
ExpressionEncoder()
+  implicit def newStringSeqEncoder[T <: Seq[String] : TypeTag]: Encoder[T] 
= ExpressionEncoder()
 
   /** @since 1.6.1 */
-  implicit def newProductSeqEncoder[A <: Product : TypeTag]: 
Encoder[Seq[A]] = ExpressionEncoder()
+  implicit def newProductSeqEncoder[A <: Product : TypeTag, T <: Seq[A] : 
TypeTag]: Encoder[T] =
+ExpressionEncoder()
+
+  // Seqs with product (List) disambiguation
+
+  /** @since 2.2.0 */
+  implicit def newIntSeqWithProductEncoder[T <: Seq[Int] with Product : 
TypeTag]: Encoder[T] =
+newIntSeqEncoder
+
+  /** @since 2.2.0 */
+  implicit def newLongSeqWithProductEncoder[T <: Seq[Long] with Product : 
TypeTag]: Encoder[T] =
+newLongSeqEncoder
+
+  /** @since 2.2.0 */
+  implicit def newDoubleListEncoder[T <: Seq[Double] with Product : 
TypeTag]: Encoder[T] =
--- End diff --

Should this be `newDoubleSeqWithProductEncoder`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16291
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70180/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16289: [SPARK-18870] Disallowed Distinct Aggregations on...

2016-12-14 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16289#discussion_r92545873
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala
 ---
@@ -95,6 +96,15 @@ object UnsupportedOperationChecker {
   // Operations that cannot exists anywhere in a streaming plan
   subPlan match {
 
+case Aggregate(_, aggregateExpressions, child) =>
+  val distinctAggExprs = aggregateExpressions.flatMap { expr =>
+expr.collect { case ae: AggregateExpression if ae.isDistinct 
=> ae }
+  }
+  throwErrorIf(
+child.isStreaming && distinctAggExprs.nonEmpty,
+"Distinct aggregations are not supported on streaming 
DataFrames/Datasets, unless" +
--- End diff --

you need an extra space here. I'd also recommend users using approximate 
distinct.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16291
  
**[Test build #70180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70180/testReport)**
 for PR 16291 at commit 
[`defd536`](https://github.com/apache/spark/commit/defd536bd3a1692156a3bcc82526ffbea01ca702).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838][WIP] Use separate executor service for eac...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16291
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
@liancheng As for `DataFrameReader.dataSchema()` and 
`DataFrameReader.partitoinSchema()`, did you mean we add new interfaces there 
for users to set user-defined data and partition schema, respectively?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16291
  
**[Test build #70180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70180/testReport)**
 for PR 16291 at commit 
[`defd536`](https://github.com/apache/spark/commit/defd536bd3a1692156a3bcc82526ffbea01ca702).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16288: [SPARK-18869][SQL] Add TreeNode.p that returns Ba...

2016-12-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16288


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...

2016-12-14 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16228
  
@Tagar We can always find extreme cases to which these formula can't apply. 
In my opinion, it's better to over-estimate than under-estimate, which can lead 
to OOM problems, e.g. broadcast a very large result.

If A is a big table and B is a small one, every A.k has a match in B (a 
common case for PK and FK), then 

> cardinality(A) + cardinality(B) - inner_join_cardinality(table_A, 
table_B))

becomes card(B), which is dramatically smaller than the real outer join 
card. Even more, it can be negative if all A.k and B.k has the same value, the 
inner join part becomes a cartesian product.

This formula,
> cardinality = MAX(card(A) + card(B), innerCard(AB))

although over estimates sometimes, it's still obviously better than the 
original one in spark: card(A) * card(B).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/13909#discussion_r92545598
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -56,33 +58,93 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
   }
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-val arrayClass = classOf[GenericArrayData].getName
-val values = ctx.freshName("values")
-ctx.addMutableState("Object[]", values, s"this.$values = null;")
+val array = ctx.freshName("array")
 
-ev.copy(code = s"""
-  this.$values = new Object[${children.size}];""" +
+val et = dataType.elementType
+val evals = children.map(e => e.genCode(ctx))
+val isPrimitiveArray = ctx.isPrimitiveType(et)
+val primitiveTypeName = if (isPrimitiveArray) 
ctx.primitiveTypeName(et) else ""
+val (preprocess, arrayData, arrayWriter) =
+  GenArrayData.getCodeArrayData(ctx, et, children.size, 
isPrimitiveArray, array)
+
+ev.copy(code =
+  preprocess +
   ctx.splitExpressions(
 ctx.INPUT_ROW,
-children.zipWithIndex.map { case (e, i) =>
-  val eval = e.genCode(ctx)
-  eval.code + s"""
-if (${eval.isNull}) {
-  $values[$i] = null;
+evals.zipWithIndex.map { case (eval, i) =>
+  eval.code +
+(if (isPrimitiveArray) {
+  (if (!children(i).nullable) {
+s"\n$arrayWriter.write($i, ${eval.value});"
+  } else {
+s"""
+if (${eval.isNull}) {
+  $arrayWriter.setNull$primitiveTypeName($i);
+} else {
+  $arrayWriter.write($i, ${eval.value});
+}
+   """
+  })
 } else {
-  $values[$i] = ${eval.value};
-}
-   """
+  s"""
+  if (${eval.isNull}) {
+$array[$i] = null;
+  } else {
+$array[$i] = ${eval.value};
+  }
+ """
+})
 }) +
-  s"""
-final ArrayData ${ev.value} = new $arrayClass($values);
-this.$values = null;
-  """, isNull = "false")
+  s"\nfinal ArrayData ${ev.value} = $arrayData;\n",
+  isNull = "false")
   }
 
   override def prettyName: String = "array"
 }
 
+private [sql] object GenArrayData {
+  // This function returns Java code pieces based on DataType and 
isPrimitive
+  // for allocation of ArrayData class
+  def getCodeArrayData(
+  ctx: CodegenContext,
+  dt: DataType,
+  size: Int,
+  isPrimitive : Boolean,
+  array: String): (String, String, String) = {
+if (!isPrimitive) {
+  val arrayClass = classOf[GenericArrayData].getName
+  ctx.addMutableState("Object[]", array,
+s"this.$array = new Object[${size}];")
+  ("", s"new $arrayClass($array)", null)
+} else {
+  val holder = ctx.freshName("holder")
+  val arrayWriter = ctx.freshName("createArrayWriter")
+  val unsafeArrayClass = classOf[UnsafeArrayData].getName
+  val holderClass = classOf[BufferHolder].getName
+  val arrayWriterClass = classOf[UnsafeArrayWriter].getName
+  ctx.addMutableState(unsafeArrayClass, array, "")
+  ctx.addMutableState(holderClass, holder, "")
+  ctx.addMutableState(arrayWriterClass, arrayWriter, "")
+  val baseOffset = Platform.BYTE_ARRAY_OFFSET
+  val unsafeArraySizeInBytes =
+UnsafeArrayData.calculateHeaderPortionInBytes(size) +
+ByteArrayMethods.roundNumberOfBytesToNearestWord(dt.defaultSize * 
size)
+
+  (s"""
+$array = new $unsafeArrayClass();
+$holder = new $holderClass($unsafeArraySizeInBytes);
+$arrayWriter = new $arrayWriterClass();
--- End diff --

@cloud-fan what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by ...

2016-12-14 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16232#discussion_r92545564
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 ---
@@ -96,13 +98,35 @@ public UnsafeKVExternalSorter(
 numElementsForSpillThreshold,
 canUseRadixSort);
 } else {
-  // The array will be used to do in-place sort, which require half of 
the space to be empty.
-  assert(map.numKeys() <= map.getArray().size() / 2);
+  // Becasue we insert the number of values in the map into 
`UnsafeInMemorySorter`, if
+  // the number of values is more than the number of keys, and the 
array in the map is
+  // not big enough to do in-place sort, we must acquire new array.
+  // To insert a record into `UnsafeInMemorySorter` will consume two 
spaces in the array.
+  // We must have half of the array as empty. There are totally 
`map.numValues()` records
+  // to be inserted.
+  LongArray sortArray = null;
+  boolean useAllocatedArray = false;
+  if (map.numValues() > map.numKeys() && map.numValues() * 2 > 
map.getArray().size() / 2) {
--- End diff --

oh. I added the comment to explain the correct number. So I keep the 
multiplication and division to make it clear and match the explanation. Do you 
prefer to simplify it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread yhuai

Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16288
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16280: [SPARK-18856][SQL] non-empty partitioned table sh...

2016-12-14 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16280


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16292
  
cc @shivaram @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...

2016-12-14 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/16280
  
Merging in master/branch-2.1.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generation by ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16292
  
**[Test build #70179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70179/testReport)**
 for PR 16292 at commit 
[`50a5c2e`](https://github.com/apache/spark/commit/50a5c2e51d1bc99f7237ca896ce406caa33cd9bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16292: [SPARK-18875][SPARKR][DOCS] Fix R API doc generat...

2016-12-14 Thread dongjoon-hyun

GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/16292

[SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding 
`DESCRIPTION` file

## What changes were proposed in this pull request?

Since Apache Spark 1.4.0, R API document page has a broken link on 
`DESCRIPTION file` because Jekyll plugin script doesn't copy the file. This PR 
aims to fix that.

- Official Latest Website: 
http://spark.apache.org/docs/latest/api/R/index.html
- Apache Spark 2.1.0-rc2: 
http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/api/R/index.html

## How was this patch tested?

Manual.

```bash
cd docs
SKIP_SCALADOC=1 jekyll build
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-18875

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16292


commit 50a5c2e51d1bc99f7237ca896ce406caa33cd9bc
Author: Dongjoon Hyun 
Date:   2016-12-15T04:38:21Z

[SPARK-18875][SPARKR][DOCS] Fix R API doc generation by adding 
`DESCRIPTION` file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #70178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70178/testReport)**
 for PR 16030 at commit 
[`dc54b69`](https://github.com/apache/spark/commit/dc54b699c3c93f11eaa93063b3b950e04c614a56).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/16030
  
@cloud-fan Does the latest fix satisfy what you suggested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #70177 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70177/testReport)**
 for PR 16030 at commit 
[`5b23b89`](https://github.com/apache/spark/commit/5b23b89a4a0b9b16f16c56d03fc226b8eb53c92f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70176 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70176/testReport)**
 for PR 16263 at commit 
[`67882d2`](https://github.com/apache/spark/commit/67882d2d4ebfad955b07cf0020c726ea5a153864).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

2016-12-14 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/16030#discussion_r92543342
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -969,4 +969,17 @@ class ParquetPartitionDiscoverySuite extends QueryTest 
with ParquetTest with Sha
   ))
 }
   }
+
+  test("SPARK-18108 Partition discovery fails with explicitly written long 
partitions") {
--- End diff --

yea, thanks. I'm now working on this and I'll update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16030: [SPARK-18108][SQL] Fix a bug to fail partition sc...

2016-12-14 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16030#discussion_r92542776
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetPartitionDiscoverySuite.scala
 ---
@@ -969,4 +969,17 @@ class ParquetPartitionDiscoverySuite extends QueryTest 
with ParquetTest with Sha
   ))
 }
   }
+
+  test("SPARK-18108 Partition discovery fails with explicitly written long 
partitions") {
--- End diff --

I think it's not `Partition discovery fails`, but `parquet reader fails`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16280
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70167/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16280
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16280: [SPARK-18856][SQL] non-empty partitioned table should no...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16280
  
**[Test build #70167 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70167/testReport)**
 for PR 16280 at commit 
[`1628b29`](https://github.com/apache/spark/commit/1628b29f90eb97f0d951c3850518ce6bd9b49d2c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [WIP] [SPARK-17076] [SQL] Cardinality estimation for joi...

2016-12-14 Thread Tagar

Github user Tagar commented on the issue:

https://github.com/apache/spark/pull/16228
  
@wzhfy, it's easier to check validity of these type of expressions when you 
look at extreme cases.
Your formula for full outer join cardinality,

> cardinality = MAX(card(A) + card(B), innerCard(AB)) 

in one of extreme cases when set(A) and set(B) are the same sets, then 
calculated cardinality would be 2 times more of the actual cardinality.

While 

> full_outer_join_cardinality(table_A, table_B) = cardinality(A) + 
cardinality(B) - inner_join_cardinality(table_A, table_B))

will produce correct result. 

ps. I find this visualization 
http://www.radacad.com/wp-content/uploads/2015/07/joins.jpg very helpful. 
https://en.wikipedia.org/wiki/Inclusion%E2%80%93exclusion_principle A U B = 
A + B - A \ B

Hope this helps. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16288
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16288
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70164/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16288
  
**[Test build #70164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70164/testReport)**
 for PR 16288 at commit 
[`f498b4a`](https://github.com/apache/spark/commit/f498b4a0f2cbc9d3f0c038b73053c8018a6d9984).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16289
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70170/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16289
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16282
  
**[Test build #70153 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70153/testReport)**
 for PR 16282 at commit 
[`c4e6962`](https://github.com/apache/spark/commit/c4e6962dbf22c2ec7658f95fd1be069628860855).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16282
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70153/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16289: [SPARK-18870] Disallowed Distinct Aggregations on Stream...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16289
  
**[Test build #70170 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70170/testReport)**
 for PR 16289 at commit 
[`dd2b2c8`](https://github.com/apache/spark/commit/dd2b2c8c1a11b0b9d9fa70dc146a36c65d94530c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16282: [DO_NOT_MERGE]Try to fix kafka

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16282
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70175 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70175/testReport)**
 for PR 16263 at commit 
[`a2d071d`](https://github.com/apache/spark/commit/a2d071d6f5ab916f9e39b5ccb50e4fb11cba183d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16263: [SPARK-18281][SQL][PySpark] Consumes the returned local ...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16263
  
**[Test build #70174 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70174/testReport)**
 for PR 16263 at commit 
[`003da89`](https://github.com/apache/spark/commit/003da89d22f04cae62de1a3ed38d105d42fe0051).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16290
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16290
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70171/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16288
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16288
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70166/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16290: [SPARK-18817] [SPARKR] [SQL] Set default warehouse dir t...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16290
  
**[Test build #70171 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70171/testReport)**
 for PR 16290 at commit 
[`2583410`](https://github.com/apache/spark/commit/25834109588e8e545deafb1da162958766a057e2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16288: [SPARK-18869][SQL] Add TreeNode.p that returns BaseType

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16288
  
**[Test build #70166 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70166/testReport)**
 for PR 16288 at commit 
[`62acdb6`](https://github.com/apache/spark/commit/62acdb6ecbf8c645be2cecbd7202819f74438efe).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16281: [SPARK-13127][SQL] Update Parquet to 1.9.0

2016-12-14 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16281
  
Thank you for review, @rxin .
Forking may give more controllability, but it goes invisible (in terms of 
documents) soon.
According to the recent mail on Spark dev list, only committers seem to 
know the latest repository location of Spark Hive fork.
I also want to contribute some on that, but it's difficult for me to find 
how to do.

Every Apache projects (including Apache Spark) have some bugs at every 
release.
I don't think there is no bug on Parquet 1.9.0. But, Parquet community 
exists to improve that, doesn't it?

BTW, @rxin and @srowen . To reduce the risk,
- Do you want to add more Spark-side test case here?
- Or, do you prefer to skip 1.9.0 and go 1.10 directly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16030
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70169/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16030: [SPARK-18108][SQL] Fix a bug to fail partition schema in...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16030
  
**[Test build #70169 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70169/testReport)**
 for PR 16030 at commit 
[`248833d`](https://github.com/apache/spark/commit/248833d0b689e243467c9901d40c3c53a63b284a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16291
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16291
  
**[Test build #70173 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70173/testReport)**
 for PR 16291 at commit 
[`ed79578`](https://github.com/apache/spark/commit/ed795783ddc413bedadcadc012b94041c82ac71f).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16291
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70173/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16291
  
**[Test build #70173 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70173/testReport)**
 for PR 16291 at commit 
[`ed79578`](https://github.com/apache/spark/commit/ed795783ddc413bedadcadc012b94041c82ac71f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16287
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16287
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70163/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16287: [SPARK-18868][FLAKY-TEST] Deflake StreamingQueryListener...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16287
  
**[Test build #70163 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70163/testReport)**
 for PR 16287 at commit 
[`cedaafd`](https://github.com/apache/spark/commit/cedaafdff23ad99e4be06077c5b5cc3bee6ebf07).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/16291
  
cc -  @zsxwing  - Please note that the PR is incomplete and there are some 
test failures. I just wanted some initial feedback on the design before 
investing more time on it.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16291
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16291: [SPARK-18838] Use separate executor service for each eve...

2016-12-14 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16291
  
**[Test build #70172 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70172/testReport)**
 for PR 16291 at commit 
[`b4af82f`](https://github.com/apache/spark/commit/b4af82f0a95487cd099432d17864b3cfac2780bb).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class StreamingQueryListenerBus(val sparkListenerBus: LiveListenerBus)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 458 matches

Mail list logo