[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137720936
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -197,7 +197,7 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
* @since 1.4.0
*/
   def jdbc(url: String, table: String, properties: Properties): DataFrame 
= {
-assertNoSpecifiedSchema("jdbc")
+assertJdbcAPISpecifiedDataFrameSchema()
--- End diff --

Users should be able to do it in either way. If users specify them in both 
`schema()` API and the `customerSchema` option, we should issue an exception.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137720118
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ---
@@ -679,6 +679,16 @@ class DataFrameReader private[sql](sparkSession: 
SparkSession) extends Logging {
   }
 
   /**
+   * A convenient function for validate specified column types schema in 
jdbc API.
+   */
+  private def assertJdbcAPISpecifiedDataFrameSchema(): Unit = {
--- End diff --

`assertJdbcAPISpecifiedDataFrameSchema` -> `assertNoSpecifiedSchemaForJDBC`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19152: [SPARK-21915][ML][PySpark] Model 1 and Model 2 ParamMaps...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19152
  
**[Test build #3914 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3914/testReport)**
 for PR 19152 at commit 
[`a2ccb8a`](https://github.com/apache/spark/commit/a2ccb8a83d13d39c95f0ac1cac1c74dca064).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137719100
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ---
@@ -185,6 +185,10 @@ object SQLDataSourceExample {
 connectionProperties.put("password", "password")
 val jdbcDF2 = spark.read
   .jdbc("jdbc:postgresql:dbserver", "schema.tablename", 
connectionProperties)
+// Specifying dataframe column data types on read
--- End diff --

> Specifying the custom data types of columns


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81542/testReport)**
 for PR 18875 at commit 
[`1df28ec`](https://github.com/apache/spark/commit/1df28ec200fd46a001b0fea9597f8b9659ea94f4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137718217
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala
 ---
@@ -185,6 +185,10 @@ object SQLDataSourceExample {
 connectionProperties.put("password", "password")
 val jdbcDF2 = spark.read
   .jdbc("jdbc:postgresql:dbserver", "schema.tablename", 
connectionProperties)
+// Specifying dataframe column data types on read
+connectionProperties.put("customDataFrameColumnTypes", "id DECIMAL(38, 
0), name STRING")
--- End diff --

`customSchema `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...

2017-09-07 Thread klion26
Github user klion26 commented on the issue:

https://github.com/apache/spark/pull/19145
  
@HyukjinKwon  @vanzin @srowen @foxish @djvulee @squito Could you please 
help to review this pr?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19131
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19131
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81541/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19131
  
**[Test build #81541 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81541/testReport)**
 for PR 19131 at commit 
[`648ed11`](https://github.com/apache/spark/commit/648ed1165e3913ac919e0dc02608887c9ee6d7c1).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137717807
  
--- Diff: examples/src/main/python/sql/datasource.py ---
@@ -177,6 +177,16 @@ def jdbc_dataset_example(spark):
 .jdbc("jdbc:postgresql:dbserver", "schema.tablename",
   properties={"user": "username", "password": "password"})
 
+# Specifying dataframe column data types on read
+jdbcDF3 = spark.read \
+.format("jdbc") \
+.option("url", "jdbc:postgresql:dbserver") \
+.option("dbtable", "schema.tablename") \
+.option("user", "username") \
+.option("password", "password") \
+.option("customDataFrameColumnTypes", "id DECIMAL(38, 0), name 
STRING") \
--- End diff --

`readTableColumnTypes`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137717728
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1334,7 +1334,14 @@ the following case-insensitive options:
 
  The database column data types to use instead of the defaults, when 
creating the table. Data type information should be specified in the same 
format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments 
VARCHAR(1024)"). The specified types should be valid spark sql data 
types. This option applies only to writing.
 
-
+  
+
+  
+customDataFrameColumnTypes
+
+ The DataFrame column data types to use instead of the defaults when 
reading data from jdbc API. (e.g: "id DECIMAL(38, 0), name 
STRING"). The specified types should be valid spark sql data types. This 
option applies only to reading.
--- End diff --

This is not limited to DataFrame. 

> The customized column types to use for reading data from JDBC connectors. 
For example, "id DECIMAL(38, 0), name STRING"). The specified 
types should be valid spark sql data types. This option applies only to reading.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18266: [SPARK-20427][SQL] Read JDBC table use custom sch...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18266#discussion_r137717379
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1334,7 +1334,14 @@ the following case-insensitive options:
 
  The database column data types to use instead of the defaults, when 
creating the table. Data type information should be specified in the same 
format as CREATE TABLE columns syntax (e.g: "name CHAR(64), comments 
VARCHAR(1024)"). The specified types should be valid spark sql data 
types. This option applies only to writing.
 
-
+  
+
+  
+customDataFrameColumnTypes
--- End diff --

`readTableColumnTypes`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19131
  
**[Test build #81541 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81541/testReport)**
 for PR 19131 at commit 
[`648ed11`](https://github.com/apache/spark/commit/648ed1165e3913ac919e0dc02608887c9ee6d7c1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19131
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/13067
  
Since this PR https://github.com/apache/spark/pull/18975 will be merged 
soon. Could you close this? Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19131: [MINOR][SQL]remove unuse import class

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19131
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19148


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory usage to...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19160
  
**[Test build #81540 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81540/testReport)**
 for PR 19160 at commit 
[`04a7ec9`](https://github.com/apache/spark/commit/04a7ec944b3273fbe9b9bdb6e217814452a1a12c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19148
  
Could you send a PR to 2.2 branch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19148
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19159: [SPARK-21946][TEST]: fix flaky test: "alter table: renam...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19159
  
LGTM pending Jenkins.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory u...

2017-09-07 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/19160

[SPARK-21934][CORE] Expose Shuffle Netty memory usage to MetricsSystem

## What changes were proposed in this pull request?

This is a followup work of SPARK-9104 to expose the Netty memory usage to 
MetricsSystem. Current the shuffle Netty memory usage of 
`NettyBlockTransferService` will be exposed, if using external shuffle, then 
the Netty memory usage of `ExternalShuffleClient` and `ExternalShuffleService` 
will be exposed instead. Currently I don't expose Netty memory usage of 
`YarnShuffleService`, because `YarnShuffleService` doesn't have `MetricsSystem` 
itself, and is better to connect to Hadoop's MetricsSystem.

## How was this patch tested?

Manually verified in local cluster.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-21934

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19160.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19160


commit 04a7ec944b3273fbe9b9bdb6e217814452a1a12c
Author: jerryshao 
Date:   2017-09-07T13:25:39Z

Expose Shuffle Netty memory usage to MetricsSystem




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81535/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19159: [SPARK-21946][TEST]: fix flaky test: "alter table: renam...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19159
  
**[Test build #81539 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81539/testReport)**
 for PR 19159 at commit 
[`7a47891`](https://github.com/apache/spark/commit/7a478918710627f5d0df973f059b07d8cf17bd51).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13067: [SPARK-4131] [SQL] Support INSERT OVERWRITE [LOCAL] DIRE...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13067
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81535/testReport)**
 for PR 19148 at commit 
[`62369e3`](https://github.com/apache/spark/commit/62369e3a07bc23d68068e809edf1c43de448740a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class NettyMemoryMetrics implements MetricSet `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18995: [SPARK-21787][SQL] Support for pushing down filte...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/18995


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18956


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19159: [TEST]: fix flaky test: "alter table: rename cach...

2017-09-07 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19159

[TEST]: fix flaky test: "alter table: rename cached table" in 
InMemoryCatalogedDDLSuite

## What changes were proposed in this pull request?

This PR fixes flaky test `InMemoryCatalogedDDLSuite "alter table: rename 
cached table"`.
Since this test validates distributed DataFrame, the result should be 
checked by using `checkAnswer`. The original version used `df.collect().Seq` 
method that does not guaranty an order of each element of the result.

## How was this patch tested?

Use existing test case

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-21946

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19159


commit 7a478918710627f5d0df973f059b07d8cf17bd51
Author: Kazuaki Ishizaki 
Date:   2017-09-08T06:06:47Z

use checkAnswer to validate results of DataFrame




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18956
  
Thanks @rxin @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18956
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19155: [SPARK-21949][TEST] Tables created in unit tests ...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19155


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19155
  
Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19147
  
**[Test build #81538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81538/testReport)**
 for PR 19147 at commit 
[`2f929d8`](https://github.com/apache/spark/commit/2f929d8e0ec01ca7070fc0969e5091dad4ce8350).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19147
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19158


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19158
  
Thanks for reviewing! merging to master/2.2/2.1/2.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18875
  
**[Test build #81537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81537/testReport)**
 for PR 18875 at commit 
[`36ce961`](https://github.com/apache/spark/commit/36ce9614c078c9c0aca62a672948d8581b43e2ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread goldmedal
Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137710147
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

ok.  I'll modify it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19158
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs i...

2017-09-07 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19147#discussion_r137707828
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/VectorizedPythonRunner.scala
 ---
@@ -0,0 +1,329 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.python
+
+import java.io.{BufferedInputStream, BufferedOutputStream, 
DataInputStream, DataOutputStream}
+import java.net.Socket
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+
+import org.apache.arrow.vector.VectorSchemaRoot
+import org.apache.arrow.vector.stream.{ArrowStreamReader, 
ArrowStreamWriter}
+
+import org.apache.spark.{SparkEnv, SparkFiles, TaskContext}
+import org.apache.spark.api.python.{ChainedPythonFunctions, 
PythonEvalType, PythonException, PythonRDD, SpecialLengths}
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.arrow.{ArrowUtils, ArrowWriter}
+import org.apache.spark.sql.execution.vectorized.{ArrowColumnVector, 
ColumnarBatch, ColumnVector}
+import org.apache.spark.sql.types._
+import org.apache.spark.util.Utils
+
+/**
+ * Similar to `PythonRunner`, but exchange data with Python worker via 
columnar format.
+ */
+class VectorizedPythonRunner(
+funcs: Seq[ChainedPythonFunctions],
+batchSize: Int,
+bufferSize: Int,
+reuse_worker: Boolean,
+argOffsets: Array[Array[Int]]) extends Logging {
+
+  require(funcs.length == argOffsets.length, "argOffsets should have the 
same length as funcs")
+
+  // All the Python functions should have the same exec, version and 
envvars.
+  private val envVars = funcs.head.funcs.head.envVars
+  private val pythonExec = funcs.head.funcs.head.pythonExec
+  private val pythonVer = funcs.head.funcs.head.pythonVer
+
+  // TODO: support accumulator in multiple UDF
+  private val accumulator = funcs.head.funcs.head.accumulator
+
+  // todo: return column batch?
+  def compute(
--- End diff --

Yes, it is a lot of duplicated code from `PythonRunner` that could be 
refactored.  I'm guessing you did not use the existing code because of the 
Arrow stream format?  While I would love to start using that in Spark, I think 
it would be better to do this at a later time when the required code could be 
refactored and the Arrow stream format could replace where we currently use the 
file format.

Also, the good part about using the iterator based file format is each 
iteration can allow Python to communicate back an error code and exit 
gracefully.  In my own tests with the streaming format if an error occurred 
after the stream had started, Spark could lock up in a waiting state.  These 
are the reasons I did not use the streaming format in my implementation.  Would 
this `VectorizedPythonRunner` be able to handle these types of errors?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-07 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18875
  
We should add test suite for `JacksonGenerator`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137706345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

For this kind of comment, we use the style like:

/**
 * Code comments...
 *
 */


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137706271
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz";
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table external_tab

[GitHub] spark issue #18266: [SPARK-20427][SQL] Read JDBC table use custom schema

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18266
  
**[Test build #81536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81536/testReport)**
 for PR 18266 at commit 
[`b38a1a8`](https://github.com/apache/spark/commit/b38a1a8b2d9ffee250b9e8637dc579f2a8f9182d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137704899
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz";
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
+|
+|spark.sql("create table hive_compatible_data_source_tbl_" + 
version_index + \\
+|  " using parquet as select 1 i")
+|
+|json_file = "${genDataDir("json_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file)
+|spark.sql("create table external_data_source_tbl_" + 
version_index + \\
+|  "(i int) using json options (path 
'{}')".format(json_file))
+|
+|parquet_file = "${genDataDir("parquet_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.parquet(parquet_file)
+|spark.sql("create table 
hive_compatible_external_data_source_tbl_" + version_index + \\
+|  "(i int) using parquet options (path 
'{}')".format(parquet_file))
+|
+|json_file2 = "${genDataDir("json2_")}" + str(version_index)
+|spark.range(1, 2).selectExpr("cast(id as int) as 
i").write.json(json_file2)
+|spark.sql("create table external_table_

[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137704429
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
+Utils.deleteRecursively(tmpDataDir)
+super.afterAll()
+  }
+
+  private def downloadSpark(version: String): Unit = {
+import scala.sys.process._
+
+val url = 
s"https://d3kbcqa49mib13.cloudfront.net/spark-$version-bin-hadoop2.7.tgz";
+
+Seq("wget", url, "-q", "-P", sparkTestingDir).!
+
+val downloaded = new File(sparkTestingDir, 
s"spark-$version-bin-hadoop2.7.tgz").getCanonicalPath
+val targetDir = new File(sparkTestingDir, 
s"spark-$version").getCanonicalPath
+
+Seq("mkdir", targetDir).!
+
+Seq("tar", "-xzf", downloaded, "-C", targetDir, 
"--strip-components=1").!
+
+Seq("rm", downloaded).!
+  }
+
+  private def genDataDir(name: String): String = {
+new File(tmpDataDir, name).getCanonicalPath
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+
+val tempPyFile = File.createTempFile("test", ".py")
+Files.write(tempPyFile.toPath,
+  s"""
+|from pyspark.sql import SparkSession
+|
+|spark = SparkSession.builder.enableHiveSupport().getOrCreate()
+|version_index = spark.conf.get("spark.sql.test.version.index", 
None)
+|
+|spark.sql("create table data_source_tbl_{} using json as select 1 
i".format(version_index))
--- End diff --

Instead of only using lowercase column name, should we use mix-case Hive 
schema for those tables?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19155
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19155
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19155
  
**[Test build #81533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)**
 for PR 19155 at commit 
[`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81532/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19148
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)**
 for PR 19148 at commit 
[`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137703092
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Ok. After a build clean it works now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19158
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19158
  
**[Test build #81534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)**
 for PR 19158 at commit 
[`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81531/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)**
 for PR 18956 at commit 
[`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137700913
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Let me do build clean and try again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81535/testReport)**
 for PR 19148 at commit 
[`62369e3`](https://github.com/apache/spark/commit/62369e3a07bc23d68068e809edf1c43de448740a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137700499
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

Did you try a clean clone? I added the derby dependency to make the test 
work on jenkins...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-07 Thread smurching
Github user smurching commented on the issue:

https://github.com/apache/spark/pull/19107
  
Sorry for the delay, this looks good to me -- thanks @WeichenXu123!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699853
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

After removing the added derby dependency, this test can work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699802
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/SparkSubmitTestUtils.scala ---
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.sql.Timestamp
+import java.util.Date
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.scalatest.concurrent.Timeouts
+import org.scalatest.exceptions.TestFailedDueToTimeoutException
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.test.ProcessTestUtils.ProcessOutputCapturer
+import org.apache.spark.util.Utils
+
+trait SparkSubmitTestUtils extends SparkFunSuite with Timeouts {
--- End diff --

nit. Let's use `TimeLimits` instead of `Timeouts`. `Timeouts` is deprecated 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTe...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19158
  
**[Test build #81534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81534/testReport)**
 for PR 19158 at commit 
[`134bc26`](https://github.com/apache/spark/commit/134bc267a5ef01d9dea3d08cc255facdd8dfc0c8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699720
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

can you print `org.apache.derby.tools.sysinfo.getVersionString` in 
`IsolatedClientLoader.createClient` to see what's your actual derby version?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19147: [WIP][SPARK-21190][SQL][PYTHON] Vectorized UDFs in Pytho...

2017-09-07 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/19147
  
The test failure above should be fixed by #19158.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137699367
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,194 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
--- End diff --

I ran this test locally and encountered the failure like:

2017-09-07 19:28:07.595 - stderr> Caused by: java.sql.SQLException: 
Database at

/root/repos/spark-1/target/tmp/warehouse-66dad501-c743-4ac3-83cc-51451c6d697a/metastore_db
has an incompatible format with the current version of the software.  
The database was created by or
upgraded by version 10.12.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19158: [SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.test...

2017-09-07 Thread ueshin
GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/19158

[SPARK-21950][SQL][PYTHON][TEST] pyspark.sql.tests.SQLTests2 should stop 
SparkContext.

## What changes were proposed in this pull request?

`pyspark.sql.tests.SQLTests2` doesn't stop newly created spark context in 
the test and it might affect the following tests.
This pr makes `pyspark.sql.tests.SQLTests2` stop `SparkContext`.

## How was this patch tested?

Existing tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-21950

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19158.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19158


commit 134bc267a5ef01d9dea3d08cc255facdd8dfc0c8
Author: Takuya UESHIN 
Date:   2017-09-08T02:34:41Z

Make pyspark.sql.tests.SQLTests2 stop SparkContext.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r137699153
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/upward/StatisticsSupport.java
 ---
@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.sources.v2.reader.upward;
+
+/**
+ * A mix in interface for `DataSourceV2Reader`. Users can implement this 
interface to report
+ * statistics to Spark.
+ */
+public interface StatisticsSupport {
--- End diff --

I'd like to put column stats in a separated interface, because we already 
separate basic stats and column stats in `ANALYZE TABLE`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19136: [DO NOT MERGE][SPARK-15689][SQL] data source v2

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19136#discussion_r137698996
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.v2
+
+import org.apache.spark.sql.catalyst.expressions.AttributeReference
+import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, Statistics}
+import org.apache.spark.sql.sources.v2.reader.DataSourceV2Reader
+import org.apache.spark.sql.sources.v2.reader.upward.StatisticsSupport
+
+case class DataSourceV2Relation(
+output: Seq[AttributeReference],
+reader: DataSourceV2Reader) extends LeafNode {
+
+  override def computeStats(): Statistics = reader match {
+case r: StatisticsSupport => Statistics(sizeInBytes = 
r.getStatistics.sizeInBytes())
+case _ => Statistics(sizeInBytes = conf.defaultSizeInBytes)
+  }
+}
+
+object DataSourceV2Relation {
+  def apply(reader: DataSourceV2Reader): DataSourceV2Relation = {
+new DataSourceV2Relation(reader.readSchema().toAttributes, reader)
--- End diff --

In data source V2, we will delegate partition pruning to the data source, 
although we need to do some refactoring to make it happen.

> I was just looking into how the data source should provide partition 
data, or at least fields that are the same for all rows in a `ReadTask`. It 
would be nice to have a way to pass those up instead of materializing them in 
each `UnsafeRow`.

This can be achieved by the columnar reader. Think about a data source 
having a data column `i` and a partition column `j`, the returned columnar 
batch has 2 column vectors for `i` and `j`. Column vector `i` is a normal one 
that contains all the values of column `i` within this batch, column vector `j` 
is a constant vector that only contains a single value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19107: [SPARK-21799][ML] Fix `KMeans` performance regression ca...

2017-09-07 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19107
  
cc @smurching Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-07 Thread caneGuy
Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/19132
  
@vanzin @zsxwing could you help reivew this?Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18956
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [SPARK-21949][TEST] Tables created in unit tests should ...

2017-09-07 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/19155
  
@dongjoon-hyun  thanks, I have created  a JIRA issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81529/testReport)**
 for PR 18956 at commit 
[`d1db7cf`](https://github.com/apache/spark/commit/d1db7cf815d447b195c907fb159ed0a6770c537b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19155: [MINOR][TEST] Tables created in unit tests should be dro...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19155
  
**[Test build #81533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81533/testReport)**
 for PR 19155 at commit 
[`1d38337`](https://github.com/apache/spark/commit/1d38337b22ea8926aeb1db0591285fbb34f902cc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19157: [SPARK-20589][Core][Scheduler] Allow limiting task concu...

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19157
  
@dhruve, FYI, AppVeyor CI only runs SparkR tests on Windows only when there 
are changes in R related codes:


https://github.com/apache/spark/blob/75a6d05853fea13f88e3c941b1959b24e4640824/appveyor.yml#L29-L34

Thing is, it looks when `git merge` is performed, 
https://github.com/apache/spark/commit/8b3830004d69bd5f109fd9846f59583c23a910c7 
 (not `rebase`), that merging commit one includes usually some changes in R and 
then the CI is triggered, which is actually quite moderate. So, I think 
generally we should rebase it when there are conflicts.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19144: [UI][Streaming]Modify the title, 'Records' instead of 'I...

2017-09-07 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/19144
  
@zsxwing Help to review the code, thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19150: [SPARK-21939][TEST] Use TimeLimits instead of Tim...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19150


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19150
  
Thank you for review and merging, @jerryshao ! Also, thank you for review 
and approving, @HyukjinKwon and @srowen .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19150: [SPARK-21939][TEST] Use TimeLimits instead of Timeouts

2017-09-07 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19150
  
Merging to master, thanks @dongjoon-hyun .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19149
  
Except that, Isolation of `InferFiltersFromConstraints` looks good to me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict between ...

2017-09-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19149
  
Hi, @gatorsmile .
According to the PR description, it's about `PruneFilters`. Do we need a 
test case because SPARK-21652 is about `ConstantPropagation`, not 
`PruneFilters`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81530/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #81530 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)**
 for PR 18029 at commit 
[`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class KinesisInitialPosition `
  * `sealed trait InitialPosition `
  * `case class AtTimestamp(timestamp: Date) extends InitialPosition `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18029
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19148: [SPARK-21936][SQL] backward compatibility test framework...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19148
  
**[Test build #81532 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81532/testReport)**
 for PR 19148 at commit 
[`00cdd0a`](https://github.com/apache/spark/commit/00cdd0a63bdd4f531eb06de8d9651e934f2bb448).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19148: [SPARK-21936][SQL] backward compatibility test fr...

2017-09-07 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19148#discussion_r137686311
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.io.File
+import java.nio.file.Files
+
+import org.apache.spark.TestUtils
+import org.apache.spark.sql.{QueryTest, Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.test.SQLTestUtils
+import org.apache.spark.util.Utils
+
+/**
+ * Test HiveExternalCatalog backward compatibility.
+ *
+ * Note that, this test suite will automatically download spark binary 
packages of different
+ * versions to a local directory `/tmp/spark-test`. If there is already a 
spark folder with
+ * expected version under this local directory, e.g. 
`/tmp/spark-test/spark-2.0.3`, we will skip the
+ * downloading for this spark version.
+ */
+class HiveExternalCatalogVersionsSuite extends SparkSubmitTestUtils {
+  private val wareHousePath = Utils.createTempDir(namePrefix = "warehouse")
+  private val tmpDataDir = Utils.createTempDir(namePrefix = "test-data")
+  private val sparkTestingDir = "/tmp/spark-test"
+  private val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
+
+  override def afterAll(): Unit = {
+Utils.deleteRecursively(wareHousePath)
--- End diff --

I wanna keep the `sparkTestingDir`, so we don't need to download spark 
again if this jenkins machine has already run this suite before.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18956: [SPARK-21726][SQL] Check for structural integrity of the...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18956
  
**[Test build #81531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81531/testReport)**
 for PR 18956 at commit 
[`ecdfb7d`](https://github.com/apache/spark/commit/ecdfb7db34d0d01e357bff0d32b62137ef0ae735).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137685731
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
+"StructField does not have typename. \
--- End diff --

Little nit: looks a typo, typename -> typeName.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137685629
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
+"StructField does not have typename. \
+You can use self.dataType.simpleString() instead.")
--- End diff --

I'd remove `self` here and just say something like ` use typeName() on its 
type explicitly ...`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18029
  
**[Test build #81530 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81530/testReport)**
 for PR 18029 at commit 
[`cef5cde`](https://github.com/apache/spark/commit/cef5cdece2bd2a7c95e19493c511d602c1b46461).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-09-07 Thread yssharma
Github user yssharma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18029#discussion_r137684968
  
--- Diff: 
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala
 ---
@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.streaming.kinesis
+
+import java.util.Date
+
+import 
com.amazonaws.services.kinesis.clientlibrary.lib.worker.InitialPositionInStream
+
+/**
+ * Trait for Kinesis's InitialPositionInStream.
+ * This will be overridden by more specific types.
+ */
+sealed trait InitialPosition {
+  val initialPositionInStream: InitialPositionInStream
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.LATEST.
+ */
+case object Latest extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.LATEST
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.TRIM_HORIZON.
+ */
+case object TrimHorizon extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.TRIM_HORIZON
+}
+
+/**
+ * Case object for Kinesis's InitialPositionInStream.AT_TIMESTAMP.
+ */
+case class AtTimestamp(timestamp: Date) extends InitialPosition {
+  val instance: InitialPosition = this
+  override val initialPositionInStream: InitialPositionInStream
+= InitialPositionInStream.AT_TIMESTAMP
+}
+
+/**
+ * Companion object for InitialPosition that returns
+ * appropriate version of InitialPositionInStream.
+ */
+object InitialPosition {
--- End diff --

I've implemented the functions with this Capital naming, but still feel a 
bit salty about this :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-09-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/17435#discussion_r137684263
  
--- Diff: python/pyspark/sql/types.py ---
@@ -438,6 +438,11 @@ def toInternal(self, obj):
 def fromInternal(self, obj):
 return self.dataType.fromInternal(obj)
 
+def typeName(self):
+raise TypeError(
--- End diff --

Could we do like ...

```python
raise TypeError(
"..."
"...")
```
if it doesn't bother you much?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >