[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-16 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
Thanks everyone, we can move to 
[SPARK-23710](https://issues.apache.org/jira/browse/SPARK-23710) to discuss.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-16 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...

2018-03-15 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20785#discussion_r174980995
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
*/
   def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): 
String = {
 val sparkValue = conf.get(key, default)
-if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
+if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
--- End diff --

`YarnConfiguration` can only configure one `spark.shuffle.service.port` 
value.
We can gradually upgrade the shuffle service if get 
`spark.shuffle.service.port` value from `SparkConf` because we can set 
different values ​​for different applications.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...

2018-03-15 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20785#discussion_r174979402
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging {
*/
   def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): 
String = {
 val sparkValue = conf.get(key, default)
-if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") {
+if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn"
--- End diff --

Assuming that `--conf spark.shuffle.service.port = 7338` is configured, 
7338 is displayed on the tab of the environment, but 7337 is actually used.
So my idea is get value from `SparkConf ` if  key starting with `spark.` 
except for `spark.hadoop.`.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-15 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20835: [HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 2...

2018-03-15 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20835
  
cc @kiszk @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20835: [HOT-FIX] Fix SparkOutOfMemoryError: Unable to ac...

2018-03-15 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20835

[HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 262144 bytes of 
memory, got 224631

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23598

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20835.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20835


commit 32df7d6d7b1c1d17460fc6cdb8b17adee8c765fd
Author: Yuming Wang <yumwang@...>
Date:   2018-03-15T15:22:36Z

SparkOutOfMemoryError: Unable to acquire 262144 bytes of memory, got 224631




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20819: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-14 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/20819


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20819: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-14 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20819

[DO-NOT-MERGE] Try to update Hive to 2.3.2

## What changes were proposed in this pull request?

Check if there is any test failed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark hive-2.3.2-jenkins

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20819


commit 915e68faefcbb5d39ad707937ef95883294c1825
Author: Yuming Wang <wgyumg@...>
Date:   2018-02-22T06:55:18Z

Update Hive to 2.3.2

* Update Hive to 2.3.2

commit a5bb731985488892ef9bc8ec9bbcff2a218d0130
Author: Yuming Wang <yumwang@...>
Date:   2018-02-22T09:52:10Z

replace manifest

commit 80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247
Author: Yuming Wang <yumwang@...>
Date:   2018-02-22T11:10:10Z

Fix javaunidoc error

commit 1110ede7e43d8638810e4e0f37772443fc91449b
Author: Yuming Wang <yumwang@...>
Date:   2018-03-05T13:34:28Z

Merge remote-tracking branch 'upstream/master' into hive-2.3.x

# Conflicts:
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
#   sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
#   sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
#   
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala

commit 566fa59125dff6df2d152290f339e304f5086bbe
Author: Yuming Wang <yumwang@...>
Date:   2018-03-11T02:20:03Z

Fix dependency

commit b35daa0593af1204e3b2833c30ec0374e8c2b530
Author: Yuming Wang <yumwang@...>
Date:   2018-03-13T13:00:16Z

Add org.apache.derby.* to shared class

commit b418909852da0222bfd96a17be7bcefce1311b75
Author: Yuming Wang <yumwang@...>
Date:   2018-03-14T01:07:58Z

ignore backward compatibility

commit f478c89a9095b88c031d5bd86135085fc81044e2
Author: Yuming Wang <yumwang@...>
Date:   2018-03-14T05:33:27Z

Try to fix hive-thriftserver/compile:compileIncremental error




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-14 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-14 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
> [error] 
/home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java:825:
  error: cannot find symbol
[error] String lScratchDir = hiveConf.getVar(ConfVars.LOCALSCRATCHDIR);
[error]  

But HiveSessionImpl.java#L825 is:
```
FileUtils.forceDelete(sessionLogDir);
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2

2018-03-13 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20803
  
```bash
cat < test.sql
select '\${a}', '\${b}';
EOF

spark-sql --hiveconf a=avalue --hivevar b=bvalue -f test.sql
```
SQL text is `select ${a}, ${b}` or `select avalue, bvalue`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI

2018-03-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20803
  
1. Double click this SQL statement can show full SQL statement: 
https://github.com/apache/spark/pull/6646
2. What if this SQL statement contains `--hiveconf` or `--hivevar`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20785
  
You are right.
In fact, our cluster has two shuffle services, one for production and one 
for development. We configure `spark.shuffle.service.port` to decide which 
shuffle service to use.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-10 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20785
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...

2018-03-09 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20785
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...

2018-03-09 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20785

[SPARK-23640][CORE] Fix hadoop config may override spark config

## What changes were proposed in this pull request?

It may be get `spark.shuffle.service.port` from 
https://github.com/apache/spark/blob/9745ec3a61c99be59ef6a9d5eebd445e8af65b7a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L459

Therefore, the client configuration `spark.shuffle.service.port` does not 
working unless the configuration is `spark.hadoop.spark.shuffle.service.port`.

- This configuration is not working:
```
bin/spark-sql --master yarn --conf spark.shuffle.service.port=7338
```
- This configuration work:
```
bin/spark-sql --master yarn --conf 
spark.hadoop.spark.shuffle.service.port=7338
```

This PR fix this issue.

## How was this patch tested?

It's difficult to carry out unit testing. But I've tested it manually.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23640

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20785.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20785


commit 9745ec3a61c99be59ef6a9d5eebd445e8af65b7a
Author: Yuming Wang <yumwang@...>
Date:   2018-03-09T11:05:29Z

Fix hadoop config may override spark config




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2

2018-03-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20659
  
Yes, I'm doing it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2

2018-03-05 Thread wangyum
GitHub user wangyum reopened a pull request:

https://github.com/apache/spark/pull/20659

[DNM] Try to update Hive to 2.3.2

## What changes were proposed in this pull request?

Check if there is any test failed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark hive-2.3.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20659


commit 915e68faefcbb5d39ad707937ef95883294c1825
Author: Yuming Wang <wgyumg@...>
Date:   2018-02-22T06:55:18Z

Update Hive to 2.3.2

* Update Hive to 2.3.2

commit a5bb731985488892ef9bc8ec9bbcff2a218d0130
Author: Yuming Wang <yumwang@...>
Date:   2018-02-22T09:52:10Z

replace manifest

commit 80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247
Author: Yuming Wang <yumwang@...>
Date:   2018-02-22T11:10:10Z

Fix javaunidoc error

commit 1110ede7e43d8638810e4e0f37772443fc91449b
Author: Yuming Wang <yumwang@...>
Date:   2018-03-05T13:34:28Z

Merge remote-tracking branch 'upstream/master' into hive-2.3.x

# Conflicts:
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
#   sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala
#   sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala
#   
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
#   
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-c...

2018-03-05 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20735

[MINOR][YARN] Add disable yarn.nodemanager.vmem-check-enabled option to 
memLimitExceededLogMessage

## What changes were proposed in this pull request?
My spark application sometimes will throw `Container killed by YARN for 
exceeding memory limits`.
Even I increased `spark.yarn.executor.memoryOverhead` to 10G, this error 
still happen.  The latest config:
https://user-images.githubusercontent.com/5399861/36975716-f5c548d2-20b5-11e8-95e5-b228d50917b9.png;>

And error message:
```
ExecutorLostFailure (executor 121 exited caused by one of the running 
tasks) Reason: Container killed by YARN for exceeding memory limits. 30.7 GB of 
30 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.
```

This is because of [Linux glibc >= 2.10 (RHEL 6) malloc may show excessive 
virtual memory 
usage](https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en).
 So disable `yarn.nodemanager.vmem-check-enabled` looks like a good option as 
[MapR mentioned 
](https://mapr.com/blog/best-practices-yarn-resource-management).

This PR add disable `yarn.nodemanager.vmem-check-enabled` option to 
memLimitExceededLogMessage.

More details:
https://issues.apache.org/jira/browse/YARN-4714
https://stackoverflow.com/a/31450291
https://stackoverflow.com/a/42091255

After this PR:
https://user-images.githubusercontent.com/5399861/36975949-c8e7bbbe-20b6-11e8-9513-9f903b868d8d.png;>

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark YARN-4714

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20735.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20735


commit 3fc05b4f8599ee65e8c4f808aee238d212c22b17
Author: Yuming Wang <yumwang@...>
Date:   2018-03-05T12:38:21Z

Update memLimitExceededLogMessage




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20734: [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hi...

2018-03-05 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20734

[SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.metastore.version

## What changes were proposed in this pull request?
Update `spark.sql.hive.metastore.version` to 2.3.2, same as HiveUtils.scala:

https://github.com/apache/spark/blob/ff1480189b827af0be38605d566a4ee71b4c36f6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L63-L65

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23510-FOLLOW-UP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20734.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20734


commit 1052c72b94d49541597b8d5561039fe223ce0ddc
Author: Yuming Wang <yumwang@...>
Date:   2018-03-05T12:24:48Z

Update spark.sql.hive.metastore.version




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-28 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/20668


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-27 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20668
  
Yes, If we do not add `alterPartitionsMethod`,  
[HiveExternalSessionCatalogSuite.alter 
partitions](https://github.com/apache/spark/blob/d73bb92a72fdd6c1901c070a91b70b845a034e88/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L951)
 will fail, too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...

2018-02-26 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20668
  
Otherwise, `SessionCatalogSuite` also needs to be updated

```scala
Index: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===
--- 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 (date 1519557876000)
+++ 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 (date 1519702924000)
@@ -955,8 +955,10 @@
   val oldPart1 = catalog.getPartition(TableIdentifier("tbl2", 
Some("db2")), part1.spec)
   val oldPart2 = catalog.getPartition(TableIdentifier("tbl2", 
Some("db2")), part2.spec)
   catalog.alterPartitions(TableIdentifier("tbl2", Some("db2")), Seq(
-oldPart1.copy(storage = storageFormat.copy(locationUri = 
Some(newLocation))),
-oldPart2.copy(storage = storageFormat.copy(locationUri = 
Some(newLocation)
+oldPart1.copy(parameters = oldPart1.parameters,
+  storage = storageFormat.copy(locationUri = Some(newLocation))),
+oldPart2.copy(parameters = oldPart2.parameters,
+  storage = storageFormat.copy(locationUri = Some(newLocation)
   val newPart1 = catalog.getPartition(TableIdentifier("tbl2", 
Some("db2")), part1.spec)
   val newPart2 = catalog.getPartition(TableIdentifier("tbl2", 
Some("db2")), part2.spec)
   assert(newPart1.storage.locationUri == Some(newLocation))
@@ -965,7 +967,9 @@
   assert(oldPart2.storage.locationUri != Some(newLocation))
   // Alter partitions without explicitly specifying database
   catalog.setCurrentDatabase("db2")
-  catalog.alterPartitions(TableIdentifier("tbl2"), Seq(oldPart1, 
oldPart2))
+  catalog.alterPartitions(TableIdentifier("tbl2"),
+Seq(oldPart1.copy(parameters = newPart1.parameters),
+  oldPart2.copy(parameters = newPart2.parameters)))
   val newerPart1 = catalog.getPartition(TableIdentifier("tbl2"), 
part1.spec)
   val newerPart2 = catalog.getPartition(TableIdentifier("tbl2"), 
part2.spec)
   assert(oldPart1.storage.locationUri == 
newerPart1.storage.locationUri)

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170450895
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
 alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
   }
 }
+
+private[client] class Shim_v2_2 extends Shim_v2_1 {
+
+}
+
+private[client] class Shim_v2_3 extends Shim_v2_2 {
+
+  val environmentContext = new EnvironmentContext()
+  environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true")
+
+  private lazy val alterPartitionsMethod =
+findMethod(
+  classOf[Hive],
+  "alterPartitions",
+  classOf[String],
+  classOf[JList[Partition]],
+  classOf[EnvironmentContext])
+
+  override def alterPartitions(hive: Hive, tableName: String, newParts: 
JList[Partition]): Unit = {
--- End diff --

`alterPartitions`:
```
[info] - 2.3: alterPartitions *** FAILED *** (50 milliseconds)
[info]   java.lang.reflect.InvocationTargetException:
[info]   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[info]   at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[info]   at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.lang.reflect.Method.invoke(Method.java:498)
[info]   at 
org.apache.spark.sql.hive.client.Shim_v2_1.alterPartitions(HiveShim.scala:1144)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply$mcV$sp(HiveClientImpl.scala:616)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply(HiveClientImpl.scala:607)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply(HiveClientImpl.scala:607)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
[info]   at 
org.apache.spark.sql.hive.client.HiveClientImpl.alterPartitions(HiveClientImpl.scala:607)
[info]   at 
org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$55.apply(VersionsSuite.scala:432)
[info]   at 
org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$55.apply(VersionsSuite.scala:424)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:103)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
[info]   at org.scalatest.FunSuite.runTest(FunSuite.scala:1560)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
[info]   at 
org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
[info]   at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(Fun

[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425667
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -202,7 +202,6 @@ private[spark] object HiveUtils extends Logging {
   ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> 
TimeUnit.MILLISECONDS,
   ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS,
   ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> 
TimeUnit.MILLISECONDS,
-  ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS,
--- End diff --

Remove `HIVE_STATS_JDBC_TIMEOUT ` , 
more see: https://issues.apache.org/jira/browse/HIVE-12164


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425631
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
---
@@ -202,8 +202,6 @@ private[spark] object HiveUtils extends Logging {
   ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> 
TimeUnit.MILLISECONDS,
   ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS,
   ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> 
TimeUnit.MILLISECONDS,
-  ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS,
-  ConfVars.HIVE_STATS_RETRIES_WAIT -> TimeUnit.MILLISECONDS,
--- End diff --

Remove `HIVE_STATS_JDBC_TIMEOUT ` , 
more see: https://issues.apache.org/jira/browse/HIVE-12164


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20668#discussion_r170425408
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
 alterPartitionsMethod.invoke(hive, tableName, newParts, 
environmentContextInAlterTable)
   }
 }
+
+private[client] class Shim_v2_2 extends Shim_v2_1 {
+
+}
+
+private[client] class Shim_v2_3 extends Shim_v2_2 {
+
+  val environmentContext = new EnvironmentContext()
+  environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true")
--- End diff --

Otherwise will throw `NumberFormatException`:
```
[info] Cause: java.lang.NumberFormatException: null
[info] at java.lang.Long.parseLong(Long.java:552)
[info] at java.lang.Long.parseLong(Long.java:631)
[info] at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.isFastStatsSame(MetaStoreUtils.java:315)
[info] at 
org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:605)
[info] at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3837)
```
more see: https://issues.apache.org/jira/browse/HIVE-15653


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...

2018-02-24 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20668

[SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metastore

## What changes were proposed in this pull request?

Support Hive 2.2 and Hive 2.3 metastore.

## How was this patch tested?

Exist tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23510

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20668


commit 5b1fc0145efbdd427e8b49bd0f840f709d4bc801
Author: Yuming Wang <yumwang@...>
Date:   2018-02-24T16:19:35Z

Support Hive 2.2 and Hive 2.3




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2

2018-02-22 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/20659


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2

2018-02-21 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20659

[DNM] Try to update Hive to 2.3.2

## What changes were proposed in this pull request?

Check if there is any test failed.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark hive-2.3.x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20659.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20659


commit 915e68faefcbb5d39ad707937ef95883294c1825
Author: Yuming Wang <wgyumg@...>
Date:   2018-02-22T06:55:18Z

Update Hive to 2.3.2

* Update Hive to 2.3.2




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20597: [MINOR][TEST] Update from 2.2.0 to 2.2.1 in HiveExternal...

2018-02-13 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20597
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...

2018-02-11 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20504
  
@gatorsmile


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...

2018-02-09 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20557#discussion_r167225201
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -539,15 +539,15 @@ case class DescribeTableCommand(
 throw new AnalysisException(
   s"DESC PARTITION is not allowed on a temporary view: 
${table.identifier}")
   }
-  describeSchema(catalog.lookupRelation(table).schema, result, header 
= false)
+  describeSchema(catalog.lookupRelation(table).schema, result, header 
= true)
--- End diff --

May be should add a configure like `hive.cli.print.header`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS

2018-02-06 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20521
  
Thanks @cloud-fan It works.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-06 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
The failure is due to flaky test suite.
```
org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a 
sbt.testing.NestedSuiteSelector)
```

jenkins, retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to su...

2018-02-05 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20504#discussion_r166156332
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -250,11 +257,20 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
   }
 
   private def listTestCases(): Seq[TestCase] = {
-listFilesRecursively(new File(inputFilePath)).map { file =>
+listFilesRecursively(new File(inputFilePath)).flatMap { file =>
   val resultFile = file.getAbsolutePath.replace(inputFilePath, 
goldenFilePath) + ".out"
   val absPath = file.getAbsolutePath
   val testCaseName = 
absPath.stripPrefix(inputFilePath).stripPrefix(File.separator)
-  TestCase(testCaseName, absPath, resultFile)
+  if (testCaseName.contains("typeCoercion")) {
+TypeCoercionMode.values.map(_.toString).map { mode =>
+  val fileNameWithMode = mode + File.separator + file.getName
+  val newTestCaseName = testCaseName.replace(file.getName, 
fileNameWithMode)
+  val newResultFile = resultFile.replace(file.getName, 
fileNameWithMode)
--- End diff --

Thanks @dongjoon-hyun,  There are 3 files are different:
`hive/binaryComparison.sql.out`, `hive/decimalPrecision.sql.out` and 
`hive/promoteStrings.sql.out` something like this: 
https://github.com/wangyum/spark/commit/927f6e86712ec4da4d58dbde2859b48520df3194


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20510
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20508: [SPARK-23335][SQL] Should not convert to double w...

2018-02-05 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20508#discussion_r165968094
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -327,6 +327,14 @@ object TypeCoercion {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
+  // For integralType should not convert to double which will cause 
precision loss.
+  case a @ BinaryArithmetic(left @ StringType(), right @ 
IntegralType()) =>
--- End diff --

What will happen if string value beyond the long type range?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

2018-02-05 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20510

[SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4

## What changes were proposed in this pull request?

This PR upgrade snappy-java to 1.1.4. release notes: 

- Fix a 1% performance regression when snappy is used in PIE executables.
- Improve compression performance by 5%.
- Improve decompression performance by 20%.

More details:

https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23336

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20510.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20510


commit 1055afc107b0c2357449ae3f23bda089480579d9
Author: Yuming Wang <wgyumg@...>
Date:   2018-02-05T11:59:47Z

Upgrade snappy-java to 1.1.4




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spar...

2018-02-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20274
  
The [Pre-build 
spark](https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/) contains 
`kubernetes-model-2.0.0.jar`. but the below command will not contain this jar:
```
./dev/make-distribution.sh --tgz  -Phadoop-2.7 -Phive -Phive-thriftserver 
-Pyarn  -DskipTests
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spar...

2018-02-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20274
  
An interesting discovery:
if `SPARK_HOME/jars` missing `kubernetes-model-2.0.0.jar`, the silent mode 
is broken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...

2018-02-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20504
  
Thanks @hvanhovell , the major changes is `SQLQueryTestSuite.scala`:
```scala
   private def listTestCases(): Seq[TestCase] = {
 -listFilesRecursively(new File(inputFilePath)).map { file =>
 +listFilesRecursively(new File(inputFilePath)).flatMap { file =>
val resultFile = file.getAbsolutePath.replace(inputFilePath, 
goldenFilePath) + ".out"
val absPath = file.getAbsolutePath
val testCaseName = 
absPath.stripPrefix(inputFilePath).stripPrefix(File.separator)
 -  TestCase(testCaseName, absPath, resultFile)
 +  if (testCaseName.contains("typeCoercion")) {
 +TypeCoercionMode.values.map(_.toString).map { mode =>
 +  val fileNameWithMode = mode + File.separator + file.getName
 +  val newTestCaseName = testCaseName.replace(file.getName, 
fileNameWithMode)
 +  val newResultFile = resultFile.replace(file.getName, 
fileNameWithMode)
 +  TestCase(newTestCaseName, absPath, newResultFile, mode)
 +}.toSeq
 +  } else {
 +Seq(TestCase(testCaseName, absPath, resultFile))
 +  }
  }
```
For a [type coercion 
input](https://github.com/apache/spark/tree/v2.3.0-rc2/sql/core/src/test/resources/sql-tests/inputs/typeCoercion),
 two results are generated in  different mode(`default` and `hive`).

**For example**:
_input_: 

sql/core/src/test/resources/sql-tests/inputs/typeCoercion/binaryComparison.sql
_results_: 

sql/core/src/test/resources/sql-tests/results/typeCoercion/default/binaryComparison.sql.out

sql/core/src/test/resources/sql-tests/results/typeCoercion/hive/binaryComparison.sql.out




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...

2018-02-04 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20504
  
After SPARK-21646, `hive/binaryComparison.sql.out`, 
`hive/decimalPrecision.sql.out` and `hive/promoteStrings.sql.out` seems like 
this: 
https://github.com/wangyum/spark/commit/927f6e86712ec4da4d58dbde2859b48520df3194


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to su...

2018-02-04 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20504

[SPARK-23332][SQL] Update SQLQueryTestSuite to support test hive mode

## What changes were proposed in this pull request?

Update `SQLQueryTestSuite` to support test hive mode.

## How was this patch tested?

unit tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23332

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20504.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20504


commit dd8531dbf55e1cc05eaa4e09d9ff278e02595a9a
Author: Yuming Wang <wgyumg@...>
Date:   2018-02-04T22:40:58Z

Update SQLQueryTestSuite to support test hive mode




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeti...

2018-02-04 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20498#discussion_r165844408
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql
 ---
@@ -74,7 +75,8 @@ select 12345678901234567890.0 * 12345678901234567890.0;
 select 1e35 / 0.1;
 
 -- arithmetic operations causing a precision loss return NULL
+select 12345678912345678912345678912.1234567 + 
999.12345;
--- End diff --

The result is:
```
-- !query 32
select 12345678912345678912345678912.1234567 + 
999.12345
-- !query 32 schema
struct<(CAST(12345678912345678912345678912.1234567 AS DECIMAL(38,7)) + 
CAST(999.12345 AS DECIMAL(38,7))):decimal(38,7)>
-- !query 32 output
NULL


-- !query 33
select 123456789123456789.1234567890 * 1.123456789123456789
-- !query 33 schema
struct<(CAST(123456789123456789.1234567890 AS DECIMAL(36,18)) * 
CAST(1.123456789123456789 AS DECIMAL(36,18))):decimal(38,28)>
-- !query 33 output
NULL


-- !query 34
select 12345678912345.123456789123 / 0.00012345678
-- !query 34 schema
struct<(CAST(12345678912345.123456789123 AS DECIMAL(29,15)) / 
CAST(1.2345678E-8 AS DECIMAL(29,15))):decimal(38,18)>
-- !query 34 output
NULL
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeti...

2018-02-04 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20498#discussion_r165844386
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql
 ---
@@ -48,8 +48,9 @@ select 12345678901234567890.0 * 12345678901234567890.0;
 select 1e35 / 0.1;
 
 -- arithmetic operations causing a precision loss are truncated
+select 12345678912345678912345678912.1234567 + 
999.12345;
--- End diff --

The result is:
```
-- !query 17
select 12345678912345678912345678912.1234567 + 
999.12345
-- !query 17 schema
struct<(CAST(12345678912345678912345678912.1234567 AS DECIMAL(38,6)) + 
CAST(999.12345 AS DECIMAL(38,6))):decimal(38,6)>
-- !query 17 output
10012345678912345678912345678911.246907


-- !query 18
select 123456789123456789.1234567890 * 1.123456789123456789
-- !query 18 schema
struct<(CAST(123456789123456789.1234567890 AS DECIMAL(36,18)) * 
CAST(1.123456789123456789 AS DECIMAL(36,18))):decimal(38,18)>
-- !query 18 output
138698367904130467.654320988515622621


-- !query 19
select 12345678912345.123456789123 / 0.00012345678
-- !query 19 schema
struct<(CAST(12345678912345.123456789123 AS DECIMAL(29,15)) / 
CAST(1.2345678E-8 AS DECIMAL(29,15))):decimal(38,9)>
-- !query 19 output
100073899961059796.725866332
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test

2018-02-03 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20498
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test

2018-02-03 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20498
  
retest please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test

2018-02-03 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20498

[SPARK-22036][SQL][FOLLOWUP] Fix imperfect test

## What changes were proposed in this pull request?

Fix imperfect test

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22036

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20498.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20498


commit 2f532ea3316f8a3058f517b405811f8c8c080309
Author: wangyum <wgyumg@...>
Date:   2018-02-03T11:19:00Z

Fix test error.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2018-02-02 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19788
  
Thanks @yucai , It's a great improvement for many output files. The figure 
below is our comparison:
**Before**:
https://user-images.githubusercontent.com/5399861/35762292-6b5f9f88-08cf-11e8-8aa5-0d10e4282599.png;>
**After**:
https://user-images.githubusercontent.com/5399861/35762790-9be2e468-08d8-11e8-8403-2f85993eee9d.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...

2018-01-29 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20430

[SPARK-23263][SQL] Create table stored as parquet should update table size 
if automatic update table size is enabled

…update table size is enabled

## What changes were proposed in this pull request?
How to reproduce:
```sql
bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true

spark-sql> create table test_create_parquet stored as parquet as select 1;
spark-sql> desc extended test_create_parquet;
```
The table statistics will not exists. This pr fix this issue.

## How was this patch tested?

unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-23263

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20430


commit 08d31c0823e5f6c257b0917362c8e07b04702af2
Author: Yuming Wang <yumwang@...>
Date:   2018-01-30T03:45:20Z

create table stored as parquet should update table size if automatic update 
table size is enabled




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive ...

2018-01-26 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20303#discussion_r164070918
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala
 ---
@@ -0,0 +1,222 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.adaptive
+
+import scala.concurrent.{ExecutionContext, Future}
+import scala.concurrent.duration.Duration
+
+import org.apache.spark.MapOutputStatistics
+import org.apache.spark.broadcast
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.physical.Partitioning
+import org.apache.spark.sql.execution._
+import org.apache.spark.sql.execution.exchange._
+import 
org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * In adaptive execution mode, an execution plan is divided into multiple 
QueryStages. Each
+ * QueryStage is a sub-tree that runs in a single stage.
+ */
+abstract class QueryStage extends UnaryExecNode {
+
+  var child: SparkPlan
+
+  // Ignore this wrapper for canonicalizing.
+  override def doCanonicalize(): SparkPlan = child.canonicalized
+
+  override def output: Seq[Attribute] = child.output
+
+  override def outputPartitioning: Partitioning = child.outputPartitioning
+
+  override def outputOrdering: Seq[SortOrder] = child.outputOrdering
+
+  /**
+   * Execute childStages and wait until all stages are completed. Use a 
thread pool to avoid
+   * blocking on one child stage.
+   */
+  def executeChildStages(): Unit = {
+// Handle broadcast stages
+val broadcastQueryStages: Seq[BroadcastQueryStage] = child.collect {
+  case bqs: BroadcastQueryStageInput => bqs.childStage
+}
+val broadcastFutures = broadcastQueryStages.map { queryStage =>
+  Future { queryStage.prepareBroadcast() }(QueryStage.executionContext)
+}
+
+// Submit shuffle stages
+val executionId = 
sqlContext.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
+val shuffleQueryStages: Seq[ShuffleQueryStage] = child.collect {
+  case sqs: ShuffleQueryStageInput => sqs.childStage
+}
+val shuffleStageFutures = shuffleQueryStages.map { queryStage =>
+  Future {
+SQLExecution.withExecutionId(sqlContext.sparkContext, executionId) 
{
+  queryStage.execute()
+}
+  }(QueryStage.executionContext)
+}
+
+ThreadUtils.awaitResult(
+  Future.sequence(broadcastFutures)(implicitly, 
QueryStage.executionContext), Duration.Inf)
+ThreadUtils.awaitResult(
+  Future.sequence(shuffleStageFutures)(implicitly, 
QueryStage.executionContext), Duration.Inf)
+  }
+
+  /**
+   * Before executing the plan in this query stage, we execute all child 
stages, optimize the plan
+   * in this stage and determine the reducer number based on the child 
stages' statistics. Finally
+   * we do a codegen for this query stage and update the UI with the new 
plan.
+   */
+  def prepareExecuteStage(): Unit = {
+// 1. Execute childStages
+executeChildStages()
+// It is possible to optimize this stage's plan here based on the 
child stages' statistics.
+
+// 2. Determine reducer number
+val queryStageInputs: Seq[ShuffleQueryStageInput] = child.collect {
+  case input: ShuffleQueryStageInput => input
+}
+val childMapOutputStatistics = 
queryStageInputs.map(_.childStage.mapOutputStatistics)
+  .filter(_ != null).toArray
+if (childMapOutputStatistics.length > 0) {
+  val exchangeCoordinator = new ExchangeCoordinator(
+conf.targetPostShuffleInputSize,
+conf.minNumPostS

[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...

2018-01-16 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/18138
  
Hive will throw `ArrayIndexOutOfBoundsException` at runtime: 
https://issues.apache.org/jira/browse/HIVE-17077


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to suppo...

2018-01-15 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20274

[SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode.

## What changes were proposed in this pull request?

`spark-sql` silent mode is broken now. It seems `sc.setLogLevel 
()` is a better way.

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-20120-FOLLOW-UP

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20274


commit 83a844b2c221eea4b02cb6816bd2c6017cd1e1fc
Author: Yuming Wang <yumwang@...>
Date:   2018-01-16T01:10:42Z

Better way to support spark-sql silent mode.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20268: [SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for s...

2018-01-15 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20268
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20268: [SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSiz...

2018-01-14 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20268

[SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for sql module

## What changes were proposed in this pull request?

Remove `MaxPermSize` for `sql` module

## How was this patch tested?

Manually tested.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-19550-MaxPermSize

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20268


commit 67708359ff19d450a3f3e60548df778fb1588515
Author: Yuming Wang <yumwang@...>
Date:   2018-01-15T04:56:45Z

Remove MaxPermSize for sql module




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20248: [SPARK-23058][SQL] Show non printable field delim as uni...

2018-01-13 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20248
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20248: [SPARK-23058][SQL] Show non printable field delim as uni...

2018-01-13 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20248
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20248: [SPARK-23058][SQL] Show non printable field delim...

2018-01-12 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20248#discussion_r161364269
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -1023,7 +1023,12 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
 
   val serdeProps = metadata.storage.properties.map {
 case (key, value) =>
-  s"'${escapeSingleQuotedString(key)}' = 
'${escapeSingleQuotedString(value)}'"
+  val escapedValue = if (value.length == 1 && (value.head < 32 || 
value.head > 126)) {
--- End diff --

I need to copy an external table to another environment, but lost the 
create table statement.  So I want to get this create table statement by `show 
create table ...`, but it can't show non printable field delim.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20248: [SPARK-23058][SQL] Fix non printable field delim issue

2018-01-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20248
  
[Non printable characters](http://www.theasciicode.com.ar/):
https://user-images.githubusercontent.com/5399861/34880068-33152b7a-f7ea-11e7-8203-570e61c7a21c.png;>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org




[GitHub] spark pull request #20248: [SPARK-23058][SQL] Fix non printable field delim ...

2018-01-12 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20248

[SPARK-23058][SQL] Fix non printable field delim issue

## What changes were proposed in this pull request?

Create a table with  non printable delim like below:
```sql
CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '\177',
  'serialization.format' = '\003'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1';
```

When `show create table t1` :
```sql
CREATE EXTERNAL TABLE `t1`(`col1` bigint)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'field.delim' = '',
  'serialization.format' = ''
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION 'file:/tmp/t1'
TBLPROPERTIES (
  'transient_lastDdlTime' = '1515766958'
)
```

`'\177'` and `'\003'` didn't correct show. This PR fix this issue.

## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark non-printable-field-delim

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20248.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20248


commit d44f242955503cf6195c5a47bbf631500406027d
Author: Yuming Wang <yumwang@...>
Date:   2018-01-12T14:28:22Z

Fix non printable field delim issue




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20080: [SPARK-22870][CORE] Dynamic allocation should allow 0 id...

2018-01-09 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20080
  
@srowen @jiangxb1987  I have test on my cluster with this patch.
```
bin/spark-sql --master yarn --conf spark.dynamicAllocation.enabled=true 
--conf spark.shuffle.service.enabled=true --conf 
spark.dynamicAllocation.executorIdleTimeout=0
```

```
18/01/09 05:49:03.452 INFO DAGScheduler: Job 0 finished: processCmd at 
CliDriver.java:376, took 26.196061 s
75000
Time taken: 26.383 seconds, Fetched 1 row(s)
18/01/09 05:49:03.455 INFO SparkSQLCLIDriver: Time taken: 26.383 seconds, 
Fetched 1 row(s)
spark-sql> 18/01/09 05:49:03.479 INFO ExecutorAllocationManager: Request to 
remove executorIds: 972
```

`05:49:03.479 - 05:49:03.455 = 24 ms`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20080: [SPARK-22870][CORE] Dynamic allocation should allow 0 id...

2017-12-25 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20080
  
cc @srowen


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20080: [SPARK-22870][CORE] Dynamic allocation should all...

2017-12-25 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20080

[SPARK-22870][CORE] Dynamic allocation should allow 0 idle time

## What changes were proposed in this pull request?

This pr to make `0` as a valid value for 
`spark.dynamicAllocation.executorIdleTimeout`. 
For details, see the jira description: 
https://issues.apache.org/jira/browse/SPARK-22870.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22870

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20080


commit 1dcec41a3c1e2c001b0f9fed92aa6f03b6c47f3a
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-26T01:58:49Z

Dynamic allocation should allow 0 idle time




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20079: [SPARK-22893][SQL][HOTFIX] Fix a error message of Versio...

2017-12-25 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20079
  
LGTM, thanks @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...

2017-12-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20067#discussion_r158648321
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("SPARK-22894: DateTimeOperations should accept SQL like string 
type") {
+val date = "2017-12-24"
+val str = sql(s"SELECT CAST('$date' as STRING) + interval 2 months 2 
seconds")
--- End diff --

But Spark was originally supported:

https://github.com/apache/spark/blob/bc0848b4c1ab84ccef047363a70fd11df240dbbf/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala#L1083


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...

2017-12-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20067#discussion_r158648226
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("SPARK-22894: DateTimeOperations should accept SQL like string 
type") {
--- End diff --

Yes, I'll and it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...

2017-12-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20067#discussion_r158647982
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 }
   }
 
+  test("SPARK-22894: DateTimeOperations should accept SQL like string 
type") {
+val date = "2017-12-24"
+val str = sql(s"SELECT CAST('$date' as STRING) + interval 2 months 2 
seconds")
--- End diff --

Hive doesn't accept string type:
```
hive> SELECT cast('2017-12-24' as date) + interval 2 day;
2017-12-26 00:00:00
hive> SELECT cast('2017-12-24' as timestamp) + interval 2 day;
2017-12-26 00:00:00
hive> SELECT cast('2017-12-24' as string) + interval 2 day;
FAILED: SemanticException Line 0:-1 Wrong arguments '2': No matching method 
for class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPDTIPlus with 
(string, interval_day_time)
hive>
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...

2017-12-23 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20067

[SPARK-22894][SQL] DateTimeOperations should accept SQL like string type

## What changes were proposed in this pull request?

`DateTimeOperations` accept 
[`StringType`](https://github.com/apache/spark/blob/ae998ec2b5548b7028d741da4813473dde1ad81e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L669),
  but:

```
spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds;
Error in query: cannot resolve '(CAST('2017-12-24' AS DOUBLE) + interval 2 
months 2 seconds)' due to data type mismatch: differing types in 
'(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' (double and 
calendarinterval).; line 1 pos 7;
'Project [unresolvedalias((cast(2017-12-24 as double) + interval 2 months 2 
seconds), None)]
+- OneRowRelation
spark-sql> 
```

After this PR:
```
spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds;
2018-02-24 00:00:02
Time taken: 0.2 seconds, Fetched 1 row(s)

```

## How was this patch tested?

unit tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22894

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20067.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20067


commit ae998ec2b5548b7028d741da4813473dde1ad81e
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-23T19:45:31Z

DateTimeOperations should accept SQL like string type




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespa...

2017-12-23 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/20066


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20018: SPARK-22833 [Improvement] in SparkHive Scala Examples

2017-12-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20018
  
Thanks @HyukjinKwon


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespace to f...

2017-12-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20066
  

https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85343/console


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespa...

2017-12-23 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20066

[SPARK-22833][Examples][FOLLOWUP]  Remove whitespace to fix scalastyle 
checks failed

## What changes were proposed in this pull request?

This is a followup PR for: https://github.com/apache/spark/pull/20018.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22833

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20066.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20066


commit df92f6ce38a14fc248d5830090dfa473371a129c
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-23T15:59:29Z

Remove whitespace




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20064: [SPARK-22893][SQL] Unified the data type mismatch messag...

2017-12-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20064
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20064: [SPARK-22893][SQL] Unified the data type mismatch...

2017-12-23 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20064

[SPARK-22893][SQL] Unified the data type mismatch message

## What changes were proposed in this pull request?

We should use `dataType.simpleString` to unified the data type mismatch 
message:
Before:
```
spark-sql> select cast(1 as binary);
Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type 
mismatch: cannot cast IntegerType to BinaryType; line 1 pos 7;
```
After:
```
park-sql> select cast(1 as binary);
Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type 
mismatch: cannot cast int to binary; line 1 pos 7;
```

## How was this patch tested?

Exist test.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22893

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20064.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20064


commit 8540b912e8e846f9e0fb8c94a8dcc48a05be6a57
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-23T11:45:45Z

Unified the data type mismatch message.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20061: [SPARK-22890][TEST] Basic tests for DateTimeOpera...

2017-12-22 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20061

[SPARK-22890][TEST] Basic tests for DateTimeOperations

## What changes were proposed in this pull request?

Test Coverage for `DateTimeOperations`, this is a Sub-tasks for 
[SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722).

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22890

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20061.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20061


commit 24b50f0c8371af258ed152363a9ba8148b23d2d2
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-23T02:39:39Z

Basic tests for DateTimeOperations

commit e8e4d11a504c4169848baeabbec84af2a1b3e6a8
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-23T02:53:40Z

Append a blank line




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20008
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCo...

2017-12-20 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20008#discussion_r158016336
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala
 ---
@@ -252,7 +252,7 @@ case class SpecifiedWindowFrame(
 case e: Expression if !frameType.inputType.acceptsType(e.dataType) =>
   TypeCheckFailure(
 s"The data type of the $location bound '${e.dataType} does not 
match " +
-  s"the expected data type '${frameType.inputType}'.")
+  s"the expected data type '${frameType.inputType.simpleString}'.")
--- End diff --

Otherwise the result is:
```
cannot resolve 'RANGE BETWEEN CURRENT ROW AND CAST(1 AS STRING) FOLLOWING' 
due to data type mismatch: The data type of the upper bound 'StringType does 
not match the expected data type 
'org.apache.spark.sql.types.TypeCollection@7ff36201'.; line 1 pos 21
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if...

2017-12-20 Thread wangyum
GitHub user wangyum reopened a pull request:

https://github.com/apache/spark/pull/19804

[WIP][SPARK-22573][SQL] Shouldn't inferFilters if it contains 
SubqueryExpression

## What changes were proposed in this pull request?

Shouldn't inferFilters if it contains SubqueryExpression.

## How was this patch tested?

unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22573

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19804


commit c2f6a4986fd81d5f9ecacc3fc9a0a6d069a16216
Author: Yuming Wang <wgyumg@...>
Date:   2017-11-23T17:09:18Z

Shouldn't inferFilters if it contains SubqueryExpression

commit 75e6787b644e635e67804abb69025c42b91d9337
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-20T06:42:33Z

Merge remote-tracking branch 'upstream/master' into SPARK-22573

# Conflicts:
#   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
#   sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

commit edd0434b710a764c7be2ea94242dd7ea5ce6ace7
Author: Yuming Wang <wgyumg@...>
Date:   2017-12-20T08:15:16Z

RewritePredicateSubquery first




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...

2017-12-20 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/20008
  
retest this, please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...

2017-12-19 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20008#discussion_r157920327
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/functionArgumentConversion.sql
 ---
@@ -25,7 +25,7 @@ SELECT array(cast(1 as tinyint), cast(1 as float)) FROM t;
 SELECT array(cast(1 as tinyint), cast(1 as double)) FROM t;
 SELECT array(cast(1 as tinyint), cast(1 as decimal(10, 0))) FROM t;
 SELECT array(cast(1 as tinyint), cast(1 as string)) FROM t;
-SELECT array(cast(1 as tinyint), cast('1' as binary)) FROM t;
+SELECT size(array(cast(1 as tinyint), cast('1' as binary))) FROM t;
--- End diff --

Replace `array(cast(1 as tinyint), cast('1' as binary))` with 
`size(array(cast(1 as tinyint), cast('1' as binary)))` to avoid binary type 
with collection.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...

2017-12-19 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20006#discussion_r157691771
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/booleanEquality.sql
 ---
@@ -0,0 +1,122 @@
+--
+--   Licensed to the Apache Software Foundation (ASF) under one or more
+--   contributor license agreements.  See the NOTICE file distributed with
+--   this work for additional information regarding copyright ownership.
+--   The ASF licenses this file to You under the Apache License, Version 
2.0
+--   (the "License"); you may not use this file except in compliance with
+--   the License.  You may obtain a copy of the License at
+--
+--  http://www.apache.org/licenses/LICENSE-2.0
+--
+--   Unless required by applicable law or agreed to in writing, software
+--   distributed under the License is distributed on an "AS IS" BASIS,
+--   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+--   See the License for the specific language governing permissions and
+--   limitations under the License.
+--
+
+CREATE TEMPORARY VIEW t AS SELECT 1;
+
+SELECT true = cast(1 as tinyint) FROM t;
+SELECT true = cast(1 as smallint) FROM t;
+SELECT true = cast(1 as int) FROM t;
+SELECT true = cast(1 as bigint) FROM t;
+SELECT true = cast(1 as float) FROM t;
+SELECT true = cast(1 as double) FROM t;
+SELECT true = cast(1 as decimal(10, 0)) FROM t;
+SELECT true = cast(1 as string) FROM t;
+SELECT true = cast('1' as binary) FROM t;
+SELECT true = cast(1 as boolean) FROM t;
+SELECT true = cast('2017-12-11 09:30:00.0' as timestamp) FROM t;
+SELECT true = cast('2017-12-11 09:30:00' as date) FROM t;
--- End diff --

I think we should keep both, we have some below usage:


https://github.com/apache/spark/blob/6d7ebf2f9fbd043813738005a23c57a77eba6f47/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L486-L489


https://github.com/apache/spark/blob/6d7ebf2f9fbd043813738005a23c57a77eba6f47/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L134-L135


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...

2017-12-18 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20008#discussion_r157651436
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalPrecision.sql
 ---
@@ -0,0 +1,6883 @@
+--
+--   Licensed to the Apache Software Foundation (ASF) under one or more
+--   contributor license agreements.  See the NOTICE file distributed with
+--   this work for additional information regarding copyright ownership.
+--   The ASF licenses this file to You under the Apache License, Version 
2.0
+--   (the "License"); you may not use this file except in compliance with
+--   the License.  You may obtain a copy of the License at
+--
+--  http://www.apache.org/licenses/LICENSE-2.0
+--
+--   Unless required by applicable law or agreed to in writing, software
+--   distributed under the License is distributed on an "AS IS" BASIS,
+--   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+--   See the License for the specific language governing permissions and
+--   limitations under the License.
+--
+
+CREATE TEMPORARY VIEW t AS SELECT 1;
+
+SELECT cast(1 as tinyint) + cast(1 as decimal(1, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(3, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(4, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(5, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(6, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(10, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(11, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(20, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(21, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(38, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(39, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(1, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(2, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(3, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(4, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(5, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(6, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(10, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(11, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(20, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(21, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(38, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(39, 1)) FROM t;
--- End diff --

How about only these 4 decimals: `DECIMAL(3, 0)`, `DECIMAL(5, 0)`, 
`DECIMAL(10, 0)` and `DECIMAL(20, 0)`.

https://github.com/apache/spark/blob/00d176d2fe7bbdf55cb3146a9cb04ca99b1858b7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L54-L57


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...

2017-12-18 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20008

[SPARK-22822][TEST] Basic tests for FunctionArgumentConversion and 
DecimalPrecision

## What changes were proposed in this pull request?

Test Coverage for `FunctionArgumentConversion` and `DecimalPrecision`, this 
is a Sub-tasks for 
[SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722).

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22822

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20008.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20008


commit 05acae313008220aaccec47c687a764f7e81bd02
Author: Yuming Wang <wgy...@gmail.com>
Date:   2017-12-18T11:02:58Z

Basic tests for FunctionArgumentConversion and DecimalPrecision




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...

2017-12-17 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20006

[SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, 
BooleanEquality, StackCoercion and Division

## What changes were proposed in this pull request?

Test Coverage for `WidenSetOperationTypes`, `BooleanEquality`, 
`StackCoercion`  and `Division`, this is a Sub-tasks for 
[SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722).

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22821

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20006.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20006


commit 7ee9aeccfc5adc6107c33401ef0c5212a65d9577
Author: Yuming Wang <wgy...@gmail.com>
Date:   2017-12-18T04:43:55Z

Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and 
Division




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...

2017-12-17 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19714
  
Can we backport this to branch-2.2?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20001: [SPARK-22762][TEST] Basic tests for PromoteString...

2017-12-16 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/20001

[SPARK-22762][TEST] Basic tests for PromoteStrings and InConversion

## What changes were proposed in this pull request?

Test Coverage for `PromoteStrings` and `InConversion`, this is a Sub-tasks 
for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722).

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22816

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20001.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20001


commit 604f16f872f1cf3e008435577d4a4768711c63ed
Author: Yuming Wang <wgy...@gmail.com>
Date:   2017-12-16T12:48:44Z

Basic tests for PromoteStrings and InConversion




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19949: [SPARK-22762][TEST] Basic tests for IfCoercion and CaseW...

2017-12-15 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19949
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19949: [SPARK-22762][TEST] Basic tests for IfCoercion an...

2017-12-14 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19949#discussion_r157106530
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/caseWhenCoercion.sql
 ---
@@ -0,0 +1,200 @@
+--
+--   Licensed to the Apache Software Foundation (ASF) under one or more
+--   contributor license agreements.  See the NOTICE file distributed with
+--   this work for additional information regarding copyright ownership.
+--   The ASF licenses this file to You under the Apache License, Version 
2.0
+--   (the "License"); you may not use this file except in compliance with
+--   the License.  You may obtain a copy of the License at
+--
+--  http://www.apache.org/licenses/LICENSE-2.0
+--
+--   Unless required by applicable law or agreed to in writing, software
+--   distributed under the License is distributed on an "AS IS" BASIS,
+--   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+--   See the License for the specific language governing permissions and
+--   limitations under the License.
+--
+
+CREATE TEMPORARY VIEW t AS SELECT 1;
+
+SELECT CASE WHEN true THEN cast(1 as tinyint) ELSE cast(2 as tinyint) END 
FROM t;
--- End diff --

@gatorsmile Two questions:
1. Hive doesn't have the `short` type, so can we remove the `short` type 
here?
2. Hive can't execute `CREATE TEMPORARY VIEW ...`, but can executor `CREATE 
TEMPORARY TABLE ...`, Do we add this feature to spark?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19949: [SPARK-22762][TEST] Basic tests for IfCoercion and CaseW...

2017-12-12 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19949
  
@HyukjinKwon see: https://issues.apache.org/jira/browse/SPARK-22722


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19949: [SPARK-22762][TEST] Basic tests for IfCoercion an...

2017-12-12 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/19949

[SPARK-22762][TEST] Basic tests for IfCoercion and CaseWhenCoercion

## What changes were proposed in this pull request?

Basic tests for IfCoercion and CaseWhenCoercion

## How was this patch tested?

N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-22762

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19949


commit f9da9103eacdad8a7d544e9d17b8a54d6b7e01c5
Author: Yuming Wang <wgy...@gmail.com>
Date:   2017-12-12T07:59:11Z

Basic tests for IfCoercion and CaseWhenCoercion




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19918: [SPARK-22726] [TEST] Basic tests for Binary Compa...

2017-12-09 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19918#discussion_r155939348
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/binaryComparison.sql
 ---
@@ -0,0 +1,287 @@
+--
+--   Licensed to the Apache Software Foundation (ASF) under one or more
+--   contributor license agreements.  See the NOTICE file distributed with
+--   this work for additional information regarding copyright ownership.
+--   The ASF licenses this file to You under the Apache License, Version 
2.0
+--   (the "License"); you may not use this file except in compliance with
+--   the License.  You may obtain a copy of the License at
+--
+--  http://www.apache.org/licenses/LICENSE-2.0
+--
+--   Unless required by applicable law or agreed to in writing, software
+--   distributed under the License is distributed on an "AS IS" BASIS,
+--   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+--   See the License for the specific language governing permissions and
+--   limitations under the License.
+--
+
+-- Binary Comparison
+
+CREATE TEMPORARY VIEW t AS SELECT 1;
+
+SELECT cast(1 as binary) = '1' FROM t;
--- End diff --

Seems binary comparison without 
[<=>](https://github.com/apache/spark/blob/ced6ccf0d6f362e299f270ed2a474f2e14f845da/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L594).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive

2017-12-09 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19932#discussion_r155935430
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -213,6 +213,29 @@ class StatisticsSuite extends 
StatisticsCollectionTestBase with TestHiveSingleto
 }
   }
 
+  test("SPARK- - read Hive's statistics for partition") {
--- End diff --

SPARK- -> SPARK-22745?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19841: [SPARK-22642][SQL] the createdTempDir will not be...

2017-12-01 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/19841#discussion_r154480032
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -104,147 +105,153 @@ case class InsertIntoHiveTable(
 val partitionColumns = 
fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns")
 val partitionColumnNames = 
Option(partitionColumns).map(_.split("/")).getOrElse(Array.empty)
 
-// By this time, the partition map must match the table's partition 
columns
-if (partitionColumnNames.toSet != partition.keySet) {
-  throw new SparkException(
-s"""Requested partitioning does not match the 
${table.identifier.table} table:
-   |Requested partitions: ${partition.keys.mkString(",")}
-   |Table partitions: 
${table.partitionColumnNames.mkString(",")}""".stripMargin)
-}
-
-// Validate partition spec if there exist any dynamic partitions
-if (numDynamicPartitions > 0) {
-  // Report error if dynamic partitioning is not enabled
-  if (!hadoopConf.get("hive.exec.dynamic.partition", 
"true").toBoolean) {
-throw new 
SparkException(ErrorMsg.DYNAMIC_PARTITION_DISABLED.getMsg)
+def processInsert = {
+  // By this time, the partition map must match the table's partition 
columns
+  if (partitionColumnNames.toSet != partition.keySet) {
+throw new SparkException(
+  s"""Requested partitioning does not match the 
${table.identifier.table} table:
+ |Requested partitions: ${partition.keys.mkString(",")}
+ |Table partitions: 
${table.partitionColumnNames.mkString(",")}""".stripMargin)
   }
 
-  // Report error if dynamic partition strict mode is on but no static 
partition is found
-  if (numStaticPartitions == 0 &&
-hadoopConf.get("hive.exec.dynamic.partition.mode", 
"strict").equalsIgnoreCase("strict")) {
-throw new 
SparkException(ErrorMsg.DYNAMIC_PARTITION_STRICT_MODE.getMsg)
-  }
+  // Validate partition spec if there exist any dynamic partitions
+  if (numDynamicPartitions > 0) {
+// Report error if dynamic partitioning is not enabled
+if (!hadoopConf.get("hive.exec.dynamic.partition", 
"true").toBoolean) {
+  throw new 
SparkException(ErrorMsg.DYNAMIC_PARTITION_DISABLED.getMsg)
+}
+
+// Report error if dynamic partition strict mode is on but no 
static partition is found
+if (numStaticPartitions == 0 &&
+  hadoopConf.get("hive.exec.dynamic.partition.mode", 
"strict").equalsIgnoreCase("strict")) {
+  throw new 
SparkException(ErrorMsg.DYNAMIC_PARTITION_STRICT_MODE.getMsg)
+}
 
-  // Report error if any static partition appears after a dynamic 
partition
-  val isDynamic = partitionColumnNames.map(partitionSpec(_).isEmpty)
-  if (isDynamic.init.zip(isDynamic.tail).contains((true, false))) {
-throw new 
AnalysisException(ErrorMsg.PARTITION_DYN_STA_ORDER.getMsg)
+// Report error if any static partition appears after a dynamic 
partition
+val isDynamic = partitionColumnNames.map(partitionSpec(_).isEmpty)
+if (isDynamic.init.zip(isDynamic.tail).contains((true, false))) {
+  throw new 
AnalysisException(ErrorMsg.PARTITION_DYN_STA_ORDER.getMsg)
+}
   }
-}
 
-table.bucketSpec match {
-  case Some(bucketSpec) =>
-// Writes to bucketed hive tables are allowed only if user does 
not care about maintaining
-// table's bucketing ie. both "hive.enforce.bucketing" and 
"hive.enforce.sorting" are
-// set to false
-val enforceBucketingConfig = "hive.enforce.bucketing"
-val enforceSortingConfig = "hive.enforce.sorting"
+  table.bucketSpec match {
+case Some(bucketSpec) =>
+  // Writes to bucketed hive tables are allowed only if user does 
not care about maintaining
+  // table's bucketing ie. both "hive.enforce.bucketing" and 
"hive.enforce.sorting" are
+  // set to false
+  val enforceBucketingConfig = "hive.enforce.bucketing"
+  val enforceSortingConfig = "hive.enforce.sorting"
 
-val message = s"Output Hive table ${table.identifier} is bucketed 
but Spark" +
-  "cur

[GitHub] spark issue #19858: [SPARK-22489][DOC][FOLLOWUP] Update broadcast behavior c...

2017-12-01 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/19858
  
cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   >