[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 Thanks everyone, we can move to [SPARK-23710](https://issues.apache.org/jira/browse/SPARK-23710) to discuss. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20785#discussion_r174980995 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging { */ def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): String = { val sparkValue = conf.get(key, default) -if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") { +if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" --- End diff -- `YarnConfiguration` can only configure one `spark.shuffle.service.port` value. We can gradually upgrade the shuffle service if get `spark.shuffle.service.port` value from `SparkConf` because we can set different values ââfor different applications. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20785#discussion_r174979402 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2434,7 +2434,8 @@ private[spark] object Utils extends Logging { */ def getSparkOrYarnConfig(conf: SparkConf, key: String, default: String): String = { val sparkValue = conf.get(key, default) -if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn") { +if (conf.get(SparkLauncher.SPARK_MASTER, null) == "yarn" --- End diff -- Assuming that `--conf spark.shuffle.service.port = 7338` is configured, 7338 is displayed on the tab of the environment, but 7337 is actually used. So my idea is get value from `SparkConf ` if key starting with `spark.` except for `spark.hadoop.`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20835: [HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 2...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20835 cc @kiszk @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20835: [HOT-FIX] Fix SparkOutOfMemoryError: Unable to ac...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20835 [HOT-FIX] Fix SparkOutOfMemoryError: Unable to acquire 262144 bytes of memory, got 224631 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23598 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20835.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20835 commit 32df7d6d7b1c1d17460fc6cdb8b17adee8c765fd Author: Yuming Wang <yumwang@...> Date: 2018-03-15T15:22:36Z SparkOutOfMemoryError: Unable to acquire 262144 bytes of memory, got 224631 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20819: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/20819 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20819: [DO-NOT-MERGE] Try to update Hive to 2.3.2
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20819 [DO-NOT-MERGE] Try to update Hive to 2.3.2 ## What changes were proposed in this pull request? Check if there is any test failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark hive-2.3.2-jenkins Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20819.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20819 commit 915e68faefcbb5d39ad707937ef95883294c1825 Author: Yuming Wang <wgyumg@...> Date: 2018-02-22T06:55:18Z Update Hive to 2.3.2 * Update Hive to 2.3.2 commit a5bb731985488892ef9bc8ec9bbcff2a218d0130 Author: Yuming Wang <yumwang@...> Date: 2018-02-22T09:52:10Z replace manifest commit 80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247 Author: Yuming Wang <yumwang@...> Date: 2018-02-22T11:10:10Z Fix javaunidoc error commit 1110ede7e43d8638810e4e0f37772443fc91449b Author: Yuming Wang <yumwang@...> Date: 2018-03-05T13:34:28Z Merge remote-tracking branch 'upstream/master' into hive-2.3.x # Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala commit 566fa59125dff6df2d152290f339e304f5086bbe Author: Yuming Wang <yumwang@...> Date: 2018-03-11T02:20:03Z Fix dependency commit b35daa0593af1204e3b2833c30ec0374e8c2b530 Author: Yuming Wang <yumwang@...> Date: 2018-03-13T13:00:16Z Add org.apache.derby.* to shared class commit b418909852da0222bfd96a17be7bcefce1311b75 Author: Yuming Wang <yumwang@...> Date: 2018-03-14T01:07:58Z ignore backward compatibility commit f478c89a9095b88c031d5bd86135085fc81044e2 Author: Yuming Wang <yumwang@...> Date: 2018-03-14T05:33:27Z Try to fix hive-thriftserver/compile:compileIncremental error --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 > [error] /home/jenkins/workspace/SparkPullRequestBuilder/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/session/HiveSessionImpl.java:825: error: cannot find symbol [error] String lScratchDir = hiveConf.getVar(ConfVars.LOCALSCRATCHDIR); [error] But HiveSessionImpl.java#L825 is: ``` FileUtils.forceDelete(sessionLogDir); ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DO-NOT-MERGE] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20803 ```bash cat < test.sql select '\${a}', '\${b}'; EOF spark-sql --hiveconf a=avalue --hivevar b=bvalue -f test.sql ``` SQL text is `select ${a}, ${b}` or `select avalue, bvalue`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20803: [SPARK-23653][SQL] Show sql statement in spark SQL UI
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20803 1. Double click this SQL statement can show full SQL statement: https://github.com/apache/spark/pull/6646 2. What if this SQL statement contains `--hiveconf` or `--hivevar`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20785 You are right. In fact, our cluster has two shuffle services, one for production and one for development. We configure `spark.shuffle.service.port` to decide which shuffle service to use. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20785 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20785: [SPARK-23640][CORE] Fix hadoop config may override spark...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20785 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20785: [SPARK-23640][CORE] Fix hadoop config may overrid...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20785 [SPARK-23640][CORE] Fix hadoop config may override spark config ## What changes were proposed in this pull request? It may be get `spark.shuffle.service.port` from https://github.com/apache/spark/blob/9745ec3a61c99be59ef6a9d5eebd445e8af65b7a/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L459 Therefore, the client configuration `spark.shuffle.service.port` does not working unless the configuration is `spark.hadoop.spark.shuffle.service.port`. - This configuration is not working: ``` bin/spark-sql --master yarn --conf spark.shuffle.service.port=7338 ``` - This configuration work: ``` bin/spark-sql --master yarn --conf spark.hadoop.spark.shuffle.service.port=7338 ``` This PR fix this issue. ## How was this patch tested? It's difficult to carry out unit testing. But I've tested it manually. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23640 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20785.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20785 commit 9745ec3a61c99be59ef6a9d5eebd445e8af65b7a Author: Yuming Wang <yumwang@...> Date: 2018-03-09T11:05:29Z Fix hadoop config may override spark config --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20659: [DNM] Try to update Hive to 2.3.2
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20659 Yes, I'm doing it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2
GitHub user wangyum reopened a pull request: https://github.com/apache/spark/pull/20659 [DNM] Try to update Hive to 2.3.2 ## What changes were proposed in this pull request? Check if there is any test failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark hive-2.3.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20659.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20659 commit 915e68faefcbb5d39ad707937ef95883294c1825 Author: Yuming Wang <wgyumg@...> Date: 2018-02-22T06:55:18Z Update Hive to 2.3.2 * Update Hive to 2.3.2 commit a5bb731985488892ef9bc8ec9bbcff2a218d0130 Author: Yuming Wang <yumwang@...> Date: 2018-02-22T09:52:10Z replace manifest commit 80fd8a8aa3c3e42cd99f164f80cfcc6f46e2f247 Author: Yuming Wang <yumwang@...> Date: 2018-02-22T11:10:10Z Fix javaunidoc error commit 1110ede7e43d8638810e4e0f37772443fc91449b Author: Yuming Wang <yumwang@...> Date: 2018-03-05T13:34:28Z Merge remote-tracking branch 'upstream/master' into hive-2.3.x # Conflicts: # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/client/package.scala # sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20735: [MINOR][YARN] Add disable yarn.nodemanager.vmem-c...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20735 [MINOR][YARN] Add disable yarn.nodemanager.vmem-check-enabled option to memLimitExceededLogMessage ## What changes were proposed in this pull request? My spark application sometimes will throw `Container killed by YARN for exceeding memory limits`. Even I increased `spark.yarn.executor.memoryOverhead` to 10G, this error still happen. The latest config: https://user-images.githubusercontent.com/5399861/36975716-f5c548d2-20b5-11e8-95e5-b228d50917b9.png;> And error message: ``` ExecutorLostFailure (executor 121 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 30.7 GB of 30 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead. ``` This is because of [Linux glibc >= 2.10 (RHEL 6) malloc may show excessive virtual memory usage](https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en). So disable `yarn.nodemanager.vmem-check-enabled` looks like a good option as [MapR mentioned ](https://mapr.com/blog/best-practices-yarn-resource-management). This PR add disable `yarn.nodemanager.vmem-check-enabled` option to memLimitExceededLogMessage. More details: https://issues.apache.org/jira/browse/YARN-4714 https://stackoverflow.com/a/31450291 https://stackoverflow.com/a/42091255 After this PR: https://user-images.githubusercontent.com/5399861/36975949-c8e7bbbe-20b6-11e8-9513-9f903b868d8d.png;> ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark YARN-4714 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20735.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20735 commit 3fc05b4f8599ee65e8c4f808aee238d212c22b17 Author: Yuming Wang <yumwang@...> Date: 2018-03-05T12:38:21Z Update memLimitExceededLogMessage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20734: [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hi...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20734 [SPARK-23510][DOC][FOLLOW-UP] Update spark.sql.hive.metastore.version ## What changes were proposed in this pull request? Update `spark.sql.hive.metastore.version` to 2.3.2, same as HiveUtils.scala: https://github.com/apache/spark/blob/ff1480189b827af0be38605d566a4ee71b4c36f6/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala#L63-L65 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23510-FOLLOW-UP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20734.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20734 commit 1052c72b94d49541597b8d5561039fe223ce0ddc Author: Yuming Wang <yumwang@...> Date: 2018-03-05T12:24:48Z Update spark.sql.hive.metastore.version --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/20668 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20668 Yes, If we do not add `alterPartitionsMethod`, [HiveExternalSessionCatalogSuite.alter partitions](https://github.com/apache/spark/blob/d73bb92a72fdd6c1901c070a91b70b845a034e88/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala#L951) will fail, too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metasto...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20668 Otherwise, `SessionCatalogSuite` also needs to be updated ```scala Index: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala IDEA additional info: Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP <+>UTF-8 === --- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (date 1519557876000) +++ sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala (date 1519702924000) @@ -955,8 +955,10 @@ val oldPart1 = catalog.getPartition(TableIdentifier("tbl2", Some("db2")), part1.spec) val oldPart2 = catalog.getPartition(TableIdentifier("tbl2", Some("db2")), part2.spec) catalog.alterPartitions(TableIdentifier("tbl2", Some("db2")), Seq( -oldPart1.copy(storage = storageFormat.copy(locationUri = Some(newLocation))), -oldPart2.copy(storage = storageFormat.copy(locationUri = Some(newLocation) +oldPart1.copy(parameters = oldPart1.parameters, + storage = storageFormat.copy(locationUri = Some(newLocation))), +oldPart2.copy(parameters = oldPart2.parameters, + storage = storageFormat.copy(locationUri = Some(newLocation) val newPart1 = catalog.getPartition(TableIdentifier("tbl2", Some("db2")), part1.spec) val newPart2 = catalog.getPartition(TableIdentifier("tbl2", Some("db2")), part2.spec) assert(newPart1.storage.locationUri == Some(newLocation)) @@ -965,7 +967,9 @@ assert(oldPart2.storage.locationUri != Some(newLocation)) // Alter partitions without explicitly specifying database catalog.setCurrentDatabase("db2") - catalog.alterPartitions(TableIdentifier("tbl2"), Seq(oldPart1, oldPart2)) + catalog.alterPartitions(TableIdentifier("tbl2"), +Seq(oldPart1.copy(parameters = newPart1.parameters), + oldPart2.copy(parameters = newPart2.parameters))) val newerPart1 = catalog.getPartition(TableIdentifier("tbl2"), part1.spec) val newerPart2 = catalog.getPartition(TableIdentifier("tbl2"), part2.spec) assert(oldPart1.storage.locationUri == newerPart1.storage.locationUri) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20668#discussion_r170450895 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 { alterPartitionsMethod.invoke(hive, tableName, newParts, environmentContextInAlterTable) } } + +private[client] class Shim_v2_2 extends Shim_v2_1 { + +} + +private[client] class Shim_v2_3 extends Shim_v2_2 { + + val environmentContext = new EnvironmentContext() + environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true") + + private lazy val alterPartitionsMethod = +findMethod( + classOf[Hive], + "alterPartitions", + classOf[String], + classOf[JList[Partition]], + classOf[EnvironmentContext]) + + override def alterPartitions(hive: Hive, tableName: String, newParts: JList[Partition]): Unit = { --- End diff -- `alterPartitions`: ``` [info] - 2.3: alterPartitions *** FAILED *** (50 milliseconds) [info] java.lang.reflect.InvocationTargetException: [info] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [info] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [info] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [info] at java.lang.reflect.Method.invoke(Method.java:498) [info] at org.apache.spark.sql.hive.client.Shim_v2_1.alterPartitions(HiveShim.scala:1144) [info] at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply$mcV$sp(HiveClientImpl.scala:616) [info] at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply(HiveClientImpl.scala:607) [info] at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$alterPartitions$1.apply(HiveClientImpl.scala:607) [info] at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275) [info] at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213) [info] at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212) [info] at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258) [info] at org.apache.spark.sql.hive.client.HiveClientImpl.alterPartitions(HiveClientImpl.scala:607) [info] at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$55.apply(VersionsSuite.scala:432) [info] at org.apache.spark.sql.hive.client.VersionsSuite$$anonfun$6$$anonfun$apply$55.apply(VersionsSuite.scala:424) [info] at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:103) [info] at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) [info] at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) [info] at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196) [info] at org.scalatest.FunSuite.runTest(FunSuite.scala:1560) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) [info] at org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396) [info] at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384) [info] at scala.collection.immutable.List.foreach(List.scala:381) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) [info] at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) [info] at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229) [info] at org.scalatest.FunSuite.runTests(FunSuite.scala:1560) [info] at org.scalatest.Suite$class.run(Suite.scala:1147) [info] at org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233) [info] at org.scalatest.FunSuiteLike$$anonfun$run$1.apply(Fun
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20668#discussion_r170425667 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -202,7 +202,6 @@ private[spark] object HiveUtils extends Logging { ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> TimeUnit.MILLISECONDS, ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS, ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> TimeUnit.MILLISECONDS, - ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS, --- End diff -- Remove `HIVE_STATS_JDBC_TIMEOUT ` , more see: https://issues.apache.org/jira/browse/HIVE-12164 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20668#discussion_r170425631 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala --- @@ -202,8 +202,6 @@ private[spark] object HiveUtils extends Logging { ConfVars.METASTORE_AGGREGATE_STATS_CACHE_MAX_READER_WAIT -> TimeUnit.MILLISECONDS, ConfVars.HIVES_AUTO_PROGRESS_TIMEOUT -> TimeUnit.SECONDS, ConfVars.HIVE_LOG_INCREMENTAL_PLAN_PROGRESS_INTERVAL -> TimeUnit.MILLISECONDS, - ConfVars.HIVE_STATS_JDBC_TIMEOUT -> TimeUnit.SECONDS, - ConfVars.HIVE_STATS_RETRIES_WAIT -> TimeUnit.MILLISECONDS, --- End diff -- Remove `HIVE_STATS_JDBC_TIMEOUT ` , more see: https://issues.apache.org/jira/browse/HIVE-12164 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20668#discussion_r170425408 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -1146,3 +1146,25 @@ private[client] class Shim_v2_1 extends Shim_v2_0 { alterPartitionsMethod.invoke(hive, tableName, newParts, environmentContextInAlterTable) } } + +private[client] class Shim_v2_2 extends Shim_v2_1 { + +} + +private[client] class Shim_v2_3 extends Shim_v2_2 { + + val environmentContext = new EnvironmentContext() + environmentContext.putToProperties("DO_NOT_UPDATE_STATS", "true") --- End diff -- Otherwise will throw `NumberFormatException`: ``` [info] Cause: java.lang.NumberFormatException: null [info] at java.lang.Long.parseLong(Long.java:552) [info] at java.lang.Long.parseLong(Long.java:631) [info] at org.apache.hadoop.hive.metastore.MetaStoreUtils.isFastStatsSame(MetaStoreUtils.java:315) [info] at org.apache.hadoop.hive.metastore.HiveAlterHandler.alterPartitions(HiveAlterHandler.java:605) [info] at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_partitions_with_environment_context(HiveMetaStore.java:3837) ``` more see: https://issues.apache.org/jira/browse/HIVE-15653 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20668: [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20668 [SPARK-23510][SQL] Support Hive 2.2 and Hive 2.3 metastore ## What changes were proposed in this pull request? Support Hive 2.2 and Hive 2.3 metastore. ## How was this patch tested? Exist tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23510 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20668 commit 5b1fc0145efbdd427e8b49bd0f840f709d4bc801 Author: Yuming Wang <yumwang@...> Date: 2018-02-24T16:19:35Z Support Hive 2.2 and Hive 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/20659 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20659: [DNM] Try to update Hive to 2.3.2
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20659 [DNM] Try to update Hive to 2.3.2 ## What changes were proposed in this pull request? Check if there is any test failed. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark hive-2.3.x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20659.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20659 commit 915e68faefcbb5d39ad707937ef95883294c1825 Author: Yuming Wang <wgyumg@...> Date: 2018-02-22T06:55:18Z Update Hive to 2.3.2 * Update Hive to 2.3.2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20597: [MINOR][TEST] Update from 2.2.0 to 2.2.1 in HiveExternal...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20597 Jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20504 @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20557: [SPARK-23364][SQL]'desc table' command in spark-s...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20557#discussion_r167225201 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -539,15 +539,15 @@ case class DescribeTableCommand( throw new AnalysisException( s"DESC PARTITION is not allowed on a temporary view: ${table.identifier}") } - describeSchema(catalog.lookupRelation(table).schema, result, header = false) + describeSchema(catalog.lookupRelation(table).schema, result, header = true) --- End diff -- May be should add a configure like `hive.cli.print.header`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20521: [SPARK-22977][SQL] fix web UI SQL tab for CTAS
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20521 Thanks @cloud-fan It works. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 The failure is due to flaky test suite. ``` org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.NestedSuiteSelector) ``` jenkins, retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to su...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20504#discussion_r166156332 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -250,11 +257,20 @@ class SQLQueryTestSuite extends QueryTest with SharedSQLContext { } private def listTestCases(): Seq[TestCase] = { -listFilesRecursively(new File(inputFilePath)).map { file => +listFilesRecursively(new File(inputFilePath)).flatMap { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" val absPath = file.getAbsolutePath val testCaseName = absPath.stripPrefix(inputFilePath).stripPrefix(File.separator) - TestCase(testCaseName, absPath, resultFile) + if (testCaseName.contains("typeCoercion")) { +TypeCoercionMode.values.map(_.toString).map { mode => + val fileNameWithMode = mode + File.separator + file.getName + val newTestCaseName = testCaseName.replace(file.getName, fileNameWithMode) + val newResultFile = resultFile.replace(file.getName, fileNameWithMode) --- End diff -- Thanks @dongjoon-hyun, There are 3 files are different: `hive/binaryComparison.sql.out`, `hive/decimalPrecision.sql.out` and `hive/promoteStrings.sql.out` something like this: https://github.com/wangyum/spark/commit/927f6e86712ec4da4d58dbde2859b48520df3194 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20510 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20508: [SPARK-23335][SQL] Should not convert to double w...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20508#discussion_r165968094 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -327,6 +327,14 @@ object TypeCoercion { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e + // For integralType should not convert to double which will cause precision loss. + case a @ BinaryArithmetic(left @ StringType(), right @ IntegralType()) => --- End diff -- What will happen if string value beyond the long type range? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20510: [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20510 [SPARK-23336][BUILD] Upgrade snappy-java to 1.1.4 ## What changes were proposed in this pull request? This PR upgrade snappy-java to 1.1.4. release notes: - Fix a 1% performance regression when snappy is used in PIE executables. - Improve compression performance by 5%. - Improve decompression performance by 20%. More details: https://github.com/xerial/snappy-java/blob/master/Milestone.md#snappy-java-114-2017-05-22 ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23336 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20510.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20510 commit 1055afc107b0c2357449ae3f23bda089480579d9 Author: Yuming Wang <wgyumg@...> Date: 2018-02-05T11:59:47Z Upgrade snappy-java to 1.1.4 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spar...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20274 The [Pre-build spark](https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/) contains `kubernetes-model-2.0.0.jar`. but the below command will not contain this jar: ``` ./dev/make-distribution.sh --tgz -Phadoop-2.7 -Phive -Phive-thriftserver -Pyarn -DskipTests ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to support spar...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20274 An interesting discovery: if `SPARK_HOME/jars` missing `kubernetes-model-2.0.0.jar`, the silent mode is broken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20504 Thanks @hvanhovell , the major changes is `SQLQueryTestSuite.scala`: ```scala private def listTestCases(): Seq[TestCase] = { -listFilesRecursively(new File(inputFilePath)).map { file => +listFilesRecursively(new File(inputFilePath)).flatMap { file => val resultFile = file.getAbsolutePath.replace(inputFilePath, goldenFilePath) + ".out" val absPath = file.getAbsolutePath val testCaseName = absPath.stripPrefix(inputFilePath).stripPrefix(File.separator) - TestCase(testCaseName, absPath, resultFile) + if (testCaseName.contains("typeCoercion")) { +TypeCoercionMode.values.map(_.toString).map { mode => + val fileNameWithMode = mode + File.separator + file.getName + val newTestCaseName = testCaseName.replace(file.getName, fileNameWithMode) + val newResultFile = resultFile.replace(file.getName, fileNameWithMode) + TestCase(newTestCaseName, absPath, newResultFile, mode) +}.toSeq + } else { +Seq(TestCase(testCaseName, absPath, resultFile)) + } } ``` For a [type coercion input](https://github.com/apache/spark/tree/v2.3.0-rc2/sql/core/src/test/resources/sql-tests/inputs/typeCoercion), two results are generated in different mode(`default` and `hive`). **For example**: _input_: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/binaryComparison.sql _results_: sql/core/src/test/resources/sql-tests/results/typeCoercion/default/binaryComparison.sql.out sql/core/src/test/resources/sql-tests/results/typeCoercion/hive/binaryComparison.sql.out --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to support a...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20504 After SPARK-21646, `hive/binaryComparison.sql.out`, `hive/decimalPrecision.sql.out` and `hive/promoteStrings.sql.out` seems like this: https://github.com/wangyum/spark/commit/927f6e86712ec4da4d58dbde2859b48520df3194 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20504: [SPARK-23332][SQL] Update SQLQueryTestSuite to su...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20504 [SPARK-23332][SQL] Update SQLQueryTestSuite to support test hive mode ## What changes were proposed in this pull request? Update `SQLQueryTestSuite` to support test hive mode. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23332 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20504.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20504 commit dd8531dbf55e1cc05eaa4e09d9ff278e02595a9a Author: Yuming Wang <wgyumg@...> Date: 2018-02-04T22:40:58Z Update SQLQueryTestSuite to support test hive mode --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeti...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20498#discussion_r165844408 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql --- @@ -74,7 +75,8 @@ select 12345678901234567890.0 * 12345678901234567890.0; select 1e35 / 0.1; -- arithmetic operations causing a precision loss return NULL +select 12345678912345678912345678912.1234567 + 999.12345; --- End diff -- The result is: ``` -- !query 32 select 12345678912345678912345678912.1234567 + 999.12345 -- !query 32 schema struct<(CAST(12345678912345678912345678912.1234567 AS DECIMAL(38,7)) + CAST(999.12345 AS DECIMAL(38,7))):decimal(38,7)> -- !query 32 output NULL -- !query 33 select 123456789123456789.1234567890 * 1.123456789123456789 -- !query 33 schema struct<(CAST(123456789123456789.1234567890 AS DECIMAL(36,18)) * CAST(1.123456789123456789 AS DECIMAL(36,18))):decimal(38,28)> -- !query 33 output NULL -- !query 34 select 12345678912345.123456789123 / 0.00012345678 -- !query 34 schema struct<(CAST(12345678912345.123456789123 AS DECIMAL(29,15)) / CAST(1.2345678E-8 AS DECIMAL(29,15))):decimal(38,18)> -- !query 34 output NULL ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix decimalArithmeti...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20498#discussion_r165844386 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalArithmeticOperations.sql --- @@ -48,8 +48,9 @@ select 12345678901234567890.0 * 12345678901234567890.0; select 1e35 / 0.1; -- arithmetic operations causing a precision loss are truncated +select 12345678912345678912345678912.1234567 + 999.12345; --- End diff -- The result is: ``` -- !query 17 select 12345678912345678912345678912.1234567 + 999.12345 -- !query 17 schema struct<(CAST(12345678912345678912345678912.1234567 AS DECIMAL(38,6)) + CAST(999.12345 AS DECIMAL(38,6))):decimal(38,6)> -- !query 17 output 10012345678912345678912345678911.246907 -- !query 18 select 123456789123456789.1234567890 * 1.123456789123456789 -- !query 18 schema struct<(CAST(123456789123456789.1234567890 AS DECIMAL(36,18)) * CAST(1.123456789123456789 AS DECIMAL(36,18))):decimal(38,18)> -- !query 18 output 138698367904130467.654320988515622621 -- !query 19 select 12345678912345.123456789123 / 0.00012345678 -- !query 19 schema struct<(CAST(12345678912345.123456789123 AS DECIMAL(29,15)) / CAST(1.2345678E-8 AS DECIMAL(29,15))):decimal(38,9)> -- !query 19 output 100073899961059796.725866332 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20498 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20498 retest please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20498: [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20498 [SPARK-22036][SQL][FOLLOWUP] Fix imperfect test ## What changes were proposed in this pull request? Fix imperfect test ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22036 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20498.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20498 commit 2f532ea3316f8a3058f517b405811f8c8c080309 Author: wangyum <wgyumg@...> Date: 2018-02-03T11:19:00Z Fix test error. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19788 Thanks @yucai , It's a great improvement for many output files. The figure below is our comparison: **Before**: https://user-images.githubusercontent.com/5399861/35762292-6b5f9f88-08cf-11e8-8aa5-0d10e4282599.png;> **After**: https://user-images.githubusercontent.com/5399861/35762790-9be2e468-08d8-11e8-8403-2f85993eee9d.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20430: [SPARK-23263][SQL] Create table stored as parquet...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20430 [SPARK-23263][SQL] Create table stored as parquet should update table size if automatic update table size is enabled â¦update table size is enabled ## What changes were proposed in this pull request? How to reproduce: ```sql bin/spark-sql --conf spark.sql.statistics.size.autoUpdate.enabled=true spark-sql> create table test_create_parquet stored as parquet as select 1; spark-sql> desc extended test_create_parquet; ``` The table statistics will not exists. This pr fix this issue. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-23263 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20430.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20430 commit 08d31c0823e5f6c257b0917362c8e07b04702af2 Author: Yuming Wang <yumwang@...> Date: 2018-01-30T03:45:20Z create table stored as parquet should update table size if automatic update table size is enabled --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20303: [SPARK-23128][SQL] A new approach to do adaptive ...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20303#discussion_r164070918 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/QueryStage.scala --- @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.adaptive + +import scala.concurrent.{ExecutionContext, Future} +import scala.concurrent.duration.Duration + +import org.apache.spark.MapOutputStatistics +import org.apache.spark.broadcast +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.plans.physical.Partitioning +import org.apache.spark.sql.execution._ +import org.apache.spark.sql.execution.exchange._ +import org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate +import org.apache.spark.util.ThreadUtils + +/** + * In adaptive execution mode, an execution plan is divided into multiple QueryStages. Each + * QueryStage is a sub-tree that runs in a single stage. + */ +abstract class QueryStage extends UnaryExecNode { + + var child: SparkPlan + + // Ignore this wrapper for canonicalizing. + override def doCanonicalize(): SparkPlan = child.canonicalized + + override def output: Seq[Attribute] = child.output + + override def outputPartitioning: Partitioning = child.outputPartitioning + + override def outputOrdering: Seq[SortOrder] = child.outputOrdering + + /** + * Execute childStages and wait until all stages are completed. Use a thread pool to avoid + * blocking on one child stage. + */ + def executeChildStages(): Unit = { +// Handle broadcast stages +val broadcastQueryStages: Seq[BroadcastQueryStage] = child.collect { + case bqs: BroadcastQueryStageInput => bqs.childStage +} +val broadcastFutures = broadcastQueryStages.map { queryStage => + Future { queryStage.prepareBroadcast() }(QueryStage.executionContext) +} + +// Submit shuffle stages +val executionId = sqlContext.sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY) +val shuffleQueryStages: Seq[ShuffleQueryStage] = child.collect { + case sqs: ShuffleQueryStageInput => sqs.childStage +} +val shuffleStageFutures = shuffleQueryStages.map { queryStage => + Future { +SQLExecution.withExecutionId(sqlContext.sparkContext, executionId) { + queryStage.execute() +} + }(QueryStage.executionContext) +} + +ThreadUtils.awaitResult( + Future.sequence(broadcastFutures)(implicitly, QueryStage.executionContext), Duration.Inf) +ThreadUtils.awaitResult( + Future.sequence(shuffleStageFutures)(implicitly, QueryStage.executionContext), Duration.Inf) + } + + /** + * Before executing the plan in this query stage, we execute all child stages, optimize the plan + * in this stage and determine the reducer number based on the child stages' statistics. Finally + * we do a codegen for this query stage and update the UI with the new plan. + */ + def prepareExecuteStage(): Unit = { +// 1. Execute childStages +executeChildStages() +// It is possible to optimize this stage's plan here based on the child stages' statistics. + +// 2. Determine reducer number +val queryStageInputs: Seq[ShuffleQueryStageInput] = child.collect { + case input: ShuffleQueryStageInput => input +} +val childMapOutputStatistics = queryStageInputs.map(_.childStage.mapOutputStatistics) + .filter(_ != null).toArray +if (childMapOutputStatistics.length > 0) { + val exchangeCoordinator = new ExchangeCoordinator( +conf.targetPostShuffleInputSize, +conf.minNumPostS
[GitHub] spark issue #18138: [SPARK-20915][SQL] Make lpad/rpad with empty pad string ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/18138 Hive will throw `ArrayIndexOutOfBoundsException` at runtime: https://issues.apache.org/jira/browse/HIVE-17077 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20274: [SPARK-20120][SQL][FOLLOW-UP] Better way to suppo...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20274 [SPARK-20120][SQL][FOLLOW-UP] Better way to support spark-sql silent mode. ## What changes were proposed in this pull request? `spark-sql` silent mode is broken now. It seems `sc.setLogLevel ()` is a better way. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-20120-FOLLOW-UP Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20274.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20274 commit 83a844b2c221eea4b02cb6816bd2c6017cd1e1fc Author: Yuming Wang <yumwang@...> Date: 2018-01-16T01:10:42Z Better way to support spark-sql silent mode. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20268: [SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for s...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20268 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20268: [SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSiz...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20268 [SPARK-19550][BUILD][FOLLOW-UP] Remove MaxPermSize for sql module ## What changes were proposed in this pull request? Remove `MaxPermSize` for `sql` module ## How was this patch tested? Manually tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-19550-MaxPermSize Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20268.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20268 commit 67708359ff19d450a3f3e60548df778fb1588515 Author: Yuming Wang <yumwang@...> Date: 2018-01-15T04:56:45Z Remove MaxPermSize for sql module --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20248: [SPARK-23058][SQL] Show non printable field delim as uni...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20248 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20248: [SPARK-23058][SQL] Show non printable field delim as uni...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20248 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20248: [SPARK-23058][SQL] Show non printable field delim...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20248#discussion_r161364269 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala --- @@ -1023,7 +1023,12 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman val serdeProps = metadata.storage.properties.map { case (key, value) => - s"'${escapeSingleQuotedString(key)}' = '${escapeSingleQuotedString(value)}'" + val escapedValue = if (value.length == 1 && (value.head < 32 || value.head > 126)) { --- End diff -- I need to copy an external table to another environment, but lost the create table statement. So I want to get this create table statement by `show create table ...`, but it can't show non printable field delim. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20248: [SPARK-23058][SQL] Fix non printable field delim issue
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20248 [Non printable characters](http://www.theasciicode.com.ar/): https://user-images.githubusercontent.com/5399861/34880068-33152b7a-f7ea-11e7-8203-570e61c7a21c.png;> --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20248: [SPARK-23058][SQL] Fix non printable field delim ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20248 [SPARK-23058][SQL] Fix non printable field delim issue ## What changes were proposed in this pull request? Create a table with non printable delim like below: ```sql CREATE EXTERNAL TABLE `t1`(`col1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim' = '\177', 'serialization.format' = '\003' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'file:/tmp/t1'; ``` When `show create table t1` : ```sql CREATE EXTERNAL TABLE `t1`(`col1` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'field.delim' = '', 'serialization.format' = '' ) STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'file:/tmp/t1' TBLPROPERTIES ( 'transient_lastDdlTime' = '1515766958' ) ``` `'\177'` and `'\003'` didn't correct show. This PR fix this issue. ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark non-printable-field-delim Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20248.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20248 commit d44f242955503cf6195c5a47bbf631500406027d Author: Yuming Wang <yumwang@...> Date: 2018-01-12T14:28:22Z Fix non printable field delim issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20080: [SPARK-22870][CORE] Dynamic allocation should allow 0 id...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20080 @srowen @jiangxb1987 I have test on my cluster with this patch. ``` bin/spark-sql --master yarn --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.executorIdleTimeout=0 ``` ``` 18/01/09 05:49:03.452 INFO DAGScheduler: Job 0 finished: processCmd at CliDriver.java:376, took 26.196061 s 75000 Time taken: 26.383 seconds, Fetched 1 row(s) 18/01/09 05:49:03.455 INFO SparkSQLCLIDriver: Time taken: 26.383 seconds, Fetched 1 row(s) spark-sql> 18/01/09 05:49:03.479 INFO ExecutorAllocationManager: Request to remove executorIds: 972 ``` `05:49:03.479 - 05:49:03.455 = 24 ms` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20080: [SPARK-22870][CORE] Dynamic allocation should allow 0 id...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20080 cc @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20080: [SPARK-22870][CORE] Dynamic allocation should all...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20080 [SPARK-22870][CORE] Dynamic allocation should allow 0 idle time ## What changes were proposed in this pull request? This pr to make `0` as a valid value for `spark.dynamicAllocation.executorIdleTimeout`. For details, see the jira description: https://issues.apache.org/jira/browse/SPARK-22870. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22870 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20080.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20080 commit 1dcec41a3c1e2c001b0f9fed92aa6f03b6c47f3a Author: Yuming Wang <wgyumg@...> Date: 2017-12-26T01:58:49Z Dynamic allocation should allow 0 idle time --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20079: [SPARK-22893][SQL][HOTFIX] Fix a error message of Versio...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20079 LGTM, thanks @dongjoon-hyun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20067#discussion_r158648321 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-22894: DateTimeOperations should accept SQL like string type") { +val date = "2017-12-24" +val str = sql(s"SELECT CAST('$date' as STRING) + interval 2 months 2 seconds") --- End diff -- But Spark was originally supported: https://github.com/apache/spark/blob/bc0848b4c1ab84ccef047363a70fd11df240dbbf/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercionSuite.scala#L1083 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20067#discussion_r158648226 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-22894: DateTimeOperations should accept SQL like string type") { --- End diff -- Yes, I'll and it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20067#discussion_r158647982 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2760,6 +2760,17 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext { } } + test("SPARK-22894: DateTimeOperations should accept SQL like string type") { +val date = "2017-12-24" +val str = sql(s"SELECT CAST('$date' as STRING) + interval 2 months 2 seconds") --- End diff -- Hive doesn't accept string type: ``` hive> SELECT cast('2017-12-24' as date) + interval 2 day; 2017-12-26 00:00:00 hive> SELECT cast('2017-12-24' as timestamp) + interval 2 day; 2017-12-26 00:00:00 hive> SELECT cast('2017-12-24' as string) + interval 2 day; FAILED: SemanticException Line 0:-1 Wrong arguments '2': No matching method for class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPDTIPlus with (string, interval_day_time) hive> ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20067: [SPARK-22894][SQL] DateTimeOperations should acce...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20067 [SPARK-22894][SQL] DateTimeOperations should accept SQL like string type ## What changes were proposed in this pull request? `DateTimeOperations` accept [`StringType`](https://github.com/apache/spark/blob/ae998ec2b5548b7028d741da4813473dde1ad81e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L669), but: ``` spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds; Error in query: cannot resolve '(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' due to data type mismatch: differing types in '(CAST('2017-12-24' AS DOUBLE) + interval 2 months 2 seconds)' (double and calendarinterval).; line 1 pos 7; 'Project [unresolvedalias((cast(2017-12-24 as double) + interval 2 months 2 seconds), None)] +- OneRowRelation spark-sql> ``` After this PR: ``` spark-sql> SELECT '2017-12-24' + interval 2 months 2 seconds; 2018-02-24 00:00:02 Time taken: 0.2 seconds, Fetched 1 row(s) ``` ## How was this patch tested? unit tests Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22894 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20067.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20067 commit ae998ec2b5548b7028d741da4813473dde1ad81e Author: Yuming Wang <wgyumg@...> Date: 2017-12-23T19:45:31Z DateTimeOperations should accept SQL like string type --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespa...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/20066 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20018: SPARK-22833 [Improvement] in SparkHive Scala Examples
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20018 Thanks @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespace to f...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20066 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85343/console --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20066: [SPARK-22833][Examples][FOLLOWUP] Remove whitespa...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20066 [SPARK-22833][Examples][FOLLOWUP] Remove whitespace to fix scalastyle checks failed ## What changes were proposed in this pull request? This is a followup PR for: https://github.com/apache/spark/pull/20018. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22833 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20066.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20066 commit df92f6ce38a14fc248d5830090dfa473371a129c Author: Yuming Wang <wgyumg@...> Date: 2017-12-23T15:59:29Z Remove whitespace --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20064: [SPARK-22893][SQL] Unified the data type mismatch messag...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20064 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20064: [SPARK-22893][SQL] Unified the data type mismatch...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20064 [SPARK-22893][SQL] Unified the data type mismatch message ## What changes were proposed in this pull request? We should use `dataType.simpleString` to unified the data type mismatch message: Before: ``` spark-sql> select cast(1 as binary); Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type mismatch: cannot cast IntegerType to BinaryType; line 1 pos 7; ``` After: ``` park-sql> select cast(1 as binary); Error in query: cannot resolve 'CAST(1 AS BINARY)' due to data type mismatch: cannot cast int to binary; line 1 pos 7; ``` ## How was this patch tested? Exist test. You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22893 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20064.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20064 commit 8540b912e8e846f9e0fb8c94a8dcc48a05be6a57 Author: Yuming Wang <wgyumg@...> Date: 2017-12-23T11:45:45Z Unified the data type mismatch message. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20061: [SPARK-22890][TEST] Basic tests for DateTimeOpera...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20061 [SPARK-22890][TEST] Basic tests for DateTimeOperations ## What changes were proposed in this pull request? Test Coverage for `DateTimeOperations`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22890 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20061.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20061 commit 24b50f0c8371af258ed152363a9ba8148b23d2d2 Author: Yuming Wang <wgyumg@...> Date: 2017-12-23T02:39:39Z Basic tests for DateTimeOperations commit e8e4d11a504c4169848baeabbec84af2a1b3e6a8 Author: Yuming Wang <wgyumg@...> Date: 2017-12-23T02:53:40Z Append a blank line --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20008 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCo...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20008#discussion_r158016336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -252,7 +252,7 @@ case class SpecifiedWindowFrame( case e: Expression if !frameType.inputType.acceptsType(e.dataType) => TypeCheckFailure( s"The data type of the $location bound '${e.dataType} does not match " + - s"the expected data type '${frameType.inputType}'.") + s"the expected data type '${frameType.inputType.simpleString}'.") --- End diff -- Otherwise the result is: ``` cannot resolve 'RANGE BETWEEN CURRENT ROW AND CAST(1 AS STRING) FOLLOWING' due to data type mismatch: The data type of the upper bound 'StringType does not match the expected data type 'org.apache.spark.sql.types.TypeCollection@7ff36201'.; line 1 pos 21 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if...
GitHub user wangyum reopened a pull request: https://github.com/apache/spark/pull/19804 [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it contains SubqueryExpression ## What changes were proposed in this pull request? Shouldn't inferFilters if it contains SubqueryExpression. ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22573 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19804 commit c2f6a4986fd81d5f9ecacc3fc9a0a6d069a16216 Author: Yuming Wang <wgyumg@...> Date: 2017-11-23T17:09:18Z Shouldn't inferFilters if it contains SubqueryExpression commit 75e6787b644e635e67804abb69025c42b91d9337 Author: Yuming Wang <wgyumg@...> Date: 2017-12-20T06:42:33Z Merge remote-tracking branch 'upstream/master' into SPARK-22573 # Conflicts: # sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala # sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala commit edd0434b710a764c7be2ea94242dd7ea5ce6ace7 Author: Yuming Wang <wgyumg@...> Date: 2017-12-20T08:15:16Z RewritePredicateSubquery first --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/20008 retest this, please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20008#discussion_r157920327 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/functionArgumentConversion.sql --- @@ -25,7 +25,7 @@ SELECT array(cast(1 as tinyint), cast(1 as float)) FROM t; SELECT array(cast(1 as tinyint), cast(1 as double)) FROM t; SELECT array(cast(1 as tinyint), cast(1 as decimal(10, 0))) FROM t; SELECT array(cast(1 as tinyint), cast(1 as string)) FROM t; -SELECT array(cast(1 as tinyint), cast('1' as binary)) FROM t; +SELECT size(array(cast(1 as tinyint), cast('1' as binary))) FROM t; --- End diff -- Replace `array(cast(1 as tinyint), cast('1' as binary))` with `size(array(cast(1 as tinyint), cast('1' as binary)))` to avoid binary type with collection. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20006#discussion_r157691771 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/booleanEquality.sql --- @@ -0,0 +1,122 @@ +-- +-- Licensed to the Apache Software Foundation (ASF) under one or more +-- contributor license agreements. See the NOTICE file distributed with +-- this work for additional information regarding copyright ownership. +-- The ASF licenses this file to You under the Apache License, Version 2.0 +-- (the "License"); you may not use this file except in compliance with +-- the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. +-- + +CREATE TEMPORARY VIEW t AS SELECT 1; + +SELECT true = cast(1 as tinyint) FROM t; +SELECT true = cast(1 as smallint) FROM t; +SELECT true = cast(1 as int) FROM t; +SELECT true = cast(1 as bigint) FROM t; +SELECT true = cast(1 as float) FROM t; +SELECT true = cast(1 as double) FROM t; +SELECT true = cast(1 as decimal(10, 0)) FROM t; +SELECT true = cast(1 as string) FROM t; +SELECT true = cast('1' as binary) FROM t; +SELECT true = cast(1 as boolean) FROM t; +SELECT true = cast('2017-12-11 09:30:00.0' as timestamp) FROM t; +SELECT true = cast('2017-12-11 09:30:00' as date) FROM t; --- End diff -- I think we should keep both, we have some below usage: https://github.com/apache/spark/blob/6d7ebf2f9fbd043813738005a23c57a77eba6f47/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L486-L489 https://github.com/apache/spark/blob/6d7ebf2f9fbd043813738005a23c57a77eba6f47/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L134-L135 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20008#discussion_r157651436 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalPrecision.sql --- @@ -0,0 +1,6883 @@ +-- +-- Licensed to the Apache Software Foundation (ASF) under one or more +-- contributor license agreements. See the NOTICE file distributed with +-- this work for additional information regarding copyright ownership. +-- The ASF licenses this file to You under the Apache License, Version 2.0 +-- (the "License"); you may not use this file except in compliance with +-- the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. +-- + +CREATE TEMPORARY VIEW t AS SELECT 1; + +SELECT cast(1 as tinyint) + cast(1 as decimal(1, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(3, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(4, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(5, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(6, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(10, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(11, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(20, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(21, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(38, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(39, 0)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(1, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(2, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(3, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(4, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(5, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(6, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(10, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(11, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(20, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(21, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(38, 1)) FROM t; +SELECT cast(1 as tinyint) + cast(1 as decimal(39, 1)) FROM t; --- End diff -- How about only these 4 decimals: `DECIMAL(3, 0)`, `DECIMAL(5, 0)`, `DECIMAL(10, 0)` and `DECIMAL(20, 0)`. https://github.com/apache/spark/blob/00d176d2fe7bbdf55cb3146a9cb04ca99b1858b7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L54-L57 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20008 [SPARK-22822][TEST] Basic tests for FunctionArgumentConversion and DecimalPrecision ## What changes were proposed in this pull request? Test Coverage for `FunctionArgumentConversion` and `DecimalPrecision`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22822 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20008.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20008 commit 05acae313008220aaccec47c687a764f7e81bd02 Author: Yuming Wang <wgy...@gmail.com> Date: 2017-12-18T11:02:58Z Basic tests for FunctionArgumentConversion and DecimalPrecision --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20006: [SPARK-22821][TEST] Basic tests for WidenSetOpera...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20006 [SPARK-22821][TEST] Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division ## What changes were proposed in this pull request? Test Coverage for `WidenSetOperationTypes`, `BooleanEquality`, `StackCoercion` and `Division`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22821 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20006.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20006 commit 7ee9aeccfc5adc6107c33401ef0c5212a65d9577 Author: Yuming Wang <wgy...@gmail.com> Date: 2017-12-18T04:43:55Z Basic tests for WidenSetOperationTypes, BooleanEquality, StackCoercion and Division --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19714 Can we backport this to branch-2.2? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20001: [SPARK-22762][TEST] Basic tests for PromoteString...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/20001 [SPARK-22762][TEST] Basic tests for PromoteStrings and InConversion ## What changes were proposed in this pull request? Test Coverage for `PromoteStrings` and `InConversion`, this is a Sub-tasks for [SPARK-22722](https://issues.apache.org/jira/browse/SPARK-22722). ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22816 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20001.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20001 commit 604f16f872f1cf3e008435577d4a4768711c63ed Author: Yuming Wang <wgy...@gmail.com> Date: 2017-12-16T12:48:44Z Basic tests for PromoteStrings and InConversion --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19949: [SPARK-22762][TEST] Basic tests for IfCoercion and CaseW...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19949 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19949: [SPARK-22762][TEST] Basic tests for IfCoercion an...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/19949#discussion_r157106530 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/caseWhenCoercion.sql --- @@ -0,0 +1,200 @@ +-- +-- Licensed to the Apache Software Foundation (ASF) under one or more +-- contributor license agreements. See the NOTICE file distributed with +-- this work for additional information regarding copyright ownership. +-- The ASF licenses this file to You under the Apache License, Version 2.0 +-- (the "License"); you may not use this file except in compliance with +-- the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. +-- + +CREATE TEMPORARY VIEW t AS SELECT 1; + +SELECT CASE WHEN true THEN cast(1 as tinyint) ELSE cast(2 as tinyint) END FROM t; --- End diff -- @gatorsmile Two questions: 1. Hive doesn't have the `short` type, so can we remove the `short` type here? 2. Hive can't execute `CREATE TEMPORARY VIEW ...`, but can executor `CREATE TEMPORARY TABLE ...`, Do we add this feature to spark? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19949: [SPARK-22762][TEST] Basic tests for IfCoercion and CaseW...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19949 @HyukjinKwon see: https://issues.apache.org/jira/browse/SPARK-22722 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19949: [SPARK-22762][TEST] Basic tests for IfCoercion an...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/19949 [SPARK-22762][TEST] Basic tests for IfCoercion and CaseWhenCoercion ## What changes were proposed in this pull request? Basic tests for IfCoercion and CaseWhenCoercion ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-22762 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19949.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19949 commit f9da9103eacdad8a7d544e9d17b8a54d6b7e01c5 Author: Yuming Wang <wgy...@gmail.com> Date: 2017-12-12T07:59:11Z Basic tests for IfCoercion and CaseWhenCoercion --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19918: [SPARK-22726] [TEST] Basic tests for Binary Compa...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/19918#discussion_r155939348 --- Diff: sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/binaryComparison.sql --- @@ -0,0 +1,287 @@ +-- +-- Licensed to the Apache Software Foundation (ASF) under one or more +-- contributor license agreements. See the NOTICE file distributed with +-- this work for additional information regarding copyright ownership. +-- The ASF licenses this file to You under the Apache License, Version 2.0 +-- (the "License"); you may not use this file except in compliance with +-- the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, software +-- distributed under the License is distributed on an "AS IS" BASIS, +-- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +-- See the License for the specific language governing permissions and +-- limitations under the License. +-- + +-- Binary Comparison + +CREATE TEMPORARY VIEW t AS SELECT 1; + +SELECT cast(1 as binary) = '1' FROM t; --- End diff -- Seems binary comparison without [<=>](https://github.com/apache/spark/blob/ced6ccf0d6f362e299f270ed2a474f2e14f845da/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala#L594). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19932: [SPARK-22745][SQL] read partition stats from Hive
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/19932#discussion_r155935430 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala --- @@ -213,6 +213,29 @@ class StatisticsSuite extends StatisticsCollectionTestBase with TestHiveSingleto } } + test("SPARK- - read Hive's statistics for partition") { --- End diff -- SPARK- -> SPARK-22745? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19841: [SPARK-22642][SQL] the createdTempDir will not be...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/19841#discussion_r154480032 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -104,147 +105,153 @@ case class InsertIntoHiveTable( val partitionColumns = fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns") val partitionColumnNames = Option(partitionColumns).map(_.split("/")).getOrElse(Array.empty) -// By this time, the partition map must match the table's partition columns -if (partitionColumnNames.toSet != partition.keySet) { - throw new SparkException( -s"""Requested partitioning does not match the ${table.identifier.table} table: - |Requested partitions: ${partition.keys.mkString(",")} - |Table partitions: ${table.partitionColumnNames.mkString(",")}""".stripMargin) -} - -// Validate partition spec if there exist any dynamic partitions -if (numDynamicPartitions > 0) { - // Report error if dynamic partitioning is not enabled - if (!hadoopConf.get("hive.exec.dynamic.partition", "true").toBoolean) { -throw new SparkException(ErrorMsg.DYNAMIC_PARTITION_DISABLED.getMsg) +def processInsert = { + // By this time, the partition map must match the table's partition columns + if (partitionColumnNames.toSet != partition.keySet) { +throw new SparkException( + s"""Requested partitioning does not match the ${table.identifier.table} table: + |Requested partitions: ${partition.keys.mkString(",")} + |Table partitions: ${table.partitionColumnNames.mkString(",")}""".stripMargin) } - // Report error if dynamic partition strict mode is on but no static partition is found - if (numStaticPartitions == 0 && -hadoopConf.get("hive.exec.dynamic.partition.mode", "strict").equalsIgnoreCase("strict")) { -throw new SparkException(ErrorMsg.DYNAMIC_PARTITION_STRICT_MODE.getMsg) - } + // Validate partition spec if there exist any dynamic partitions + if (numDynamicPartitions > 0) { +// Report error if dynamic partitioning is not enabled +if (!hadoopConf.get("hive.exec.dynamic.partition", "true").toBoolean) { + throw new SparkException(ErrorMsg.DYNAMIC_PARTITION_DISABLED.getMsg) +} + +// Report error if dynamic partition strict mode is on but no static partition is found +if (numStaticPartitions == 0 && + hadoopConf.get("hive.exec.dynamic.partition.mode", "strict").equalsIgnoreCase("strict")) { + throw new SparkException(ErrorMsg.DYNAMIC_PARTITION_STRICT_MODE.getMsg) +} - // Report error if any static partition appears after a dynamic partition - val isDynamic = partitionColumnNames.map(partitionSpec(_).isEmpty) - if (isDynamic.init.zip(isDynamic.tail).contains((true, false))) { -throw new AnalysisException(ErrorMsg.PARTITION_DYN_STA_ORDER.getMsg) +// Report error if any static partition appears after a dynamic partition +val isDynamic = partitionColumnNames.map(partitionSpec(_).isEmpty) +if (isDynamic.init.zip(isDynamic.tail).contains((true, false))) { + throw new AnalysisException(ErrorMsg.PARTITION_DYN_STA_ORDER.getMsg) +} } -} -table.bucketSpec match { - case Some(bucketSpec) => -// Writes to bucketed hive tables are allowed only if user does not care about maintaining -// table's bucketing ie. both "hive.enforce.bucketing" and "hive.enforce.sorting" are -// set to false -val enforceBucketingConfig = "hive.enforce.bucketing" -val enforceSortingConfig = "hive.enforce.sorting" + table.bucketSpec match { +case Some(bucketSpec) => + // Writes to bucketed hive tables are allowed only if user does not care about maintaining + // table's bucketing ie. both "hive.enforce.bucketing" and "hive.enforce.sorting" are + // set to false + val enforceBucketingConfig = "hive.enforce.bucketing" + val enforceSortingConfig = "hive.enforce.sorting" -val message = s"Output Hive table ${table.identifier} is bucketed but Spark" + - "cur
[GitHub] spark issue #19858: [SPARK-22489][DOC][FOLLOWUP] Update broadcast behavior c...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19858 cc @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org