from:"lixiao"

[spark] branch master updated (b83304f -> d599807)

2019-09-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b83304f  [SPARK-28796][DOC] Document DROP DATABASE statement in SQL 
Reference
 add d599807  [SPARK-28795][DOC][SQL] Document CREATE VIEW statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-ddl-create-view.md | 62 +-
 1 file changed, 61 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ee63031 -> b83304f)

2019-09-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ee63031  [SPARK-28828][DOC] Document REFRESH TABLE command
 add b83304f  [SPARK-28796][DOC] Document DROP DATABASE statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-ddl-drop-database.md | 60 +++-
 1 file changed, 59 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5631a96 -> ee63031)

2019-09-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5631a96  [SPARK-29048] Improve performance on Column.isInCollection() 
with a large size collection
 add ee63031  [SPARK-28828][DOC] Document REFRESH TABLE command

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml |  2 ++
 docs/sql-ref-syntax-aux-refresh-table.md | 58 
 2 files changed, 60 insertions(+)
 create mode 100644 docs/sql-ref-syntax-aux-refresh-table.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c56a012 -> 5631a96)

2019-09-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c56a012  [SPARK-29060][SQL] Add tree traversal helper for adaptive 
spark plans
 add 5631a96  [SPARK-29048] Improve performance on Column.isInCollection() 
with a large size collection

No new revisions were added by this update.

Summary of changes:
 .../main/scala/org/apache/spark/sql/Column.scala   | 10 -
 .../apache/spark/sql/ColumnExpressionSuite.scala   | 45 ++
 2 files changed, 37 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c839d09 -> 125af78d)

2019-09-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c839d09  [SPARK-28773][DOC][SQL] Handling of NULL data in Spark SQL
 add 125af78d [SPARK-28831][DOC][SQL] Document CLEAR CACHE statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-cache-clear-cache.md   | 18 +-
 docs/sql-ref-syntax-aux-cache-uncache-table.md |  4 ++--
 docs/sql-ref-syntax-aux-cache.md   | 15 +++
 3 files changed, 26 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6378d4b -> c839d09)

2019-09-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6378d4b  [SPARK-28980][CORE][SQL][STREAMING][MLLIB] Remove most items 
deprecated in Spark 2.2.0 or earlier, for Spark 3
 add c839d09  [SPARK-28773][DOC][SQL] Handling of NULL data in Spark SQL

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml   |   2 +
 docs/sql-ref-null-semantics.md | 703 +
 2 files changed, 705 insertions(+)
 create mode 100644 docs/sql-ref-null-semantics.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d4eca7c -> 4a3a6b6)

2019-09-08 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d4eca7c  [SPARK-29000][SQL] Decimal precision overflow when don't 
allow precision loss
 add 4a3a6b6  [SPARK-28637][SQL] Thriftserver support interval type

No new revisions were added by this update.

Summary of changes:
 .../thriftserver/SparkExecuteStatementOperation.scala |  9 -
 .../sql/hive/thriftserver/HiveThriftServer2Suites.scala   | 15 +++
 .../SparkThriftServerProtocolVersionsSuite.scala  |  4 ++--
 3 files changed, 25 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: Revert "[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()"

2019-09-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 0a4b356  Revert "[SPARK-28912][STREAMING] Fixed MatchError in 
getCheckpointFiles()"
0a4b356 is described below

commit 0a4b35642ffa3020ec0fcae2cca59376e2095636
Author: Xiao Li 
AuthorDate: Fri Sep 6 23:37:36 2019 -0700

Revert "[SPARK-28912][STREAMING] Fixed MatchError in getCheckpointFiles()"

This reverts commit 2654c33fd6a7a09e2b2fa9fc1c2ea6224ab292e6.
---
 .../scala/org/apache/spark/streaming/Checkpoint.scala   |  4 ++--
 .../org/apache/spark/streaming/CheckpointSuite.scala| 17 -
 2 files changed, 2 insertions(+), 19 deletions(-)

diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
index b081287..a882558 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala
@@ -128,8 +128,8 @@ object Checkpoint extends Logging {
 try {
   val statuses = fs.listStatus(path)
   if (statuses != null) {
-val paths = statuses.filterNot(_.isDirectory).map(_.getPath)
-val filtered = paths.filter(p => REGEX.findFirstIn(p.getName).nonEmpty)
+val paths = statuses.map(_.getPath)
+val filtered = paths.filter(p => 
REGEX.findFirstIn(p.toString).nonEmpty)
 filtered.sortWith(sortFunc)
   } else {
 logWarning(s"Listing $path returned null")
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala 
b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
index 43e3cdd..19b621f 100644
--- a/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
+++ b/streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala
@@ -846,23 +846,6 @@ class CheckpointSuite extends TestSuiteBase with 
DStreamCheckpointTester
 checkpointWriter.stop()
   }
 
-  test("SPARK-28912: Fix MatchError in getCheckpointFiles") {
-withTempDir { tempDir =>
-  val fs = FileSystem.get(tempDir.toURI, new Configuration)
-  val checkpointDir = tempDir.getAbsolutePath + "/checkpoint-01"
-
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-
-  // Ignore files whose parent path match.
-  fs.create(new Path(checkpointDir, 
"this-is-matched-before-due-to-parent-path")).close()
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-
-  // Ignore directories whose names match.
-  fs.mkdirs(new Path(checkpointDir, "checkpoint-10"))
-  assert(Checkpoint.getCheckpointFiles(checkpointDir, Some(fs)).length === 
0)
-}
-  }
-
   test("SPARK-6847: stack overflow when updateStateByKey is followed by a 
checkpointed dstream") {
 // In this test, there are two updateStateByKey operators. The RDD DAG is 
as follows:
 //


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ff5fa58 -> 89aba69)

2019-09-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ff5fa58  [SPARK-21870][SQL][FOLLOW-UP] Clean up string template 
formats for generated code in HashAggregateExec
 add 89aba69  [SPARK-28935][SQL][DOCS] Document SQL metrics for Details for 
Query Plan

No new revisions were added by this update.

Summary of changes:
 docs/web-ui.md | 35 +++
 1 file changed, 35 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (67b4329 -> b2f0660)

2019-09-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 67b4329  [SPARK-28690][SQL] Add `date_part` function for 
timestamps/dates
 add b2f0660  [SPARK-29002][SQL] Avoid changing SMJ to BHJ if the build 
side has a high ratio of empty partitions

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/plans/logical/hints.scala   |  8 +++
 .../org/apache/spark/sql/internal/SQLConf.scala| 12 +
 .../spark/sql/execution/SparkStrategies.scala  |  4 +-
 .../execution/adaptive/AdaptiveSparkPlanExec.scala |  4 +-
 .../adaptive/DemoteBroadcastHashJoin.scala | 60 ++
 .../adaptive/AdaptiveQueryExecSuite.scala  | 29 ++-
 6 files changed, 114 insertions(+), 3 deletions(-)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/DemoteBroadcastHashJoin.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5adaa2e -> e4f7002)

2019-09-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5adaa2e  [SPARK-28979][SQL] Rename UnresovledTable to V1Table
 add e4f7002  [SPARK-28830][DOC][SQL] Document UNCACHE TABLE statement in 
SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-cache-uncache-table.md | 24 +---
 1 file changed, 21 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f96486b -> a7a3935)

2019-09-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f96486b  [SPARK-28808][DOCS][SQL] Document SHOW FUNCTIONS in SQL 
Reference
 add a7a3935  [SPARK-11150][SQL] Dynamic Partition Pruning

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/unsafe/map/BytesToBytesMap.java   |7 +
 .../sql/catalyst/expressions/DynamicPruning.scala  |   95 ++
 .../sql/catalyst/expressions/predicates.scala  |   39 +-
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |4 +-
 .../org/apache/spark/sql/internal/SQLConf.scala|   43 +
 .../CleanupDynamicPruningFilters.scala |   51 +
 .../sql/dynamicpruning/PartitionPruning.scala  |  259 
 .../dynamicpruning/PlanDynamicPruningFilters.scala |   91 ++
 .../spark/sql/execution/DataSourceScanExec.scala   |   69 +-
 .../spark/sql/execution/QueryExecution.scala   |3 +-
 .../spark/sql/execution/SparkOptimizer.scala   |   11 +-
 .../sql/execution/SubqueryBroadcastExec.scala  |  108 ++
 .../adaptive/InsertAdaptiveSparkPlan.scala |6 +-
 .../execution/datasources/DataSourceStrategy.scala |2 +-
 .../datasources/PruneFileSourcePartitions.scala|6 +-
 .../spark/sql/execution/joins/HashJoin.scala   |   22 +-
 .../spark/sql/execution/joins/HashedRelation.scala |   84 +-
 .../org/apache/spark/sql/execution/subquery.scala  |   81 +-
 .../spark/sql/DynamicPartitionPruningSuite.scala   | 1293 
 .../sql/execution/joins/HashedRelationSuite.scala  |  199 ++-
 20 files changed, 2447 insertions(+), 26 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/DynamicPruning.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/dynamicpruning/CleanupDynamicPruningFilters.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/dynamicpruning/PartitionPruning.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/dynamicpruning/PlanDynamicPruningFilters.scala
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala
 create mode 100644 
sql/core/src/test/scala/org/apache/spark/sql/DynamicPartitionPruningSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28808][DOCS][SQL] Document SHOW FUNCTIONS in SQL Reference

2019-09-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f96486b  [SPARK-28808][DOCS][SQL] Document SHOW FUNCTIONS in SQL 
Reference
f96486b is described below

commit f96486b4aaba36a0f843c8a52801c305b0fa2b16
Author: Dilip Biswal 
AuthorDate: Wed Sep 4 11:47:10 2019 -0700

[SPARK-28808][DOCS][SQL] Document SHOW FUNCTIONS in SQL Reference

### What changes were proposed in this pull request?
Document SHOW FUNCTIONS statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand 
the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**


![image](https://user-images.githubusercontent.com/11567269/64281840-e3cc0f00-cf08-11e9-9784-f01392276130.png)

https://user-images.githubusercontent.com/11567269/64281911-0fe79000-cf09-11e9-955f-21b44590707c.png";>

https://user-images.githubusercontent.com/11567269/64281916-12e28080-cf09-11e9-9187-688c2c751559.png";>

### How was this patch tested?
Tested using jykyll build --serve

Closes #25539 from dilipbiswal/ref-doc-show-functions.

Lead-authored-by: Dilip Biswal 
Co-authored-by: Xiao Li 
Signed-off-by: Xiao Li 
---
 docs/sql-ref-syntax-aux-describe-function.md |   2 +-
 docs/sql-ref-syntax-aux-show-functions.md| 119 ++-
 2 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-describe-function.md 
b/docs/sql-ref-syntax-aux-describe-function.md
index 1c50708..d3dc192 100644
--- a/docs/sql-ref-syntax-aux-describe-function.md
+++ b/docs/sql-ref-syntax-aux-describe-function.md
@@ -34,7 +34,7 @@ metadata information is returned along with the extended 
usage information.
 
   function_name
   
-Specifies a name of an existing function in the syetem. The function name 
may be
+Specifies a name of an existing function in the system. The function name 
may be
 optionally qualified with a database name. If `function_name` is qualified 
with
 a database then the function is resolved from the user specified database, 
otherwise
 it is resolved from the current database.
diff --git a/docs/sql-ref-syntax-aux-show-functions.md 
b/docs/sql-ref-syntax-aux-show-functions.md
index ae689fd..db02607 100644
--- a/docs/sql-ref-syntax-aux-show-functions.md
+++ b/docs/sql-ref-syntax-aux-show-functions.md
@@ -19,4 +19,121 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+Returns the list of functions after applying an optional regex pattern.
+Given number of functions supported by Spark is quite large, this statement
+in conjuction with [describe 
function](sql-ref-syntax-aux-describe-function.html)
+may be used to quickly find the function and understand its usage. The `LIKE` 
+clause is optional and supported only for compatibility with other systems.
+
+### Syntax
+{% highlight sql %}
+SHOW [ function_kind ] FUNCTIONS ([LIKE] function_name | regex_pattern)
+{% endhighlight %}
+
+### Parameters
+
+  function_kind
+  
+Specifies the name space of the function to be searched upon. The valid 
name spaces are :
+
+  USER - Looks up the function(s) among the user defined 
functions.
+  SYSTEM - Looks up the function(s) among the system defined 
functions.
+  ALL -  Looks up the function(s) among both user and system 
defined functions.
+
+  
+  function_name
+  
+Specifies a name of an existing function in the system. The function name 
may be
+optionally qualified with a database name. If `function_name` is qualified 
with
+a database then the function is resolved from the user specified database, 
otherwise
+it is resolved from the current database.
+Syntax:
+  
+[database_name.]function_name
+  
+  
+  regex_pattern
+  
+Specifies a regular expression pattern that is used to limit the results 
of the
+statement.
+
+  Only `*` and `|` are allowed as wildcard pattern.
+  Excluding `*` and `|` the remaining pattern follows the regex 
semantics.
+  The leading and trailing blanks are trimmed in the input pattern 
before processing. 
+
+  
+
+
+### Examples
+{% highlight sql %}
+-- List a system function `trim` by searching both user defined and system
+-- defined functions.
+SHOW FUNCTIONS trim;
+  ++
+  |function|
+  ++
+  |trim|
+  ++
+
+-- List a system function `concat` by searching system defined functions.
+SHOW SYSTEM

[spark] branch master updated (594c9c5 -> b992160)

2019-09-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 594c9c5  [SPARK-25151][SS] Apply Apache Commons Pool to 
KafkaDataConsumer
 add b992160  [SPARK-28811][DOCS][SQL] Document SHOW TBLPROPERTIES in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-tblproperties.md | 94 ++-
 1 file changed, 93 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a07f795 -> 9f478a6)

2019-09-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a07f795  [SPARK-28577][YARN] Resource capability requested for each 
executor add offHeapMemorySize
 add 9f478a6  [SPARK-28901][SQL] SparkThriftServer's Cancel SQL Operation 
show it in JDBC Tab UI

No new revisions were added by this update.

Summary of changes:
 .../sql/hive/thriftserver/HiveThriftServer2.scala  | 48 +++-
 .../SparkExecuteStatementOperation.scala   | 89 ++
 2 files changed, 87 insertions(+), 50 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8980093 -> 56f2887)

2019-09-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8980093  [SPARK-3137][CORE] Replace the global TorrentBroadcast lock 
with fine grained KeyLock
 add 56f2887  [SPARK-28788][DOC][SQL] Document ANALYZE TABLE statement in 
SQL Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-analyze-table.md | 76 +++-
 docs/sql-ref-syntax-aux-analyze.md   | 13 +++---
 2 files changed, 80 insertions(+), 9 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2

2019-09-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2856398  [SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 
3.6.2
2856398 is described below

commit 2856398de9d35d136758bbc11afa4d1dc0c98830
Author: Xiao Li 
AuthorDate: Tue Sep 3 11:06:57 2019 -0700

[SPARK-28961][HOT-FIX][BUILD] Upgrade Maven from 3.6.1 to 3.6.2

### What changes were proposed in this pull request?
This PR is to upgrade the maven dependence from 3.6.1 to 3.6.2.

### Why are the changes needed?
All the builds are broken because 3.6.1 is not available.  
http://ftp.wayne.edu/apache//maven/maven-3/

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-3.2/485/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-compile-maven-hadoop-2.7/10536/


![image](https://user-images.githubusercontent.com/11567269/64196667-36d69100-ce39-11e9-8f93-40eb333d595d.png)

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25665 from gatorsmile/upgradeMVN.

Authored-by: Xiao Li 
Signed-off-by: Xiao Li 
---
 dev/appveyor-install-dependencies.ps1 | 2 +-
 docs/building-spark.md| 2 +-
 pom.xml   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/dev/appveyor-install-dependencies.ps1 
b/dev/appveyor-install-dependencies.ps1
index 85e0df6..d33a107 100644
--- a/dev/appveyor-install-dependencies.ps1
+++ b/dev/appveyor-install-dependencies.ps1
@@ -81,7 +81,7 @@ if (!(Test-Path $tools)) {
 # == Maven
 Push-Location $tools
 
-$mavenVer = "3.6.1"
+$mavenVer = "3.6.2"
 Start-FileDownload 
"https://archive.apache.org/dist/maven/maven-3/$mavenVer/binaries/apache-maven-$mavenVer-bin.zip";
 "maven.zip"
 
 # extract
diff --git a/docs/building-spark.md b/docs/building-spark.md
index 1f8e51f..37f8986 100644
--- a/docs/building-spark.md
+++ b/docs/building-spark.md
@@ -27,7 +27,7 @@ license: |
 ## Apache Maven
 
 The Maven-based build is the build of reference for Apache Spark.
-Building Spark using Maven requires Maven 3.6.1 and Java 8.
+Building Spark using Maven requires Maven 3.6.2 and Java 8.
 Spark requires Scala 2.12; support for Scala 2.11 was removed in Spark 3.0.0.
 
 ### Setting up Maven's Memory Usage
diff --git a/pom.xml b/pom.xml
index 6544b50..e2a3559 100644
--- a/pom.xml
+++ b/pom.xml
@@ -118,7 +118,7 @@
 1.8
 ${java.version}
 ${java.version}
-3.6.1
+3.6.2
 spark
 1.7.16
 1.2.17


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (92ae271 -> 94e6674)

2019-09-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 92ae271  [SPARK-28806][DOCS][SQL] Document SHOW COLUMNS in SQL 
Reference
 add 94e6674  [SPARK-28805][DOCS][SQL] Document DESCRIBE FUNCTION in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-describe-function.md | 92 +++-
 1 file changed, 91 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (5cf2602 -> 92ae271)

2019-09-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5cf2602  [SPARK-28946][R][DOCS] Add some more information about 
building SparkR on Windows
 add 92ae271  [SPARK-28806][DOCS][SQL] Document SHOW COLUMNS in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-show-columns.md | 77 -
 1 file changed, 76 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (eb037a8 -> 585954d)

2019-09-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eb037a8  [SPARK-28855][CORE][ML][SQL][STREAMING] Remove outdated 
usages of Experimental, Evolving annotations
 add 585954d  [SPARK-28790][DOC][SQL] Document CACHE TABLE statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-cache-cache-table.md | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28786][DOC][SQL][FOLLOW-UP] Change "Related Statements" to bold

2019-08-31 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b85a554  [SPARK-28786][DOC][SQL][FOLLOW-UP] Change "Related 
Statements" to bold
b85a554 is described below

commit b85a554487540bbbae1c62ee6ddd315d6760ea6a
Author: Huaxin Gao 
AuthorDate: Sat Aug 31 14:58:41 2019 -0700

[SPARK-28786][DOC][SQL][FOLLOW-UP] Change "Related Statements" to bold

### What changes were proposed in this pull request?
Change "Related Statements" to bold

### Why are the changes needed?
To make doc look nice and consistent.

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Tested using jykyll build --serve

Before the change:

![image](https://user-images.githubusercontent.com/13592258/63965303-ae797a00-ca4d-11e9-8a85-71fbfdeaaccb.png)

After the change:

![image](https://user-images.githubusercontent.com/13592258/63965316-b76a4b80-ca4d-11e9-9a85-48d7a909f0ef.png)

Before the change:

![image](https://user-images.githubusercontent.com/13592258/63988989-7c8b0680-ca93-11e9-9352-a9ec5457b279.png)

After the change:

![image](https://user-images.githubusercontent.com/13592258/63988996-87459b80-ca93-11e9-9e51-8cb36a632436.png)

Closes #25623 from huaxingao/spark-28786-n.

Authored-by: Huaxin Gao 
Signed-off-by: Xiao Li 
---
 docs/sql-ref-syntax-dml-insert-into.md  | 11 ---
 ...ql-ref-syntax-dml-insert-overwrite-directory-hive.md |  2 +-
 docs/sql-ref-syntax-dml-insert-overwrite-directory.md   |  2 +-
 docs/sql-ref-syntax-dml-insert-overwrite-table.md   | 17 ++---
 4 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/docs/sql-ref-syntax-dml-insert-into.md 
b/docs/sql-ref-syntax-dml-insert-into.md
index 1bb043d..890e30b 100644
--- a/docs/sql-ref-syntax-dml-insert-into.md
+++ b/docs/sql-ref-syntax-dml-insert-into.md
@@ -95,9 +95,8 @@ INSERT INTO [ TABLE ] table_name
 {% endhighlight %}
 
  Insert Using a SELECT Statement
-Assuming the `persons` table has already been created and populated.
-
 {% highlight sql %}
+ -- Assuming the persons table has already been created and populated.
  SELECT * FROM persons;
 
  + -- + -- + -- +
@@ -127,9 +126,8 @@ Assuming the `persons` table has already been created and 
populated.
 {% endhighlight %}
 
  Insert Using a TABLE Statement
-Assuming the `visiting_students` table has already been created and populated.
-
 {% highlight sql %}
+ -- Assuming the visiting_students table has already been created and 
populated.
  SELECT * FROM visiting_students;
 
  + -- + -- + -- +
@@ -162,9 +160,8 @@ Assuming the `visiting_students` table has already been 
created and populated.
 {% endhighlight %}
 
  Insert Using a FROM Statement
-Assuming the `applicants` table has already been created and populated.
-
 {% highlight sql %}
+ -- Assuming the applicants table has already been created and populated.
  SELECT * FROM applicants;
 
  + -- + -- + -- + 
-- +
@@ -203,7 +200,7 @@ Assuming the `applicants` table has already been created 
and populated.
  + -- + -- + -- +
 {% endhighlight %}
 
-Related Statements:
+### Related Statements
   * [INSERT OVERWRITE 
statement](sql-ref-syntax-dml-insert-overwrite-table.html)
   * [INSERT OVERWRITE DIRECTORY 
statement](sql-ref-syntax-dml-insert-overwrite-directory.html)
   * [INSERT OVERWRITE DIRECTORY with Hive format 
statement](sql-ref-syntax-dml-insert-overwrite-directory-hive.html)
\ No newline at end of file
diff --git a/docs/sql-ref-syntax-dml-insert-overwrite-directory-hive.md 
b/docs/sql-ref-syntax-dml-insert-overwrite-directory-hive.md
index f09da80..784e631 100644
--- a/docs/sql-ref-syntax-dml-insert-overwrite-directory-hive.md
+++ b/docs/sql-ref-syntax-dml-insert-overwrite-directory-hive.md
@@ -81,7 +81,7 @@ INSERT OVERWRITE [ LOCAL ] DIRECTORY directory_path
  SELECT * FROM test_table;
 {% endhighlight %}
 
-Related Statements:
+### Related Statements
   * [INSERT INTO statement](sql-ref-syntax-dml-insert-into.html)
   * [INSERT OVERWRITE 
statement](sql-ref-syntax-dml-insert-overwrite-table.html)
   * [INSERT OVERWRITE DIRECTORY 
statement](sql-ref-syntax-dml-insert-overwrite-directory.html)
\ No newline at end of file
diff --git a/docs/sql-ref-syntax-dml-insert-overwrite-directory.md 
b/docs/sql-ref-syntax-dml-insert-overwrite-directory.md
index 8262d9e..89d58e7 100644
--- a/docs/sql-ref-syntax-dml-insert-overwrite-directory.md
+++ b/docs/sql-ref-syntax-dml-insert-overwrite-directo

[spark] branch master updated: [SPARK-28803][DOCS][SQL] Document DESCRIBE TABLE in SQL Reference

2019-08-31 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new b4d7b30  [SPARK-28803][DOCS][SQL] Document DESCRIBE TABLE in SQL 
Reference
b4d7b30 is described below

commit b4d7b30aa62792e7dfaa1318d0ef608286fde806
Author: Dilip Biswal 
AuthorDate: Sat Aug 31 14:46:55 2019 -0700

[SPARK-28803][DOCS][SQL] Document DESCRIBE TABLE in SQL Reference

### What changes were proposed in this pull request?
Document DESCRIBE TABLE statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand 
the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
https://user-images.githubusercontent.com/14225158/64069071-f556a380-cbf6-11e9-985d-13dd37a32bbb.png";>
https://user-images.githubusercontent.com/14225158/64069073-f982c100-cbf6-11e9-925b-eb2fc85c3341.png";>
https://user-images.githubusercontent.com/14225158/64069076-0ef7eb00-cbf7-11e9-8062-9a9fb8700bb3.png";>
https://user-images.githubusercontent.com/14225158/64069077-0f908180-cbf7-11e9-9a31-9b7f122db2d3.png";>
https://user-images.githubusercontent.com/14225158/64069078-0f908180-cbf7-11e9-96ee-438a7b64c961.png";>
https://user-images.githubusercontent.com/14225158/64069079-0f908180-cbf7-11e9-9bae-734a1994f936.png";>

### How was this patch tested?
Tested using jykyll build --serve

Closes #25527 from dilipbiswal/ref-doc-desc-table.

Lead-authored-by: Dilip Biswal 
Co-authored-by: Xiao Li 
Signed-off-by: Xiao Li 
---
 docs/sql-ref-syntax-aux-describe-table.md | 163 +-
 1 file changed, 162 insertions(+), 1 deletion(-)

diff --git a/docs/sql-ref-syntax-aux-describe-table.md 
b/docs/sql-ref-syntax-aux-describe-table.md
index 110a5e4..e2cb0e4 100644
--- a/docs/sql-ref-syntax-aux-describe-table.md
+++ b/docs/sql-ref-syntax-aux-describe-table.md
@@ -18,5 +18,166 @@ license: |
   See the License for the specific language governing permissions and
   limitations under the License.
 ---
+### Description
+`DESCRIBE TABLE` statement returns the basic metadata information of a
+table. The metadata information includes column name, column type
+and column comment. Optionally a partition spec or column name may be specified
+to return the metadata pertaining to a partition or column respectively.
 
-**This page is under construction**
+### Syntax
+{% highlight sql %}
+{DESC | DESCRIBE} [TABLE] [format] table_identifier [partition_spec] [col_name]
+{% endhighlight %}
+
+### Parameters
+
+  format
+  
+Specifies the optional format of describe output. If `EXTENDED` is 
specified
+then additional metadata information (such as parent database, owner, and 
access time)
+is returned. 
+  
+  table_identifier
+  
+Specifies a table name, which may be optionally qualified with a database 
name.
+Syntax:
+  
+[database_name.]table_name
+  
+  
+  partition_spec
+  
+An optional parameter that specifies a comma separated list of key and 
value pairs
+for paritions. When specified, additional partition metadata is 
returned.
+Syntax:
+  
+PARTITION (partition_col_name  = partition_col_val [ , ... ])
+  
+
+  col_name
+  
+An optional paramter that specifies the column name that needs to be 
described.
+The supplied column name may be optionally qualified. Parameters 
`partition_spec`
+and `col_name` are  mutually exclusive and can not be specified together. 
Currently
+nested columns are not allowed to be specified.
+
+Syntax:
+  
+[database_name.][table_name.]column_name
+  
+   
+
+
+### Examples
+{% highlight sql %}
+-- Creates a table `customer`. Assumes current database is `salesdb`.
+CREATE TABLE customer(
+cust_id INT,
+state VARCHAR(20),
+name STRING COMMENT 'Short name'
+  )
+  USING parquet
+  PARTITION BY state;
+  ;
+
+-- Returns basic metadata information for unqualified table `customer`
+DESCRIBE TABLE customer;
+  +---+-+--+
+  |col_name   |data_type|comment   |
+  +---+-+--+
+  |cust_id|int  |null  |
+  |name   |string   |Short name|
+  |state  |string   |null  |
+  |# Partition Information| |  |
+  |# col_name |data_type|comment   |
+  |state  |string   |null  |
+  +---+-+--+
+

[spark] branch master updated (7cc0f0e -> a08f33b)

2019-08-30 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7cc0f0e  [SPARK-28894][SQL][TESTS] Add a clue to make it easier to 
debug via Jenkins's test results
 add a08f33b  [SPARK-28804][DOCS][SQL] Document DESCRIBE QUERY in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-aux-describe-query.md | 87 ++-
 1 file changed, 86 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28807][DOCS][SQL] Document SHOW DATABASES in SQL Reference

2019-08-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new fb1053d  [SPARK-28807][DOCS][SQL] Document SHOW DATABASES in SQL 
Reference
fb1053d is described below

commit fb1053d14a3691695539304457b50e7342020c1f
Author: Dilip Biswal 
AuthorDate: Thu Aug 29 09:04:27 2019 -0700

[SPARK-28807][DOCS][SQL] Document SHOW DATABASES in SQL Reference

### What changes were proposed in this pull request?
Document SHOW DATABASES statement in SQL Reference Guide.

### Why are the changes needed?
Currently Spark lacks documentation on the supported SQL constructs causing
confusion among users who sometimes have to look at the code to understand 
the
usage. This is aimed at addressing this issue.

### Does this PR introduce any user-facing change?
Yes.

**Before:**
There was no documentation for this.

**After.**
https://user-images.githubusercontent.com/14225158/63916727-dd600380-c9ed-11e9-8372-789110c9d2dc.png";>
https://user-images.githubusercontent.com/14225158/63916734-e0f38a80-c9ed-11e9-8ad4-d854febeaab8.png";>
https://user-images.githubusercontent.com/14225158/63916740-e4871180-c9ed-11e9-9cfc-199cd8a64852.png";>

### How was this patch tested?
Tested using jykyll build --serve

Closes #25526 from dilipbiswal/ref-doc-show-db.

Authored-by: Dilip Biswal 
Signed-off-by: Xiao Li 
---
 docs/sql-ref-syntax-aux-show-databases.md | 63 +--
 1 file changed, 60 insertions(+), 3 deletions(-)

diff --git a/docs/sql-ref-syntax-aux-show-databases.md 
b/docs/sql-ref-syntax-aux-show-databases.md
index e7aedf8..b1c48da 100644
--- a/docs/sql-ref-syntax-aux-show-databases.md
+++ b/docs/sql-ref-syntax-aux-show-databases.md
@@ -1,7 +1,7 @@
 ---
 layout: global
-title: SHOW DATABASE
-displayTitle: SHOW DATABASE
+title: SHOW DATABASES
+displayTitle: SHOW DATABASES
 license: |
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
@@ -19,4 +19,61 @@ license: |
   limitations under the License.
 ---
 
-**This page is under construction**
+### Description
+Lists the databases that match an optionally supplied string pattern. If no
+pattern is supplied then the command lists all the databases in the system.
+Please note that the usage of `SCHEMAS` and `DATABASES` are interchangable
+and mean the same thing.
+
+### Syntax
+{% highlight sql %}
+SHOW {DATABASES|SCHEMAS} [LIKE string_pattern]
+{% endhighlight %}
+
+### Parameters
+
+  LIKE string_pattern
+  
+Specifies a string pattern that is used to match the databases in the 
system. In 
+the specified string pattern '*' matches any number of 
characters.
+  
+
+
+### Examples
+{% highlight sql %}
+-- Create database. Assumes a database named `default` already exists in
+-- the system. 
+CREATE DATABASE payroll_db;
+CREATE DATABASE payments_db;
+
+-- Lists all the databases. 
+SHOW DATABASES;
+  ++
+  |databaseName|
+  ++
+  | default|
+  | payments_db|
+  |  payroll_db|
+  ++
+-- Lists databases with name starting with string pattern `pay`
+SHOW DATABASES LIKE 'pay*';
+  ++
+  |databaseName|
+  ++
+  | payments_db|
+  |  payroll_db|
+  ++
+-- Lists all databases. Keywords SCHEMAS and DATABASES are interchangeable. 
+SHOW SCHEMAS;
+  ++
+  |databaseName|
+  ++
+  | default|
+  | payments_db|
+  |  payroll_db|
+  ++
+{% endhighlight %}
+### Related Statements
+- [DESCRIBE DATABASE](sql-ref-syntax-aux-describe-databases.html)
+- [CREATE DATABASE](sql-ref-syntax-ddl-create-database.html)
+- [ALTER DATABASE](sql-ref-syntax-ddl-alter-database.html)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (2465558 -> 3e09a0f)

2019-08-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 2465558  [SPARK-28495][SQL][FOLLOW-UP] Disallow conversions between 
timestamp and long in ASNI mode
 add 3e09a0f  [SPARK-28786][DOC][SQL] Document INSERT statement in SQL 
Reference

No new revisions were added by this update.

Summary of changes:
 docs/sql-ref-syntax-dml-insert-into.md | 209 +
 ...f-syntax-dml-insert-overwrite-directory-hive.md |  87 +
 ...ql-ref-syntax-dml-insert-overwrite-directory.md |  85 +
 docs/sql-ref-syntax-dml-insert-overwrite-table.md  | 191 +++
 docs/sql-ref-syntax-dml-insert.md  |  12 +-
 5 files changed, 580 insertions(+), 4 deletions(-)
 create mode 100644 docs/sql-ref-syntax-dml-insert-into.md
 create mode 100644 docs/sql-ref-syntax-dml-insert-overwrite-directory-hive.md
 create mode 100644 docs/sql-ref-syntax-dml-insert-overwrite-directory.md
 create mode 100644 docs/sql-ref-syntax-dml-insert-overwrite-table.md


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (1b404b9 -> 7452786)

2019-08-28 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 1b404b9  [SPARK-28890][SQL] Upgrade Hive Metastore Client to the 3.1.2 
for Hive 3.1
 add 7452786  [SPARK-28789][DOCS][SQL] Document ALTER DATABASE command

No new revisions were added by this update.

Summary of changes:
 docs/css/main.css | 14 +++
 docs/sql-ref-syntax-ddl-alter-database.md | 41 ++-
 2 files changed, 54 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28642][SQL][TEST][FOLLOW-UP] Test spark.sql.redaction.options.regex with and without default values

2019-08-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c353a84  [SPARK-28642][SQL][TEST][FOLLOW-UP] Test 
spark.sql.redaction.options.regex with and without default values
c353a84 is described below

commit c353a84d1a991797f255ec312e5935438727536c
Author: Yuming Wang 
AuthorDate: Sun Aug 25 23:12:16 2019 -0700

[SPARK-28642][SQL][TEST][FOLLOW-UP] Test spark.sql.redaction.options.regex 
with and without default values

### What changes were proposed in this pull request?

Test `spark.sql.redaction.options.regex` with and without  default values.

### Why are the changes needed?

Normally, we do not rely on the default value of 
`spark.sql.redaction.options.regex`.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
N/A

Closes #25579 from wangyum/SPARK-28642-f1.

Authored-by: Yuming Wang 
Signed-off-by: Xiao Li 
---
 .../scala/org/apache/spark/sql/jdbc/JDBCSuite.scala  | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
index 72a5645..7fe00ae 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala
@@ -1031,8 +1031,10 @@ class JDBCSuite extends QueryTest
   }
 
   test("Hide credentials in show create table") {
+val userName = "testUser"
 val password = "testPass"
 val tableName = "tab1"
+val dbTable = "TEST.PEOPLE"
 withTable(tableName) {
   sql(
 s"""
@@ -1040,18 +1042,30 @@ class JDBCSuite extends QueryTest
|USING org.apache.spark.sql.jdbc
|OPTIONS (
| url '$urlWithUserAndPass',
-   | dbtable 'TEST.PEOPLE',
-   | user 'testUser',
+   | dbtable '$dbTable',
+   | user '$userName',
| password '$password')
  """.stripMargin)
 
   val show = ShowCreateTableCommand(TableIdentifier(tableName))
   
spark.sessionState.executePlan(show).executedPlan.executeCollect().foreach { r 
=>
 assert(!r.toString.contains(password))
+assert(r.toString.contains(dbTable))
+assert(r.toString.contains(userName))
   }
 
   sql(s"SHOW CREATE TABLE $tableName").collect().foreach { r =>
-assert(!r.toString().contains(password))
+assert(!r.toString.contains(password))
+assert(r.toString.contains(dbTable))
+assert(r.toString.contains(userName))
+  }
+
+  withSQLConf(SQLConf.SQL_OPTIONS_REDACTION_PATTERN.key -> 
"(?i)dbtable|user") {
+
spark.sessionState.executePlan(show).executedPlan.executeCollect().foreach { r 
=>
+  assert(!r.toString.contains(password))
+  assert(!r.toString.contains(dbTable))
+  assert(!r.toString.contains(userName))
+}
   }
 }
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28852][SQL] Implement SparkGetCatalogsOperation for Thrift Server

2019-08-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new adb506a  [SPARK-28852][SQL] Implement SparkGetCatalogsOperation for 
Thrift Server
adb506a is described below

commit adb506afd783d24d397c73c34c7fed89563c0a6b
Author: Yuming Wang 
AuthorDate: Sun Aug 25 22:42:50 2019 -0700

[SPARK-28852][SQL] Implement SparkGetCatalogsOperation for Thrift Server

### What changes were proposed in this pull request?
This PR implements `SparkGetCatalogsOperation` for Thrift Server metadata 
completeness.

### Why are the changes needed?
Thrift Server metadata completeness.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Unit test

Closes #2 from wangyum/SPARK-28852.

Authored-by: Yuming Wang 
Signed-off-by: Xiao Li 
---
 .../thriftserver/SparkGetCatalogsOperation.scala   | 79 ++
 .../server/SparkSQLOperationManager.scala  | 11 +++
 .../thriftserver/SparkMetadataOperationSuite.scala |  8 +++
 .../cli/operation/GetCatalogsOperation.java|  2 +-
 .../cli/operation/GetCatalogsOperation.java|  2 +-
 5 files changed, 100 insertions(+), 2 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetCatalogsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetCatalogsOperation.scala
new file mode 100644
index 000..cde99fd
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetCatalogsOperation.scala
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.util.UUID
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType
+import org.apache.hive.service.cli.{HiveSQLException, OperationState}
+import org.apache.hive.service.cli.operation.GetCatalogsOperation
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.util.{Utils => SparkUtils}
+
+/**
+ * Spark's own GetCatalogsOperation
+ *
+ * @param sqlContext SQLContext to use
+ * @param parentSession a HiveSession from SessionManager
+ */
+private[hive] class SparkGetCatalogsOperation(
+sqlContext: SQLContext,
+parentSession: HiveSession)
+  extends GetCatalogsOperation(parentSession) with Logging {
+
+  private var statementId: String = _
+
+  override def close(): Unit = {
+super.close()
+HiveThriftServer2.listener.onOperationClosed(statementId)
+  }
+
+  override def runInternal(): Unit = {
+statementId = UUID.randomUUID().toString
+val logMsg = "Listing catalogs"
+logInfo(s"$logMsg with $statementId")
+setState(OperationState.RUNNING)
+// Always use the latest class loader provided by executionHive's state.
+val executionHiveClassLoader = sqlContext.sharedState.jarClassLoader
+Thread.currentThread().setContextClassLoader(executionHiveClassLoader)
+
+HiveThriftServer2.listener.onStatementStart(
+  statementId,
+  parentSession.getSessionHandle.getSessionId.toString,
+  logMsg,
+  statementId,
+  parentSession.getUsername)
+
+try {
+  if (isAuthV2Enabled) {
+authorizeMetaGets(HiveOperationType.GET_CATALOGS, null)
+  }
+  setState(OperationState.FINISHED)
+} catch {
+  case e: HiveSQLException =>
+setState(OperationState.ERROR)
+HiveThriftServer2.listener.onStatementError(
+  statementId, e.getMessage, SparkUtils.exceptionString(e))
+throw e
+}
+HiveThriftServer2.listener.onStatementFinish(statementId)
+  }
+}
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/Spar

[spark] branch master updated (ed3ea67 -> aefb2e7)

2019-08-21 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed3ea67  [SPARK-28837][SQL] CTAS/RTAS should use nullable schema
 add aefb2e7  [SPARK-28739][SQL] Add a simple cost check for Adaptive Query 
Execution

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/trees/TreeNode.scala |   4 +
 .../spark/sql/execution/SparkStrategies.scala  |   2 -
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 180 +++--
 .../spark/sql/execution/adaptive/costing.scala}|  16 +-
 ...eAdaptiveSubquery.scala => simpleCosting.scala} |  43 ++---
 .../adaptive/AdaptiveQueryExecSuite.scala  |  31 +++-
 6 files changed, 189 insertions(+), 87 deletions(-)
 copy sql/core/src/main/{java/org/apache/spark/sql/api/java/UDF0.java => 
scala/org/apache/spark/sql/execution/adaptive/costing.scala} (74%)
 copy 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/{ReuseAdaptiveSubquery.scala
 => simpleCosting.scala} (50%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (d75a11d -> a5df5ff)

2019-08-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d75a11d  [SPARK-27330][SS] support task abort in foreach writer
 add a5df5ff  [SPARK-28734][DOC] Initial table of content in the left hand 
side bar for SQL doc

No new revisions were added by this update.

Summary of changes:
 docs/_data/menu-sql.yaml  | 164 +-
 docs/css/main.css |   5 +-
 docs/sql-ref-arithmetic-ops.md|  22 +++
 docs/{sql-reference.md => sql-ref-datatypes.md}   |  28 +---
 docs/sql-ref-functions-builtin-aggregate.md   |  22 +++
 docs/sql-ref-functions-builtin-scalar.md  |  22 +++
 docs/sql-ref-functions-builtin.md |  25 
 docs/sql-ref-functions-udf-aggregate.md   |  22 +++
 docs/sql-ref-functions-udf-scalar.md  |  22 +++
 docs/sql-ref-functions-udf.md |  25 
 docs/sql-ref-functions.md |  25 
 docs/sql-ref-nan-semantics.md |  29 
 docs/sql-ref-syntax-aux-analyze-table.md  |  22 +++
 docs/sql-ref-syntax-aux-analyze.md|  25 
 docs/sql-ref-syntax-aux-cache-cache-table.md  |  22 +++
 docs/sql-ref-syntax-aux-cache-clear-cache.md  |  22 +++
 docs/sql-ref-syntax-aux-cache-uncache-table.md|  22 +++
 docs/sql-ref-syntax-aux-cache.md  |  25 
 docs/sql-ref-syntax-aux-conf-mgmt-reset.md|  22 +++
 docs/sql-ref-syntax-aux-conf-mgmt-set.md  |  22 +++
 docs/sql-ref-syntax-aux-conf-mgmt.md  |  25 
 docs/sql-ref-syntax-aux-describe-database.md  |  22 +++
 docs/sql-ref-syntax-aux-describe-function.md  |  22 +++
 docs/sql-ref-syntax-aux-describe-query.md |  22 +++
 docs/sql-ref-syntax-aux-describe-table.md |  22 +++
 docs/sql-ref-syntax-aux-describe.md   |  25 
 docs/sql-ref-syntax-aux-resource-mgmt-add-file.md |  22 +++
 docs/sql-ref-syntax-aux-resource-mgmt-add-jar.md  |  22 +++
 docs/sql-ref-syntax-aux-resource-mgmt.md  |  25 
 docs/sql-ref-syntax-aux-show-columns.md   |  22 +++
 docs/sql-ref-syntax-aux-show-create-table.md  |  22 +++
 docs/sql-ref-syntax-aux-show-databases.md |  22 +++
 docs/sql-ref-syntax-aux-show-functions.md |  22 +++
 docs/sql-ref-syntax-aux-show-partitions.md|  22 +++
 docs/sql-ref-syntax-aux-show-table.md |  22 +++
 docs/sql-ref-syntax-aux-show-tables.md|  22 +++
 docs/sql-ref-syntax-aux-show-tblproperties.md |  22 +++
 docs/sql-ref-syntax-aux-show.md   |  25 
 docs/sql-ref-syntax-aux.md|  25 
 docs/sql-ref-syntax-ddl-alter-database.md |  22 +++
 docs/sql-ref-syntax-ddl-alter-table.md|  22 +++
 docs/sql-ref-syntax-ddl-alter-view.md |  22 +++
 docs/sql-ref-syntax-ddl-create-database.md|  22 +++
 docs/sql-ref-syntax-ddl-create-function.md|  22 +++
 docs/sql-ref-syntax-ddl-create-table.md   |  22 +++
 docs/sql-ref-syntax-ddl-create-view.md|  22 +++
 docs/sql-ref-syntax-ddl-drop-database.md  |  22 +++
 docs/sql-ref-syntax-ddl-drop-function.md  |  22 +++
 docs/sql-ref-syntax-ddl-drop-table.md |  22 +++
 docs/sql-ref-syntax-ddl-drop-view.md  |  22 +++
 docs/sql-ref-syntax-ddl-repair-table.md   |  22 +++
 docs/sql-ref-syntax-ddl-truncate-table.md |  22 +++
 docs/sql-ref-syntax-ddl.md|  25 
 docs/sql-ref-syntax-dml-insert.md |  22 +++
 docs/sql-ref-syntax-dml-load.md   |  22 +++
 docs/sql-ref-syntax-dml.md|  25 
 docs/sql-ref-syntax-qry-aggregation.md|  22 +++
 docs/sql-ref-syntax-qry-explain.md|  22 +++
 docs/sql-ref-syntax-qry-sampling.md   |  22 +++
 docs/sql-ref-syntax-qry-select-cte.md |  22 +++
 docs/sql-ref-syntax-qry-select-distinct.md|  22 +++
 docs/sql-ref-syntax-qry-select-groupby.md |  22 +++
 docs/sql-ref-syntax-qry-select-having.md  |  22 +++
 docs/sql-ref-syntax-qry-select-hints.md   |  22 +++
 docs/sql-ref-syntax-qry-select-join.md|  22 +++
 docs/sql-ref-syntax-qry-select-limit.md   |  22 +++
 docs/sql-ref-syntax-qry-select-orderby.md |  22 +++
 docs/sql-ref-syntax-qry-select-setops.md  |  22 +++
 docs/sql-ref-syntax-qry-select-subqueries.md  |  22 +++
 docs/sql-ref-syntax-qry-select.md |  25 
 docs/sql-ref-syntax-qry-window.md |  22 +++
 docs/sql-ref-syntax-qry.md|  25 
 docs/sql-ref-syntax.md|  25 
 docs/sql-ref.md   |  25 
 74 files changed, 1781 inserti

[spark] branch master updated (5756a47 -> efbb035)

2019-08-17 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 5756a47  [SPARK-28766][R][DOC] Fix CRAN incoming feasibility warning 
on invalid URL
 add efbb035  [SPARK-28527][SQL][TEST] Re-run all the tests in 
SQLQueryTestSuite via Thrift Server

No new revisions were added by this update.

Summary of changes:
 project/SparkBuild.scala   |   3 +-
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  67 ++--
 sql/hive-thriftserver/pom.xml  |   7 +
 .../sql/hive/thriftserver/HiveThriftServer2.scala  |   3 +-
 .../thriftserver/ThriftServerQueryTestSuite.scala  | 362 +
 5 files changed, 409 insertions(+), 33 deletions(-)
 create mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (469423f -> a59fdc4)

2019-08-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 469423f  [SPARK-28595][SQL] explain should not trigger partition 
listing
 add a59fdc4  [SPARK-28472][SQL][TEST] Add test for thriftserver protocol 
versions

No new revisions were added by this update.

Summary of changes:
 .../SparkThriftServerProtocolVersionsSuite.scala   | 316 +
 .../hive/thriftserver/ThriftserverShimUtils.scala  |  13 +
 .../hive/thriftserver/ThriftserverShimUtils.scala  |  15 +
 3 files changed, 344 insertions(+)
 create mode 100644 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SparkThriftServerProtocolVersionsSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (150dbc5 -> bab88c4)

2019-08-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 150dbc5  [SPARK-28391][PYTHON][SQL][TESTS][FOLLOW-UP] Add UDF cases 
into groupby clause in 'pgSQL/select_implicit.sql'
 add bab88c4  [SPARK-28622][SQL][PYTHON] Rename 
PullOutPythonUDFInJoinCondition to ExtractPythonUDFFromJoinCondition and move 
to 'Extract Python UDFs'

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/catalyst/expressions/predicates.scala | 2 +-
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala  | 6 +-
 .../main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala | 2 +-
 ...ionSuite.scala => ExtractPythonUDFFromJoinConditionSuite.scala} | 4 ++--
 .../main/scala/org/apache/spark/sql/execution/SparkOptimizer.scala | 7 ++-
 5 files changed, 11 insertions(+), 10 deletions(-)
 rename 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/{PullOutPythonUDFInJoinConditionSuite.scala
 => ExtractPythonUDFFromJoinConditionSuite.scala} (98%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (8ae032d -> 10d4ffd)

2019-08-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8ae032d  Revert "[SPARK-28582][PYSPARK] Fix flaky test 
DaemonTests.do_termination_test which fail on Python 3.7"
 add 10d4ffd  [SPARK-28532][SPARK-28530][SQL][FOLLOWUP] Inline doc for 
FixedPoint(1) batches "Subquery" and "Join Reorder"

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 4 
 1 file changed, 4 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation

2019-08-02 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new efd9299  [SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation
efd9299 is described below

commit efd92993f403fe40b3abd1dac21f8d7c875f407d
Author: Yuming Wang 
AuthorDate: Fri Aug 2 08:50:42 2019 -0700

[SPARK-28510][SQL] Implement Spark's own GetFunctionsOperation

## What changes were proposed in this pull request?

This PR implements Spark's own GetFunctionsOperation which mitigates the 
differences between Spark SQL and Hive UDFs. But our implementation is 
different from Hive's implementation:
- Our implementation always returns results. Hive only returns results when 
[(null == catalogName || "".equals(catalogName)) && (null == schemaName || 
"".equals(schemaName))](https://github.com/apache/hive/blob/rel/release-3.1.1/service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java#L101-L119).
- Our implementation pads the `REMARKS` field with the function usage - 
Hive returns an empty string.
- Our implementation does not support `FUNCTION_TYPE`, but Hive does.

## How was this patch tested?

unit tests

Closes #25252 from wangyum/SPARK-28510.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../thriftserver/SparkGetFunctionsOperation.scala  | 115 +
 .../server/SparkSQLOperationManager.scala  |  15 +++
 .../thriftserver/SparkMetadataOperationSuite.scala |  43 +++-
 .../cli/operation/GetFunctionsOperation.java   |   2 +-
 .../cli/operation/GetFunctionsOperation.java   |   2 +-
 5 files changed, 174 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
new file mode 100644
index 000..462e5730
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetFunctionsOperation.scala
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.sql.DatabaseMetaData
+import java.util.UUID
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.{HiveOperationType, 
HivePrivilegeObjectUtils}
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetFunctionsOperation
+import 
org.apache.hive.service.cli.operation.MetadataOperation.DEFAULT_HIVE_CATALOG
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.util.{Utils => SparkUtils}
+
+/**
+ * Spark's own GetFunctionsOperation
+ *
+ * @param sqlContext SQLContext to use
+ * @param parentSession a HiveSession from SessionManager
+ * @param catalogName catalog name. null if not applicable
+ * @param schemaName database name, null or a concrete database name
+ * @param functionName function name pattern
+ */
+private[hive] class SparkGetFunctionsOperation(
+sqlContext: SQLContext,
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String,
+functionName: String)
+  extends GetFunctionsOperation(parentSession, catalogName, schemaName, 
functionName) with Logging {
+
+  private var statementId: String = _
+
+  override def close(): Unit = {
+super.close()
+HiveThriftServer2.listener.onOperationClosed(statementId)
+  }
+
+  override def runInternal(): Unit = {
+statementId = UUID.randomUUID().toString
+// Do not change cmdStr. It's used for Hive auditing and authorization.
+val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName"
+val logMsg = s"Listing functions '$cmdStr, functionName : $functionName'"
+logInfo

[spark] branch master updated: [SPARK-28375][SQL] Make pullupCorrelatedPredicate idempotent

2019-07-30 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ee3c1c7  [SPARK-28375][SQL] Make pullupCorrelatedPredicate idempotent
ee3c1c7 is described below

commit ee3c1c777ddb8034d62213a5d8e064b97cc067e5
Author: Dilip Biswal 
AuthorDate: Tue Jul 30 16:29:24 2019 -0700

[SPARK-28375][SQL] Make pullupCorrelatedPredicate idempotent

## What changes were proposed in this pull request?

This PR makes the optimizer rule PullupCorrelatedPredicates idempotent.
## How was this patch tested?

A new test PullupCorrelatedPredicatesSuite

Closes #25268 from dilipbiswal/pr-25164.

Authored-by: Dilip Biswal 
Signed-off-by: gatorsmile 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  4 +-
 .../spark/sql/catalyst/optimizer/subquery.scala| 20 +--
 .../PullupCorrelatedPredicatesSuite.scala  | 62 +++---
 3 files changed, 72 insertions(+), 14 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 670fc92..346b2e6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -48,9 +48,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   }
 
   override protected val blacklistedOnceBatches: Set[String] =
-Set("Pullup Correlated Expressions",
-  "Extract Python UDFs"
-)
+Set("Extract Python UDFs")
 
   protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations)
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
index 4f7333c..32dbd389 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
@@ -273,16 +273,28 @@ object PullupCorrelatedPredicates extends 
Rule[LogicalPlan] with PredicateHelper
   }
 
   private def rewriteSubQueries(plan: LogicalPlan, outerPlans: 
Seq[LogicalPlan]): LogicalPlan = {
+/**
+ * This function is used as a aid to enforce idempotency of 
pullUpCorrelatedPredicate rule.
+ * In the first call to rewriteSubqueries, all the outer references from 
the subplan are
+ * pulled up and join predicates are recorded as children of the enclosing 
subquery expression.
+ * The subsequent call to rewriteSubqueries would simply re-records the 
`children` which would
+ * contains the pulled up correlated predicates (from the previous call) 
in the enclosing
+ * subquery expression.
+ */
+def getJoinCondition(newCond: Seq[Expression], oldCond: Seq[Expression]): 
Seq[Expression] = {
+  if (newCond.isEmpty) oldCond else newCond
+}
+
 plan transformExpressions {
   case ScalarSubquery(sub, children, exprId) if children.nonEmpty =>
 val (newPlan, newCond) = pullOutCorrelatedPredicates(sub, outerPlans)
-ScalarSubquery(newPlan, newCond, exprId)
+ScalarSubquery(newPlan, getJoinCondition(newCond, children), exprId)
   case Exists(sub, children, exprId) if children.nonEmpty =>
 val (newPlan, newCond) = pullOutCorrelatedPredicates(sub, outerPlans)
-Exists(newPlan, newCond, exprId)
-  case ListQuery(sub, _, exprId, childOutputs) =>
+Exists(newPlan, getJoinCondition(newCond, children), exprId)
+  case ListQuery(sub, children, exprId, childOutputs) if children.nonEmpty 
=>
 val (newPlan, newCond) = pullOutCorrelatedPredicates(sub, outerPlans)
-ListQuery(newPlan, newCond, exprId, childOutputs)
+ListQuery(newPlan, getJoinCondition(newCond, children), exprId, 
childOutputs)
 }
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala
index 960162a..2d86d5a 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PullupCorrelatedPredicatesSuite.scala
@@ -19,9 +19,9 @@ package org.apache.spark.sql.catalyst.optimizer
 
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
-import org.apache.spark.sql.catalyst.expressions.{InSubquery, ListQuery}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache

[spark] branch master updated: [SPARK-28530][SQL] Cost-based join reorder optimizer batch should be FixedPoint(1)

2019-07-26 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d4e2466  [SPARK-28530][SQL] Cost-based join reorder optimizer batch 
should be FixedPoint(1)
d4e2466 is described below

commit d4e246658a308e2546d89e7ec9bbd9b6ca7517f7
Author: Yesheng Ma 
AuthorDate: Fri Jul 26 22:57:39 2019 -0700

[SPARK-28530][SQL] Cost-based join reorder optimizer batch should be 
FixedPoint(1)

## What changes were proposed in this pull request?
Since for AQP the cost for joins can change between multiple runs, there is 
no reason that we have an idempotence enforcement on this optimizer batch. We 
thus make it `FixedPoint(1)` instead of `Once`.

## How was this patch tested?
Existing UTs.

Closes #25266 from yeshengm/SPARK-28530.

Lead-authored-by: Yesheng Ma 
Co-authored-by: Xiao Li 
Signed-off-by: gatorsmile 
---
 .../main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 3 +--
 .../org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 3efc41d..670fc92 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -49,7 +49,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
 
   override protected val blacklistedOnceBatches: Set[String] =
 Set("Pullup Correlated Expressions",
-  "Join Reorder",
   "Extract Python UDFs"
 )
 
@@ -168,7 +167,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   RemoveLiteralFromGroupExpressions,
   RemoveRepetitionFromGroupExpressions) :: Nil ++
 operatorOptimizationBatch) :+
-Batch("Join Reorder", Once,
+Batch("Join Reorder", FixedPoint(1),
   CostBasedJoinReorder) :+
 Batch("Remove Redundant Sorts", Once,
   RemoveRedundantSorts) :+
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
index 43e5bad..f775500 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/JoinReorderSuite.scala
@@ -40,7 +40,7 @@ class JoinReorderSuite extends PlanTest with 
StatsEstimationTestBase {
 PushPredicateThroughJoin,
 ColumnPruning,
 CollapseProject) ::
-  Batch("Join Reorder", Once,
+  Batch("Join Reorder", FixedPoint(1),
 CostBasedJoinReorder) :: Nil
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28532][SQL] Make optimizer batch "subquery" FixedPoint(1)

2019-07-26 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e037a11  [SPARK-28532][SQL] Make optimizer batch "subquery" 
FixedPoint(1)
e037a11 is described below

commit e037a11494b8079d9440899a9e509bb6760bcc43
Author: Yesheng Ma 
AuthorDate: Fri Jul 26 22:48:42 2019 -0700

[SPARK-28532][SQL] Make optimizer batch "subquery" FixedPoint(1)

## What changes were proposed in this pull request?
In the Catalyst optimizer, the batch subquery actually calls the optimizer 
recursively. Therefore it makes no sense to enforce idempotence on it and we 
change this batch to `FixedPoint(1)`.

## How was this patch tested?
Existing UTs.

Closes #25267 from yeshengm/SPARK-28532.

Authored-by: Yesheng Ma 
Signed-off-by: gatorsmile 
---
 .../main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 1c36cdc..3efc41d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -50,7 +50,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   override protected val blacklistedOnceBatches: Set[String] =
 Set("Pullup Correlated Expressions",
   "Join Reorder",
-  "Subquery",
   "Extract Python UDFs"
 )
 
@@ -156,7 +155,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   PropagateEmptyRelation) ::
 Batch("Pullup Correlated Expressions", Once,
   PullupCorrelatedPredicates) ::
-Batch("Subquery", Once,
+Batch("Subquery", FixedPoint(1),
   OptimizeSubqueries) ::
 Batch("Replace Operators", fixedPoint,
   RewriteExceptAll,


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28463][SQL] Thriftserver throws BigDecimal incompatible with HiveDecimal

2019-07-26 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 545c7ee  [SPARK-28463][SQL] Thriftserver throws BigDecimal 
incompatible with HiveDecimal
545c7ee is described below

commit 545c7ee00b5d4c5b848be6b27b3820955a0803d6
Author: Yuming Wang 
AuthorDate: Fri Jul 26 10:30:01 2019 -0700

[SPARK-28463][SQL] Thriftserver throws BigDecimal incompatible with 
HiveDecimal

## What changes were proposed in this pull request?

How to reproduce this issue:
```shell
build/sbt clean package -Phive -Phive-thriftserver -Phadoop-3.2
export SPARK_PREPEND_CLASSES=true
sbin/start-thriftserver.sh

[rootspark-3267648 spark]# bin/beeline -u 
jdbc:hive2://localhost:1/default -e "select cast(1 as decimal(38, 18));"
Connecting to jdbc:hive2://localhost:1/default
Connected to: Spark SQL (version 3.0.0-SNAPSHOT)
Driver: Hive JDBC (version 2.3.5)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: java.lang.ClassCastException: java.math.BigDecimal incompatible with 
org.apache.hadoop.hive.common.type.HiveDecimal (state=,code=0)
Closing: 0: jdbc:hive2://localhost:1/default
```

This pr fix this issue.

## How was this patch tested?

unit tests

Closes #25217 from wangyum/SPARK-28463.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala| 8 
 .../main/java/org/apache/hive/service/cli/ColumnBasedSet.java| 9 +
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
index dd18add..9c53e90 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
@@ -654,6 +654,14 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   assert(resultSet.getString(1) === "4.56")
 }
   }
+
+  test("SPARK-28463: Thriftserver throws BigDecimal incompatible with 
HiveDecimal") {
+withJdbcStatement() { statement =>
+  val rs = statement.executeQuery("SELECT CAST(1 AS decimal(38, 18))")
+  assert(rs.next())
+  assert(rs.getBigDecimal(1) === new 
java.math.BigDecimal("1.00"))
+}
+  }
 }
 
 class SingleSessionSuite extends HiveThriftJdbcTest {
diff --git 
a/sql/hive-thriftserver/v2.3.5/src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java
 
b/sql/hive-thriftserver/v2.3.5/src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java
index 5546060..3ca18f0 100644
--- 
a/sql/hive-thriftserver/v2.3.5/src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java
+++ 
b/sql/hive-thriftserver/v2.3.5/src/main/java/org/apache/hive/service/cli/ColumnBasedSet.java
@@ -23,9 +23,7 @@ import java.util.ArrayList;
 import java.util.Iterator;
 import java.util.List;
 
-import org.apache.hadoop.hive.common.type.HiveDecimal;
 import org.apache.hadoop.hive.serde2.thrift.ColumnBuffer;
-import org.apache.hadoop.hive.serde2.thrift.Type;
 import org.apache.hive.service.rpc.thrift.TColumn;
 import org.apache.hive.service.rpc.thrift.TRow;
 import org.apache.hive.service.rpc.thrift.TRowSet;
@@ -105,12 +103,7 @@ public class ColumnBasedSet implements RowSet {
 } else {
   for (int i = 0; i < fields.length; i++) {
 TypeDescriptor descriptor = descriptors[i];
-Object field = fields[i];
-if (field != null && descriptor.getType() == Type.DECIMAL_TYPE) {
-  int scale = descriptor.getDecimalDigits();
-  field = ((HiveDecimal) field).toFormatString(scale);
-}
-columns.get(i).addValue(descriptor.getType(), field);
+columns.get(i).addValue(descriptor.getType(), fields[i]);
   }
 }
 return this;


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bf41070 -> 6807a82)

2019-07-26 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bf41070  [SPARK-28499][ML] Optimize MinMaxScaler
 add 6807a82  [SPARK-28524][SQL] Fix ThriftServerTab lost error message

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/thriftserver/ui/ThriftServerPage.scala| 2 +-
 .../apache/spark/sql/hive/thriftserver/ui/ThriftServerSessionPage.scala | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (ded1a74 -> c93d2dd)

2019-07-25 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ded1a74  [SPARK-28365][ML] Fallback locale to en_US in 
StopWordsRemover if system default locale isn't in available locales in JVM
 add c93d2dd  [SPARK-28237][SQL] Enforce Idempotence for Once batches in 
RuleExecutor

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  7 +
 .../spark/sql/catalyst/rules/RuleExecutor.scala| 30 --
 .../sql/catalyst/optimizer/PruneFiltersSuite.scala |  2 +-
 .../PullupCorrelatedPredicatesSuite.scala  |  2 ++
 .../optimizer/StarJoinCostBasedReorderSuite.scala  |  2 +-
 .../sql/catalyst/trees/RuleExecutorSuite.scala | 22 +---
 6 files changed, 58 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (167fa04 -> 045191e)

2019-07-24 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 167fa04  [SPARK-28390][SQL][PYTHON][TESTS] Convert and port 
'pgSQL/select_having.sql' into UDF test base
 add 045191e  [SPARK-28293][SQL] Implement Spark's own 
GetTableTypesOperation

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/catalog/interface.scala |  2 ++
 ...ion.scala => SparkGetTableTypesOperation.scala} | 35 +++---
 .../thriftserver/SparkGetTablesOperation.scala |  9 +-
 .../thriftserver/SparkMetadataOperationUtils.scala | 18 ++-
 .../server/SparkSQLOperationManager.scala  | 17 +--
 .../thriftserver/SparkMetadataOperationSuite.scala | 16 ++
 .../cli/operation/GetTableTypesOperation.java  |  2 +-
 .../cli/operation/GetTableTypesOperation.java  |  2 +-
 8 files changed, 56 insertions(+), 45 deletions(-)
 copy 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/{SparkGetSchemasOperation.scala
 => SparkGetTableTypesOperation.scala} (64%)
 copy 
common/network-common/src/main/java/org/apache/spark/network/protocol/AbstractResponseMessage.java
 => 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkMetadataOperationUtils.scala
 (61%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (022667c -> e04f696)

2019-07-23 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 022667c  [SPARK-28469][SQL] Change CalendarIntervalType's readable 
string representation from calendarinterval to interval
 add e04f696  [SPARK-28346][SQL] clone the query plan between analyzer, 
optimizer and planner

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/trees/TreeNode.scala | 13 +++--
 .../spark/sql/execution/QueryExecution.scala   | 24 +-
 .../sql/execution/columnar/InMemoryRelation.scala  |  7 +++
 .../spark/sql/execution/command/SetCommand.scala   |  2 +-
 .../datasources/SaveIntoDataSourceCommand.scala|  6 +++
 .../spark/sql/execution/QueryExecutionSuite.scala  | 55 +-
 .../columnar/PartitionBatchPruningSuite.scala  |  4 +-
 7 files changed, 93 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: Revert "[SPARK-27485] EnsureRequirements.reorder should handle duplicate expressions gracefully"

2019-07-16 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 63898cb  Revert "[SPARK-27485] EnsureRequirements.reorder should 
handle duplicate expressions gracefully"
63898cb is described below

commit 63898cbc1db46be8bcbb46d21fe01340bc883520
Author: gatorsmile 
AuthorDate: Tue Jul 16 06:57:14 2019 -0700

Revert "[SPARK-27485] EnsureRequirements.reorder should handle duplicate 
expressions gracefully"

This reverts commit 72f547d4a960ba0ba9cace53a0a5553eca1b4dd6.
---
 .../execution/exchange/EnsureRequirements.scala| 72 ++
 .../scala/org/apache/spark/sql/JoinSuite.scala | 20 --
 .../apache/spark/sql/execution/PlannerSuite.scala  | 26 
 3 files changed, 32 insertions(+), 86 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
index bdb9a31..d2d5011 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala
@@ -24,7 +24,8 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.physical._
 import org.apache.spark.sql.catalyst.rules.Rule
 import org.apache.spark.sql.execution._
-import org.apache.spark.sql.execution.joins.{ShuffledHashJoinExec, 
SortMergeJoinExec}
+import org.apache.spark.sql.execution.joins.{BroadcastHashJoinExec, 
ShuffledHashJoinExec,
+  SortMergeJoinExec}
 import org.apache.spark.sql.internal.SQLConf
 
 /**
@@ -220,41 +221,25 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   }
 
   private def reorder(
-  leftKeys: IndexedSeq[Expression],
-  rightKeys: IndexedSeq[Expression],
+  leftKeys: Seq[Expression],
+  rightKeys: Seq[Expression],
   expectedOrderOfKeys: Seq[Expression],
   currentOrderOfKeys: Seq[Expression]): (Seq[Expression], Seq[Expression]) 
= {
-if (expectedOrderOfKeys.size != currentOrderOfKeys.size) {
-  return (leftKeys, rightKeys)
-}
-
-// Build a lookup between an expression and the positions its holds in the 
current key seq.
-val keyToIndexMap = mutable.Map.empty[Expression, mutable.BitSet]
-currentOrderOfKeys.zipWithIndex.foreach {
-  case (key, index) =>
-keyToIndexMap.getOrElseUpdate(key.canonicalized, 
mutable.BitSet.empty).add(index)
-}
-
-// Reorder the keys.
-val leftKeysBuffer = new ArrayBuffer[Expression](leftKeys.size)
-val rightKeysBuffer = new ArrayBuffer[Expression](rightKeys.size)
-val iterator = expectedOrderOfKeys.iterator
-while (iterator.hasNext) {
-  // Lookup the current index of this key.
-  keyToIndexMap.get(iterator.next().canonicalized) match {
-case Some(indices) if indices.nonEmpty =>
-  // Take the first available index from the map.
-  val index = indices.firstKey
-  indices.remove(index)
+val leftKeysBuffer = ArrayBuffer[Expression]()
+val rightKeysBuffer = ArrayBuffer[Expression]()
+val pickedIndexes = mutable.Set[Int]()
+val keysAndIndexes = currentOrderOfKeys.zipWithIndex
 
-  // Add the keys for that index to the reordered keys.
-  leftKeysBuffer += leftKeys(index)
-  rightKeysBuffer += rightKeys(index)
-case _ =>
-  // The expression cannot be found, or we have exhausted all indices 
for that expression.
-  return (leftKeys, rightKeys)
-  }
-}
+expectedOrderOfKeys.foreach(expression => {
+  val index = keysAndIndexes.find { case (e, idx) =>
+// As we may have the same key used many times, we need to filter out 
its occurrence we
+// have already used.
+e.semanticEquals(expression) && !pickedIndexes.contains(idx)
+  }.map(_._2).get
+  pickedIndexes += index
+  leftKeysBuffer.append(leftKeys(index))
+  rightKeysBuffer.append(rightKeys(index))
+})
 (leftKeysBuffer, rightKeysBuffer)
   }
 
@@ -264,13 +249,20 @@ case class EnsureRequirements(conf: SQLConf) extends 
Rule[SparkPlan] {
   leftPartitioning: Partitioning,
   rightPartitioning: Partitioning): (Seq[Expression], Seq[Expression]) = {
 if (leftKeys.forall(_.deterministic) && rightKeys.forall(_.deterministic)) 
{
-  (leftPartitioning, rightPartitioning) match {
-case (HashPartitioning(leftExpressions, _), _) =>
-  reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, 
leftExpressions, leftKeys)
-case (_, HashPartitioning(rightExpressions, _)) =>
-  reorder(leftKeys.toIndexedSeq, rightKeys.toIndexedSeq, 
rightExpressions, righ

[spark] branch master updated (8d1e87a -> 2f3997f)

2019-07-15 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 8d1e87a  [SPARK-28150][CORE][FOLLOW-UP] Don't try to log in when 
impersonating.
 add 2f3997f  [SPARK-28306][SQL][FOLLOWUP] Fix NormalizeFloatingNumbers 
rule idempotence for equi-join with `<=>` predicates

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/planning/patterns.scala| 16 ++--
 .../optimizer/NormalizeFloatingPointNumbersSuite.scala   | 15 ++-
 2 files changed, 24 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which command is compressed by broadcast

2019-07-13 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 79e2047  [SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at 
which command is compressed by broadcast
79e2047 is described below

commit 79e204770300dab4a669b9f8e2421ef905236e7b
Author: Jesse Cai 
AuthorDate: Sat Jul 13 08:44:16 2019 -0700

[SPARK-28355][CORE][PYTHON] Use Spark conf for threshold at which command 
is compressed by broadcast

## What changes were proposed in this pull request?

The `_prepare_for_python_RDD` method currently broadcasts a pickled command 
if its length is greater than the hardcoded value `1 << 20` (1M). This change 
sets this value as a Spark conf instead.

## How was this patch tested?

Unit tests, manual tests.

Closes #25123 from jessecai/SPARK-28355.

Authored-by: Jesse Cai 
Signed-off-by: gatorsmile 
---
 .../scala/org/apache/spark/api/python/PythonUtils.scala |  4 
 .../scala/org/apache/spark/internal/config/package.scala|  8 
 core/src/test/scala/org/apache/spark/SparkConfSuite.scala   | 13 +
 python/pyspark/rdd.py   |  2 +-
 4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
index eee6e4b..62d6047 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonUtils.scala
@@ -81,4 +81,8 @@ private[spark] object PythonUtils {
   def isEncryptionEnabled(sc: JavaSparkContext): Boolean = {
 sc.conf.get(org.apache.spark.internal.config.IO_ENCRYPTION_ENABLED)
   }
+
+  def getBroadcastThreshold(sc: JavaSparkContext): Long = {
+
sc.conf.get(org.apache.spark.internal.config.BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD)
+  }
 }
diff --git a/core/src/main/scala/org/apache/spark/internal/config/package.scala 
b/core/src/main/scala/org/apache/spark/internal/config/package.scala
index 46f..76d3d6e 100644
--- a/core/src/main/scala/org/apache/spark/internal/config/package.scala
+++ b/core/src/main/scala/org/apache/spark/internal/config/package.scala
@@ -1246,6 +1246,14 @@ package object config {
   "mechanisms to guarantee data won't be corrupted during broadcast")
 .booleanConf.createWithDefault(true)
 
+  private[spark] val BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD =
+ConfigBuilder("spark.broadcast.UDFCompressionThreshold")
+  .doc("The threshold at which user-defined functions (UDFs) and Python 
RDD commands " +
+"are compressed by broadcast in bytes unless otherwise specified")
+  .bytesConf(ByteUnit.BYTE)
+  .checkValue(v => v >= 0, "The threshold should be non-negative.")
+  .createWithDefault(1L * 1024 * 1024)
+
   private[spark] val RDD_COMPRESS = ConfigBuilder("spark.rdd.compress")
 .doc("Whether to compress serialized RDD partitions " +
   "(e.g. for StorageLevel.MEMORY_ONLY_SER in Scala " +
diff --git a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala 
b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
index 6be1fed..202b85d 100644
--- a/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkConfSuite.scala
@@ -389,6 +389,19 @@ class SparkConfSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
   """.stripMargin.trim)
   }
 
+  test("SPARK-28355: Use Spark conf for threshold at which UDFs are compressed 
by broadcast") {
+val conf = new SparkConf()
+
+// Check the default value
+assert(conf.get(BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD) === 1L * 1024 * 
1024)
+
+// Set the conf
+conf.set(BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD, 1L * 1024)
+
+// Verify that it has been set properly
+assert(conf.get(BROADCAST_FOR_UDF_COMPRESSION_THRESHOLD) === 1L * 1024)
+  }
+
   val defaultIllegalValue = "SomeIllegalValue"
   val illegalValueTests : Map[String, (SparkConf, String) => Any] = Map(
 "getTimeAsSeconds" -> (_.getTimeAsSeconds(_)),
diff --git a/python/pyspark/rdd.py b/python/pyspark/rdd.py
index 8bcc67a..96fdf5f 100644
--- a/python/pyspark/rdd.py
+++ b/python/pyspark/rdd.py
@@ -2490,7 +2490,7 @@ def _prepare_for_python_RDD(sc, command):
 # the serialized command will be compressed by broadcast
 ser = CloudPickleSerializer()
 pickled_command = ser.dumps(command)
-if len(pickled_command) > (1 << 20):  # 1M
+if len(pickled_command) > 
sc._jvm.PythonUtils.getBroadcastThreshold(sc._jsc):  # Default 1M

[spark] branch master updated: [SPARK-28260][SQL] Add CLOSED state to ExecutionState

2019-07-12 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 687dd4e  [SPARK-28260][SQL] Add CLOSED state to ExecutionState
687dd4e is described below

commit 687dd4eb55739f802692b3c5457618fd6558e538
Author: Yuming Wang 
AuthorDate: Fri Jul 12 10:31:28 2019 -0700

[SPARK-28260][SQL] Add CLOSED state to ExecutionState

## What changes were proposed in this pull request?

The `ThriftServerTab` displays a FINISHED state when the operation finishes 
execution, but quite often it still takes a lot of time to fetch the results. 
OperationState has state CLOSED for when after the iterator is closed. This PR 
add CLOSED state to ExecutionState, and override the `close()` in 
SparkExecuteStatementOperation, SparkGetColumnsOperation, 
SparkGetSchemasOperation and SparkGetTablesOperation.

## How was this patch tested?

manual tests
1. Add `Thread.sleep(1)` before 
[SparkExecuteStatementOperation.scala#L112](https://github.com/apache/spark/blob/b2e7677f4d3d8f47f5f148680af39d38f2b558f0/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala#L112)
2. Switch to `ThriftServerTab`:

![image](https://user-images.githubusercontent.com/5399861/60809590-9dcf2500-a1bd-11e9-826e-33729bb97daf.png)
3. After a while:

![image](https://user-images.githubusercontent.com/5399861/60809719-e850a180-a1bd-11e9-9a6a-546146e626ab.png)

Closes #25062 from wangyum/SPARK-28260.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../spark/sql/hive/thriftserver/HiveThriftServer2.scala| 14 ++
 .../hive/thriftserver/SparkExecuteStatementOperation.scala |  3 ++-
 .../sql/hive/thriftserver/SparkGetColumnsOperation.scala   |  9 -
 .../sql/hive/thriftserver/SparkGetSchemasOperation.scala   |  9 -
 .../sql/hive/thriftserver/SparkGetTablesOperation.scala|  9 -
 .../spark/sql/hive/thriftserver/ui/ThriftServerPage.scala  |  8 +---
 .../sql/hive/thriftserver/ui/ThriftServerSessionPage.scala |  8 +---
 7 files changed, 46 insertions(+), 14 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
index d1de9f0..b4d1d0d 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala
@@ -137,7 +137,7 @@ object HiveThriftServer2 extends Logging {
   }
 
   private[thriftserver] object ExecutionState extends Enumeration {
-val STARTED, COMPILED, FAILED, FINISHED = Value
+val STARTED, COMPILED, FAILED, FINISHED, CLOSED = Value
 type ExecutionState = Value
   }
 
@@ -147,16 +147,17 @@ object HiveThriftServer2 extends Logging {
   val startTimestamp: Long,
   val userName: String) {
 var finishTimestamp: Long = 0L
+var closeTimestamp: Long = 0L
 var executePlan: String = ""
 var detail: String = ""
 var state: ExecutionState.Value = ExecutionState.STARTED
 val jobId: ArrayBuffer[String] = ArrayBuffer[String]()
 var groupId: String = ""
-def totalTime: Long = {
-  if (finishTimestamp == 0L) {
+def totalTime(endTime: Long): Long = {
+  if (endTime == 0L) {
 System.currentTimeMillis - startTimestamp
   } else {
-finishTimestamp - startTimestamp
+endTime - startTimestamp
   }
 }
   }
@@ -254,6 +255,11 @@ object HiveThriftServer2 extends Logging {
   trimExecutionIfNecessary()
 }
 
+def onOperationClosed(id: String): Unit = synchronized {
+  executionList(id).closeTimestamp = System.currentTimeMillis
+  executionList(id).state = ExecutionState.CLOSED
+}
+
 private def trimExecutionIfNecessary() = {
   if (executionList.size > retainedStatements) {
 val toRemove = math.max(retainedStatements / 10, 1)
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
index 820f76d..2f011c2 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
@@ -70,11 +70,12 @@ private[hive] class SparkExecuteStatementOperation(
 }
   }
 
-  def close(): Unit = {
+  override def close(): Unit = {
 // RDDs will be cleaned automatically upon

[spark] branch master updated: [SPARK-27815][SQL] Predicate pushdown in one pass for cascading joins

2019-07-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 74f1176  [SPARK-27815][SQL] Predicate pushdown in one pass for 
cascading joins
74f1176 is described below

commit 74f1176311676145d5d8669daacf67e5308f68b5
Author: Yesheng Ma 
AuthorDate: Wed Jul 3 09:01:16 2019 -0700

[SPARK-27815][SQL] Predicate pushdown in one pass for cascading joins

## What changes were proposed in this pull request?

This PR makes the predicate pushdown logic in catalyst optimizer more 
efficient by unifying two existing rules `PushdownPredicates` and 
`PushPredicateThroughJoin`. Previously pushing down a predicate for queries 
such as `Filter(Join(Join(Join)))` requires n steps. This patch essentially 
reduces this to a single pass.

To make this actually work, we need to unify a few rules such as 
`CombineFilters`, `PushDownPredicate` and `PushDownPrdicateThroughJoin`. 
Otherwise cases such as `Filter(Join(Filter(Join)))` still requires several 
passes to fully push down predicates. This unification is done by composing 
several partial functions, which makes a minimal code change and can reuse 
existing UTs.

Results show that this optimization can improve the catalyst optimization 
time by 16.5%. For queries with more joins, the performance is even better. 
E.g., for TPC-DS q64, the performance boost is 49.2%.

## How was this patch tested?
Existing UTs + new a UT for the new rule.

Closes #24956 from yeshengm/fixed-point-opt.

Authored-by: Yesheng Ma 
Signed-off-by: gatorsmile 
---
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |  30 +++-
 .../optimizer/PushDownLeftSemiAntiJoin.scala   |  10 +-
 .../catalyst/optimizer/ColumnPruningSuite.scala|   2 +-
 .../optimizer/FilterPushdownOnePassSuite.scala | 183 +
 .../catalyst/optimizer/FilterPushdownSuite.scala   |   2 +-
 .../InferFiltersFromConstraintsSuite.scala |   2 +-
 .../catalyst/optimizer/JoinOptimizationSuite.scala |   2 +-
 .../sql/catalyst/optimizer/JoinReorderSuite.scala  |   2 +-
 .../optimizer/LeftSemiAntiJoinPushDownSuite.scala  |   2 +-
 .../catalyst/optimizer/OptimizerLoggingSuite.scala |  12 +-
 .../optimizer/OptimizerRuleExclusionSuite.scala|   6 +-
 .../optimizer/PropagateEmptyRelationSuite.scala|   4 +-
 .../sql/catalyst/optimizer/PruneFiltersSuite.scala |   2 +-
 .../sql/catalyst/optimizer/SetOperationSuite.scala |   2 +-
 .../optimizer/StarJoinCostBasedReorderSuite.scala  |   2 +-
 .../catalyst/optimizer/StarJoinReorderSuite.scala  |   2 +-
 .../spark/sql/execution/SparkOptimizer.scala   |   4 +-
 17 files changed, 235 insertions(+), 34 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 17b4ff7..c99d2c0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -63,8 +63,7 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
 PushProjectionThroughUnion,
 ReorderJoin,
 EliminateOuterJoin,
-PushPredicateThroughJoin,
-PushDownPredicate,
+PushDownPredicates,
 PushDownLeftSemiAntiJoin,
 PushLeftSemiLeftAntiThroughJoin,
 LimitPushDown,
@@ -911,7 +910,9 @@ object CombineUnions extends Rule[LogicalPlan] {
  * one conjunctive predicate.
  */
 object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform applyLocally
+
+  val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {
 // The query execution/optimization does not guarantee the expressions are 
evaluated in order.
 // We only can combine them if and only if both are deterministic.
 case Filter(fc, nf @ Filter(nc, grandChild)) if fc.deterministic && 
nc.deterministic =>
@@ -997,14 +998,29 @@ object PruneFilters extends Rule[LogicalPlan] with 
PredicateHelper {
 }
 
 /**
+ * The unified version for predicate pushdown of normal operators and joins.
+ * This rule improves performance of predicate pushdown for cascading joins 
such as:
+ *  Filter-Join-Join-Join. Most predicates can be pushed down in a single pass.
+ */
+object PushDownPredicates extends Rule[LogicalPlan] with PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+CombineFilters.applyLocally
+  .orElse(PushPredicateThroughNonJoin.applyLocally)
+  .orElse(PushPredicateThroughJoin.applyLocally)
+  }
+}
+
+/**
  * Pushes [[Filter]] operators through many operators iff:

[spark] branch master updated: [SPARK-28167][SQL] Show global temporary view in database tool

2019-07-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ea03030  [SPARK-28167][SQL] Show global temporary view in database tool
ea03030 is described below

commit ea0303063f5d0e25dbc6c126b13bbefe59a8d1f7
Author: Yuming Wang 
AuthorDate: Wed Jul 3 00:01:05 2019 -0700

[SPARK-28167][SQL] Show global temporary view in database tool

## What changes were proposed in this pull request?

This pr add support show global temporary view and local temporary view in 
database tool.

TODO: Database tools should support show temporary views because it's 
schema is null.

## How was this patch tested?

unit tests and manual tests:

![image](https://user-images.githubusercontent.com/5399861/60392266-a5455d00-9b31-11e9-92c8-88a8e6c2aec3.png)

Closes #24972 from wangyum/SPARK-28167.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../thriftserver/SparkGetColumnsOperation.scala| 95 ++
 .../thriftserver/SparkGetSchemasOperation.scala|  8 ++
 .../thriftserver/SparkGetTablesOperation.scala | 55 +
 .../thriftserver/SparkMetadataOperationSuite.scala | 46 +--
 4 files changed, 148 insertions(+), 56 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
index 4b78e2f..6d3c9fc 100644
--- 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
@@ -17,7 +17,6 @@
 
 package org.apache.spark.sql.hive.thriftserver
 
-import java.util.UUID
 import java.util.regex.Pattern
 
 import scala.collection.JavaConverters.seqAsJavaListConverter
@@ -33,6 +32,7 @@ import org.apache.spark.sql.SQLContext
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.catalyst.catalog.SessionCatalog
 import 
org.apache.spark.sql.hive.thriftserver.ThriftserverShimUtils.toJavaSQLType
+import org.apache.spark.sql.types.StructType
 
 /**
  * Spark's own SparkGetColumnsOperation
@@ -56,8 +56,6 @@ private[hive] class SparkGetColumnsOperation(
 
   val catalog: SessionCatalog = sqlContext.sessionState.catalog
 
-  private var statementId: String = _
-
   override def runInternal(): Unit = {
 val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName, 
tablePattern : $tableName" +
   s", columnName : $columnName"
@@ -77,7 +75,7 @@ private[hive] class SparkGetColumnsOperation(
 }
 
 val db2Tabs = catalog.listDatabases(schemaPattern).map { dbName =>
-  (dbName, catalog.listTables(dbName, tablePattern))
+  (dbName, catalog.listTables(dbName, tablePattern, includeLocalTempViews 
= false))
 }.toMap
 
 if (isAuthV2Enabled) {
@@ -88,42 +86,31 @@ private[hive] class SparkGetColumnsOperation(
 }
 
 try {
+  // Tables and views
   db2Tabs.foreach {
 case (dbName, tables) =>
   catalog.getTablesByName(tables).foreach { catalogTable =>
-catalogTable.schema.foreach { column =>
-  if (columnPattern != null && 
!columnPattern.matcher(column.name).matches()) {
-  } else {
-val rowData = Array[AnyRef](
-  null,  // TABLE_CAT
-  dbName, // TABLE_SCHEM
-  catalogTable.identifier.table, // TABLE_NAME
-  column.name, // COLUMN_NAME
-  toJavaSQLType(column.dataType.sql).asInstanceOf[AnyRef], // 
DATA_TYPE
-  column.dataType.sql, // TYPE_NAME
-  null, // COLUMN_SIZE
-  null, // BUFFER_LENGTH, unused
-  null, // DECIMAL_DIGITS
-  null, // NUM_PREC_RADIX
-  (if (column.nullable) 1 else 0).asInstanceOf[AnyRef], // 
NULLABLE
-  column.getComment().getOrElse(""), // REMARKS
-  null, // COLUMN_DEF
-  null, // SQL_DATA_TYPE
-  null, // SQL_DATETIME_SUB
-  null, // CHAR_OCTET_LENGTH
-  null, // ORDINAL_POSITION
-  "YES", // IS_NULLABLE
-  null, // SCOPE_CATALOG
-  null, // SCOPE_SCHEMA
-  null, // SCOPE_TABLE
-  null, // SOURCE_DATA_TYPE
-  "NO" // IS_AUTO_INCREMENT
-)
-rowSet.addRow(rowData)
-  }
-}
+addToRowSet(columnPattern, dbName, catalogTable.identifier.table

[spark] branch master updated: [SPARK-28196][SQL] Add a new `listTables` and `listLocalTempViews` APIs for SessionCatalog

2019-06-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 24e1e41  [SPARK-28196][SQL] Add a new `listTables` and 
`listLocalTempViews` APIs for SessionCatalog
24e1e41 is described below

commit 24e1e41648de58d3437e008b187b84828830e238
Author: Yuming Wang 
AuthorDate: Sat Jun 29 18:36:36 2019 -0700

[SPARK-28196][SQL] Add a new `listTables` and `listLocalTempViews` APIs for 
SessionCatalog

## What changes were proposed in this pull request?

This pr add two API for 
[SessionCatalog](https://github.com/apache/spark/blob/df4cb471c9712a2fe496664028d9303caebd8777/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala):
```scala
def listTables(db: String, pattern: String, includeLocalTempViews: 
Boolean): Seq[TableIdentifier]

def listLocalTempViews(pattern: String): Seq[TableIdentifier]
```
Because in some cases `listTables` does not need local temporary view and 
sometimes only need list local temporary view.

## How was this patch tested?

unit tests

Closes #24995 from wangyum/SPARK-28196.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../sql/catalyst/catalog/SessionCatalog.scala  | 29 +-
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 65 ++
 2 files changed, 91 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index dcc6298..74559f5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
@@ -784,7 +784,19 @@ class SessionCatalog(
* Note that, if the specified database is global temporary view database, 
we will list global
* temporary views.
*/
-  def listTables(db: String, pattern: String): Seq[TableIdentifier] = {
+  def listTables(db: String, pattern: String): Seq[TableIdentifier] = 
listTables(db, pattern, true)
+
+  /**
+   * List all matching tables in the specified database, including local 
temporary views
+   * if includeLocalTempViews is enabled.
+   *
+   * Note that, if the specified database is global temporary view database, 
we will list global
+   * temporary views.
+   */
+  def listTables(
+  db: String,
+  pattern: String,
+  includeLocalTempViews: Boolean): Seq[TableIdentifier] = {
 val dbName = formatDatabaseName(db)
 val dbTables = if (dbName == globalTempViewManager.database) {
   globalTempViewManager.listViewNames(pattern).map { name =>
@@ -796,12 +808,23 @@ class SessionCatalog(
 TableIdentifier(name, Some(dbName))
   }
 }
-val localTempViews = synchronized {
+
+if (includeLocalTempViews) {
+  dbTables ++ listLocalTempViews(pattern)
+} else {
+  dbTables
+}
+  }
+
+  /**
+   * List all matching local temporary views.
+   */
+  def listLocalTempViews(pattern: String): Seq[TableIdentifier] = {
+synchronized {
   StringUtils.filterPattern(tempViews.keys.toSeq, pattern).map { name =>
 TableIdentifier(name)
   }
 }
-dbTables ++ localTempViews
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
index 5a9e4ad..bce8553 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala
@@ -717,6 +717,71 @@ abstract class SessionCatalogSuite extends AnalysisTest {
 }
   }
 
+  test("list tables with pattern and includeLocalTempViews") {
+withEmptyCatalog { catalog =>
+  catalog.createDatabase(newDb("mydb"), ignoreIfExists = false)
+  catalog.createTable(newTable("tbl1", "mydb"), ignoreIfExists = false)
+  catalog.createTable(newTable("tbl2", "mydb"), ignoreIfExists = false)
+  val tempTable = Range(1, 10, 2, 10)
+  catalog.createTempView("temp_view1", tempTable, overrideIfExists = false)
+  catalog.createTempView("temp_view4", tempTable, overrideIfExists = false)
+
+  assert(catalog.listTables("mydb").toSet == catalog.listTables("mydb", 
"*").toSet)
+  assert(catalog.listTables("mydb").toSet == catalog.listTables("mydb", 
"*", true).toSet)
+  assert(catalog.listTables("mydb").toSet ==
+catalog.listTables("mydb&q

[spark] branch master updated (73183b3 -> e0e2144)

2019-06-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 73183b3  [SPARK-11412][SQL] Support merge schema for ORC
 add e0e2144  [SPARK-28184][SQL][TEST] Avoid creating new sessions in 
SparkMetadataOperationSuite

No new revisions were added by this update.

Summary of changes:
 .../thriftserver/SparkMetadataOperationSuite.scala | 271 ++---
 1 file changed, 68 insertions(+), 203 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-11412][SQL] Support merge schema for ORC

2019-06-29 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73183b3  [SPARK-11412][SQL] Support merge schema for ORC
73183b3 is described below

commit 73183b3c8c2022846587f08e8dea5c387ed3b8d5
Author: wangguangxin.cn 
AuthorDate: Sat Jun 29 17:08:31 2019 -0700

[SPARK-11412][SQL] Support merge schema for ORC

## What changes were proposed in this pull request?

Currently, ORC's `inferSchema` is implemented as randomly choosing one ORC 
file and reading its schema.

This PR follows the behavior of Parquet, it implements merge schemas logic 
by reading all ORC files in parallel through a spark job.

Users can enable merge schema by 
`spark.read.orc("xxx").option("mergeSchema", "true")` or by setting 
`spark.sql.orc.mergeSchema` to `true`, the prior one has higher priority.

## How was this patch tested?
tested by UT OrcUtilsSuite.scala

Closes #24043 from WangGuangxin/SPARK-11412.

Lead-authored-by: wangguangxin.cn 
Co-authored-by: wangguangxin.cn 
Signed-off-by: gatorsmile 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|   8 ++
 .../execution/datasources/SchemaMergeUtils.scala   | 106 ++
 .../execution/datasources/orc/OrcFileFormat.scala  |   2 +-
 .../sql/execution/datasources/orc/OrcOptions.scala |  11 ++
 .../sql/execution/datasources/orc/OrcUtils.scala   |  26 +++-
 .../datasources/parquet/ParquetFileFormat.scala|  81 ++-
 .../execution/datasources/v2/orc/OrcTable.scala|   4 +-
 .../execution/datasources/ReadSchemaSuite.scala|  21 +++
 .../execution/datasources/orc/OrcSourceSuite.scala | 154 -
 .../apache/spark/sql/hive/orc/OrcFileFormat.scala  |  20 ++-
 .../spark/sql/hive/orc/OrcFileOperator.scala   |  21 ++-
 .../spark/sql/hive/orc/HiveOrcSourceSuite.scala|   4 +
 12 files changed, 375 insertions(+), 83 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index ff84f4e..e4c58ef 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -566,6 +566,12 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val ORC_SCHEMA_MERGING_ENABLED = buildConf("spark.sql.orc.mergeSchema")
+.doc("When true, the Orc data source merges schemas collected from all 
data files, " +
+  "otherwise the schema is picked from a random data file.")
+.booleanConf
+.createWithDefault(false)
+
   val HIVE_VERIFY_PARTITION_PATH = 
buildConf("spark.sql.hive.verifyPartitionPath")
 .doc("When true, check all the partition paths under the table\'s root 
directory " +
  "when reading data stored in HDFS. This configuration will be 
deprecated in the future " +
@@ -1956,6 +1962,8 @@ class SQLConf extends Serializable with Logging {
 
   def orcFilterPushDown: Boolean = getConf(ORC_FILTER_PUSHDOWN_ENABLED)
 
+  def isOrcSchemaMergingEnabled: Boolean = getConf(ORC_SCHEMA_MERGING_ENABLED)
+
   def verifyPartitionPath: Boolean = getConf(HIVE_VERIFY_PARTITION_PATH)
 
   def metastorePartitionPruning: Boolean = 
getConf(HIVE_METASTORE_PARTITION_PRUNING)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaMergeUtils.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaMergeUtils.scala
new file mode 100644
index 000..99882b0
--- /dev/null
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/SchemaMergeUtils.scala
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+
+import org.apache.spark.SparkException
+import org.apach

[spark] branch master updated: [SPARK-28056][.2][PYTHON][SQL] add docstring/doctest for SCALAR_ITER Pandas UDF

2019-06-28 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 8299600  [SPARK-28056][.2][PYTHON][SQL] add docstring/doctest for 
SCALAR_ITER Pandas UDF
8299600 is described below

commit 8299600575024ca81127b7bf8ef48ae11fdd0594
Author: Xiangrui Meng 
AuthorDate: Fri Jun 28 15:09:57 2019 -0700

[SPARK-28056][.2][PYTHON][SQL] add docstring/doctest for SCALAR_ITER Pandas 
UDF

## What changes were proposed in this pull request?

Add docstring/doctest for `SCALAR_ITER` Pandas UDF. I explicitly mentioned 
that per-partition execution is an implementation detail, not guaranteed. I 
will submit another PR to add the same to user guide, just to keep this PR 
minimal.

I didn't add "doctest: +SKIP" in the first commit so it is easy to test 
locally.

cc: HyukjinKwon gatorsmile icexelloss BryanCutler WeichenXu123

![Screen Shot 2019-06-28 at 9 52 41 
AM](https://user-images.githubusercontent.com/829644/60358349-b0aa5400-998a-11e9-9ebf-8481dfd555b5.png)
![Screen Shot 2019-06-28 at 9 53 19 
AM](https://user-images.githubusercontent.com/829644/60358355-b1db8100-998a-11e9-8f6f-00a11bdbdc4d.png)

## How was this patch tested?

doctest

Closes #25005 from mengxr/SPARK-28056.2.

Authored-by: Xiangrui Meng 
Signed-off-by: gatorsmile 
---
 python/pyspark/sql/functions.py | 104 +++-
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index 34f6593..5d1e69e 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -2951,7 +2951,107 @@ def pandas_udf(f=None, returnType=None, 
functionType=None):
Therefore, this can be used, for example, to ensure the length of 
each returned
`pandas.Series`, and can not be used as the column length.
 
-2. GROUPED_MAP
+2. SCALAR_ITER
+
+   A scalar iterator UDF is semantically the same as the scalar Pandas UDF 
above except that the
+   wrapped Python function takes an iterator of batches as input instead 
of a single batch and,
+   instead of returning a single output batch, it yields output batches or 
explicitly returns an
+   generator or an iterator of output batches.
+   It is useful when the UDF execution requires initializing some state, 
e.g., loading a machine
+   learning model file to apply inference to every input batch.
+
+   .. note:: It is not guaranteed that one invocation of a scalar iterator 
UDF will process all
+   batches from one partition, although it is currently implemented 
this way.
+   Your code shall not rely on this behavior because it might change 
in the future for
+   further optimization, e.g., one invocation processes multiple 
partitions.
+
+   Scalar iterator UDFs are used with 
:meth:`pyspark.sql.DataFrame.withColumn` and
+   :meth:`pyspark.sql.DataFrame.select`.
+
+   >>> import pandas as pd  # doctest: +SKIP
+   >>> from pyspark.sql.functions import col, pandas_udf, struct, 
PandasUDFType
+   >>> pdf = pd.DataFrame([1, 2, 3], columns=["x"])  # doctest: +SKIP
+   >>> df = spark.createDataFrame(pdf)  # doctest: +SKIP
+
+   When the UDF is called with a single column that is not `StructType`, 
the input to the
+   underlying function is an iterator of `pd.Series`.
+
+   >>> @pandas_udf("long", PandasUDFType.SCALAR_ITER)  # doctest: +SKIP
+   ... def plus_one(batch_iter):
+   ... for x in batch_iter:
+   ... yield x + 1
+   ...
+   >>> df.select(plus_one(col("x"))).show()  # doctest: +SKIP
+   +---+
+   |plus_one(x)|
+   +---+
+   |  2|
+   |  3|
+   |  4|
+   +---+
+
+   When the UDF is called with more than one columns, the input to the 
underlying function is an
+   iterator of `pd.Series` tuple.
+
+   >>> @pandas_udf("long", PandasUDFType.SCALAR_ITER)  # doctest: +SKIP
+   ... def multiply_two_cols(batch_iter):
+   ... for a, b in batch_iter:
+   ... yield a * b
+   ...
+   >>> df.select(multiply_two_cols(col("x"), col("x"))).show()  # doctest: 
+SKIP
+   +---+
+   |multiply_two_cols(x, x)|
+   +---+
+   |  1|
+   |  4|
+   |  9|
+   +---+
+
+   When the UDF is called with a single column that is `StructType`, the 
input to the underlying
+   function is an iterator

[spark] branch master updated (a7e1619 -> cded421)

2019-06-27 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a7e1619  [SPARK-28174][BUILD][SS] Upgrade to Kafka 2.3.0
 add cded421  [SPARK-27871][SQL] LambdaVariable should use per-query unique 
IDs instead of globally unique IDs

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/encoders/ExpressionEncoder.scala  |  30 ++-
 .../sql/catalyst/expressions/objects/objects.scala | 268 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |   3 +-
 .../spark/sql/catalyst/optimizer/objects.scala |  53 +++-
 .../org/apache/spark/sql/internal/SQLConf.scala|   7 +
 .../expressions/ObjectExpressionsSuite.scala   |   2 +-
 .../optimizer/ReassignLambdaVariableIDSuite.scala  |  61 +
 .../main/scala/org/apache/spark/sql/Dataset.scala  |  35 +--
 .../aggregate/TypedAggregateExpression.scala   |  23 +-
 .../spark/sql/DatasetOptimizationSuite.scala   |  48 +++-
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  17 +-
 .../sql/execution/WholeStageCodegenSuite.scala |   2 +-
 12 files changed, 342 insertions(+), 207 deletions(-)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ReassignLambdaVariableIDSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28059][SQL][TEST] Port int4.sql

2019-06-23 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 929d313  [SPARK-28059][SQL][TEST] Port int4.sql
929d313 is described below

commit 929d3135687226aa7b24edeeb0ff8e62cd087fae
Author: Yuming Wang 
AuthorDate: Sat Jun 22 23:59:30 2019 -0700

[SPARK-28059][SQL][TEST] Port int4.sql

## What changes were proposed in this pull request?

This PR is to port int4.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int4.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int4.out

When porting the test cases, found two PostgreSQL specific features that do 
not exist in Spark SQL:
[SPARK-28023](https://issues.apache.org/jira/browse/SPARK-28023): Trim the 
string when cast string type to other types
[SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027): Add 
bitwise shift left/right operators

Also, found a bug:
[SPARK-28024](https://issues.apache.org/jira/browse/SPARK-28024): Incorrect 
value when out of range

Also, found four inconsistent behavior:
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Invalid 
input syntax for integer: "34.5" at PostgreSQL
[SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027) Our 
operator `!` and `!!` has different meanings
[SPARK-28028](https://issues.apache.org/jira/browse/SPARK-28028): Cast 
numeric to integral type need round
[SPARK-2659](https://issues.apache.org/jira/browse/SPARK-2659): HiveQL: 
Division operator should always perform fractional division, for example:
```sql
select 1/2;
```

## How was this patch tested?

N/A

Closes #24877 from wangyum/SPARK-28059.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../test/resources/sql-tests/inputs/pgSQL/int4.sql | 178 +++
 .../resources/sql-tests/results/pgSQL/int4.sql.out | 530 +
 2 files changed, 708 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql
new file mode 100644
index 000..89cac00
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/int4.sql
@@ -0,0 +1,178 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- INT4
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int4.sql
+--
+
+CREATE TABLE INT4_TBL(f1 int) USING parquet;
+
+-- [SPARK-28023] Trim the string when cast string type to other types
+INSERT INTO INT4_TBL VALUES (trim('   0  '));
+
+INSERT INTO INT4_TBL VALUES (trim('123456 '));
+
+INSERT INTO INT4_TBL VALUES (trim('-123456'));
+
+-- [SPARK-27923] Invalid input syntax for integer: "34.5" at PostgreSQL
+-- INSERT INTO INT4_TBL(f1) VALUES ('34.5');
+
+-- largest and smallest values
+INSERT INTO INT4_TBL VALUES ('2147483647');
+
+INSERT INTO INT4_TBL VALUES ('-2147483647');
+
+-- [SPARK-27923] Spark SQL insert these bad inputs to NULL
+-- bad input values
+-- INSERT INTO INT4_TBL(f1) VALUES ('1');
+-- INSERT INTO INT4_TBL(f1) VALUES ('asdf');
+-- INSERT INTO INT4_TBL(f1) VALUES (' ');
+-- INSERT INTO INT4_TBL(f1) VALUES ('   asdf   ');
+-- INSERT INTO INT4_TBL(f1) VALUES ('- 1234');
+-- INSERT INTO INT4_TBL(f1) VALUES ('123   5');
+-- INSERT INTO INT4_TBL(f1) VALUES ('');
+
+
+SELECT '' AS five, * FROM INT4_TBL;
+
+SELECT '' AS four, i.* FROM INT4_TBL i WHERE i.f1 <> smallint('0');
+
+SELECT '' AS four, i.* FROM INT4_TBL i WHERE i.f1 <> int('0');
+
+SELECT '' AS one, i.* FROM INT4_TBL i WHERE i.f1 = smallint('0');
+
+SELECT '' AS one, i.* FROM INT4_TBL i WHERE i.f1 = int('0');
+
+SELECT '' AS two, i.* FROM INT4_TBL i WHERE i.f1 < smallint('0');
+
+SELECT '' AS two, i.* FROM INT4_TBL i WHERE i.f1 < int('0');
+
+SELECT '' AS three, i.* FROM INT4_TBL i WHERE i.f1 <= smallint('0');
+
+SELECT '' AS three, i.* FROM INT4_TBL i WHERE i.f1 <= int('0');
+
+SELECT '' AS two, i.* FROM INT4_TBL i WHERE i.f1 > smallint('0');
+
+SELECT '' AS two, i.* FROM INT4_TBL i WHERE i.f1 > int('0');
+
+SELECT '' AS three, i.* FROM INT4_TBL i WHERE i.f1 >= smallint('0');
+
+SELECT '' AS three, i.* FROM INT4_TBL i WHERE i.f1 >= int('0');
+
+-- positive odds

[spark] branch master updated (870f972 -> 0768fad)

2019-06-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 870f972  [SPARK-28104][SQL] Implement Spark's own GetColumnsOperation
 add 0768fad  [SPARK-28126][SQL] Support TRIM(trimStr FROM str) syntax

No new revisions were added by this update.

Summary of changes:
 .../antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4   |  2 +-
 .../spark/sql/catalyst/expressions/stringExpressions.scala   | 12 +---
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala|  2 +-
 .../apache/spark/sql/catalyst/parser/PlanParserSuite.scala   |  5 +
 .../src/test/resources/sql-tests/inputs/string-functions.sql |  4 ++--
 .../resources/sql-tests/results/string-functions.sql.out | 12 ++--
 6 files changed, 24 insertions(+), 13 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28104][SQL] Implement Spark's own GetColumnsOperation

2019-06-22 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 870f972  [SPARK-28104][SQL] Implement Spark's own GetColumnsOperation
870f972 is described below

commit 870f972dcc7b2b0e5bea2ae64f2c9598c681eddf
Author: Yuming Wang 
AuthorDate: Sat Jun 22 09:15:07 2019 -0700

[SPARK-28104][SQL] Implement Spark's own GetColumnsOperation

## What changes were proposed in this pull request?

[SPARK-24196](https://issues.apache.org/jira/browse/SPARK-24196) and 
[SPARK-24570](https://issues.apache.org/jira/browse/SPARK-24570) implemented 
Spark's own `GetSchemasOperation` and `GetTablesOperation`. This pr implements 
Spark's own `GetColumnsOperation`.

## How was this patch tested?

unit tests and manual tests:

![image](https://user-images.githubusercontent.com/5399861/59745367-3a7d6180-92a7-11e9-862d-96bc494c5f00.png)

Closes #24906 from wangyum/SPARK-28104.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../thriftserver/SparkGetColumnsOperation.scala| 142 +
 .../server/SparkSQLOperationManager.scala  |  20 ++-
 .../thriftserver/HiveThriftServer2Suites.scala |   9 +-
 .../thriftserver/SparkMetadataOperationSuite.scala |  92 -
 .../service/cli/operation/GetColumnsOperation.java |   4 +-
 .../hive/thriftserver/ThriftserverShimUtils.scala  |   5 +-
 .../service/cli/operation/GetColumnsOperation.java |   4 +-
 .../hive/thriftserver/ThriftserverShimUtils.scala  |   4 +
 8 files changed, 270 insertions(+), 10 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
new file mode 100644
index 000..4b78e2f
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetColumnsOperation.scala
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.util.UUID
+import java.util.regex.Pattern
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.{HiveOperationType, 
HivePrivilegeObject}
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HivePrivilegeObject.HivePrivilegeObjectType
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetColumnsOperation
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.SessionCatalog
+import 
org.apache.spark.sql.hive.thriftserver.ThriftserverShimUtils.toJavaSQLType
+
+/**
+ * Spark's own SparkGetColumnsOperation
+ *
+ * @param sqlContext SQLContext to use
+ * @param parentSession a HiveSession from SessionManager
+ * @param catalogName catalog name. NULL if not applicable.
+ * @param schemaName database name, NULL or a concrete database name
+ * @param tableName table name
+ * @param columnName column name
+ */
+private[hive] class SparkGetColumnsOperation(
+sqlContext: SQLContext,
+parentSession: HiveSession,
+catalogName: String,
+schemaName: String,
+tableName: String,
+columnName: String)
+  extends GetColumnsOperation(parentSession, catalogName, schemaName, 
tableName, columnName)
+with Logging {
+
+  val catalog: SessionCatalog = sqlContext.sessionState.catalog
+
+  private var statementId: String = _
+
+  override def runInternal(): Unit = {
+val cmdStr = s"catalog : $catalogName, schemaPattern : $schemaName, 
tablePattern : $tableName" +
+  s", columnName : $columnName"
+logInfo(s"GetColumnsOperation: $cmdStr")
+
+setState(OperationState.RUNNING)
+// Always use the latest class

[spark] branch master updated (47f54b1 -> 54da3bb)

2019-06-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 47f54b1  [SPARK-28118][CORE] Add `spark.eventLog.compression.codec` 
configuration
 add 54da3bb  [SPARK-28127][SQL] Micro optimization on TreeNode's 
mapChildren method

No new revisions were added by this update.

Summary of changes:
 .../src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27890][SQL] Improve SQL parser error message for character-only identifier with hyphens except those in expressions

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7b7f16f  [SPARK-27890][SQL] Improve SQL parser error message for 
character-only identifier with hyphens except those in expressions
7b7f16f is described below

commit 7b7f16f2a7a6a6685a8917a9b5ba403fff76
Author: Yesheng Ma 
AuthorDate: Tue Jun 18 21:51:15 2019 -0700

[SPARK-27890][SQL] Improve SQL parser error message for character-only 
identifier with hyphens except those in expressions

## What changes were proposed in this pull request?

Current SQL parser's error message for hyphen-connected identifiers without 
surrounding backquotes(e.g. hyphen-table) is confusing for end users. A 
possible approach to tackle this is to explicitly capture these wrong usages in 
the SQL parser. In this way, the end users can fix these errors more quickly.

For example, for a simple query such as `SELECT * FROM test-table`, the 
original error message is
```
Error in SQL statement: ParseException:
mismatched input '-' expecting (line 1, pos 18)
```
which can be confusing in a large query.

After the fix, the error message is:
```
Error in query:
Possibly unquoted identifier test-table detected. Please consider quoting 
it with back-quotes as `test-table`(line 1, pos 14)

== SQL ==
SELECT * FROM test-table
--^^^
```
which is easier for end users to identify the issue and fix.

We safely augmented the current grammar rule to explicitly capture these 
error cases. The error handling logic is implemented in the SQL parsing 
listener `PostProcessor`.

However, note that for cases such as `a - my-func(b)`, the parser can't 
actually tell whether this should be ``a -`my-func`(b) `` or `a - my - 
func(b)`. Therefore for these cases, we leave the parser as is. Also, in this 
patch we only provide better error messages for character-only identifiers.

## How was this patch tested?
Adding new unit tests.

Closes #24749 from yeshengm/hyphen-ident.

Authored-by: Yesheng Ma 
Signed-off-by: gatorsmile 
---
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  60 ++-
 .../spark/sql/catalyst/parser/AstBuilder.scala |  16 +--
 .../spark/sql/catalyst/parser/ParseDriver.scala|   8 ++
 .../sql/catalyst/parser/ErrorParserSuite.scala | 110 +
 .../spark/sql/execution/SparkSqlParser.scala   |  10 +-
 5 files changed, 169 insertions(+), 35 deletions(-)

diff --git 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
index dcb7939..f57a659 100644
--- 
a/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
+++ 
b/sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
@@ -82,13 +82,15 @@ singleTableSchema
 statement
 : query
#statementDefault
 | ctes? dmlStatementNoWith 
#dmlStatement
-| USE db=identifier#use
-| CREATE database (IF NOT EXISTS)? identifier
+| USE db=errorCapturingIdentifier  #use
+| CREATE database (IF NOT EXISTS)? db=errorCapturingIdentifier
 ((COMMENT comment=STRING) |
  locationSpec |
  (WITH DBPROPERTIES tablePropertyList))*   
#createDatabase
-| ALTER database identifier SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
-| DROP database (IF EXISTS)? identifier (RESTRICT | CASCADE)?  
#dropDatabase
+| ALTER database db=errorCapturingIdentifier
+SET DBPROPERTIES tablePropertyList 
#setDatabaseProperties
+| DROP database (IF EXISTS)? db=errorCapturingIdentifier
+(RESTRICT | CASCADE)?  
#dropDatabase
 | SHOW DATABASES (LIKE? pattern=STRING)?   
#showDatabases
 | createTableHeader ('(' colTypeList ')')? tableProvider
 ((OPTIONS options=tablePropertyList) |
@@ -135,7 +137,8 @@ statement
 (ALTER | CHANGE) COLUMN? qualifiedName
 (TYPE dataType)? (COMMENT comment=STRING)? colPosition?
#alterTableColumn
 | ALTER TABLE tableIdentifier partitionSpec?
-CHANGE COLUMN? identifier colType colPosition? 
#changeColumn
+CHANGE COLUMN?
+colName=errorCapturingIdentifier colType colPosition?  
#changeColumn
 | ALTER TABLE tableIdentifier (partitionSpec)?
 SET SERDE STRING (WITH SERDEPROPERTIES tablePropertyList)? 
#setTable

[spark] branch master updated (a5dcb82 -> 15de6d0)

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a5dcb82  [SPARK-27105][SQL] Optimize away exponential complexity in 
ORC predicate conversion
 add 15de6d0  [SPARK-28096][SQL] Convert defs to lazy vals to avoid 
expensive reference computation in QueryPlan and Expression

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/expressions/Expression.scala   |  9 -
 .../catalyst/expressions/aggregate/interfaces.scala   |  3 ++-
 .../spark/sql/catalyst/expressions/grouping.scala |  8 ++--
 .../sql/catalyst/expressions/namedExpressions.scala   |  3 ++-
 .../apache/spark/sql/catalyst/plans/QueryPlan.scala   |  7 +--
 .../sql/catalyst/plans/logical/LogicalPlan.scala  |  2 +-
 .../catalyst/plans/logical/QueryPlanConstraints.scala |  2 +-
 .../catalyst/plans/logical/ScriptTransformation.scala |  3 ++-
 .../plans/logical/basicLogicalOperators.scala | 19 ++-
 .../spark/sql/catalyst/plans/logical/object.scala |  9 ++---
 .../plans/logical/pythonLogicalOperators.scala|  3 ++-
 .../org/apache/spark/sql/execution/ExpandExec.scala   |  3 ++-
 .../org/apache/spark/sql/execution/objects.scala  |  3 ++-
 .../spark/sql/execution/python/EvalPythonExec.scala   |  3 ++-
 14 files changed, 51 insertions(+), 26 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-28039][SQL][TEST] Port float4.sql

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e3ae97  [SPARK-28039][SQL][TEST] Port float4.sql
2e3ae97 is described below

commit 2e3ae97668f9170c820ec5564edc50dff8347915
Author: Yuming Wang 
AuthorDate: Tue Jun 18 16:22:30 2019 -0700

[SPARK-28039][SQL][TEST] Port float4.sql

## What changes were proposed in this pull request?

This PR is to port float4.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out

When porting the test cases, found three PostgreSQL specific features that 
do not exist in Spark SQL:
[SPARK-28060](https://issues.apache.org/jira/browse/SPARK-28060): Float 
type can not accept some special inputs
[SPARK-28027](https://issues.apache.org/jira/browse/SPARK-28027): Spark SQL 
does not support prefix operator ``
[SPARK-28061](https://issues.apache.org/jira/browse/SPARK-28061): Support 
for converting float to binary format

Also, found a bug:
[SPARK-28024](https://issues.apache.org/jira/browse/SPARK-28024): Incorrect 
value when out of range

Also, found three inconsistent behavior:
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
insert there bad inputs to NULL
[SPARK-28028](https://issues.apache.org/jira/browse/SPARK-28028): Cast 
numeric to integral type need round
[SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Spark SQL 
returns NULL when dividing by zero

## How was this patch tested?

N/A

Closes #24887 from wangyum/SPARK-28039.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../resources/sql-tests/inputs/pgSQL/float4.sql| 363 
 .../sql-tests/results/pgSQL/float4.sql.out | 379 +
 2 files changed, 742 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql
new file mode 100644
index 000..9e684d1
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/float4.sql
@@ -0,0 +1,363 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- FLOAT4
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql
+
+CREATE TABLE FLOAT4_TBL (f1  float) USING parquet;
+
+INSERT INTO FLOAT4_TBL VALUES ('0.0');
+INSERT INTO FLOAT4_TBL VALUES ('1004.30   ');
+INSERT INTO FLOAT4_TBL VALUES (' -34.84');
+INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e+20');
+INSERT INTO FLOAT4_TBL VALUES ('1.2345678901234e-20');
+
+-- [SPARK-28024] Incorrect numeric values when out of range
+-- test for over and under flow
+-- INSERT INTO FLOAT4_TBL VALUES ('10e70');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e70');
+-- INSERT INTO FLOAT4_TBL VALUES ('10e-70');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e-70');
+
+-- INSERT INTO FLOAT4_TBL VALUES ('10e400');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e400');
+-- INSERT INTO FLOAT4_TBL VALUES ('10e-400');
+-- INSERT INTO FLOAT4_TBL VALUES ('-10e-400');
+
+-- [SPARK-27923] Spark SQL insert there bad inputs to NULL
+-- bad input
+-- INSERT INTO FLOAT4_TBL VALUES ('');
+-- INSERT INTO FLOAT4_TBL VALUES ('   ');
+-- INSERT INTO FLOAT4_TBL VALUES ('xyz');
+-- INSERT INTO FLOAT4_TBL VALUES ('5.0.0');
+-- INSERT INTO FLOAT4_TBL VALUES ('5 . 0');
+-- INSERT INTO FLOAT4_TBL VALUES ('5.   0');
+-- INSERT INTO FLOAT4_TBL VALUES (' - 3.0');
+-- INSERT INTO FLOAT4_TBL VALUES ('1235');
+
+-- special inputs
+SELECT float('NaN');
+-- [SPARK-28060] Float type can not accept some special inputs
+SELECT float('nan');
+SELECT float('   NAN  ');
+SELECT float('infinity');
+SELECT float('  -INFINiTY   ');
+-- [SPARK-27923] Spark SQL insert there bad special inputs to NULL
+-- bad special inputs
+SELECT float('N A N');
+SELECT float('NaN x');
+SELECT float(' INFINITYx');
+
+-- [SPARK-28060] Float type can not accept some special inputs
+SELECT float('Infinity') + 100.0;
+SELECT float('Infinity') / float('Infinity');
+SELECT float('nan') / float('nan');
+SELECT float(decimal('nan'));
+
+SELECT '' AS five, * FROM FLOAT4_TBL;
+
+SELECT '' AS four, f.* FROM FLOAT4_TBL f WHERE f.f1 <

[spark] branch master updated (ed280c2 -> 1ada36b)

2019-06-18 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ed280c2  [SPARK-28072][SQL] Fix IncompatibleClassChangeError in 
`FromUnixTime` codegen on JDK9+
 add 1ada36b  [SPARK-27783][SQL] Add customizable hint error handler

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 +-
 .../sql/catalyst/analysis/HintErrorLogger.scala| 55 ++
 .../spark/sql/catalyst/analysis/ResolveHints.scala | 22 -
 .../catalyst/optimizer/EliminateResolvedHint.scala | 16 +++
 .../spark/sql/catalyst/plans/logical/hints.scala   | 43 +++--
 .../org/apache/spark/sql/internal/SQLConf.scala|  8 +++-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala |  4 +-
 7 files changed, 120 insertions(+), 30 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HintErrorLogger.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (63e0711 -> 6284ac7)

2019-06-11 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 63e0711  [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
 add 6284ac7  [SPARK-27934][SQL][TEST] Port case.sql

No new revisions were added by this update.

Summary of changes:
 .../test/resources/sql-tests/inputs/pgSQL/case.sql | 267 +
 .../resources/sql-tests/results/pgSQL/case.sql.out | 425 +
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |   2 +
 3 files changed, 694 insertions(+)
 create mode 100644 sql/core/src/test/resources/sql-tests/inputs/pgSQL/case.sql
 create mode 100644 
sql/core/src/test/resources/sql-tests/results/pgSQL/case.sql.out


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available in ExternalCatalog/SessionCatalog API

2019-06-11 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 63e0711  [SPARK-27899][SQL] Make 
HiveMetastoreClient.getTableObjectsByName available in 
ExternalCatalog/SessionCatalog API
63e0711 is described below

commit 63e071152498ccaf10f038231ae7e1706b786a0b
Author: LantaoJin 
AuthorDate: Tue Jun 11 15:32:59 2019 +0800

[SPARK-27899][SQL] Make HiveMetastoreClient.getTableObjectsByName available 
in ExternalCatalog/SessionCatalog API

## What changes were proposed in this pull request?

The new Spark ThriftServer SparkGetTablesOperation implemented in 
https://github.com/apache/spark/pull/22794 does a catalog.getTableMetadata 
request for every table. This can get very slow for large schemas (~50ms per 
table with an external Hive metastore).
Hive ThriftServer GetTablesOperation uses 
HiveMetastoreClient.getTableObjectsByName to get table information in bulk, but 
we don't expose that through our APIs that go through Hive -> HiveClientImpl 
(HiveClient) -> HiveExternalCatalog (ExternalCatalog) -> SessionCatalog.

If we added and exposed getTableObjectsByName through our catalog APIs, we 
could resolve that performance problem in SparkGetTablesOperation.

## How was this patch tested?

Add UT

Closes #24774 from LantaoJin/SPARK-27899.

Authored-by: LantaoJin 
Signed-off-by: gatorsmile 
---
 .../sql/catalyst/catalog/ExternalCatalog.scala |  2 +
 .../catalog/ExternalCatalogWithListener.scala  |  4 +
 .../sql/catalyst/catalog/InMemoryCatalog.scala |  5 ++
 .../sql/catalyst/catalog/SessionCatalog.scala  | 28 +++
 .../catalyst/catalog/ExternalCatalogSuite.scala| 22 ++
 .../sql/catalyst/catalog/SessionCatalogSuite.scala | 90 ++
 .../thriftserver/SparkGetTablesOperation.scala |  3 +-
 .../spark/sql/hive/HiveExternalCatalog.scala   |  8 ++
 .../apache/spark/sql/hive/client/HiveClient.scala  |  3 +
 .../spark/sql/hive/client/HiveClientImpl.scala | 65 +++-
 .../apache/spark/sql/hive/client/HiveShim.scala| 14 
 .../spark/sql/hive/client/VersionsSuite.scala  | 27 +++
 12 files changed, 265 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
index 1a145c2..dcc1439 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala
@@ -128,6 +128,8 @@ trait ExternalCatalog {
 
   def getTable(db: String, table: String): CatalogTable
 
+  def getTablesByName(db: String, tables: Seq[String]): Seq[CatalogTable]
+
   def tableExists(db: String, table: String): Boolean
 
   def listTables(db: String): Seq[String]
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala
index 2f009be..86113d3 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogWithListener.scala
@@ -138,6 +138,10 @@ class ExternalCatalogWithListener(delegate: 
ExternalCatalog)
 delegate.getTable(db, table)
   }
 
+  override def getTablesByName(db: String, tables: Seq[String]): 
Seq[CatalogTable] = {
+delegate.getTablesByName(db, tables)
+  }
+
   override def tableExists(db: String, table: String): Boolean = {
 delegate.tableExists(db, table)
   }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
index 741dc46..abf6993 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala
@@ -327,6 +327,11 @@ class InMemoryCatalog(
 catalog(db).tables(table).table
   }
 
+  override def getTablesByName(db: String, tables: Seq[String]): 
Seq[CatalogTable] = {
+requireDbExists(db)
+tables.flatMap(catalog(db).tables.get).map(_.table)
+  }
+
   override def tableExists(db: String, table: String): Boolean = synchronized {
 requireDbExists(db)
 catalog(db).tables.contains(table)
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
index c05f777..e49e54f 100644
--- 
a/sql/cat

[spark] branch master updated: [SPARK-27970][SQL] Support Hive 3.0 metastore

2019-06-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2926890  [SPARK-27970][SQL] Support Hive 3.0 metastore
2926890 is described below

commit 2926890ffbcf3a92c7e0863c69e31c3d22191112
Author: Yuming Wang 
AuthorDate: Fri Jun 7 15:24:07 2019 -0700

[SPARK-27970][SQL] Support Hive 3.0 metastore

## What changes were proposed in this pull request?

It seems that some users are using Hive 3.0.0. This pr makes it support 
Hive 3.0 metastore.

## How was this patch tested?

unit tests

Closes #24688 from wangyum/SPARK-26145.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 docs/sql-data-sources-hive-tables.md  |  2 +-
 docs/sql-migration-guide-hive-compatibility.md|  2 +-
 .../main/scala/org/apache/spark/sql/hive/HiveUtils.scala  |  2 +-
 .../org/apache/spark/sql/hive/client/HiveClientImpl.scala |  3 ++-
 .../scala/org/apache/spark/sql/hive/client/HiveShim.scala |  4 +++-
 .../spark/sql/hive/client/IsolatedClientLoader.scala  |  1 +
 .../scala/org/apache/spark/sql/hive/client/package.scala  | 13 -
 .../apache/spark/sql/hive/execution/SaveAsHiveFile.scala  |  2 +-
 .../apache/spark/sql/hive/client/HiveClientVersions.scala |  3 ++-
 .../apache/spark/sql/hive/client/HiveVersionSuite.scala   |  4 ++--
 .../org/apache/spark/sql/hive/client/VersionsSuite.scala  | 15 +++
 11 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/docs/sql-data-sources-hive-tables.md 
b/docs/sql-data-sources-hive-tables.md
index 3d58e94..5688011 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -130,7 +130,7 @@ The following options can be used to configure the version 
of Hive that is used
 1.2.1
 
   Version of the Hive metastore. Available
-  options are 0.12.0 through 2.3.5 and 
3.1.0 through 3.1.1.
+  options are 0.12.0 through 2.3.5 and 
3.0.0 through 3.1.1.
 
   
   
diff --git a/docs/sql-migration-guide-hive-compatibility.md 
b/docs/sql-migration-guide-hive-compatibility.md
index 4a8076d..f955e31 100644
--- a/docs/sql-migration-guide-hive-compatibility.md
+++ b/docs/sql-migration-guide-hive-compatibility.md
@@ -25,7 +25,7 @@ license: |
 Spark SQL is designed to be compatible with the Hive Metastore, SerDes and 
UDFs.
 Currently, Hive SerDes and UDFs are based on Hive 1.2.1,
 and Spark SQL can be connected to different versions of Hive Metastore
-(from 0.12.0 to 2.3.5 and 3.1.0 to 3.1.1. Also see [Interacting with Different 
Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).
+(from 0.12.0 to 2.3.5 and 3.0.0 to 3.1.1. Also see [Interacting with Different 
Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).
 
  Deploying in Existing Hive Warehouses
 
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
index 38ad061..c3ae3d5 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala
@@ -64,7 +64,7 @@ private[spark] object HiveUtils extends Logging {
   val HIVE_METASTORE_VERSION = buildConf("spark.sql.hive.metastore.version")
 .doc("Version of the Hive metastore. Available options are " +
 "0.12.0 through 2.3.5 and " +
-"3.1.0 through 3.1.1.")
+"3.0.0 through 3.1.1.")
 .stringConf
 .createWithDefault(builtinHiveVersion)
 
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
index b8d5f21..2b80165 100644
--- 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
+++ 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala
@@ -107,6 +107,7 @@ private[hive] class HiveClientImpl(
 case hive.v2_1 => new Shim_v2_1()
 case hive.v2_2 => new Shim_v2_2()
 case hive.v2_3 => new Shim_v2_3()
+case hive.v3_0 => new Shim_v3_0()
 case hive.v3_1 => new Shim_v3_1()
   }
 
@@ -744,7 +745,7 @@ private[hive] class HiveClientImpl(
   // Since HIVE-18238(Hive 3.0.0), the Driver.close function's return type 
changed
   // and the CommandProcessorFactory.clean function removed.
   driver.getClass.getMethod("close").invoke(driver)
-  if (version != hive.v3_1) {
+  if (version != hive.v3_0 && version != hive.v3_1) {
 CommandProcessorFactory.clean(conf)
   }
 }
diff --git 
a/sql/hive/src/main/sca

[spark] branch master updated: [SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)

2019-06-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9c4eb99  [SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF 
(for improving pandas UDFs pipeline)
9c4eb99 is described below

commit 9c4eb99c52803f2488ac3787672aa8d3e4d1544e
Author: WeichenXu 
AuthorDate: Fri Jun 7 14:02:43 2019 -0700

[SPARK-27870][SQL][PYSPARK] Flush batch timely for pandas UDF (for 
improving pandas UDFs pipeline)

## What changes were proposed in this pull request?

Flush batch timely for pandas UDF.

This could improve performance when multiple pandas UDF plans are pipelined.

When batch being flushed in time, downstream pandas UDFs will get pipelined 
as soon as possible, and pipeline will help hide the donwstream UDFs 
computation time. For example:

When the first UDF start computing on batch-3, the second pipelined UDF can 
start computing on batch-2, and the third pipelined UDF can start computing on 
batch-1.

If we do not flush each batch in time, the donwstream UDF's pipeline will 
lag behind too much, which may increase the total processing time.

I add flush at two places:
* JVM process feed data into python worker. In jvm side, when write one 
batch, flush it
* VM process read data from python worker output, In python worker side, 
when write one batch, flush it

If no flush, the default buffer size for them are both 65536. Especially in 
the ML case, in order to make realtime prediction, we will make batch size very 
small. The buffer size is too large for the case, which cause downstream pandas 
UDF pipeline lag behind too much.

### Note
* This is only applied to pandas scalar UDF.
* Do not flush for each batch. The minimum interval between two flush is 
0.1 second. This avoid too frequent flushing when batch size is small. It works 
like:
```
last_flush_time = time.time()
for batch in iterator:
writer.write_batch(batch)
flush_time = time.time()
if self.flush_timely and (flush_time - last_flush_time > 
0.1):
  stream.flush()
  last_flush_time = flush_time
```

## How was this patch tested?

### Benchmark to make sure the flush do not cause performance regression
 Test code:
```
numRows = ...
batchSize = ...

spark.conf.set('spark.sql.execution.arrow.maxRecordsPerBatch', 
str(batchSize))
df = spark.range(1, numRows + 1, 
numPartitions=1).select(col('id').alias('a'))

pandas_udf("int", PandasUDFType.SCALAR)
def fp1(x):
return x + 10

beg_time = time.time()
result = df.select(sum(fp1('a'))).head()
print("result: " + str(result[0]))
print("consume time: " + str(time.time() - beg_time))
```
 Test Result:

 params| Consume time (Before) | Consume time (After)
 | --- | --
numRows=1, batchSize=1 | 23.43s | 24.64s
numRows=1, batchSize=1000 | 36.73s | 34.50s
numRows=1000, batchSize=100 | 35.67s | 32.64s
numRows=100, batchSize=10 | 33.60s | 32.11s
numRows=10, batchSize=1 | 33.36s | 31.82s

### Benchmark pipelined pandas UDF
 Test code:
```
spark.conf.set('spark.sql.execution.arrow.maxRecordsPerBatch', '1')
df = spark.range(1, 31, numPartitions=1).select(col('id').alias('a'))

pandas_udf("int", PandasUDFType.SCALAR)
def fp1(x):
print("run fp1")
time.sleep(1)
return x + 100

pandas_udf("int", PandasUDFType.SCALAR)
def fp2(x, y):
print("run fp2")
time.sleep(1)
return x + y

beg_time = time.time()
result = df.select(sum(fp2(fp1('a'), col('a'.head()
print("result: " + str(result[0]))
print("consume time: " + str(time.time() - beg_time))

```
 Test Result:

**Before**: consume time: 63.57s
**After**: consume time: 32.43s
**So the PR improve performance by make downstream UDF get pipelined 
early.**

Please review https://spark.apache.org/contributing.html before opening a 
pull request.

Closes #24734 from WeichenXu123/improve_pandas_udf_pipeline.

Lead-authored-by: WeichenXu 
Co-authored-by: Xiangrui Meng 
Signed-off-by: gatorsmile 
---
 python/pyspark/serializers.py | 18 --
 python/pyspark/testing/utils.py   |  3 +++
 python/pyspark/tests/te

[spark] branch master updated (eee3467 -> b30655b)

2019-06-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from eee3467  [SPARK-27938][SQL] Remove feature flag 
LEGACY_PASS_PARTITION_BY_AS_OPTIONS
 add b30655b  [SPARK-27965][SQL] Add extractors for v2 catalog transforms.

No new revisions were added by this update.

Summary of changes:
 .../sql/catalog/v2/expressions/expressions.scala   |  83 +++
 .../v2/expressions/TransformExtractorSuite.scala   | 156 +
 2 files changed, 239 insertions(+)
 create mode 100644 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalog/v2/expressions/TransformExtractorSuite.scala


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27918][SQL] Port boolean.sql

2019-06-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eadb538  [SPARK-27918][SQL] Port boolean.sql
eadb538 is described below

commit eadb53824d08480131498c7eb5bd7674f48b62c7
Author: Yuming Wang 
AuthorDate: Thu Jun 6 10:57:10 2019 -0700

[SPARK-27918][SQL] Port boolean.sql

## What changes were proposed in this pull request?

This PR is to port boolean.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out

When porting the test cases, found two PostgreSQL specific features that do 
not exist in Spark SQL:
- [SPARK-27931](https://issues.apache.org/jira/browse/SPARK-27931): Accept 
'on' and 'off' as input for boolean data type / Trim the string when cast to 
boolean type / Accept unique prefixes thereof
- [SPARK-27924](https://issues.apache.org/jira/browse/SPARK-27924): Support 
E061-14: Search Conditions

Also, found an inconsistent behavior:
- [SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): 
Unsupported input throws an exception in PostgreSQL but Spark accepts it and 
sets the value to `NULL`, for example:
```sql
SELECT bool 'test' AS error; -- SELECT boolean('test') AS error;
```

## How was this patch tested?

N/A

Closes #24767 from wangyum/SPARK-27918.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../resources/sql-tests/inputs/pgSQL/boolean.sql   | 284 
 .../sql-tests/results/pgSQL/boolean.sql.out| 741 +
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  10 +
 3 files changed, 1035 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql
new file mode 100644
index 000..8ba6f97
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql
@@ -0,0 +1,284 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- BOOLEAN
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql
+
+--
+-- sanity check - if this fails go insane!
+--
+SELECT 1 AS one;
+
+
+-- **testing built-in type bool
+
+-- check bool input syntax
+
+SELECT true AS true;
+
+SELECT false AS false;
+
+SELECT boolean('t') AS true;
+
+-- [SPARK-27931] Trim the string when cast string type to boolean type
+SELECT boolean('   f   ') AS false;
+
+SELECT boolean('true') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept 'test' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('test') AS error;
+
+SELECT boolean('false') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'foo' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('foo') AS error;
+
+SELECT boolean('y') AS true;
+
+SELECT boolean('yes') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept 'yeah' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('yeah') AS error;
+
+SELECT boolean('n') AS false;
+
+SELECT boolean('no') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'nay' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('nay') AS error;
+
+-- [SPARK-27931] Accept 'on' and 'off' as input for boolean data type
+SELECT boolean('on') AS true;
+
+SELECT boolean('off') AS false;
+
+-- [SPARK-27931] Accept unique prefixes thereof
+SELECT boolean('of') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'o' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('o') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept 'on_' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('on_') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept 'off_' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('off_') AS error;
+
+SELECT boolean('1') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept '11' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('11') AS error;
+
+SELECT boolean('0') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept '000' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('000') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept '' but Spark SQL accep

[spark] branch master updated: [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

2019-06-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4de9649  [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]
4de9649 is described below

commit 4de96493ae1595bf6a80596c99df0e003ef0cf7d
Author: Yuming Wang 
AuthorDate: Thu Jun 6 09:28:59 2019 -0700

[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

## What changes were proposed in this pull request?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984

When porting the test cases, found four PostgreSQL specific features that 
do not exist in Spark SQL:

- [SPARK-27877](https://issues.apache.org/jira/browse/SPARK-27877): 
Implement SQL-standard LATERAL subqueries
- [SPARK-27878](https://issues.apache.org/jira/browse/SPARK-27878): Support 
ARRAY(sub-SELECT) expressions
- [SPARK-27879](https://issues.apache.org/jira/browse/SPARK-27879): 
Implement bitwise integer aggregates(BIT_AND and BIT_OR)
- [SPARK-27880](https://issues.apache.org/jira/browse/SPARK-27880): 
Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY)

## How was this patch tested?

N/A

Closes #24743 from wangyum/SPARK-27883.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../sql-tests/inputs/pgSQL/aggregates_part1.sql|   2 +-
 .../sql-tests/inputs/pgSQL/aggregates_part2.sql| 228 +
 .../results/pgSQL/aggregates_part2.sql.out | 162 +++
 3 files changed, 391 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
index de7bbda..a81eca2 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
@@ -3,7 +3,7 @@
 --
 --
 -- AGGREGATES [Part 1]
--- 
https://github.com/postgres/postgres/blob/02ddd499322ab6f2f0d58692955dc9633c2150fc/src/test/regress/sql/aggregates.sql#L1-L143
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L1-L143
 
 -- avoid bit-exact output here because operations may not be bit-exact.
 -- SET extra_float_digits = 0;
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
new file mode 100644
index 000..c461370
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
@@ -0,0 +1,228 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- AGGREGATES [Part 2]
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350
+
+create temporary view int4_tbl as select * from values
+  (0),
+  (123456),
+  (-123456),
+  (2147483647),
+  (-2147483647)
+  as int4_tbl(f1);
+
+-- Test handling of Params within aggregate arguments in hashed aggregation.
+-- Per bug report from Jeevan Chalke.
+-- [SPARK-27877] Implement SQL-standard LATERAL subqueries
+-- explain (verbose, costs off)
+-- select s1, s2, sm
+-- from generate_series(1, 3) s1,
+--  lateral (select s2, sum(s1 + s2) sm
+--   from generate_series(1, 3) s2 group by s2) ss
+-- order by 1, 2;
+-- select s1, s2, sm
+-- from generate_series(1, 3) s1,
+--  lateral (select s2, sum(s1 + s2) sm
+--   from generate_series(1, 3) s2 group by s2) ss
+-- order by 1, 2;
+
+-- [SPARK-27878] Support ARRAY(sub-SELECT) expressions
+-- explain (verbose, costs off)
+-- select array(select sum(x+y) s
+-- from generate_series(1,3) y group by y order by s)
+--   from generate_series(1,3) x;
+-- select array(select sum(x+y) s
+-- from generate_series(1,3) y group by y order by s)
+--   from generate_series(1,3) x;
+
+-- [SPARK-27879] Implement bitwise integer aggregates(BIT_AND and BIT_OR)
+--
+-- test for bitwise integer aggregates
+--
+-- CREATE TEMPORARY TABLE bitwise_test(
+--   i2 INT2,
+--   i4 INT4,
+--   i8 INT8,
+--   i INTEGER,
+--   x INT2,
+--   y BIT(4)
+-- );
+
+-- empty case
+-- SELECT
+--   BIT_AND(i2) AS "?",
+--   BIT_OR(i4)  AS "?"
+-- FROM bitwise_test;
+
+-- COPY bitwise_test FROM STDIN NULL 'null';
+-- 1   1   1   1   1   B0101
+-- 3   3   3   null2   B0100
+-- 7   7   7   3   4   B1100
+-- \.
+
+-- SELECT
+--   BIT_AND(i2) AS "1",
+--   BIT_AND(i4) AS "1",
+--   BIT_AND(i8)

[spark] branch master updated (6c28ef1 -> 5d6758c)

2019-06-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6c28ef1  [SPARK-27933][SS] Extracting common purge behaviour to the 
parent StreamExecution
 add 5d6758c  [SPARK-27857][SQL] Move ALTER TABLE parsing into Catalyst

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  40 +++-
 .../spark/sql/catalyst/parser/AstBuilder.scala | 162 +++-
 .../plans/logical/sql/AlterTableStatements.scala   |  78 
 ...ewStatement.scala => AlterViewStatements.scala} |  20 +-
 .../plans/logical/sql/CreateTableStatement.scala   |  10 +-
 .../plans/logical/sql/ParsedStatement.scala|   5 +
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 207 -
 .../spark/sql/execution/SparkSqlParser.scala   |  60 +-
 .../datasources/DataSourceResolution.scala |  42 -
 .../sql/execution/command/DDLParserSuite.scala |  63 +--
 .../execution/command/PlanResolutionSuite.scala|  76 
 11 files changed, 612 insertions(+), 151 deletions(-)
 create mode 100644 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/AlterTableStatements.scala
 copy 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/{DropViewStatement.scala
 => AlterViewStatements.scala} (69%)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (3f102a8 -> 8b6232b)

2019-06-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3f102a8  [SPARK-27749][SQL] hadoop-3.2 support hive-thriftserver
 add 8b6232b  [SPARK-27521][SQL] Move data source v2 to catalyst module

No new revisions were added by this update.

Summary of changes:
 project/MimaExcludes.scala | 39 ++
 sql/catalyst/pom.xml   |  4 +++
 .../spark/sql/sources/v2/SessionConfigSupport.java |  0
 .../apache/spark/sql/sources/v2/SupportsRead.java  |  0
 .../apache/spark/sql/sources/v2/SupportsWrite.java |  0
 .../apache/spark/sql/sources/v2/TableProvider.java |  9 +
 .../apache/spark/sql/sources/v2/reader/Batch.java  |  0
 .../sql/sources/v2/reader/InputPartition.java  |  0
 .../sql/sources/v2/reader/PartitionReader.java |  0
 .../sources/v2/reader/PartitionReaderFactory.java  |  0
 .../apache/spark/sql/sources/v2/reader/Scan.java   |  0
 .../spark/sql/sources/v2/reader/ScanBuilder.java   |  0
 .../spark/sql/sources/v2/reader/Statistics.java|  0
 .../sources/v2/reader/SupportsPushDownFilters.java |  0
 .../v2/reader/SupportsPushDownRequiredColumns.java |  0
 .../v2/reader/SupportsReportPartitioning.java  |  0
 .../v2/reader/SupportsReportStatistics.java|  0
 .../reader/partitioning/ClusteredDistribution.java |  0
 .../v2/reader/partitioning/Distribution.java   |  0
 .../v2/reader/partitioning/Partitioning.java   |  0
 .../streaming/ContinuousPartitionReader.java   |  0
 .../ContinuousPartitionReaderFactory.java  |  0
 .../v2/reader/streaming/ContinuousStream.java  |  0
 .../v2/reader/streaming/MicroBatchStream.java  |  0
 .../sql/sources/v2/reader/streaming/Offset.java|  0
 .../v2/reader/streaming/PartitionOffset.java   |  0
 .../v2/reader/streaming/SparkDataStream.java   |  0
 .../spark/sql/sources/v2/writer/BatchWrite.java|  0
 .../spark/sql/sources/v2/writer/DataWriter.java|  0
 .../sql/sources/v2/writer/DataWriterFactory.java   |  0
 .../v2/writer/SupportsDynamicOverwrite.java|  0
 .../sql/sources/v2/writer/SupportsOverwrite.java   |  0
 .../sql/sources/v2/writer/SupportsTruncate.java|  0
 .../spark/sql/sources/v2/writer/WriteBuilder.java  |  0
 .../sql/sources/v2/writer/WriterCommitMessage.java |  0
 .../streaming/StreamingDataWriterFactory.java  |  0
 .../v2/writer/streaming/StreamingWrite.java|  0
 .../spark/sql/vectorized/ArrowColumnVector.java|  2 +-
 .../apache/spark/sql/vectorized/ColumnVector.java  |  0
 .../apache/spark/sql/vectorized/ColumnarArray.java |  0
 .../apache/spark/sql/vectorized/ColumnarBatch.java |  0
 .../apache/spark/sql/vectorized/ColumnarMap.java   |  0
 .../apache/spark/sql/vectorized/ColumnarRow.java   |  0
 .../org/apache/spark/sql/sources/filters.scala |  0
 .../org/apache/spark/sql/util}/ArrowUtils.scala|  2 +-
 .../apache/spark/sql/util}/ArrowUtilsSuite.scala   |  4 +--
 sql/core/pom.xml   |  4 ---
 .../sql/execution/arrow/ArrowConverters.scala  |  1 +
 .../spark/sql/execution/arrow/ArrowWriter.scala|  1 +
 .../execution/python/AggregateInPandasExec.scala   |  2 +-
 .../sql/execution/python/ArrowEvalPythonExec.scala |  2 +-
 .../sql/execution/python/ArrowPythonRunner.scala   |  3 +-
 .../python/FlatMapGroupsInPandasExec.scala |  4 +--
 .../sql/execution/python/WindowInPandasExec.scala  |  2 +-
 .../spark/sql/execution/r/ArrowRRunner.scala   |  3 +-
 .../sql/execution/arrow/ArrowConvertersSuite.scala |  1 +
 .../sources/RateStreamProviderSuite.scala  |  2 +-
 .../streaming/sources/TextSocketStreamSuite.scala  |  2 +-
 .../vectorized/ArrowColumnVectorSuite.scala|  2 +-
 .../execution/vectorized/ColumnarBatchSuite.scala  |  4 +--
 60 files changed, 65 insertions(+), 28 deletions(-)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsRead.java (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/SupportsWrite.java 
(100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java (89%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Batch.java (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReader.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java
 (100%)
 rename sql/{core => 
catalyst}/src/main/java/org/apache/spark/sql/sources/v2/reader/Scan.java (100%)
 rename sql/{core => 
catalyst}/src/main/jav

[spark] branch master updated (b312033 -> 18834e8)

2019-06-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b312033  [SPARK-20286][CORE] Improve logic for timing out executors in 
dynamic allocation.
 add 18834e8  [SPARK-27899][SQL] Refactor getTableOption() to extract a 
common method

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/hive/client/HiveClientImpl.scala | 206 +++--
 1 file changed, 104 insertions(+), 102 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [MINOR][DOC] Avro data source documentation change

2019-06-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 9d307dd  [MINOR][DOC] Avro data source documentation change
9d307dd is described below

commit 9d307dd53857e3eac9509992505be5ca37f2de7a
Author: Jules Damji 
AuthorDate: Tue Jun 4 16:17:53 2019 -0700

[MINOR][DOC] Avro data source documentation change

This is a minor documentation change whereby the 
https://spark.apache.org/docs/latest/sql-data-sources-avro.html mentions "The 
date type and naming of record fields should match the input Avro data or 
Catalyst data,"

The term Catalyst data is confusing. It should instead say, Spark's 
internal data type such as String
Type or IntegerType.

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
There are no code changes; only doc changes.
Please review https://spark.apache.org/contributing.html before opening a 
pull request.

Closes #24787 from dmatrix/br-orc-ds.doc.changes.

Authored-by: Jules Damji 
Signed-off-by: gatorsmile 
(cherry picked from commit b71abd654de2886ff2b44cada81ea909a0712f7c)
Signed-off-by: gatorsmile 
---
 docs/sql-data-sources-avro.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index d3b81f0..285b3aa 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -148,8 +148,8 @@ Data source options of Avro can be set using the `.option` 
method on `DataFrameR
   
 avroSchema
 None
-Optional Avro schema provided by an user in JSON format. The date type 
and naming of record fields
-should match the input Avro data or Catalyst data, otherwise the 
read/write action will fail.
+Optional Avro schema provided by a user in JSON format. The data type 
and naming of record fields
+should match the Avro data type when reading from Avro or match the 
Spark's internal data type (e.g., StringType, IntegerType) when writing to Avro 
files; otherwise, the read/write action will fail.
 read and write
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (de73a54 -> b71abd6)

2019-06-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from de73a54  [SPARK-27909][SQL] Do not run analysis inside CTE substitution
 add b71abd6  [MINOR][DOC] Avro data source documentation change

No new revisions were added by this update.

Summary of changes:
 docs/sql-data-sources-avro.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27909][SQL] Do not run analysis inside CTE substitution

2019-06-04 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new de73a54  [SPARK-27909][SQL] Do not run analysis inside CTE substitution
de73a54 is described below

commit de73a54269cd73fba1a9735304f641e1d9c45789
Author: Ryan Blue 
AuthorDate: Tue Jun 4 14:46:13 2019 -0700

[SPARK-27909][SQL] Do not run analysis inside CTE substitution

## What changes were proposed in this pull request?

This updates CTE substitution to avoid needing to run all resolution rules 
on each substituted expression. Running resolution rules was previously used to 
avoid infinite recursion. In the updated rule, CTE plans are substituted as 
sub-queries from right to left. Using this scope-based order, it is not 
necessary to replace multiple CTEs at the same time using 
`resolveOperatorsDown`. Instead, `resolveOperatorsUp` is used to replace each 
CTE individually.

By resolving using `resolveOperatorsUp`, this no longer needs to run all 
analyzer rules on each substituted expression. Previously, this was done to 
apply `ResolveRelations`, which would throw an `AnalysisException` for all 
unresolved relations so that unresolved relations that may cause recursive 
substitutions were not left in the plan. Because this is no longer needed, 
`ResolveRelations` no longer needs to throw `AnalysisException` and resolution 
can be done in multiple rules.

## How was this patch tested?

Existing tests in `SQLQueryTestSuite`, `cte.sql`.

Closes #24763 from rdblue/SPARK-27909-fix-cte-substitution.

Authored-by: Ryan Blue 
Signed-off-by: gatorsmile 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 91365fc..841b858 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -215,23 +215,26 @@ class Analyzer(
   object CTESubstitution extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
   case With(child, relations) =>
-substituteCTE(child, relations.foldLeft(Seq.empty[(String, 
LogicalPlan)]) {
-  case (resolved, (name, relation)) =>
-resolved :+ name -> executeSameContext(substituteCTE(relation, 
resolved))
-})
+// substitute CTE expressions right-to-left to resolve references to 
previous CTEs:
+// with a as (select * from t), b as (select * from a) select * from b
+relations.foldRight(child) {
+  case ((cteName, ctePlan), currentPlan) =>
+substituteCTE(currentPlan, cteName, ctePlan)
+}
   case other => other
 }
 
-def substituteCTE(plan: LogicalPlan, cteRelations: Seq[(String, 
LogicalPlan)]): LogicalPlan = {
-  plan resolveOperatorsDown {
+def substituteCTE(plan: LogicalPlan, cteName: String, ctePlan: 
LogicalPlan): LogicalPlan = {
+  plan resolveOperatorsUp {
+case UnresolvedRelation(TableIdentifier(table, None)) if 
resolver(cteName, table) =>
+  ctePlan
 case u: UnresolvedRelation =>
-  cteRelations.find(x => resolver(x._1, u.tableIdentifier.table))
-.map(_._2).getOrElse(u)
+  u
 case other =>
   // This cannot be done in ResolveSubquery because ResolveSubquery 
does not know the CTE.
   other transformExpressions {
 case e: SubqueryExpression =>
-  e.withNewPlan(substituteCTE(e.plan, cteRelations))
+  e.withNewPlan(substituteCTE(e.plan, cteName, ctePlan))
   }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27926][SQL] Allow altering table add columns with CSVFileFormat/JsonFileFormat provider

2019-06-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d1937c1  [SPARK-27926][SQL] Allow altering table add columns with 
CSVFileFormat/JsonFileFormat provider
d1937c1 is described below

commit d1937c14795aed4bea89065c3796a5bc1dd18c8f
Author: Gengliang Wang 
AuthorDate: Mon Jun 3 23:51:05 2019 -0700

[SPARK-27926][SQL] Allow altering table add columns with 
CSVFileFormat/JsonFileFormat provider

## What changes were proposed in this pull request?

In the previous work of csv/json migration, CSVFileFormat/JsonFileFormat is 
removed in the table provider whitelist of 
`AlterTableAddColumnsCommand.verifyAlterTableAddColumn`:
https://github.com/apache/spark/pull/24005
https://github.com/apache/spark/pull/24058

This is regression. If a table is created with Provider 
`org.apache.spark.sql.execution.datasources.csv.CSVFileFormat` or 
`org.apache.spark.sql.execution.datasources.json.JsonFileFormat`, Spark should 
allow the "alter table add column" operation.

## How was this patch tested?

Unit test

Closes #24776 from gengliangwang/v1Table.

Authored-by: Gengliang Wang 
Signed-off-by: gatorsmile 
---
 .../main/scala/org/apache/spark/sql/execution/command/tables.scala   | 5 -
 .../test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala | 5 -
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index ea29eff..64f739f 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -37,6 +37,8 @@ import org.apache.spark.sql.catalyst.expressions.{Attribute, 
AttributeReference}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.util.{escapeSingleQuotedString, 
quoteIdentifier}
 import org.apache.spark.sql.execution.datasources.{DataSource, 
PartitioningUtils}
+import org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
+import org.apache.spark.sql.execution.datasources.json.JsonFileFormat
 import org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat
 import org.apache.spark.sql.execution.datasources.v2.csv.CSVDataSourceV2
 import org.apache.spark.sql.execution.datasources.v2.json.JsonDataSourceV2
@@ -238,7 +240,8 @@ case class AlterTableAddColumnsCommand(
 // TextFileFormat only default to one column "value"
 // Hive type is already considered as hive serde table, so the logic 
will not
 // come in here.
-case _: JsonDataSourceV2 | _: CSVDataSourceV2 | _: ParquetFileFormat | 
_: OrcDataSourceV2 =>
+case _: CSVFileFormat | _: JsonFileFormat | _: ParquetFileFormat =>
+case _: JsonDataSourceV2 | _: CSVDataSourceV2 | _: OrcDataSourceV2 =>
 case s if s.getClass.getCanonicalName.endsWith("OrcFileFormat") =>
 case s =>
   throw new AnalysisException(
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
index 0124f28..b777db7 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala
@@ -2566,7 +2566,10 @@ abstract class DDLSuite extends QueryTest with 
SQLTestUtils {
 }
   }
 
-  val supportedNativeFileFormatsForAlterTableAddColumns = Seq("parquet", 
"json", "csv")
+  val supportedNativeFileFormatsForAlterTableAddColumns = Seq("csv", "json", 
"parquet",
+"org.apache.spark.sql.execution.datasources.csv.CSVFileFormat",
+"org.apache.spark.sql.execution.datasources.json.JsonFileFormat",
+"org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat")
 
   supportedNativeFileFormatsForAlterTableAddColumns.foreach { provider =>
 test(s"alter datasource table add columns - $provider") {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (a12de29 -> 5e3520f)

2019-05-24 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a12de29  [SPARK-27838][SQL] Support user provided non-nullable avro 
schema for nullable catalyst schema without any null record
 add 5e3520f  [SPARK-27809][SQL] Make optional clauses order insensitive 
for CREATE DATABASE/VIEW SQL statement

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4| 13 ++--
 .../sql/catalyst/parser/ParserUtilsSuite.scala |  7 +-
 .../spark/sql/execution/SparkSqlParser.scala   | 76 +-
 .../sql/execution/command/DDLParserSuite.scala | 38 ++-
 4 files changed, 94 insertions(+), 40 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27824][SQL] Make rule EliminateResolvedHint idempotent

2019-05-24 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new de13f70  [SPARK-27824][SQL] Make rule EliminateResolvedHint idempotent
de13f70 is described below

commit de13f70ce1b3d5c90dc9b4f185e50dea3f1508b7
Author: maryannxue 
AuthorDate: Fri May 24 11:25:22 2019 -0700

[SPARK-27824][SQL] Make rule EliminateResolvedHint idempotent

## What changes were proposed in this pull request?

This fix prevents the rule EliminateResolvedHint from being applied again 
if it's already applied.

## How was this patch tested?

Added new UT.

Closes #24692 from maryannxue/eliminatehint-bug.

Authored-by: maryannxue 
Signed-off-by: gatorsmile 
---
 .../catalyst/optimizer/EliminateResolvedHint.scala |  2 +-
 .../scala/org/apache/spark/sql/JoinHintSuite.scala | 23 ++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
index aebd660..6419f9c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
@@ -29,7 +29,7 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
   // is using transformUp rather than resolveOperators.
   def apply(plan: LogicalPlan): LogicalPlan = {
 val pulledUp = plan transformUp {
-  case j: Join =>
+  case j: Join if j.hint == JoinHint.NONE =>
 val (newLeft, leftHints) = extractHintsFromPlan(j.left)
 val (newRight, rightHints) = extractHintsFromPlan(j.right)
 val newJoinHint = JoinHint(mergeHints(leftHints), 
mergeHints(rightHints))
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
index 9c2dc0c..534560b 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
@@ -22,8 +22,10 @@ import scala.collection.mutable.ArrayBuffer
 import org.apache.log4j.{AppenderSkeleton, Level}
 import org.apache.log4j.spi.LoggingEvent
 
+import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical._
+import org.apache.spark.sql.catalyst.rules.RuleExecutor
 import org.apache.spark.sql.execution.joins._
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.test.SharedSQLContext
@@ -557,4 +559,25 @@ class JoinHintSuite extends PlanTest with SharedSQLContext 
{
   }
 }
   }
+
+  test("Verify that the EliminatedResolvedHint rule is idempotent") {
+withTempView("t1", "t2") {
+  Seq((1, "4"), (2, "2")).toDF("key", "value").createTempView("t1")
+  Seq((1, "1"), (2, "12.3"), (2, "123")).toDF("key", 
"value").createTempView("t2")
+  val df = sql("SELECT /*+ broadcast(t2) */ * from t1 join t2 ON t1.key = 
t2.key")
+  val optimize = new RuleExecutor[LogicalPlan] {
+val batches = Batch("EliminateResolvedHint", FixedPoint(10), 
EliminateResolvedHint) :: Nil
+  }
+  val optimized = optimize.execute(df.logicalPlan)
+  val expectedHints =
+JoinHint(
+  None,
+  Some(HintInfo(strategy = Some(BROADCAST :: Nil
+  val joinHints = optimized collect {
+case Join(_, _, _, _, hint) => hint
+case _: ResolvedHint => fail("ResolvedHint should not appear after 
optimize.")
+  }
+  assert(joinHints == expectedHints)
+}
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26356][SQL] remove SaveMode from data source v2

2019-05-24 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7d318bf  [SPARK-26356][SQL] remove SaveMode from data source v2
7d318bf is described below

commit 7d318bfe907ab22b904d118e4ff4970af32b0e44
Author: Wenchen Fan 
AuthorDate: Fri May 24 10:45:46 2019 -0700

[SPARK-26356][SQL] remove SaveMode from data source v2

## What changes were proposed in this pull request?

In data source v1, save mode specified in `DataFrameWriter` is passed to 
data source implementation directly, and each data source can define its own 
behavior about save mode. This is confusing and we want to get rid of save mode 
in data source v2.

For data source v2, we expect data source to implement the `TableCatalog` 
API, and end-users use SQL(or the new write API described in [this 
doc](https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5ace0718#heading=h.e9v1af12g5zo))
 to acess data sources. The SQL API has very clear semantic and we don't need 
save mode at all.

However, for simple data sources that do not have table management (like a 
JIRA data source, a noop sink, etc.), it's not ideal to ask them to implement 
the `TableCatalog` API, and throw exception here and there.

`TableProvider` API is created for simple data sources. It can only get 
tables, without any other table management methods. This means, it can only 
deal with existing tables.

`TableProvider` fits well with `DataStreamReader` and `DataStreamWriter`, 
as they can only read/write existing tables. However, `TableProvider` doesn't 
fit `DataFrameWriter` well, as the save mode requires more than just get table. 
More specifically, `ErrorIfExists` mode needs to check if table exists, and 
create table. `Ignore` mode needs to check if table exists. When end-users 
specify `ErrorIfExists` or `Ignore` mode and write data to `TableProvider` via 
`DataFrameWriter`, Spark fa [...]

The file source is in the middle of `TableProvider` and `TableCatalog`: 
it's simple but it can check table(path) exists and create table(path). That 
said, file source supports all the save modes.

Currently file source implements `TableProvider`, and it's not working 
because `TableProvider` doesn't support `ErrorIfExists` and `Ignore` modes. 
Ideally we should create a new API for path-based data sources, but to unblock 
the work of file source v2 migration, this PR proposes to special-case file 
source v2 in `DataFrameWriter`, to make it work.

This PR also removes `SaveMode` from data source v2, as now only the 
internal file source v2 needs it.

## How was this patch tested?

existing tests

Closes #24233 from cloud-fan/file.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 .../apache/spark/sql/sources/v2/TableProvider.java |  4 ++
 .../sql/sources/v2/writer/SupportsSaveMode.java| 26 
 .../spark/sql/sources/v2/writer/WriteBuilder.java  |  4 --
 .../org/apache/spark/sql/DataFrameWriter.scala | 77 --
 .../datasources/noop/NoopDataSource.scala  |  6 +-
 .../datasources/v2/FileWriteBuilder.scala  |  7 +-
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 35 +++---
 .../spark/sql/sources/v2/DataSourceV2Suite.scala   | 44 -
 .../sources/v2/FileDataSourceV2FallBackSuite.scala |  4 +-
 .../sql/sources/v2/SimpleWritableDataSource.scala  | 23 ++-
 .../sql/test/DataFrameReaderWriterSuite.scala  | 72 ++--
 11 files changed, 147 insertions(+), 155 deletions(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
index 04ad8fd..0e2eb9c 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java
@@ -26,6 +26,10 @@ import org.apache.spark.sql.util.CaseInsensitiveStringMap;
  * The base interface for v2 data sources which don't have a real catalog. 
Implementations must
  * have a public, 0-arg constructor.
  * 
+ * Note that, TableProvider can only apply data operations to existing tables, 
like read, append,
+ * delete, and overwrite. It does not support the operations that require 
metadata changes, like
+ * create/drop tables.
+ * 
  * The major responsibility of this interface is to return a {@link Table} for 
read/write.
  * 
  */
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsSaveMode.java
 
b/sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsSaveMode.java
deleted file mode 100644
index c4295f2..000
--- 
a/sql/core/src/main/java/org

[spark] branch master updated: [SPARK-27816][SQL] make TreeNode tag type safe

2019-05-23 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a68fc3  [SPARK-27816][SQL] make TreeNode tag type safe
1a68fc3 is described below

commit 1a68fc38f0aafb9015c499b3f9f7fbe63739e909
Author: Wenchen Fan 
AuthorDate: Thu May 23 11:53:21 2019 -0700

[SPARK-27816][SQL] make TreeNode tag type safe

## What changes were proposed in this pull request?

Add type parameter to `TreeNodeTag`.

## How was this patch tested?

existing tests

Closes #24687 from cloud-fan/tag.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 .../plans/logical/basicLogicalOperators.scala   |  2 +-
 .../apache/spark/sql/catalyst/trees/TreeNode.scala  | 21 -
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala| 18 ++
 .../org/apache/spark/sql/execution/SparkPlan.scala  |  5 +++--
 .../spark/sql/execution/SparkStrategies.scala   |  2 +-
 .../execution/LogicalPlanTagInSparkPlanSuite.scala  | 11 +--
 6 files changed, 36 insertions(+), 23 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index a2a7eb1..4350f91 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -1083,7 +1083,7 @@ case class OneRowRelation() extends LeafNode {
   /** [[org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy()]] does not 
support 0-arg ctor. */
   override def makeCopy(newArgs: Array[AnyRef]): OneRowRelation = {
 val newCopy = OneRowRelation()
-newCopy.tags ++= this.tags
+newCopy.copyTagsFrom(this)
 newCopy
   }
 }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index a5705d0..cd5dfb7 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -74,9 +74,8 @@ object CurrentOrigin {
   }
 }
 
-// The name of the tree node tag. This is preferred over using string 
directly, as we can easily
-// find all the defined tags.
-case class TreeNodeTagName(name: String)
+// A tag of a `TreeNode`, which defines name and type
+case class TreeNodeTag[T](name: String)
 
 // scalastyle:off
 abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
@@ -89,7 +88,19 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
* A mutable map for holding auxiliary information of this tree node. It 
will be carried over
* when this node is copied via `makeCopy`, or transformed via 
`transformUp`/`transformDown`.
*/
-  val tags: mutable.Map[TreeNodeTagName, Any] = mutable.Map.empty
+  private val tags: mutable.Map[TreeNodeTag[_], Any] = mutable.Map.empty
+
+  protected def copyTagsFrom(other: BaseType): Unit = {
+tags ++= other.tags
+  }
+
+  def setTagValue[T](tag: TreeNodeTag[T], value: T): Unit = {
+tags(tag) = value
+  }
+
+  def getTagValue[T](tag: TreeNodeTag[T]): Option[T] = {
+tags.get(tag).map(_.asInstanceOf[T])
+  }
 
   /**
* Returns a Seq of the children of this node.
@@ -418,7 +429,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
 try {
   CurrentOrigin.withOrigin(origin) {
 val res = defaultCtor.newInstance(allArgs.toArray: 
_*).asInstanceOf[BaseType]
-res.tags ++= this.tags
+res.copyTagsFrom(this)
 res
   }
 } catch {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
index 5cfa84d..744d522 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
@@ -622,31 +622,33 @@ class TreeNodeSuite extends SparkFunSuite with SQLHelper {
   }
 
   test("tags will be carried over after copy & transform") {
+val tag = TreeNodeTag[String]("test")
+
 withClue("makeCopy") {
   val node = Dummy(None)
-  node.tags += TreeNodeTagName("test") -> "a"
+  node.setTagValue(tag, "a")
   val copied = node.makeCopy(Array(Some(Literal(1
-  assert(copied.tags(TreeNodeTagName("test")) == "a")
+  assert(copied.getTagValue(tag) == S

[spark] branch master updated: [SPARK-27439][SQL] Explainging Dataset should show correct resolved plans

2019-05-21 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c033a3e  [SPARK-27439][SQL] Explainging Dataset should show correct 
resolved plans
c033a3e is described below

commit c033a3e1e60a785a42d48c60dd5d72dcb7b0f7c9
Author: Liang-Chi Hsieh 
AuthorDate: Tue May 21 11:27:05 2019 -0700

[SPARK-27439][SQL] Explainging Dataset should show correct resolved plans

## What changes were proposed in this pull request?

Because a temporary view is resolved during analysis when we create a 
dataset, the content of the view is determined when the dataset is created, not 
when it is evaluated. Now the explain result of a dataset is not correctly 
consistent with the collected result of it, because we use pre-analyzed logical 
plan of the dataset in explain command. The explain command will analyzed the 
logical plan passed in. So if a view is changed after the dataset was created, 
the plans shown by explain  [...]

```scala
scala> spark.range(10).createOrReplaceTempView("test")
scala> spark.range(5).createOrReplaceTempView("test2")
scala> spark.sql("select * from test").createOrReplaceTempView("tmp001")
scala> val df = spark.sql("select * from tmp001")
scala> spark.sql("select * from test2").createOrReplaceTempView("tmp001")
scala> df.show
+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+
scala> df.explain(true)
```

Before:
```scala
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `tmp001`

== Analyzed Logical Plan ==
id: bigint
Project [id#2L]
+- SubqueryAlias `tmp001`
   +- Project [id#2L]
  +- SubqueryAlias `test2`
 +- Range (0, 5, step=1, splits=Some(12))

== Optimized Logical Plan ==
Range (0, 5, step=1, splits=Some(12))

== Physical Plan ==
*(1) Range (0, 5, step=1, splits=12)
```

After:
```scala
== Parsed Logical Plan ==
'Project [*]
+- 'UnresolvedRelation `tmp001`

== Analyzed Logical Plan ==
id: bigint
Project [id#0L]
+- SubqueryAlias `tmp001`
   +- Project [id#0L]
  +- SubqueryAlias `test`
 +- Range (0, 10, step=1, splits=Some(12))

== Optimized Logical Plan ==
Range (0, 10, step=1, splits=Some(12))

== Physical Plan ==
*(1) Range (0, 10, step=1, splits=12)
```

Previous PR to this issue has a regression when to explain an explain 
statement, like `sql("explain select 1").explain(true)`. This new fix is 
following up with hvanhovell's advice at 
https://github.com/apache/spark/pull/24464#issuecomment-494165538.

Explain an explain:
```scala
scala> sql("explain select 1").explain(true)
== Parsed Logical Plan ==
ExplainCommand 'Project [unresolvedalias(1, None)], false, false, false

== Analyzed Logical Plan ==
plan: string
ExplainCommand 'Project [unresolvedalias(1, None)], false, false, false

== Optimized Logical Plan ==
ExplainCommand 'Project [unresolvedalias(1, None)], false, false, false

== Physical Plan ==
Execute ExplainCommand
   +- ExplainCommand 'Project [unresolvedalias(1, None)], false, false, 
false
```

Btw, I found there is a regression after applying hvanhovell's advice:

```scala
spark.readStream
  .format("org.apache.spark.sql.streaming.test")
  .load()
  .explain(true)
```

```scala
== Parsed Logical Plan ==
StreamingRelation 
DataSource(org.apache.spark.sql.test.TestSparkSession3e8c7175,org.apache.spark.sql.streaming.test,List(),None,List(),None,Map(),None
), dummySource, [a#559]

== Analyzed Logical Plan ==
a: int
StreamingRelation 
DataSource(org.apache.spark.sql.test.TestSparkSession3e8c7175,org.apache.spark.sql.streaming.test,List(),None,List(),None,Map(),Non$
), dummySource, [a#559]

== Optimized Logical Plan ==
org.apache.spark.sql.AnalysisException: Queries with streaming sources must 
be executed with writeStream.start();;
dummySource
== Physical Plan ==
org.apache.spark.sql.AnalysisException: Queries with streaming sources must 
be executed with writeStream.start();;
dummySource
```

So I did a change to that to fix it too.

## How was this patch tested?

Added test and manually test.

Closes #24654 from viirya/SPARK-27439-3.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: gatorsmile 
---
 .../main/scala/org/apache/spark/s

[spark] branch master updated: [SPARK-27747][SQL] add a logical plan link in the physical plan

2019-05-20 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0e6601a  [SPARK-27747][SQL] add a logical plan link in the physical 
plan
0e6601a is described below

commit 0e6601acdf17c770f880fbc263747779739f4c92
Author: Wenchen Fan 
AuthorDate: Mon May 20 13:42:25 2019 -0700

[SPARK-27747][SQL] add a logical plan link in the physical plan

## What changes were proposed in this pull request?

It's pretty useful if we can convert a physical plan back to a logical 
plan, e.g., in https://github.com/apache/spark/pull/24389

This PR introduces a new feature to `TreeNode`, which allows `TreeNode` to 
carry some extra information via a mutable map, and keep the information when 
it's copied.

The planner leverages this feature to put the logical plan into the 
physical plan.

## How was this patch tested?

a test suite that runs all TPCDS queries and checks that some common 
physical plans contain the corresponding logical plans.

Closes #24626 from cloud-fan/link.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Peng Bo 
Signed-off-by: gatorsmile 
---
 .../plans/logical/basicLogicalOperators.scala  |   6 +-
 .../apache/spark/sql/catalyst/trees/TreeNode.scala |  25 +++-
 .../spark/sql/catalyst/trees/TreeNodeSuite.scala   |  51 
 .../org/apache/spark/sql/execution/SparkPlan.scala |   9 +-
 .../spark/sql/execution/SparkStrategies.scala  |  11 ++
 .../execution/LogicalPlanTagInSparkPlanSuite.scala | 133 +
 6 files changed, 228 insertions(+), 7 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 1925d45..a2a7eb1 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -1081,7 +1081,11 @@ case class OneRowRelation() extends LeafNode {
   override def computeStats(): Statistics = Statistics(sizeInBytes = 1)
 
   /** [[org.apache.spark.sql.catalyst.trees.TreeNode.makeCopy()]] does not 
support 0-arg ctor. */
-  override def makeCopy(newArgs: Array[AnyRef]): OneRowRelation = 
OneRowRelation()
+  override def makeCopy(newArgs: Array[AnyRef]): OneRowRelation = {
+val newCopy = OneRowRelation()
+newCopy.tags ++= this.tags
+newCopy
+  }
 }
 
 /** A logical plan for `dropDuplicates`. */
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
index 84ca066..a5705d0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
@@ -19,7 +19,7 @@ package org.apache.spark.sql.catalyst.trees
 
 import java.util.UUID
 
-import scala.collection.Map
+import scala.collection.{mutable, Map}
 import scala.reflect.ClassTag
 
 import org.apache.commons.lang3.ClassUtils
@@ -35,7 +35,7 @@ import org.apache.spark.sql.catalyst.errors._
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.JoinType
 import org.apache.spark.sql.catalyst.plans.physical.{BroadcastMode, 
Partitioning}
-import org.apache.spark.sql.catalyst.util.StringUtils.{PlanStringConcat, 
StringConcat}
+import org.apache.spark.sql.catalyst.util.StringUtils.PlanStringConcat
 import org.apache.spark.sql.catalyst.util.truncatedString
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.types._
@@ -74,6 +74,10 @@ object CurrentOrigin {
   }
 }
 
+// The name of the tree node tag. This is preferred over using string 
directly, as we can easily
+// find all the defined tags.
+case class TreeNodeTagName(name: String)
+
 // scalastyle:off
 abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product {
 // scalastyle:on
@@ -82,6 +86,12 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   val origin: Origin = CurrentOrigin.get
 
   /**
+   * A mutable map for holding auxiliary information of this tree node. It 
will be carried over
+   * when this node is copied via `makeCopy`, or transformed via 
`transformUp`/`transformDown`.
+   */
+  val tags: mutable.Map[TreeNodeTagName, Any] = mutable.Map.empty
+
+  /**
* Returns a Seq of the children of this node.
* Children should not change. Immutability required for containsChild 
optimization
*/
@@ -262,6 +272,8 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
 if (this fastEquals afterRule) {

[spark] branch master updated: [SPARK-27674][SQL] the hint should not be dropped after cache lookup

2019-05-15 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e30a98  [SPARK-27674][SQL] the hint should not be dropped after cache 
lookup
3e30a98 is described below

commit 3e30a988102e162f2702ae223312763a0bdc15eb
Author: Wenchen Fan 
AuthorDate: Wed May 15 15:47:52 2019 -0700

[SPARK-27674][SQL] the hint should not be dropped after cache lookup

## What changes were proposed in this pull request?

This is a followup of https://github.com/apache/spark/pull/20365 .

#20365 fixed this problem when the hint node is a root node. This PR fixes 
this problem for all the cases.

## How was this patch tested?

a new test

Closes #24580 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 .../catalyst/optimizer/EliminateResolvedHint.scala |  2 +-
 .../apache/spark/sql/execution/CacheManager.scala  | 22 +
 .../org/apache/spark/sql/CachedTableSuite.scala| 56 --
 3 files changed, 54 insertions(+), 26 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
index 5586690..aebd660 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
@@ -56,7 +56,7 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
* in this method will be cleaned up later by this rule, and may emit 
warnings depending on the
* configurations.
*/
-  private def extractHintsFromPlan(plan: LogicalPlan): (LogicalPlan, 
Seq[HintInfo]) = {
+  private[sql] def extractHintsFromPlan(plan: LogicalPlan): (LogicalPlan, 
Seq[HintInfo]) = {
 plan match {
   case h: ResolvedHint =>
 val (plan, hints) = extractHintsFromPlan(h.child)
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index b3c253b..a13e6ad 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -23,7 +23,8 @@ import org.apache.hadoop.fs.{FileSystem, Path}
 
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.{Dataset, SparkSession}
-import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, 
SubqueryExpression}
+import org.apache.spark.sql.catalyst.expressions.{Attribute, 
SubqueryExpression}
+import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
 import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, 
LogicalPlan, ResolvedHint}
 import org.apache.spark.sql.execution.columnar.InMemoryRelation
 import org.apache.spark.sql.execution.command.CommandUtils
@@ -212,17 +213,18 @@ class CacheManager extends Logging {
   def useCachedData(plan: LogicalPlan): LogicalPlan = {
 val newPlan = plan transformDown {
   case command: IgnoreCachedData => command
-  // Do not lookup the cache by hint node. Hint node is special, we should 
ignore it when
-  // canonicalizing plans, so that plans which are same except hint can 
hit the same cache.
-  // However, we also want to keep the hint info after cache lookup. Here 
we skip the hint
-  // node, so that the returned caching plan won't replace the hint node 
and drop the hint info
-  // from the original plan.
-  case hint: ResolvedHint => hint
 
   case currentFragment =>
-lookupCachedData(currentFragment)
-  .map(_.cachedRepresentation.withOutput(currentFragment.output))
-  .getOrElse(currentFragment)
+lookupCachedData(currentFragment).map { cached =>
+  // After cache lookup, we should still keep the hints from the input 
plan.
+  val hints = 
EliminateResolvedHint.extractHintsFromPlan(currentFragment)._2
+  val cachedPlan = 
cached.cachedRepresentation.withOutput(currentFragment.output)
+  // The returned hint list is in top-down order, we should create the 
hint nodes from
+  // right to left.
+  hints.foldRight[LogicalPlan](cachedPlan) { case (hint, p) =>
+ResolvedHint(p, hint)
+  }
+}.getOrElse(currentFragment)
 }
 
 newPlan transformAllExpressions {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
index 76350ad..62e77bf 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala
+++ b/sql/core/src/test/sca

[spark] branch master updated: [SPARK-27354][SQL] Move incompatible code from the hive-thriftserver module to sql/hive-thriftserver/v1.2.1

2019-05-15 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 02c3369  [SPARK-27354][SQL] Move incompatible code from the 
hive-thriftserver module to sql/hive-thriftserver/v1.2.1
02c3369 is described below

commit 02c33694c8254f69cb36c71c0876194dccdbc014
Author: Yuming Wang 
AuthorDate: Wed May 15 14:52:08 2019 -0700

[SPARK-27354][SQL] Move incompatible code from the hive-thriftserver module 
to sql/hive-thriftserver/v1.2.1

## What changes were proposed in this pull request?

When we upgraded the built-in Hive to 2.3.4, the current 
`hive-thriftserver` module is not compatible, such as these Hive changes:
1. [HIVE-12442](https://issues.apache.org/jira/browse/HIVE-12442) 
HiveServer2: Refactor/repackage HiveServer2's Thrift code so that it can be 
used in the tasks
2. [HIVE-12237](https://issues.apache.org/jira/browse/HIVE-12237) Use slf4j 
as logging facade
3. [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169) 
HiveServer2: Support delegation token based connection when using http transport

So this PR moves the incompatible code to `sql/hive-thriftserver/v1.2.1` 
and copies it to `sql/hive-thriftserver/v2.3.4` for the next code review.

## How was this patch tested?

manual tests:
```
diff -urNa sql/hive-thriftserver/v1.2.1 sql/hive-thriftserver/v2.3.4
```

Closes #24282 from wangyum/SPARK-27354.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 sql/hive-thriftserver/pom.xml | 4 +++-
 sql/hive-thriftserver/{ => v1.2.1}/if/TCLIService.thrift  | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TArrayTypeEntry.java  | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TBinaryColumn.java| 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TBoolColumn.java  | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TBoolValue.java   | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TByteColumn.java  | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TByteValue.java   | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TCLIService.java  | 0
 .../java/org/apache/hive/service/cli/thrift/TCLIServiceConstants.java | 0
 .../org/apache/hive/service/cli/thrift/TCancelDelegationTokenReq.java | 0
 .../apache/hive/service/cli/thrift/TCancelDelegationTokenResp.java| 0
 .../java/org/apache/hive/service/cli/thrift/TCancelOperationReq.java  | 0
 .../java/org/apache/hive/service/cli/thrift/TCancelOperationResp.java | 0
 .../java/org/apache/hive/service/cli/thrift/TCloseOperationReq.java   | 0
 .../java/org/apache/hive/service/cli/thrift/TCloseOperationResp.java  | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TCloseSessionReq.java | 0
 .../java/org/apache/hive/service/cli/thrift/TCloseSessionResp.java| 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TColumn.java  | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TColumnDesc.java  | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TColumnValue.java | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TDoubleColumn.java| 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TDoubleValue.java | 0
 .../java/org/apache/hive/service/cli/thrift/TExecuteStatementReq.java | 0
 .../org/apache/hive/service/cli/thrift/TExecuteStatementResp.java | 0
 .../java/org/apache/hive/service/cli/thrift/TFetchOrientation.java| 0
 .../gen/java/org/apache/hive/service/cli/thrift/TFetchResultsReq.java | 0
 .../java/org/apache/hive/service/cli/thrift/TFetchResultsResp.java| 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetCatalogsReq.java  | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetCatalogsResp.java | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetColumnsReq.java   | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetColumnsResp.java  | 0
 .../org/apache/hive/service/cli/thrift/TGetDelegationTokenReq.java| 0
 .../org/apache/hive/service/cli/thrift/TGetDelegationTokenResp.java   | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetFunctionsReq.java | 0
 .../java/org/apache/hive/service/cli/thrift/TGetFunctionsResp.java| 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TGetInfoReq.java  | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TGetInfoResp.java | 0
 .../src/gen/java/org/apache/hive/service/cli/thrift/TGetInfoType.java | 0
 .../gen/java/org/apache/hive/service/cli/thrift/TGetInfoValue.java| 0
 .../org/apache/hive/service/cli/thrift/TGetOperationStatusReq.java| 0
 .../org/apache/hive/service/cli/thrift/TGetOperationStatusResp.java   | 0
 .../org/apache/hive/service/cli/thrift/TGetResultSetMetadataReq.java  | 0
 .../org/apache/hive/service/cli/thrift/TGetResultSetMetadataResp.java | 0
 .../gen/java/or

[spark] branch master updated: [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on BroadcastTimeout

2019-05-15 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0bba5cf  [SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast 
execution on BroadcastTimeout
0bba5cf is described below

commit 0bba5cf56832f0690a4ebd733d01a0416e4c7252
Author: Xingbo Jiang 
AuthorDate: Wed May 15 14:47:15 2019 -0700

[SPARK-20774][SPARK-27036][SQL] Cancel the running broadcast execution on 
BroadcastTimeout

## What changes were proposed in this pull request?

In the existing code, a broadcast execution timeout for the Future only 
causes a query failure, but the job running with the broadcast and the 
computation in the Future are not canceled. This wastes resources and slows 
down the other jobs. This PR tries to cancel both the running job and the 
running hashed relation construction thread.

## How was this patch tested?

Add new test suite `BroadcastExchangeExec`

Closes #24595 from jiangxb1987/SPARK-20774.

Authored-by: Xingbo Jiang 
Signed-off-by: gatorsmile 
---
 .../org/apache/spark/sql/internal/SQLConf.scala|   5 +-
 .../execution/exchange/BroadcastExchangeExec.scala | 151 +++--
 .../sql/execution/BroadcastExchangeSuite.scala |  93 +
 3 files changed, 176 insertions(+), 73 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index b5e40dd..b4c68a7 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -2026,7 +2026,10 @@ class SQLConf extends Serializable with Logging {
 
   def columnNameOfCorruptRecord: String = 
getConf(COLUMN_NAME_OF_CORRUPT_RECORD)
 
-  def broadcastTimeout: Long = getConf(BROADCAST_TIMEOUT)
+  def broadcastTimeout: Long = {
+val timeoutValue = getConf(BROADCAST_TIMEOUT)
+if (timeoutValue < 0) Long.MaxValue else timeoutValue
+  }
 
   def defaultDataSourceName: String = getConf(DEFAULT_DATA_SOURCE_NAME)
 
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
index aa0dd1d..8017188 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala
@@ -17,10 +17,11 @@
 
 package org.apache.spark.sql.execution.exchange
 
-import java.util.concurrent.TimeoutException
+import java.util.UUID
+import java.util.concurrent._
 
-import scala.concurrent.{ExecutionContext, Future}
-import scala.concurrent.duration._
+import scala.concurrent.ExecutionContext
+import scala.concurrent.duration.NANOSECONDS
 import scala.util.control.NonFatal
 
 import org.apache.spark.{broadcast, SparkException}
@@ -43,6 +44,8 @@ case class BroadcastExchangeExec(
 mode: BroadcastMode,
 child: SparkPlan) extends Exchange {
 
+  private val runId: UUID = UUID.randomUUID
+
   override lazy val metrics = Map(
 "dataSize" -> SQLMetrics.createSizeMetric(sparkContext, "data size"),
 "collectTime" -> SQLMetrics.createTimingMetric(sparkContext, "time to 
collect"),
@@ -56,79 +59,79 @@ case class BroadcastExchangeExec(
   }
 
   @transient
-  private val timeout: Duration = {
-val timeoutValue = sqlContext.conf.broadcastTimeout
-if (timeoutValue < 0) {
-  Duration.Inf
-} else {
-  timeoutValue.seconds
-}
-  }
+  private val timeout: Long = SQLConf.get.broadcastTimeout
 
   @transient
   private lazy val relationFuture: Future[broadcast.Broadcast[Any]] = {
-// broadcastFuture is used in "doExecute". Therefore we can get the 
execution id correctly here.
+// relationFuture is used in "doExecute". Therefore we can get the 
execution id correctly here.
 val executionId = 
sparkContext.getLocalProperty(SQLExecution.EXECUTION_ID_KEY)
-Future {
-  // This will run in another thread. Set the execution id so that we can 
connect these jobs
-  // with the correct execution.
-  SQLExecution.withExecutionId(sqlContext.sparkSession, executionId) {
-try {
-  val beforeCollect = System.nanoTime()
-  // Use executeCollect/executeCollectIterator to avoid conversion to 
Scala types
-  val (numRows, input) = child.executeCollectIterator()
-  if (numRows >= 51200) {
-throw new SparkException(
-  s"Cannot broadcast the table with 512 million or more rows: 
$numRows rows")
-  }
-
-  val beforeBuild = System.nanoTime()
-

[spark] branch master updated: [SPARK-27402][SQL][TEST-HADOOP3.2][TEST-MAVEN] Fix hadoop-3.2 test issue(except the hive-thriftserver module)

2019-05-13 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new f3ddd6f  [SPARK-27402][SQL][TEST-HADOOP3.2][TEST-MAVEN] Fix hadoop-3.2 
test issue(except the hive-thriftserver module)
f3ddd6f is described below

commit f3ddd6f9da27925607c06e55cdfb9a809633238b
Author: Yuming Wang 
AuthorDate: Mon May 13 10:35:26 2019 -0700

[SPARK-27402][SQL][TEST-HADOOP3.2][TEST-MAVEN] Fix hadoop-3.2 test 
issue(except the hive-thriftserver module)

## What changes were proposed in this pull request?

This pr fix hadoop-3.2 test issues(except the `hive-thriftserver` module):
1. Add `hive.metastore.schema.verification` and 
`datanucleus.schema.autoCreateAll` to HiveConf.
2. hadoop-3.2 support access the Hive metastore from 0.12 to 2.2

After [SPARK-27176](https://issues.apache.org/jira/browse/SPARK-27176) and 
this PR, we upgraded the built-in Hive to 2.3 when enabling the Hadoop 3.2+ 
profile. This upgrade fixes the following issues:
- [HIVE-6727](https://issues.apache.org/jira/browse/HIVE-6727): Table level 
stats for external tables are set incorrectly.
- [HIVE-15653](https://issues.apache.org/jira/browse/HIVE-15653): Some 
ALTER TABLE commands drop table stats.
- [SPARK-12014](https://issues.apache.org/jira/browse/SPARK-12014): Spark 
SQL query containing semicolon is broken in Beeline.
- [SPARK-25193](https://issues.apache.org/jira/browse/SPARK-25193): insert 
overwrite doesn't throw exception when drop old data fails.
- [SPARK-25919](https://issues.apache.org/jira/browse/SPARK-25919): Date 
value corrupts when tables are "ParquetHiveSerDe" formatted and target table is 
Partitioned.
- [SPARK-26332](https://issues.apache.org/jira/browse/SPARK-26332): Spark 
sql write orc table on viewFS throws exception.
- [SPARK-26437](https://issues.apache.org/jira/browse/SPARK-26437): Decimal 
data becomes bigint to query, unable to query.

## How was this patch tested?
This pr test Spark’s Hadoop 3.2 profile on jenkins and #24591 test Spark’s 
Hadoop 2.7 profile on jenkins

This PR close #24591

Closes #24391 from wangyum/SPARK-27402.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 dev/sparktestsupport/modules.py|  14 ++-
 .../spark/sql/execution/command/DDLSuite.scala |   2 +-
 .../execution/datasources/orc/OrcSourceSuite.scala |   4 +-
 .../spark/sql/hive/client/HiveClientImpl.scala |  25 +++-
 .../sql/hive/client/IsolatedClientLoader.scala |   2 +
 .../org/apache/spark/sql/hive/test/TestHive.scala  |   9 -
 sql/hive/src/test/resources/hive-contrib-2.3.4.jar | Bin 0 -> 125719 bytes
 .../test/resources/hive-hcatalog-core-2.3.4.jar| Bin 0 -> 263795 bytes
 .../sql/hive/ClasspathDependenciesSuite.scala  |  21 --
 .../hive/HiveExternalCatalogVersionsSuite.scala|   4 ++
 .../spark/sql/hive/HiveMetastoreCatalogSuite.scala |  17 ++--
 .../org/apache/spark/sql/hive/HiveShimSuite.scala  |  12 +-
 .../apache/spark/sql/hive/StatisticsSuite.scala|  34 +++-
 .../spark/sql/hive/orc/HiveOrcFilterSuite.scala|  43 ++---
 .../spark/sql/hive/orc/HiveOrcSourceSuite.scala|   9 -
 15 files changed, 154 insertions(+), 42 deletions(-)

diff --git a/dev/sparktestsupport/modules.py b/dev/sparktestsupport/modules.py
index d496eec..812c2ef 100644
--- a/dev/sparktestsupport/modules.py
+++ b/dev/sparktestsupport/modules.py
@@ -15,9 +15,17 @@
 # limitations under the License.
 #
 
+from __future__ import print_function
 from functools import total_ordering
 import itertools
 import re
+import os
+
+if os.environ.get("AMPLAB_JENKINS"):
+hadoop_version = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE", 
"hadoop2.7")
+else:
+hadoop_version = os.environ.get("HADOOP_PROFILE", "hadoop2.7")
+print("[info] Choosing supported modules with Hadoop profile", hadoop_version)
 
 all_modules = []
 
@@ -72,7 +80,11 @@ class Module(object):
 self.dependent_modules = set()
 for dep in dependencies:
 dep.dependent_modules.add(self)
-all_modules.append(self)
+# TODO: Skip hive-thriftserver module for hadoop-3.2. remove this once 
hadoop-3.2 support it
+if name == "hive-thriftserver" and hadoop_version == "hadoop3.2":
+print("[info] Skip unsupported module:", name)
+else:
+all_modules.append(self)
 
 def contains_file(self, filename):
 return any(re.match(p, filename) for p in self.source_file_prefixes)
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/com

[spark] branch master updated: [SPARK-27642][SS] make v1 offset extends v2 offset

2019-05-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bae5baa  [SPARK-27642][SS] make v1 offset extends v2 offset
bae5baa is described below

commit bae5baae5281d01dc8c67077b90592be857329bd
Author: Wenchen Fan 
AuthorDate: Tue May 7 23:03:15 2019 -0700

[SPARK-27642][SS] make v1 offset extends v2 offset

## What changes were proposed in this pull request?

To move DS v2 to the catalyst module, we can't make v2 offset rely on v1 
offset, as v1 offset is in sql/core.

## How was this patch tested?

existing tests

Closes #24538 from cloud-fan/offset.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 .../spark/sql/kafka010/KafkaContinuousStream.scala |  2 +-
 .../spark/sql/kafka010/KafkaSourceOffset.scala |  4 +--
 .../spark/sql/execution/streaming/Offset.java  | 42 +++---
 .../sql/sources/v2/reader/streaming/Offset.java| 11 ++
 .../spark/sql/execution/streaming/LongOffset.scala | 14 +---
 .../execution/streaming/MicroBatchExecution.scala  | 10 +++---
 .../spark/sql/execution/streaming/OffsetSeq.scala  |  9 ++---
 .../sql/execution/streaming/OffsetSeqLog.scala |  3 +-
 .../sql/execution/streaming/StreamExecution.scala  |  4 +--
 .../sql/execution/streaming/StreamProgress.scala   | 19 +-
 .../spark/sql/execution/streaming/memory.scala | 25 +
 .../sources/TextSocketMicroBatchStream.scala   |  5 +--
 .../apache/spark/sql/streaming/StreamTest.scala|  8 ++---
 13 files changed, 49 insertions(+), 107 deletions(-)

diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
index d60ee1c..92686d2 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaContinuousStream.scala
@@ -76,7 +76,7 @@ class KafkaContinuousStream(
   }
 
   override def planInputPartitions(start: Offset): Array[InputPartition] = {
-val oldStartPartitionOffsets = KafkaSourceOffset.getPartitionOffsets(start)
+val oldStartPartitionOffsets = 
start.asInstanceOf[KafkaSourceOffset].partitionToOffsets
 
 val currentPartitionSet = offsetReader.fetchEarliestOffsets().keySet
 val newPartitions = 
currentPartitionSet.diff(oldStartPartitionOffsets.keySet)
diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
index 8d41c0d..90d7043 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceOffset.scala
@@ -20,14 +20,14 @@ package org.apache.spark.sql.kafka010
 import org.apache.kafka.common.TopicPartition
 
 import org.apache.spark.sql.execution.streaming.{Offset, SerializedOffset}
-import org.apache.spark.sql.sources.v2.reader.streaming.{Offset => OffsetV2, 
PartitionOffset}
+import org.apache.spark.sql.sources.v2.reader.streaming.PartitionOffset
 
 /**
  * An [[Offset]] for the [[KafkaSource]]. This one tracks all partitions of 
subscribed topics and
  * their offsets.
  */
 private[kafka010]
-case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) 
extends OffsetV2 {
+case class KafkaSourceOffset(partitionToOffsets: Map[TopicPartition, Long]) 
extends Offset {
 
   override val json = JsonUtils.partitionOffsets(partitionToOffsets)
 }
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
index 43ad4b3..7c167dc 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/streaming/Offset.java
@@ -18,44 +18,10 @@
 package org.apache.spark.sql.execution.streaming;
 
 /**
- * This is an internal, deprecated interface. New source implementations 
should use the
- * org.apache.spark.sql.sources.v2.reader.streaming.Offset class, which is the 
one that will be
- * supported in the long term.
+ * This class is an alias of {@link 
org.apache.spark.sql.sources.v2.reader.streaming.Offset}. It's
+ * internal and deprecated. New streaming data source implementations should 
use data source v2 API,
+ * which will be supported in the long term.
  *
  * This class will be removed in a future release.
  */
-public abstract class Offset {
-/**
- * A JSON-serialized representation of an Offset that is
- * used for saving o

svn commit: r33930 - /release/spark/spark-2.4.2/

2019-05-07 Thread lixiao

Author: lixiao
Date: Tue May  7 07:21:45 2019
New Revision: 33930

Log:
Remove Spark 2.4.2

Removed:
release/spark/spark-2.4.2/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r33928 - /dev/spark/v2.4.3-rc1-bin/ /release/spark/spark-/

2019-05-07 Thread lixiao

Author: lixiao
Date: Tue May  7 07:17:26 2019
New Revision: 33928

Log:
Apache Spark 2.4.3

Added:
release/spark/spark-/
  - copied from r33927, dev/spark/v2.4.3-rc1-bin/
Removed:
dev/spark/v2.4.3-rc1-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r33929 - in /release/spark: spark-/ spark-2.4.3/

2019-05-07 Thread lixiao

Author: lixiao
Date: Tue May  7 07:17:56 2019
New Revision: 33929

Log:
Apache Spark 2.4.3

Added:
release/spark/spark-2.4.3/
  - copied from r33928, release/spark/spark-/
Removed:
release/spark/spark-/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r33927 - /dev/spark/v2.4.3-rc1-docs/

2019-05-07 Thread lixiao

Author: lixiao
Date: Tue May  7 07:15:54 2019
New Revision: 33927

Log:
Remove RC artifacts

Removed:
dev/spark/v2.4.3-rc1-docs/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] branch asf-git created (now 7520ea9)

2019-05-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch asf-git
in repository https://gitbox.apache.org/repos/asf/spark-website.git.


  at 7520ea9  Add docs for Apache Spark 2.4.3

This branch includes the following new commits:

 new 7520ea9  Add docs for Apache Spark 2.4.3

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark-website] 01/01: Add docs for Apache Spark 2.4.3

2019-05-07 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch asf-git
in repository https://gitbox.apache.org/repos/asf/spark-website.git

commit 7520ea9da64ae87051349dca99e2f26d171f625e
Author: Wenchen Fan 
AuthorDate: Tue May 7 07:14:07 2019 +

Add docs for Apache Spark 2.4.3
---
 site/docs/latest | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/site/docs/latest b/site/docs/latest
index acdc3f1..630e6ef 12
--- a/site/docs/latest
+++ b/site/docs/latest
@@ -1 +1 @@
-2.4.2
\ No newline at end of file
+site/docs/2.4.3
\ No newline at end of file


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] tag v2.4.3 created (now c3e32bf)

2019-05-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to tag v2.4.3
in repository https://gitbox.apache.org/repos/asf/spark.git.


  at c3e32bf  (commit)
No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-27596][SQL] The JDBC 'query' option doesn't work for Oracle database

2019-05-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 771da83  [SPARK-27596][SQL] The JDBC 'query' option doesn't work for 
Oracle database
771da83 is described below

commit 771da83c34247cb1f75ee13b939ee51baa3a11bb
Author: Dilip Biswal 
AuthorDate: Sun May 5 21:52:23 2019 -0700

[SPARK-27596][SQL] The JDBC 'query' option doesn't work for Oracle database

## What changes were proposed in this pull request?
**Description from JIRA**
For the JDBC option `query`, we use the identifier name to start with 
underscore: s"(${subquery}) 
_SPARK_GEN_JDBC_SUBQUERY_NAME${curId.getAndIncrement()}". This is not supported 
by Oracle.
The Oracle doesn't seem to support identifier name to start with 
non-alphabet character (unless it is quoted) and has length restrictions as 
well. 
[link](https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements008.htm)

In this PR, the generated alias name 'SPARK_GEN_JDBC_SUBQUERY_NAME' is fixed to remove "_" prefix and also the alias name is shortened to 
not exceed the identifier length limit.

## How was this patch tested?
Tests are added for MySql, Postgress, Oracle and DB2 to ensure enough 
coverage.

Closes #24532 from dilipbiswal/SPARK-27596.

Authored-by: Dilip Biswal 
Signed-off-by: gatorsmile 
(cherry picked from commit 6001d476ce663ee6a458535431d30e8213181fcf)
Signed-off-by: gatorsmile 
---
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   | 26 
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 27 +
 .../spark/sql/jdbc/OracleIntegrationSuite.scala| 28 ++
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 26 
 .../execution/datasources/jdbc/JDBCOptions.scala   |  2 +-
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
index f5930bc28..32e56f0 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/DB2IntegrationSuite.scala
@@ -158,4 +158,30 @@ class DB2IntegrationSuite extends 
DockerJDBCIntegrationSuite {
 assert(rows(0).getInt(1) == 20)
 assert(rows(0).getString(2) == "1")
   }
+
+  test("query JDBC option") {
+val expectedResult = Set(
+  (42, "fred"),
+  (17, "dave")
+).map { case (x, y) =>
+  Row(Integer.valueOf(x), String.valueOf(y))
+}
+
+val query = "SELECT x, y FROM tbl WHERE x > 10"
+// query option to pass on the query string.
+val df = spark.read.format("jdbc")
+  .option("url", jdbcUrl)
+  .option("query", query)
+  .load()
+assert(df.collect.toSet === expectedResult)
+
+// query option in the create table path.
+sql(
+  s"""
+ |CREATE OR REPLACE TEMPORARY VIEW queryOption
+ |USING org.apache.spark.sql.jdbc
+ |OPTIONS (url '$jdbcUrl', query '$query')
+   """.stripMargin.replaceAll("\n", " "))
+assert(sql("select x, y from queryOption").collect.toSet == expectedResult)
+  }
 }
diff --git 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
index a70ed98..9cd5c4e 100644
--- 
a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
+++ 
b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MySQLIntegrationSuite.scala
@@ -21,6 +21,7 @@ import java.math.BigDecimal
 import java.sql.{Connection, Date, Timestamp}
 import java.util.Properties
 
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.tags.DockerTest
 
 @DockerTest
@@ -152,4 +153,30 @@ class MySQLIntegrationSuite extends 
DockerJDBCIntegrationSuite {
 df2.write.jdbc(jdbcUrl, "datescopy", new Properties)
 df3.write.jdbc(jdbcUrl, "stringscopy", new Properties)
   }
+
+  test("query JDBC option") {
+val expectedResult = Set(
+  (42, "fred"),
+  (17, "dave")
+).map { case (x, y) =>
+  Row(Integer.valueOf(x), String.valueOf(y))
+}
+
+val query = "SELECT x, y FROM tbl WHERE x > 10"
+// query option to pass on t

[spark] branch master updated (d9bcacf -> 6001d47)

2019-05-05 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from d9bcacf  [SPARK-27629][PYSPARK] Prevent Unpickler from intervening 
each unpickling
 add 6001d47  [SPARK-27596][SQL] The JDBC 'query' option doesn't work for 
Oracle database

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/jdbc/DB2IntegrationSuite.scala   | 26 
 .../spark/sql/jdbc/MySQLIntegrationSuite.scala | 27 +
 .../spark/sql/jdbc/OracleIntegrationSuite.scala| 28 ++
 .../spark/sql/jdbc/PostgresIntegrationSuite.scala  | 26 
 .../execution/datasources/jdbc/JDBCOptions.scala   |  2 +-
 5 files changed, 108 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-24360][FOLLOW-UP][SQL] Add missing options for sql-migration-guide-hive-compatibility.md

2019-05-03 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 9a419c3  [SPARK-24360][FOLLOW-UP][SQL] Add missing options for 
sql-migration-guide-hive-compatibility.md
9a419c3 is described below

commit 9a419c37ecd3e7638d35043156ac2e60dabe97df
Author: Yuming Wang 
AuthorDate: Fri May 3 08:42:11 2019 -0700

[SPARK-24360][FOLLOW-UP][SQL] Add missing options for 
sql-migration-guide-hive-compatibility.md

## What changes were proposed in this pull request?

This pr add missing options for `sql-migration-guide-hive-compatibility.md`.

## How was this patch tested?

N/A

Closes #24520 from wangyum/SPARK-24360.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 docs/sql-migration-guide-hive-compatibility.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sql-migration-guide-hive-compatibility.md 
b/docs/sql-migration-guide-hive-compatibility.md
index 857e974..5602f53 100644
--- a/docs/sql-migration-guide-hive-compatibility.md
+++ b/docs/sql-migration-guide-hive-compatibility.md
@@ -25,7 +25,7 @@ license: |
 Spark SQL is designed to be compatible with the Hive Metastore, SerDes and 
UDFs.
 Currently, Hive SerDes and UDFs are based on Hive 1.2.1,
 and Spark SQL can be connected to different versions of Hive Metastore
-(from 0.12.0 to 2.3.4. Also see [Interacting with Different Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).
+(from 0.12.0 to 2.3.4 and 3.1.0 to 3.1.1. Also see [Interacting with Different 
Versions of Hive 
Metastore](sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)).
 
  Deploying in Existing Hive Warehouses
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-26048][SPARK-24530][2.4] Cherrypick all the missing commits to 2.4 release script

2019-05-01 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new e417168  [SPARK-26048][SPARK-24530][2.4] Cherrypick all the missing 
commits to 2.4 release script
e417168 is described below

commit e417168ed012190db66a21e626b2b8d2332d6c01
Author: Wenchen Fan 
AuthorDate: Wed May 1 07:36:00 2019 -0700

[SPARK-26048][SPARK-24530][2.4] Cherrypick all the missing commits to 2.4 
release script

## What changes were proposed in this pull request?
This PR is to cherry-pick all the missing and relevant commits that were 
merged to master but not to branch-2.4.

Previously, dbtsai used the release script in the branch 2.4 to release 
2.4.1.

After more investigation, I found it is risky to make a 2.4 release by 
using the release script in the master branch since the release script has 
various changes. It could easily introduce unnoticeable issues, like what we 
did for 2.4.2.

Thus, I would cherry-pick all the missing fixes and use the updated release 
script to release 2.4.3

## How was this patch tested?
N/A

Closes #24503 from gatorsmile/upgradeReleaseScript.

Lead-authored-by: Wenchen Fan 
Co-authored-by: gatorsmile 
Co-authored-by: wright 
Signed-off-by: gatorsmile 
---
 dev/create-release/do-release-docker.sh | 3 +++
 dev/create-release/release-build.sh | 7 +--
 dev/create-release/releaseutils.py  | 2 +-
 dev/create-release/spark-rm/Dockerfile  | 4 ++--
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/dev/create-release/do-release-docker.sh 
b/dev/create-release/do-release-docker.sh
index fa7b73c..c1a122e 100755
--- a/dev/create-release/do-release-docker.sh
+++ b/dev/create-release/do-release-docker.sh
@@ -135,6 +135,9 @@ if [ -n "$JAVA" ]; then
   JAVA_VOL="--volume $JAVA:/opt/spark-java"
 fi
 
+# SPARK-24530: Sphinx must work with python 3 to generate doc correctly.
+echo "SPHINXPYTHON=/opt/p35/bin/python" >> $ENVFILE
+
 echo "Building $RELEASE_TAG; output will be at $WORKDIR/output"
 docker run -ti \
   --env-file "$ENVFILE" \
diff --git a/dev/create-release/release-build.sh 
b/dev/create-release/release-build.sh
index 5e65d99..affb4dc 100755
--- a/dev/create-release/release-build.sh
+++ b/dev/create-release/release-build.sh
@@ -122,6 +122,9 @@ fi
 
 PUBLISH_SCALA_2_12=0
 SCALA_2_12_PROFILES="-Pscala-2.12"
+if [[ $SPARK_VERSION < "3.0." ]]; then
+  SCALA_2_12_PROFILES="-Pscala-2.12 -Pflume"
+fi
 if [[ $SPARK_VERSION > "2.4" ]]; then
   PUBLISH_SCALA_2_12=1
 fi
@@ -327,7 +330,7 @@ if [[ "$1" == "package" ]]; then
 svn add "svn-spark/${DEST_DIR_NAME}-bin"
 
 cd svn-spark
-svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Apache Spark 
$SPARK_PACKAGE_VERSION"
+svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Apache Spark 
$SPARK_PACKAGE_VERSION" --no-auth-cache
 cd ..
 rm -rf svn-spark
   fi
@@ -355,7 +358,7 @@ if [[ "$1" == "docs" ]]; then
 svn add "svn-spark/${DEST_DIR_NAME}-docs"
 
 cd svn-spark
-svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Apache Spark 
$SPARK_PACKAGE_VERSION docs"
+svn ci --username $ASF_USERNAME --password "$ASF_PASSWORD" -m"Apache Spark 
$SPARK_PACKAGE_VERSION docs" --no-auth-cache
 cd ..
 rm -rf svn-spark
   fi
diff --git a/dev/create-release/releaseutils.py 
b/dev/create-release/releaseutils.py
index f273b33..a5a26ae 100755
--- a/dev/create-release/releaseutils.py
+++ b/dev/create-release/releaseutils.py
@@ -236,7 +236,7 @@ def translate_component(component, commit_hash, warnings):
 # The returned components are already filtered and translated
 def find_components(commit, commit_hash):
 components = re.findall(r"\[\w*\]", commit.lower())
-components = [translate_component(c, commit_hash)
+components = [translate_component(c, commit_hash, [])
   for c in components if c in known_components]
 return components
 
diff --git a/dev/create-release/spark-rm/Dockerfile 
b/dev/create-release/spark-rm/Dockerfile
index 15f831c..4231544 100644
--- a/dev/create-release/spark-rm/Dockerfile
+++ b/dev/create-release/spark-rm/Dockerfile
@@ -62,8 +62,8 @@ RUN echo 'deb http://cran.cnr.Berkeley.edu/bin/linux/ubuntu 
xenial/' >> /etc/apt
   pip install $BASE_PIP_PKGS && \
   pip install $PIP_PKGS && \
   cd && \
-  virtualenv -p python3 p35 && \
-  . p35/bin/activate && \
+  virtualenv -p python3 /opt/p35 && \
+  . /opt/p35/bin/activate && \
   pip install $BASE_PIP_PKGS && \
   pip install $PIP_PKGS && \
   # Install R packages and dependencies used when building.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r33851 - in /dev/spark/v2.4.3-rc1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _site/api/java/org/apache/spark

2019-04-30 Thread lixiao

Author: lixiao
Date: Wed May  1 06:16:53 2019
New Revision: 33851

Log:
Apache Spark v2.4.3-rc1 docs


[This commit notification would consist of 1479 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r33850 - /dev/spark/v2.4.3-rc1-bin/

2019-04-30 Thread lixiao

Author: lixiao
Date: Wed May  1 05:57:50 2019
New Revision: 33850

Log:
Apache Spark v2.4.3-rc1

Added:
dev/spark/v2.4.3-rc1-bin/
dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz   (with props)
dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.asc
dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.sha512
dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz   (with props)
dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz.asc
dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz.sha512
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.6.tgz   (with props)
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.6.tgz.asc
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.6.tgz.sha512
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.7.tgz   (with props)
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.7.tgz.asc
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-hadoop2.7.tgz.sha512
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz   
(with props)
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz.asc

dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop-scala-2.12.tgz.sha512
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop.tgz   (with props)
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop.tgz.asc
dev/spark/v2.4.3-rc1-bin/spark-2.4.3-bin-without-hadoop.tgz.sha512
dev/spark/v2.4.3-rc1-bin/spark-2.4.3.tgz   (with props)
dev/spark/v2.4.3-rc1-bin/spark-2.4.3.tgz.asc
dev/spark/v2.4.3-rc1-bin/spark-2.4.3.tgz.sha512

Added: dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.asc
==
--- dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.asc (added)
+++ dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.asc Wed May  1 05:57:50 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJcyS/nAAoJEJb3L3aDDA0beSoP/23koj6zI9FhzSqjUfYGTWaN
++QM672Wvjo0Q3P0imM9FLfNqjK13JLskulsR3slB7NlWg+7uAU1x6GgzWvrD8+qt
+RH1wuqPeVLmE9MDYyb655scQWDO/f+Yq1TFfDgnXarUsvStw81HSuSqTx+tf4Gsk
+D+Gdop0Z7w0rVSiJgXktl1y51FZNU5QJxi+e6whh+fVhuSZDIN/JweHY2HfwYxN1
+rbRXBC8LwPiFDlZIlnulZjpJw+07wpQPCjUIr2IwGlv1D4E6sES7YhXsWkVo0kl0
+s4npI+TkCV39QXFEqSiZtcFACohdLA5AWAsKUIHpaXahlouxHhmQEt1v2IX1tdQ3
+Yv9If3KV/NDFnf1e3E59jhHrwA5cST8ufKyShJBNb9p6qhXXqNUmBOPgYOZhEe+n
+PeJdIU5fY3hz16eXePPf110pExN+csYynR4uzUAO+7TiRJNA2HdBcGoPauaMWlsf
+H28uvSVeg5Zp8GSKw/vD+OBnxmfVXWPkSZAmrUsCBvWo5KTvftMTWgx7oS2vCozI
+x7AgEmykUdiXOaUh6NOjMH70DovoaCTdx1HD+T1MJwAqwx05kvEhXAnwgcyXEd3B
+v/Q1RTyyYfHCgR5rdyZiQRv/34O7tO5d3rL669TyKSSSLX9bg3VoBoSQLUhuKJMA
+l9QcWqDnWM6Iy6WEkLkj
+=mfID
+-END PGP SIGNATURE-

Added: dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.sha512
==
--- dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.sha512 (added)
+++ dev/spark/v2.4.3-rc1-bin/SparkR_2.4.3.tar.gz.sha512 Wed May  1 05:57:50 2019
@@ -0,0 +1,3 @@
+SparkR_2.4.3.tar.gz: 662AC057 E12A036E 4EFC7B57 F1067BE1 0618C123 1C01FF5E
+ 972D4E5D B58DE948 BDC01CE5 476E1553 3C4E768C F3128E47
+ 1284689D D0F1202A 4A6CC3EA 823F8F87

Added: dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz
==
Binary file - no diff available.

Propchange: dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz
--
svn:mime-type = application/octet-stream

Added: dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz.asc
==
--- dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz.asc (added)
+++ dev/spark/v2.4.3-rc1-bin/pyspark-2.4.3.tar.gz.asc Wed May  1 05:57:50 2019
@@ -0,0 +1,17 @@
+-BEGIN PGP SIGNATURE-
+Version: GnuPG v1
+
+iQIcBAABAgAGBQJcySxrAAoJEJb3L3aDDA0bht0P/3Q9RCpLdy7Dfcyi4zCCEper
+xrX6jTJT5MHGhOYHgUoJxbotWkbDjTk81xrE+I+ccDYS7TW0SeC/ngTH85O+sD/a
+DVEQ0//dPhHVQNGcWaoojOtI3n80bFEE0XmWj6hoMmT7KbY9GA21iXNM2kjaaETb
+GQEDAyeA8S5GSyDd9ez+OvofFo6uJfuorfvQgKF8mNRTrbhNhixGxby7WE+pdlh4
+L1oaohU+dp5ubEhDCVNAi6xXemnhWWYi1zpgxubxYSvQsLoHumBEiiLNZezddvRY
+mFuH9dQQX3w5gd5znLGAzp5UPLJ341Fe/jMNAgf8TPPvJzROIw5lYmHtKwkNDCHr
++BSSOFuMejttwLGz8Kd4yJ2gxRtpzZxqzdGJ+fSOJU0mElK3e7SelW44oLEWloAB
+pf0IB84eThEQvhL02JM2sl5LIl5HehZTx+EHB3ONwXMcKStoz6RSjl/IqKNjIg/K
+0cEgfrdZdYy6UauP0RHMosQ83myQlQrsk04fLd+rlPmPLhAPk6ZqnP9Pe6V9X6xq
+i2wn8DSiwTK9NAqGacKLEjNyVl0mTcgE/8cmdqd4e0L9gG+f3XUIPWqKYY1p0q65
+1ypOlYe+KowbV0GHvwX467RQTFjiAwMRGdcFinhbfSUk0d6iLkf29rc3JfhqrroZ
+G9iULvEnsvwEmiZoK2Cs
+=f/+f
+-END PGP SIGNATURE-

Added: dev

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1649 matches

Mail list logo