[spark] branch master updated: [SPARK-24570][SQL] Implement Spark own GetTablesOperation to fix SQL client tools cannot show tables

2019-02-17 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 7f53116  [SPARK-24570][SQL] Implement Spark own GetTablesOperation to 
fix SQL client tools cannot show tables
7f53116 is described below

commit 7f53116f77bac6302bb727769b4a4c684b6b0b5b
Author: Yuming Wang 
AuthorDate: Sun Feb 17 23:35:45 2019 -0800

[SPARK-24570][SQL] Implement Spark own GetTablesOperation to fix SQL client 
tools cannot show tables

## What changes were proposed in this pull request?

For SQL client tools([DBeaver](https://dbeaver.io/))'s Navigator use 
[`GetTablesOperation`](https://github.com/apache/spark/blob/a7444570764b0a08b7e908dc7931744f9dbdf3c6/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java)
 to obtain table names.

We should use 
[`metadataHive`](https://github.com/apache/spark/blob/95d172da2b370ff6257bfd6fcd102ac553f6f6af/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L52-L53),
 but it use 
[`executionHive`](https://github.com/apache/spark/blob/24f5bbd770033dacdea62555488bfffb61665279/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala#L93-L95).

This PR implement Spark own `GetTablesOperation` to use `metadataHive`.

## How was this patch tested?

unit test and manual tests


![image](https://user-images.githubusercontent.com/5399861/47430696-acf77980-d7cc-11e8-824d-f28d78f60a00.png)

![image](https://user-images.githubusercontent.com/5399861/47440576-09649400-d7e1-11e8-97a8-a96f73f70361.png)

Closes #22794 from wangyum/SPARK-24570.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../service/cli/operation/GetTablesOperation.java  |  2 +-
 .../thriftserver/SparkGetTablesOperation.scala | 99 ++
 .../server/SparkSQLOperationManager.scala  | 22 -
 .../thriftserver/HiveThriftServer2Suites.scala |  2 +-
 .../thriftserver/SparkMetadataOperationSuite.scala | 87 ++-
 5 files changed, 206 insertions(+), 6 deletions(-)

diff --git 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
index 1a7ca79..2af17a6 100644
--- 
a/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
+++ 
b/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/GetTablesOperation.java
@@ -46,7 +46,7 @@ public class GetTablesOperation extends MetadataOperation {
   private final String schemaName;
   private final String tableName;
   private final List tableTypes = new ArrayList();
-  private final RowSet rowSet;
+  protected final RowSet rowSet;
   private final TableTypeMapping tableTypeMapping;
 
 
diff --git 
a/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala
 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala
new file mode 100644
index 000..3696500
--- /dev/null
+++ 
b/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkGetTablesOperation.scala
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import java.util.{List => JList}
+
+import scala.collection.JavaConverters.seqAsJavaListConverter
+
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveOperationType
+import 
org.apache.hadoop.hive.ql.security.authorization.plugin.HivePrivilegeObjectUtils
+import org.apache.hive.service.cli._
+import org.apache.hive.service.cli.operation.GetTablesOperation
+import org.apache.hive.service.cli.session.HiveSession
+
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType
+import org.apache.spark.sql.catalyst.catalog.CatalogTableType._
+
+/**
+ * Spar

[spark] branch master updated: [SPARK-26666][SQL] Support DSv2 overwrite and dynamic partition overwrite.

2019-02-17 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 60caa92  [SPARK-2][SQL] Support DSv2 overwrite and dynamic 
partition overwrite.
60caa92 is described below

commit 60caa92deaf6941f58da82dcc0962ebf3a598ced
Author: Ryan Blue 
AuthorDate: Mon Feb 18 13:16:28 2019 +0800

[SPARK-2][SQL] Support DSv2 overwrite and dynamic partition overwrite.

## What changes were proposed in this pull request?

This adds two logical plans that implement the ReplaceData operation from 
the [logical plans 
SPIP](https://docs.google.com/document/d/1gYm5Ji2Mge3QBdOliFV5gSPTKlX4q1DCBXIkiyMv62A/edit?ts=5a987801#heading=h.m45webtwxf2d).
 These two plans will be used to implement Spark's `INSERT OVERWRITE` behavior 
for v2.

Specific changes:
* Add `SupportsTruncate`, `SupportsOverwrite`, and 
`SupportsDynamicOverwrite` to DSv2 write API
* Add `OverwriteByExpression` and `OverwritePartitionsDynamic` plans 
(logical and physical)
* Add new plans to DSv2 write validation rule `ResolveOutputRelation`
* Refactor `WriteToDataSourceV2Exec` into trait used by all DSv2 write exec 
nodes

## How was this patch tested?

* The v2 analysis suite has been updated to validate the new overwrite plans
* The analysis suite for `OverwriteByExpression` checks that the delete 
expression is resolved using the table's columns
* Existing tests validate that overwrite exec plan works
* Updated existing v2 test because schema is used to validate overwrite

Closes #23606 from rdblue/SPARK-2-add-overwrite.

Authored-by: Ryan Blue 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala |  27 ++-
 .../plans/logical/basicLogicalOperators.scala  |  69 +++-
 .../org/apache/spark/sql/internal/SQLConf.scala|   2 +-
 .../analysis/DataSourceV2AnalysisSuite.scala   | 191 +++--
 .../sources/v2/reader/SupportsPushDownFilters.java |   3 +
 .../v2/writer/SupportsDynamicOverwrite.java|  37 
 .../sql/sources/v2/writer/SupportsOverwrite.java   |  45 +
 .../sql/sources/v2/writer/SupportsTruncate.java|  32 
 .../org/apache/spark/sql/DataFrameWriter.scala |  54 +++---
 .../execution/datasources/DataSourceStrategy.scala |   6 +
 .../datasources/v2/DataSourceV2Implicits.scala |  49 ++
 .../datasources/v2/DataSourceV2Relation.scala  |  24 +--
 .../datasources/v2/DataSourceV2Strategy.scala  |  35 ++--
 .../datasources/v2/WriteToDataSourceV2Exec.scala   | 135 ++-
 .../org/apache/spark/sql/sources/filters.scala |  26 ++-
 .../spark/sql/sources/v2/DataSourceV2Suite.scala   |   8 +-
 16 files changed, 613 insertions(+), 130 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 793c337..42904c5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -978,6 +978,11 @@ class Analyzer(
   case a @ Aggregate(groupingExprs, aggExprs, appendColumns: 
AppendColumns) =>
 a.mapExpressions(resolveExpressionTopDown(_, appendColumns))
 
+  case o: OverwriteByExpression if !o.outputResolved =>
+// do not resolve expression attributes until the query attributes are 
resolved against the
+// table by ResolveOutputRelation. that rule will alias the attributes 
to the table's names.
+o
+
   case q: LogicalPlan =>
 logTrace(s"Attempting to resolve 
${q.simpleString(SQLConf.get.maxToStringFields)}")
 q.mapExpressions(resolveExpressionTopDown(_, q))
@@ -2246,7 +2251,7 @@ class Analyzer(
   object ResolveOutputRelation extends Rule[LogicalPlan] {
 override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators 
{
   case append @ AppendData(table, query, isByName)
-  if table.resolved && query.resolved && !append.resolved =>
+  if table.resolved && query.resolved && !append.outputResolved =>
 val projection = resolveOutputColumns(table.name, table.output, query, 
isByName)
 
 if (projection != query) {
@@ -2254,6 +2259,26 @@ class Analyzer(
 } else {
   append
 }
+
+  case overwrite @ OverwriteByExpression(table, _, query, isByName)
+  if table.resolved && query.resolved && !overwrite.outputResolved =>
+val projection = resolveOutputColumns(table.name, table.output, query, 
isByName)
+
+if (projection != query) {
+  overwrite.copy(query = projection)
+} else {
+  overwrite
+}
+
+  case overwrite @ OverwritePartitionsDyna

[spark] branch master updated: [SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly instead of creating datetime64 as intermediate data.

2019-02-17 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4a4e7ae  [SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly 
instead of creating datetime64 as intermediate data.
4a4e7ae is described below

commit 4a4e7aeca79738d5788628d67d97d704f067e8d7
Author: Takuya UESHIN 
AuthorDate: Mon Feb 18 11:48:10 2019 +0800

[SPARK-26887][SQL][PYTHON][NS] Create datetime.date directly instead of 
creating datetime64 as intermediate data.

## What changes were proposed in this pull request?

Currently `DataFrame.toPandas()` with arrow enabled or 
`ArrowStreamPandasSerializer` for pandas UDF with pyarrow<0.12 creates 
`datetime64[ns]` type series as intermediate data and then convert to 
`datetime.date` series, but the intermediate `datetime64[ns]` might cause an 
overflow even if the date is valid.

```
>>> import datetime
>>>
>>> t = [datetime.date(2262, 4, 12), datetime.date(2263, 4, 12)]
>>>
>>> df = spark.createDataFrame(t, 'date')
>>> df.show()
+--+
| value|
+--+
|2262-04-12|
|2263-04-12|
+--+

>>>
>>> spark.conf.set("spark.sql.execution.arrow.enabled", "true")
>>>
>>> df.toPandas()
value
0  1677-09-21
1  1678-09-21
```

We should avoid creating such intermediate data and create `datetime.date` 
series directly instead.

## How was this patch tested?

Modified some tests to include the date which overflow caused by the 
intermediate conversion.
Run tests with pyarrow 0.8, 0.10, 0.11, 0.12 in my local environment.

Closes #23795 from ueshin/issues/SPARK-26887/date_as_object.

Authored-by: Takuya UESHIN 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/serializers.py  |  5 +-
 python/pyspark/sql/dataframe.py|  5 +-
 python/pyspark/sql/tests/test_arrow.py |  5 +-
 python/pyspark/sql/tests/test_pandas_udf_scalar.py |  3 +-
 python/pyspark/sql/types.py| 54 ++
 5 files changed, 44 insertions(+), 28 deletions(-)

diff --git a/python/pyspark/serializers.py b/python/pyspark/serializers.py
index 3db2595..a2c59fe 100644
--- a/python/pyspark/serializers.py
+++ b/python/pyspark/serializers.py
@@ -311,10 +311,9 @@ class ArrowStreamPandasSerializer(Serializer):
 
 def arrow_to_pandas(self, arrow_column):
 from pyspark.sql.types import from_arrow_type, \
-_check_series_convert_date, _check_series_localize_timestamps
+_arrow_column_to_pandas, _check_series_localize_timestamps
 
-s = arrow_column.to_pandas()
-s = _check_series_convert_date(s, from_arrow_type(arrow_column.type))
+s = _arrow_column_to_pandas(arrow_column, 
from_arrow_type(arrow_column.type))
 s = _check_series_localize_timestamps(s, self._timezone)
 return s
 
diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index a1056d0..472d296 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -2107,14 +2107,13 @@ class DataFrame(object):
 # of PyArrow is found, if 'spark.sql.execution.arrow.enabled' is 
enabled.
 if use_arrow:
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
+from pyspark.sql.types import _arrow_table_to_pandas, \
 _check_dataframe_localize_timestamps
 import pyarrow
 batches = self._collectAsArrow()
 if len(batches) > 0:
 table = pyarrow.Table.from_batches(batches)
-pdf = table.to_pandas()
-pdf = _check_dataframe_convert_date(pdf, self.schema)
+pdf = _arrow_table_to_pandas(table, self.schema)
 return _check_dataframe_localize_timestamps(pdf, 
timezone)
 else:
 return pd.DataFrame.from_records([], 
columns=self.columns)
diff --git a/python/pyspark/sql/tests/test_arrow.py 
b/python/pyspark/sql/tests/test_arrow.py
index 8a62500..38a6402 100644
--- a/python/pyspark/sql/tests/test_arrow.py
+++ b/python/pyspark/sql/tests/test_arrow.py
@@ -68,7 +68,9 @@ class ArrowTests(ReusedSQLTestCase):
 (u"b", 2, 20, 0.4, 4.0, Decimal("4.0"),
  date(2012, 2, 2), datetime(2012, 2, 2, 2, 2, 2)),
 (u"c", 3, 30, 0.8, 6.0, Decimal("6.0"),
- date(2100, 3, 3), datetime(2100, 3, 3, 3, 3, 3))]
+ date(2100, 3, 3), datetime(2100, 3, 3, 3, 3, 3)),
+(u"d", 4, 40, 1.0, 8.0, Decimal("8.0"),
+ 

[spark] branch branch-2.3 updated: [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite

2019-02-17 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new d38a113  [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 
2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite
d38a113 is described below

commit d38a113ef1f247696a945c57a744a831ef57b8e4
Author: Takeshi Yamamuro 
AuthorDate: Mon Feb 18 11:24:36 2019 +0800

[SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x 
in HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?
This pr just removed workaround for 2.2.0 and 2.1.x in 
HiveExternalCatalogVersionsSuite.

## How was this patch tested?
Pass the Jenkins.

Closes #23817 from maropu/SPARK-26607-FOLLOWUP.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit e2b8cc65cd579374ddbd70b93c9fcefe9b8873d9)
Signed-off-by: Hyukjin Kwon 
---
 .../sql/hive/HiveExternalCatalogVersionsSuite.scala | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index 6522f77..0947291 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -257,19 +257,10 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils 
{
 
   // SPARK-22356: overlapped columns between data and partition schema in 
data source tables
   val tbl_with_col_overlap = s"tbl_with_col_overlap_$index"
-  // For Spark 2.2.0 and 2.1.x, the behavior is different from Spark 2.0, 
2.2.1, 2.3+
-  if (testingVersions(index).startsWith("2.1") || testingVersions(index) 
== "2.2.0") {
-spark.sql("msck repair table " + tbl_with_col_overlap)
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "j", 
"p"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,j,p"))
-  } else {
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,p,j"))
-  }
+  assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
+  checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 1, 
1) :: Nil)
+  assert(sql("desc " + tbl_with_col_overlap).select("col_name")
+.as[String].collect().mkString(",").contains("i,p,j"))
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite

2019-02-17 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e2b8cc6  [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 
2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite
e2b8cc6 is described below

commit e2b8cc65cd579374ddbd70b93c9fcefe9b8873d9
Author: Takeshi Yamamuro 
AuthorDate: Mon Feb 18 11:24:36 2019 +0800

[SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x 
in HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?
This pr just removed workaround for 2.2.0 and 2.1.x in 
HiveExternalCatalogVersionsSuite.

## How was this patch tested?
Pass the Jenkins.

Closes #23817 from maropu/SPARK-26607-FOLLOWUP.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/hive/HiveExternalCatalogVersionsSuite.scala | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index 8086f75..1dd60c6 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -260,19 +260,10 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils 
{
 
   // SPARK-22356: overlapped columns between data and partition schema in 
data source tables
   val tbl_with_col_overlap = s"tbl_with_col_overlap_$index"
-  // For Spark 2.2.0 and 2.1.x, the behavior is different from Spark 2.0, 
2.2.1, 2.3+
-  if (testingVersions(index).startsWith("2.1") || testingVersions(index) 
== "2.2.0") {
-spark.sql("msck repair table " + tbl_with_col_overlap)
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "j", 
"p"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,j,p"))
-  } else {
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,p,j"))
-  }
+  assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
+  checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 1, 
1) :: Nil)
+  assert(sql("desc " + tbl_with_col_overlap).select("col_name")
+.as[String].collect().mkString(",").contains("i,p,j"))
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite

2019-02-17 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 094cabc  [SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 
2.2.0 and 2.1.x in HiveExternalCatalogVersionsSuite
094cabc is described below

commit 094cabc3f72da765cf2b4adab9bae61d05aaef45
Author: Takeshi Yamamuro 
AuthorDate: Mon Feb 18 11:24:36 2019 +0800

[SPARK-26897][SQL][TEST][FOLLOW-UP] Remove workaround for 2.2.0 and 2.1.x 
in HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?
This pr just removed workaround for 2.2.0 and 2.1.x in 
HiveExternalCatalogVersionsSuite.

## How was this patch tested?
Pass the Jenkins.

Closes #23817 from maropu/SPARK-26607-FOLLOWUP.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Hyukjin Kwon 
(cherry picked from commit e2b8cc65cd579374ddbd70b93c9fcefe9b8873d9)
Signed-off-by: Hyukjin Kwon 
---
 .../sql/hive/HiveExternalCatalogVersionsSuite.scala | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index 598b08b..0ede33d 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -257,19 +257,10 @@ object PROCESS_TABLES extends QueryTest with SQLTestUtils 
{
 
   // SPARK-22356: overlapped columns between data and partition schema in 
data source tables
   val tbl_with_col_overlap = s"tbl_with_col_overlap_$index"
-  // For Spark 2.2.0 and 2.1.x, the behavior is different from Spark 2.0, 
2.2.1, 2.3+
-  if (testingVersions(index).startsWith("2.1") || testingVersions(index) 
== "2.2.0") {
-spark.sql("msck repair table " + tbl_with_col_overlap)
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "j", 
"p"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,j,p"))
-  } else {
-assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
-checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 
1, 1) :: Nil)
-assert(sql("desc " + tbl_with_col_overlap).select("col_name")
-  .as[String].collect().mkString(",").contains("i,p,j"))
-  }
+  assert(spark.table(tbl_with_col_overlap).columns === Array("i", "p", 
"j"))
+  checkAnswer(spark.table(tbl_with_col_overlap), Row(1, 1, 1) :: Row(1, 1, 
1) :: Nil)
+  assert(sql("desc " + tbl_with_col_overlap).select("col_name")
+.as[String].collect().mkString(",").contains("i,p,j"))
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26878] QueryTest.compare() does not handle maps with array keys correctly

2019-02-17 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 36902e1  [SPARK-26878] QueryTest.compare() does not handle maps with 
array keys correctly
36902e1 is described below

commit 36902e10c6395cb378eb8743fe94ccd0aa33e616
Author: Ala Luszczak 
AuthorDate: Mon Feb 18 10:39:31 2019 +0800

[SPARK-26878] QueryTest.compare() does not handle maps with array keys 
correctly

## What changes were proposed in this pull request?

The previous strategy for comparing Maps leveraged sorting (key, value) 
tuples by their _.toString. However, the _.toString representation of an arrays 
has nothing to do with it's content. If a map has array keys, it's (key, value) 
pairs would be compared with other maps essentially at random. This could 
results in false negatives in tests.

This changes first compares keys together to find the matching ones, and 
then compares associated values.

## How was this patch tested?

New unit test added.

Closes #23789 from ala/compare-map.

Authored-by: Ala Luszczak 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 37 ++
 .../scala/org/apache/spark/sql/QueryTest.scala |  6 ++--
 2 files changed, 40 insertions(+), 3 deletions(-)

diff --git a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
index 8c34e47..64c4aab 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
@@ -20,6 +20,8 @@ package org.apache.spark.sql
 import java.io.{Externalizable, ObjectInput, ObjectOutput}
 import java.sql.{Date, Timestamp}
 
+import org.scalatest.exceptions.TestFailedException
+
 import org.apache.spark.{SparkException, TaskContext}
 import org.apache.spark.sql.catalyst.ScroogeLikeExample
 import org.apache.spark.sql.catalyst.encoders.{OuterScopes, RowEncoder}
@@ -67,6 +69,41 @@ class DatasetSuite extends QueryTest with SharedSQLContext {
   data: _*)
   }
 
+  test("toDS should compare map with byte array keys correctly") {
+// Choose the order of arrays in such way, that sorting keys of different 
maps by _.toString
+// will not incidentally put equal keys together.
+val arrays = (1 to 5).map(_ => Array[Byte](0.toByte, 
0.toByte)).sortBy(_.toString).toArray
+arrays(0)(1) = 1.toByte
+arrays(1)(1) = 2.toByte
+arrays(2)(1) = 2.toByte
+arrays(3)(1) = 1.toByte
+
+val mapA = Map(arrays(0) -> "one", arrays(2) -> "two")
+val subsetOfA = Map(arrays(0) -> "one")
+val equalToA = Map(arrays(1) -> "two", arrays(3) -> "one")
+val notEqualToA1 = Map(arrays(1) -> "two", arrays(3) -> "not one")
+val notEqualToA2 = Map(arrays(1) -> "two", arrays(4) -> "one")
+
+// Comparing map with itself
+checkDataset(Seq(mapA).toDS(), mapA)
+
+// Comparing map with equivalent map
+checkDataset(Seq(equalToA).toDS(), mapA)
+checkDataset(Seq(mapA).toDS(), equalToA)
+
+// Comparing map with it's subset
+intercept[TestFailedException](checkDataset(Seq(subsetOfA).toDS(), mapA))
+intercept[TestFailedException](checkDataset(Seq(mapA).toDS(), subsetOfA))
+
+// Comparing map with another map differing by single value
+intercept[TestFailedException](checkDataset(Seq(notEqualToA1).toDS(), 
mapA))
+intercept[TestFailedException](checkDataset(Seq(mapA).toDS(), 
notEqualToA1))
+
+// Comparing map with another map differing by single key
+intercept[TestFailedException](checkDataset(Seq(notEqualToA2).toDS(), 
mapA))
+intercept[TestFailedException](checkDataset(Seq(mapA).toDS(), 
notEqualToA2))
+  }
+
   test("toDS with RDD") {
 val ds = sparkContext.makeRDD(Seq("a", "b", "c"), 3).toDS()
 checkDataset(
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
index d83deb1..f8298c9 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala
@@ -341,9 +341,9 @@ object QueryTest {
 case (a: Array[_], b: Array[_]) =>
   a.length == b.length && a.zip(b).forall { case (l, r) => compare(l, r)}
 case (a: Map[_, _], b: Map[_, _]) =>
-  val entries1 = a.iterator.toSeq.sortBy(_.toString())
-  val entries2 = b.iterator.toSeq.sortBy(_.toString())
-  compare(entries1, entries2)
+  a.size == b.size && a.keys.forall { aKey =>
+b.keys.find(bKey => compare(aKey, bKey)).exists(bKey => 
compare(a(aKey), b(bKey)))
+  }
 case (a: Iterable[_], b: Iterable[_]) =>
   a.size == b.size && a.zip(b).forall { case (l, r) => compare(l, r)}
 case (a: Product, b: Product) =>


--

[spark] branch branch-2.4 updated: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new dfda97a  [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from 
HiveExternalCatalogVersionsSuite
dfda97a is described below

commit dfda97a29f1211384503343d27afd752cc98f578
Author: Takeshi Yamamuro 
AuthorDate: Mon Feb 18 08:05:49 2019 +0900

[SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from 
HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?
The maintenance release of `branch-2.3` (v2.3.3) vote passed, so this issue 
updates PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite

## How was this patch tested?
Pass the Jenkins.

Closes #23807 from maropu/SPARK-26897.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit dcdbd06b687fafbf29df504949db0a5f77608c8e)
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index 632a21a..598b08b 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -203,7 +203,7 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
   // Tests the latest version of every release line.
-  val testingVersions = Seq("2.3.2", "2.4.0")
+  val testingVersions = Seq("2.3.3", "2.4.0")
 
   protected var spark: SparkSession = _
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from HiveExternalCatalogVersionsSuite

2019-02-17 Thread yamamuro
This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new dcdbd06  [SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from 
HiveExternalCatalogVersionsSuite
dcdbd06 is described below

commit dcdbd06b687fafbf29df504949db0a5f77608c8e
Author: Takeshi Yamamuro 
AuthorDate: Mon Feb 18 08:05:49 2019 +0900

[SPARK-26897][SQL][TEST] Update Spark 2.3.x testing from 
HiveExternalCatalogVersionsSuite

## What changes were proposed in this pull request?
The maintenance release of `branch-2.3` (v2.3.3) vote passed, so this issue 
updates PROCESS_TABLES.testingVersions in HiveExternalCatalogVersionsSuite

## How was this patch tested?
Pass the Jenkins.

Closes #23807 from maropu/SPARK-26897.

Authored-by: Takeshi Yamamuro 
Signed-off-by: Takeshi Yamamuro 
---
 .../org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
index dd0e1bd..8086f75 100644
--- 
a/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
+++ 
b/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala
@@ -206,7 +206,7 @@ class HiveExternalCatalogVersionsSuite extends 
SparkSubmitTestUtils {
 
 object PROCESS_TABLES extends QueryTest with SQLTestUtils {
   // Tests the latest version of every release line.
-  val testingVersions = Seq("2.3.2", "2.4.0")
+  val testingVersions = Seq("2.3.3", "2.4.0")
 
   protected var spark: SparkSession = _
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org