[spark] branch branch-2.4 updated: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table

2019-12-20 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 07caebf  [SPARK-17398][SQL] Fix ClassCastException when querying 
partitioned JSON table
07caebf is described below

commit 07caebf2194fc34f1b4aacf2aa6c2d6961587482
Author: Wing Yew Poon 
AuthorDate: Fri Dec 20 10:39:26 2019 -0800

[SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON 
table

When querying a partitioned table with format 
`org.apache.hive.hcatalog.data.JsonSerDe` and more than one task runs in each 
executor concurrently, the following exception is encountered:

`java.lang.ClassCastException: java.util.ArrayList cannot be cast to 
org.apache.hive.hcatalog.data.HCatRecord`

The exception occurs in `HadoopTableReader.fillObject`.

`org.apache.hive.hcatalog.data.JsonSerDe#initialize` populates a 
`cachedObjectInspector` field by calling 
`HCatRecordObjectInspectorFactory.getHCatRecordObjectInspector`, which is not 
thread-safe; this `cachedObjectInspector` is returned by 
`JsonSerDe#getObjectInspector`.

We protect against this Hive bug by synchronizing on an object when we need 
to call `initialize` on `org.apache.hadoop.hive.serde2.Deserializer` instances 
(which may be `JsonSerDe` instances). By doing so, the `ObjectInspector` for 
the `Deserializer` of the partitions of the JSON table and that of the table 
`SerDe` are the same cached `ObjectInspector` and 
`HadoopTableReader.fillObject` then works correctly. (If the `ObjectInspector`s 
are different, then a bug in `HCatRecordObjectInsp [...]

To avoid HIVE-15773 / HIVE-21752.

No.

Tested manually on a cluster with a partitioned JSON table and running a 
query using more than one core per executor. Before this change, the 
ClassCastException happens consistently. With this change it does not happen.

Closes #26895 from wypoon/SPARK-17398.

Authored-by: Wing Yew Poon 
Signed-off-by: Marcelo Vanzin 
(cherry picked from commit c72f88b0ba20727e831ba9755d9628d0347ee3cb)
Signed-off-by: Marcelo Vanzin 
---
 .../org/apache/spark/sql/hive/TableReader.scala| 23 +++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
index 7d57389..6631073 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
@@ -133,7 +133,9 @@ class HadoopTableReader(
 val deserializedHadoopRDD = hadoopRDD.mapPartitions { iter =>
   val hconf = broadcastedHadoopConf.value.value
   val deserializer = deserializerClass.newInstance()
-  deserializer.initialize(hconf, localTableDesc.getProperties)
+  DeserializerLock.synchronized {
+deserializer.initialize(hconf, localTableDesc.getProperties)
+  }
   HadoopTableReader.fillObject(iter, deserializer, attrsWithIndex, 
mutableRow, deserializer)
 }
 
@@ -255,10 +257,14 @@ class HadoopTableReader(
 partProps.asScala.foreach {
   case (key, value) => props.setProperty(key, value)
 }
-deserializer.initialize(hconf, props)
+DeserializerLock.synchronized {
+  deserializer.initialize(hconf, props)
+}
 // get the table deserializer
 val tableSerDe = localTableDesc.getDeserializerClass.newInstance()
-tableSerDe.initialize(hconf, localTableDesc.getProperties)
+DeserializerLock.synchronized {
+  tableSerDe.initialize(hconf, localTableDesc.getProperties)
+}
 
 // fill the non partition key attributes
 HadoopTableReader.fillObject(iter, deserializer, nonPartitionKeyAttrs,
@@ -337,6 +343,17 @@ private[hive] object HiveTableUtil {
   }
 }
 
+/**
+ * Object to synchronize on when calling 
org.apache.hadoop.hive.serde2.Deserializer#initialize.
+ *
+ * [SPARK-17398] org.apache.hive.hcatalog.data.JsonSerDe#initialize calls the 
non-thread-safe
+ * HCatRecordObjectInspectorFactory.getHCatRecordObjectInspector, the results 
of which are
+ * returned by JsonSerDe#getObjectInspector.
+ * To protect against this bug in Hive (HIVE-15773/HIVE-21752), we synchronize 
on this object
+ * when calling initialize on Deserializer instances that could be JsonSerDe 
instances.
+ */
+private[hive] object DeserializerLock
+
 private[hive] object HadoopTableReader extends HiveInspectors with Logging {
   /**
* Curried. After given an argument for 'path', the resulting JobConf => 
Unit closure is used to


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[spark] branch master updated (7dff3b1 -> c72f88b)

2019-12-20 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7dff3b1  [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 
27; replace with workalikes
 add c72f88b  [SPARK-17398][SQL] Fix ClassCastException when querying 
partitioned JSON table

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/TableReader.scala| 23 +++---
 1 file changed, 20 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (7dff3b1 -> c72f88b)

2019-12-20 Thread vanzin
This is an automated email from the ASF dual-hosted git repository.

vanzin pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 7dff3b1  [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 
27; replace with workalikes
 add c72f88b  [SPARK-17398][SQL] Fix ClassCastException when querying 
partitioned JSON table

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/hive/TableReader.scala| 23 +++---
 1 file changed, 20 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (07b04c4 -> 7dff3b1)

2019-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 07b04c4  [SPARK-29938][SQL] Add batching support in Alter table add 
partition flow
 add 7dff3b1  [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 
27; replace with workalikes

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/CustomType1.java |  9 +--
 .../network/buffer/FileSegmentManagedBuffer.java   | 11 +--
 .../spark/network/buffer/NettyManagedBuffer.java   |  7 +-
 .../spark/network/buffer/NioManagedBuffer.java |  7 +-
 .../spark/network/client/TransportClient.java  | 11 +--
 .../spark/network/protocol/ChunkFetchFailure.java  | 13 ++--
 .../spark/network/protocol/ChunkFetchRequest.java  |  7 +-
 .../spark/network/protocol/ChunkFetchSuccess.java  | 13 ++--
 .../spark/network/protocol/OneWayMessage.java  |  9 ++-
 .../apache/spark/network/protocol/RpcFailure.java  | 13 ++--
 .../apache/spark/network/protocol/RpcRequest.java  | 13 ++--
 .../apache/spark/network/protocol/RpcResponse.java | 13 ++--
 .../spark/network/protocol/StreamChunkId.java  | 13 ++--
 .../spark/network/protocol/StreamFailure.java  | 13 ++--
 .../spark/network/protocol/StreamRequest.java  |  9 ++-
 .../spark/network/protocol/StreamResponse.java | 15 ++--
 .../spark/network/protocol/UploadStream.java   |  9 ++-
 .../shuffle/ExternalShuffleBlockResolver.java  | 13 ++--
 .../network/shuffle/protocol/BlocksRemoved.java|  9 ++-
 .../shuffle/protocol/ExecutorShuffleInfo.java  | 16 ++--
 .../shuffle/protocol/FetchShuffleBlocks.java   | 17 +++--
 .../shuffle/protocol/GetLocalDirsForExecutors.java | 10 ++-
 .../shuffle/protocol/LocalDirsForExecutors.java| 11 +--
 .../spark/network/shuffle/protocol/OpenBlocks.java | 18 +++--
 .../network/shuffle/protocol/RegisterExecutor.java | 21 +++---
 .../network/shuffle/protocol/RemoveBlocks.java | 23 +++---
 .../network/shuffle/protocol/StreamHandle.java | 17 +++--
 .../network/shuffle/protocol/UploadBlock.java  | 24 +++---
 .../shuffle/protocol/UploadBlockStream.java| 12 +--
 .../CleanupNonShuffleServiceServedFilesSuite.java  |  3 +-
 .../shuffle/ExternalShuffleCleanupSuite.java   |  3 +-
 .../spark/network/yarn/YarnShuffleService.java | 10 ++-
 .../src/main/scala/org/apache/spark/SparkEnv.scala |  5 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala |  4 +-
 .../apache/spark/status/ElementTrackingStore.scala |  8 +-
 .../scala/org/apache/spark/util/ThreadUtils.scala  | 85 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  6 +-
 .../shuffle/sort/UnsafeShuffleWriterSuite.java |  3 +-
 .../spark/scheduler/TaskResultGetterSuite.scala|  5 +-
 .../kinesis/KPLBasedKinesisTestUtils.scala |  7 +-
 .../spark/sql/catalyst/JavaTypeInference.scala | 23 --
 .../spark/sql/JavaBeanDeserializationSuite.java| 36 +
 43 files changed, 360 insertions(+), 217 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (07b04c4 -> 7dff3b1)

2019-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 07b04c4  [SPARK-29938][SQL] Add batching support in Alter table add 
partition flow
 add 7dff3b1  [SPARK-30272][SQL][CORE] Remove usage of Guava that breaks in 
27; replace with workalikes

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/util/kvstore/CustomType1.java |  9 +--
 .../network/buffer/FileSegmentManagedBuffer.java   | 11 +--
 .../spark/network/buffer/NettyManagedBuffer.java   |  7 +-
 .../spark/network/buffer/NioManagedBuffer.java |  7 +-
 .../spark/network/client/TransportClient.java  | 11 +--
 .../spark/network/protocol/ChunkFetchFailure.java  | 13 ++--
 .../spark/network/protocol/ChunkFetchRequest.java  |  7 +-
 .../spark/network/protocol/ChunkFetchSuccess.java  | 13 ++--
 .../spark/network/protocol/OneWayMessage.java  |  9 ++-
 .../apache/spark/network/protocol/RpcFailure.java  | 13 ++--
 .../apache/spark/network/protocol/RpcRequest.java  | 13 ++--
 .../apache/spark/network/protocol/RpcResponse.java | 13 ++--
 .../spark/network/protocol/StreamChunkId.java  | 13 ++--
 .../spark/network/protocol/StreamFailure.java  | 13 ++--
 .../spark/network/protocol/StreamRequest.java  |  9 ++-
 .../spark/network/protocol/StreamResponse.java | 15 ++--
 .../spark/network/protocol/UploadStream.java   |  9 ++-
 .../shuffle/ExternalShuffleBlockResolver.java  | 13 ++--
 .../network/shuffle/protocol/BlocksRemoved.java|  9 ++-
 .../shuffle/protocol/ExecutorShuffleInfo.java  | 16 ++--
 .../shuffle/protocol/FetchShuffleBlocks.java   | 17 +++--
 .../shuffle/protocol/GetLocalDirsForExecutors.java | 10 ++-
 .../shuffle/protocol/LocalDirsForExecutors.java| 11 +--
 .../spark/network/shuffle/protocol/OpenBlocks.java | 18 +++--
 .../network/shuffle/protocol/RegisterExecutor.java | 21 +++---
 .../network/shuffle/protocol/RemoveBlocks.java | 23 +++---
 .../network/shuffle/protocol/StreamHandle.java | 17 +++--
 .../network/shuffle/protocol/UploadBlock.java  | 24 +++---
 .../shuffle/protocol/UploadBlockStream.java| 12 +--
 .../CleanupNonShuffleServiceServedFilesSuite.java  |  3 +-
 .../shuffle/ExternalShuffleCleanupSuite.java   |  3 +-
 .../spark/network/yarn/YarnShuffleService.java | 10 ++-
 .../src/main/scala/org/apache/spark/SparkEnv.scala |  5 +-
 .../spark/deploy/history/FsHistoryProvider.scala   |  3 +-
 .../scala/org/apache/spark/rdd/HadoopRDD.scala |  4 +-
 .../apache/spark/status/ElementTrackingStore.scala |  8 +-
 .../scala/org/apache/spark/util/ThreadUtils.scala  | 85 --
 .../main/scala/org/apache/spark/util/Utils.scala   |  6 +-
 .../shuffle/sort/UnsafeShuffleWriterSuite.java |  3 +-
 .../spark/scheduler/TaskResultGetterSuite.scala|  5 +-
 .../kinesis/KPLBasedKinesisTestUtils.scala |  7 +-
 .../spark/sql/catalyst/JavaTypeInference.scala | 23 --
 .../spark/sql/JavaBeanDeserializationSuite.java| 36 +
 43 files changed, 360 insertions(+), 217 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (0d2ef3a -> 07b04c4)

2019-12-20 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 0d2ef3a  [SPARK-30300][SQL][WEB-UI] Fix updating the UI max value 
string when driver updates the same metric id as the tasks
 add 07b04c4  [SPARK-29938][SQL] Add batching support in Alter table add 
partition flow

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/execution/command/ddl.scala   | 22 +-
 .../org/apache/spark/sql/sources/InsertSuite.scala | 14 ++
 2 files changed, 31 insertions(+), 5 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-30300][SQL][WEB-UI] Fix updating the UI max value string when driver updates the same metric id as the tasks

2019-12-20 Thread tgraves
This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d2ef3a  [SPARK-30300][SQL][WEB-UI] Fix updating the UI max value 
string when driver updates the same metric id as the tasks
0d2ef3a is described below

commit 0d2ef3ae2b2d0b66f763d6bb2e490a667c83f9f2
Author: Niranjan Artal 
AuthorDate: Fri Dec 20 07:29:28 2019 -0600

[SPARK-30300][SQL][WEB-UI] Fix updating the UI max value string when driver 
updates the same metric id as the tasks

### What changes were proposed in this pull request?

In this PR, For a given metrics id we are checking if the driver side 
accumulator's value is greater than max of all stages value. If it's true, then 
we are removing that entry from the Hashmap. By doing this, for this metrics, 
"driver" would be displayed on the UI(As the driver would have the maximum 
value)

### Why are the changes needed?

This PR fixes https://issues.apache.org/jira/browse/SPARK-30300. Currently 
driver's metric value is not compared while caluculating the max.

### Does this PR introduce any user-facing change?

For the metrics where driver's value is greater than max of all stages, 
this is the change.
Previous : (min, median, max (stageId 0( attemptId 1): taskId 2))
Now:   (min, median, max (driver))

### How was this patch tested?

Ran unit tests.

Closes #26941 from nartal1/SPARK-30300.

Authored-by: Niranjan Artal 
Signed-off-by: Thomas Graves 
---
 .../org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala| 6 ++
 1 file changed, 6 insertions(+)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
index 64d2f33..d5bb36e 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
@@ -237,6 +237,12 @@ class SQLAppStatusListener(
   if (metricTypes.contains(id)) {
 val prev = allMetrics.getOrElse(id, null)
 val updated = if (prev != null) {
+  // If the driver updates same metrics as tasks and has higher value 
then remove
+  // that entry from maxMetricsFromAllStage. This would make 
stringValue function default
+  // to "driver" that would be displayed on UI.
+  if (maxMetricsFromAllStages.contains(id) && value > 
maxMetricsFromAllStages(id)(0)) {
+maxMetricsFromAllStages.remove(id)
+  }
   val _copy = Arrays.copyOf(prev, prev.length + 1)
   _copy(prev.length) = value
   _copy


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a296d15 -> 12249fc)

2019-12-20 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE
 add 12249fc  [SPARK-30301][SQL] Fix wrong results when datetimes as fields 
of complex types

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/csvExpressions.scala  |  2 +-
 .../sql/catalyst/expressions/jsonExpressions.scala |  2 +-
 .../apache/spark/sql/execution/HiveResult.scala| 75 ++
 .../test/resources/sql-tests/results/array.sql.out |  4 +-
 .../sql-tests/results/csv-functions.sql.out|  2 +-
 .../sql-tests/results/inline-table.sql.out |  2 +-
 .../sql-tests/results/json-functions.sql.out   |  2 +-
 .../results/typeCoercion/native/concat.sql.out |  2 +-
 .../results/typeCoercion/native/mapZipWith.sql.out |  2 +-
 .../results/typeCoercion/native/mapconcat.sql.out  |  2 +-
 .../sql-tests/results/udf/udf-inline-table.sql.out |  2 +-
 .../spark/sql/execution/HiveResultSuite.scala  | 30 ++---
 12 files changed, 50 insertions(+), 77 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (a296d15 -> 12249fc)

2019-12-20 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE
 add 12249fc  [SPARK-30301][SQL] Fix wrong results when datetimes as fields 
of complex types

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/expressions/csvExpressions.scala  |  2 +-
 .../sql/catalyst/expressions/jsonExpressions.scala |  2 +-
 .../apache/spark/sql/execution/HiveResult.scala| 75 ++
 .../test/resources/sql-tests/results/array.sql.out |  4 +-
 .../sql-tests/results/csv-functions.sql.out|  2 +-
 .../sql-tests/results/inline-table.sql.out |  2 +-
 .../sql-tests/results/json-functions.sql.out   |  2 +-
 .../results/typeCoercion/native/concat.sql.out |  2 +-
 .../results/typeCoercion/native/mapZipWith.sql.out |  2 +-
 .../results/typeCoercion/native/mapconcat.sql.out  |  2 +-
 .../sql-tests/results/udf/udf-inline-table.sql.out |  2 +-
 .../spark/sql/execution/HiveResultSuite.scala  | 30 ++---
 12 files changed, 50 insertions(+), 77 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (18e8d1d -> a296d15)

2019-12-20 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 18e8d1d  [SPARK-30307][SQL] remove ReusedQueryStageExec
 add a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 25 ++
 2 files changed, 36 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (18e8d1d -> a296d15)

2019-12-20 Thread lixiao
This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 18e8d1d  [SPARK-30307][SQL] remove ReusedQueryStageExec
 add a296d15  [SPARK-30291] catch the exception when doing materialize in 
AQE

No new revisions were added by this update.

Summary of changes:
 .../execution/adaptive/AdaptiveSparkPlanExec.scala | 18 ++--
 .../adaptive/AdaptiveQueryExecSuite.scala  | 25 ++
 2 files changed, 36 insertions(+), 7 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org