subject:"\[GitHub\] spark pull request #18127\: \[SPARK\-6628\]\[SQL\]\[Branch\-2.1\] Fix ClassCastExcept..."

[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

2017-08-05 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

2017-05-26 Thread weiqingy

GitHub user weiqingy opened a pull request:

https://github.com/apache/spark/pull/18127

[SPARK-6628][SQL][Branch-2.1] Fix ClassCastException when executing sql 
statement 'insert into' on hbase table

## What changes were proposed in this pull request?

The issue of SPARK-6628 is:
```
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat 
```
cannot be cast to
```
org.apache.hadoop.hive.ql.io.HiveOutputFormat
```
The reason is:
```
public interface HiveOutputFormat extends OutputFormat {â¦}

public class HiveHBaseTableOutputFormat extends
TableOutputFormat implements
OutputFormat {...}
```
From the two snippets above, we can see both `HiveHBaseTableOutputFormat` 
and `HiveOutputFormat` `extends`/`implements` `OutputFormat`, and can not cast 
to each other. 

For Spark 1.6, 2.0, 2.1, Spark initials the `outputFormat` in 
`SparkHiveWriterContainer`. For Spark 2.2+,  Spark initials the `outputFormat` 
in `HiveFileFormat`.
```
@transient private lazy val outputFormat =
jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef, 
Writable]]
```
`outputFormat` above has to be  `HiveOutputFormat`. However, when users 
insert data into hbase, the outputFormat is `HiveHBaseTableOutputFormat`, it 
isn't instance of `HiveOutputFormat`.

This PR is to make `outputFormat` to be "null" when the `OutputFormat` is 
not an instance of `HiveOutputFormat`. This change should be safe since 
`outputFormat` is only used to get the file extension in function 
[`getFileExtension()`](https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala#L101).
 

We can also submit this PR to Master branch.

## How was this patch tested?
Manually test.
(1) create a HBase table with Hive:
```
CREATE TABLE testwq100 (row_key string COMMENT 'from deserializer', 
application string COMMENT 'from deserializer', starttime timestamp COMMENT 
'from deserializer', endtime timestamp COMMENT 'from deserializer', status 
string COMMENT 'from deserializer', statusid smallint COMMENT 'from 
deserializer',   insertdate timestamp COMMENT 'from deserializer', count int 
COMMENT 'from deserializer', errordesc string COMMENT 'from deserializer') ROW 
FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 
'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 
'hbase.columns.mapping'='cf1:application,cf1:starttime,cf1:endtime,cf1:Status,cf1:StatusId,cf1:InsertDate,cf1:count,cf1:ErrorDesc',
 'line.delim'='\\n',   'mapkey.delim'='\\u0003', 
'serialization.format'='\\u0001') TBLPROPERTIES 
('transient_lastDdlTime'='1489696241', 'hbase.table.name' = 'xyz', 
'hbase.mapred.output.outputtable' = 'xyz')
```
(2) verify:

**Before:**

Insert data into the Hbase table `testwq100` from Spark SQL:
```
scala> sql(s"INSERT INTO testwq100 VALUES 
('AA1M22','AA1M122','2011722','201156','Starte1d6',45,20,1,'ad1')")
17/05/26 00:09:10 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
java.lang.ClassCastException: 
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
org.apache.hadoop.hive.ql.io.HiveOutputFormat
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:82)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:81)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:101)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:125)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:94)
at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.writeToFile(hiveWriterContainers.scala:182)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:210)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/05/26 00:09:10 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 3, 
localhost, executor driver): java.lang.ClassCastException: 
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to 
org.apache.hadoop.hive.ql.io.HiveOutputFormat
at 
org.apache.spa

[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

[GitHub] spark pull request #18127: [SPARK-6628][SQL][Branch-2.1] Fix ClassCastExcept...

2 matches

Site Navigation

Mail list logo

Footer information