Yunjian Zhang created SPARK-20973:
-------------------------------------

             Summary: insert table fail caused by unable to fetch data 
definition file from remote hdfs 
                 Key: SPARK-20973
                 URL: https://issues.apache.org/jira/browse/SPARK-20973
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.1.0
            Reporter: Yunjian Zhang


I implemented my own hive serde to handle special data files which needs to 
read data definition during process.
the process include
1.read definition file location from TBLPROPERTIES
2.read file content as per step 1
3.init serde base on step 2.
//DDL of the table as below:
---------------------------------------------
CREATE EXTERNAL TABLE dw_user_stg_txt_out
ROW FORMAT SERDE 'com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe'
STORED AS
  INPUTFORMAT 'com.ebay.dss.gdr.mapred.AbAsAvroInputFormat'
  OUTPUTFORMAT 'com.ebay.dss.gdr.hive.ql.io.ab.AvroAsAbOutputFormat'
LOCATION 'hdfs://${remote_hdfs}/user/data'
TBLPROPERTIES (
  'com.ebay.dss.dml.file' = 'hdfs://${remote_hdfs}/dml/user.dml'
)
// insert statement
insert overwrite table dw_user_stg_txt_out select * from dw_user_stg_txt_avro;
//fail with ERROR
17/06/02 15:46:34 ERROR SparkSQLDriver: Failed in [insert overwrite table 
dw_user_stg_txt_out select * from dw_user_stg_txt_avro]
java.lang.RuntimeException: FAILED to get dml file from: 
hdfs://${remote-hdfs}/dml/user.dml
        at 
com.ebay.dss.gdr.hive.serde.abvro.AbvroSerDe.initialize(AbvroSerDe.java:109)
        at 
org.apache.spark.sql.hive.SparkHiveWriterContainer.newSerializer(hiveWriterContainers.scala:160)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:258)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
        at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to