[GitHub] [incubator-seatunnel] someorz opened a new issue, #1861: [Bug] [spark] write to hive error

GitBox Wed, 11 May 2022 22:48:14 -0700


someorz opened a new issue, #1861:
URL: https://github.com/apache/incubator-seatunnel/issues/1861


   ### Search before asking
   
   - [X] I had searched in the 
[issues](https://github.com/apache/incubator-seatunnel/issues?q=is%3Aissue+label%3A%22bug%22)
 and found no similar issues.
   
   
   ### What happened
   
   write to hive error
   
   ### SeaTunnel Version
   
   v2.1.1
   
   ### SeaTunnel Config
   
   ```conf
   env {
     spark.sql.catalogImplementation = "hive"
     spark.hadoop.hive.exec.dynamic.partition = "true"
     spark.hadoop.hive.exec.dynamic.partition.mode = "nonstrict"
     spark.hadoop.metastore.catalog.default="hive"
     spark.sql.catalogImplementation="hive"
     spark.app.name = "SeaTunnel"
     spark.executor.instances = 1
     spark.executor.cores = 1
     spark.executor.memory = "1g"
   }
   
   source {
     hive {
       pre_sql = "select * from default.dwd_fault_info_pdi_2 where 
pt='20220510'"
       result_table_name = "dwd_fault_info_pdi_2"
     }
   }
   
   transform {
   
   }
   
   sink {
     hive {
       sql = "insert overwrite table default.dwd_fault_info_pdi_2_sink 
partition(pt,htid) select * from dwd_fault_info_pdi_2"
       source_table_name = "dwd_fault_info_pdi_2"
       result_table_name = "default.dwd_fault_info_pdi_2_sink"
       save_mode = "overwrite"
       sink_columns = 
"tid,begintime,endtime,beginlongitude,beginlatitude,endlongitude,endlatitude,duration,faultaddr,spn,fmi,pt,htid"
       partition_by = ["pt","htid"]
     }
   }
   ```
   
   
   ### Running Command
   
   ```shell
   ./bin/start-seatunnel-spark.sh --master local[1] --deploy-mode client 
--config ./hive_to_hive.conf
   ```
   
   
   ### Error Exception
   
   ```log
   22/05/12 10:52:38 ERROR seatunnel.Seatunnel: Fatal Error,
   
   22/05/12 10:52:38 ERROR seatunnel.Seatunnel: Please submit bug report in 
https://github.com/apache/incubator-seatunnel/issues
   
   22/05/12 10:52:38 ERROR seatunnel.Seatunnel: Reason:Execute Spark task error
   
   22/05/12 10:52:38 ERROR seatunnel.Seatunnel: Exception 
StackTrace:java.lang.RuntimeException: Execute Spark task error
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:61)
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:36)
           at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:48)
           at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:27)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 3 in 
table dwd_fault_info_pdi_2_sink with 
loadPath=hdfs://ha-hdfs/user/hive/warehouse/dwd_fault_info_pdi_2_sink/.hive-staging_hive_2022-05-12_10-52-32_753_5734183484472760049-1/-ext-10000;
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189)
           at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205)
           at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
           at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
           at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
           at org.apache.seatunnel.spark.hive.sink.Hive.output(Hive.scala:43)
           at org.apache.seatunnel.spark.hive.sink.Hive.output(Hive.scala:29)
           at 
org.apache.seatunnel.spark.SparkEnvironment.sinkProcess(SparkEnvironment.java:164)
           at 
org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:54)
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:58)
           ... 15 more
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when 
loading 3 in table dwd_fault_info_pdi_2_sink with 
loadPath=hdfs://ha-hdfs/user/hive/warehouse/dwd_fault_info_pdi_2_sink/.hive-staging_hive_2022-05-12_10-52-32_753_5734183484472760049-1/-ext-10000
           at 
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2030)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:936)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
           ... 37 more
   
   22/05/12 10:52:38 ERROR seatunnel.Seatunnel:
   
===============================================================================
   
   
   
   Exception in thread "main" java.lang.RuntimeException: Execute Spark task 
error
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:61)
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:36)
           at org.apache.seatunnel.Seatunnel.run(Seatunnel.java:48)
           at org.apache.seatunnel.SeatunnelSpark.main(SeatunnelSpark.java:27)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
           at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Exception when loading 3 in 
table dwd_fault_info_pdi_2_sink with 
loadPath=hdfs://ha-hdfs/user/hive/warehouse/dwd_fault_info_pdi_2_sink/.hive-staging_hive_2022-05-12_10-52-32_753_5734183484472760049-1/-ext-10000;
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:108)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.loadDynamicPartitions(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.loadDynamicPartitions(ExternalCatalogWithListener.scala:189)
           at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.processInsert(InsertIntoHiveTable.scala:205)
           at 
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.run(InsertIntoHiveTable.scala:99)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
           at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:115)
           at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
           at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
           at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
           at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
           at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
           at org.apache.spark.sql.Dataset.<init>(Dataset.scala:194)
           at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
           at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:651)
           at org.apache.seatunnel.spark.hive.sink.Hive.output(Hive.scala:43)
           at org.apache.seatunnel.spark.hive.sink.Hive.output(Hive.scala:29)
           at 
org.apache.seatunnel.spark.SparkEnvironment.sinkProcess(SparkEnvironment.java:164)
           at 
org.apache.seatunnel.spark.batch.SparkBatchExecution.start(SparkBatchExecution.java:54)
           at 
org.apache.seatunnel.command.spark.SparkTaskExecuteCommand.execute(SparkTaskExecuteCommand.java:58)
           ... 15 more
   Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Exception when 
loading 3 in table dwd_fault_info_pdi_2_sink with 
loadPath=hdfs://ha-hdfs/user/hive/warehouse/dwd_fault_info_pdi_2_sink/.hive-staging_hive_2022-05-12_10-52-32_753_5734183484472760049-1/-ext-10000
           at 
org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions(Hive.java:2030)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at 
org.apache.spark.sql.hive.client.Shim_v2_1.loadDynamicPartitions(HiveShim.scala:1169)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveClientImpl.scala:804)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadDynamicPartitions$1.apply(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
           at 
org.apache.spark.sql.hive.client.HiveClientImpl.loadDynamicPartitions(HiveClientImpl.scala:802)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply$mcV$sp(HiveExternalCatalog.scala:936)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadDynamicPartitions$1.apply(HiveExternalCatalog.scala:924)
           at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
   ```
   
   
   ### Flink or Spark Version
   
   spark2.4+hadoop3.0+cdh6.3.2
   
   ### Java or Scala Version
   
   java 8
   
   ### Screenshots
   
   
![image](https://user-images.githubusercontent.com/5456361/168000441-a3277d9c-29e1-40a5-9ec9-1ccef0841f44.png)
   
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-seatunnel] someorz opened a new issue, #1861: [Bug] [spark] write to hive error

Reply via email to