[ https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213579#comment-17213579 ]
Lsw_aka_laplace commented on FLINK-19630: ----------------------------------------- [~jark] [~lzljs3620320] Would u guys mind taking a glimpse~ > Sink data in [ORC] format into Hive By using Legacy Table API caused > unexpected Exception > ------------------------------------------------------------------------------------------ > > Key: FLINK-19630 > URL: https://issues.apache.org/jira/browse/FLINK-19630 > Project: Flink > Issue Type: Bug > Components: Connectors / Hive, Table SQL / Ecosystem > Affects Versions: 1.11.2 > Reporter: Lsw_aka_laplace > Priority: Critical > Attachments: image-2020-10-14-11-36-48-086.png, > image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, > image-2020-10-14-11-48-51-310.png > > > *ENV:* > *Flink version 1.11.2* > *Hive exec version: 2.0.1* > *Hive file storing type :ORC* > *SQL or Datastream: SQL API* > *Kafka Connector : custom Kafka connector which is based on Legacy API > (TableSource/`org.apache.flink.types.Row`)* > *Hive Connector : totally follows the Flink-Hive-connector (we only made some > encapsulation upon it)* > *Using StreamingFileCommitter:YES* > > > *Description:* > try to execute the following SQL: > """ > insert into hive_table (select * from kafka_table) > """ > HIVE Table SQL seems like: > """ > CREATE TABLE `hive_table`( > // some fields > PARTITIONED BY ( > `dt` string, > `hour` string) > STORED AS orc > TBLPROPERTIES ( > 'orc.compress'='SNAPPY', > 'type'='HIVE', > 'sink.partition-commit.trigger'='process-time', > 'sink.partition-commit.delay' = '1 h', > 'sink.partition-commit.policy.kind' = 'metastore,success-file', > ) > """ > When this job starts to process snapshot, here comes the weird exception: > !image-2020-10-14-11-36-48-086.png|width=882,height=395! > As we can see from the message:Owner thread shall be the [Legacy Source > Thread], but actually the streamTaskThread which represents the whole first > stage is found. > So I checked the Thread dump at once. > !image-2020-10-14-11-41-53-379.png|width=801,height=244! > The > legacy Source Thread > > !image-2020-10-14-11-42-57-353.png|width=846,height=226! > The StreamTask > Thread > > According to the thread dump info and the Exception Message, I searched > and read certain source code and then *{color:#ffab00}DID A TEST{color}* > > {color:#172b4d} Since the Kafka connector is customed, I tried to make the > KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The > task topology as follows:{color} > {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color} > > {color:#505f79}*Fortunately, it did work! No Exception is throwed and > Checkpoint could be snapshot successfully!*{color} > > > So, from my perspective, there shall be something wrong when HiveWritingTask > and LegacySourceTask chained together. the Legacy source task is a seperated > thread, which may be the cause of the exception mentioned above. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)