[ 
https://issues.apache.org/jira/browse/FLINK-19630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213579#comment-17213579
 ] 

Lsw_aka_laplace commented on FLINK-19630:
-----------------------------------------

[~jark]

[~lzljs3620320]

 

Would u guys mind taking a glimpse~

> Sink data in [ORC] format into Hive By using Legacy Table API  caused 
> unexpected Exception
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-19630
>                 URL: https://issues.apache.org/jira/browse/FLINK-19630
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive, Table SQL / Ecosystem
>    Affects Versions: 1.11.2
>            Reporter: Lsw_aka_laplace
>            Priority: Critical
>         Attachments: image-2020-10-14-11-36-48-086.png, 
> image-2020-10-14-11-41-53-379.png, image-2020-10-14-11-42-57-353.png, 
> image-2020-10-14-11-48-51-310.png
>
>
> *ENV:*
> *Flink version 1.11.2*
> *Hive exec version: 2.0.1*
> *Hive file storing type :ORC*
> *SQL or Datastream: SQL API*
> *Kafka Connector :  custom Kafka connector which is based on Legacy API 
> (TableSource/`org.apache.flink.types.Row`)*
> *Hive Connector : totally follows the Flink-Hive-connector (we only made some 
> encapsulation upon it)*
> *Using StreamingFileCommitter:YES*
>  
>  
> *Description:*
>    try to execute the following SQL:
>     """
>       insert into hive_table (select * from kafka_table)
>     """
>    HIVE Table SQL seems like:
>     """
> CREATE TABLE `hive_table`(
>  // some fields
> PARTITIONED BY (
>  `dt` string,
>  `hour` string)
> STORED AS orc
> TBLPROPERTIES (
>  'orc.compress'='SNAPPY',
> 'type'='HIVE',
> 'sink.partition-commit.trigger'='process-time',
> 'sink.partition-commit.delay' = '1 h',
> 'sink.partition-commit.policy.kind' = 'metastore,success-file',
> )   
>    """
> When this job starts to process snapshot, here comes the weird exception:
> !image-2020-10-14-11-36-48-086.png|width=882,height=395!
> As we can see from the message:Owner thread shall be the [Legacy Source 
> Thread], but actually the streamTaskThread which represents the whole first 
> stage is found. 
> So I checked the Thread dump at once.
> !image-2020-10-14-11-41-53-379.png|width=801,height=244!
>                                                                      The 
> legacy Source Thread
>  
> !image-2020-10-14-11-42-57-353.png|width=846,height=226!
>                                                                The StreamTask 
> Thread
>  
>    According to the thread dump info and the Exception Message, I searched 
> and read certain source code and then *{color:#ffab00}DID A TEST{color}*
>  
> {color:#172b4d}   Since the Kafka connector is customed, I tried to make the 
> KafkaSource a serpated stage by changing the TaskChainStrategy to Never. The 
> task topology as follows:{color}
> {color:#172b4d}!image-2020-10-14-11-48-51-310.png|width=753,height=208!{color}
>  
> {color:#505f79}*Fortunately, it did work! No Exception is throwed and 
> Checkpoint could be snapshot successfully!*{color}
>  
>  
> So, from my perspective, there shall be something wrong when HiveWritingTask 
> and  LegacySourceTask chained together. the Legacy source task is a seperated 
> thread, which may be the cause of the exception mentioned above.
>  
>                                                                 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to