[ 
https://issues.apache.org/jira/browse/FLINK-21461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289623#comment-17289623
 ] 

zhuxiaoshang commented on FLINK-21461:
--------------------------------------

our legacy spark compaction program cause this. Not a problem,close.

> FileNotFoundException when sink to  hive
> ----------------------------------------
>
>                 Key: FLINK-21461
>                 URL: https://issues.apache.org/jira/browse/FLINK-21461
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>            Reporter: zhuxiaoshang
>            Priority: Major
>
> FileNotFoundException appeared occasionally when reading from kafka and sink 
> to hive.
> Complete exception as follows:
>  
> {code:java}
> 2021-02-23 16:08:092021-02-23 
> 16:08:09org.apache.flink.streaming.runtime.tasks.AsynchronousException: 
> Caught exception while processing timer. at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$StreamTaskAsyncExceptionHandler.handleAsyncException(StreamTask.java:1088)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.handleAsyncException(StreamTask.java:1062)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1183)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$null$13(StreamTask.java:1172)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
>  at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78) 
> at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:282)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:190)
>  at 
> org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530)
>  at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721) at 
> org.apache.flink.runtime.taskmanager.Task.run(Task.java:546) at 
> java.lang.Thread.run(Thread.java:748)Caused by: 
> TimerException{java.io.UncheckedIOException: java.io.FileNotFoundException: 
> File does not exist: 
> hdfs://xxx/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f}
>  ... 12 moreCaused by: java.io.UncheckedIOException: 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://data2/data/dw/qttods.db/age_fusion_log_hi/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f
>  at 
> org.apache.flink.connectors.hive.HiveTableSink$HiveRollingPolicy.shouldRollOnProcessingTime(HiveTableSink.java:556)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onProcessingTime(Bucket.java:320)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.onProcessingTime(Buckets.java:324)
>  at 
> org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSinkHelper.onProcessingTime(StreamingFileSinkHelper.java:95)
>  at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1181)
>  ... 11 moreCaused by: java.io.FileNotFoundException: File does not exist: 
> hdfs://xxx/dt=2021-02-23/hh=15/.part-fa0b33ca-d27c-44ad-bcd7-564dc1892791-4-8.inprogress.7ed34f7f-0ec6-421e-b8d0-7cccf429c78f
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
>  at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
>  at 
> org.apache.flink.connectors.hive.write.HiveBulkWriterFactory$1.getSize(HiveBulkWriterFactory.java:54)
>  at 
> org.apache.flink.formats.hadoop.bulk.HadoopPathBasedPartFileWriter.getSize(HadoopPathBasedPartFileWriter.java:84)
>  at 
> org.apache.flink.connectors.hive.HiveTableSink$HiveRollingPolicy.shouldRollOnProcessingTime(HiveTableSink.java:554)
>  ... 15 more
> {code}
> Sink sql like :
>  
> insert into hive_catalog.my_db.sink_table
> /*+ OPTIONS('is_generic'='false',
> 'format'='parquet',
> 'sink.partition-commit.delay'='60s',
> 'sink.partition-commit.policy.kind'='metastore,success-file',
> 'sink.partition-commit.success-file.name'='_SUCCESS',
> 'table.exec.hive.fallback-mapred-writer'='false') */
> select
> log_timestamp,
> ip,
> field,
> from_unixtime(log_timestamp/1000,'yyyy-MM-dd') as `dt`,
> from_unixtime(log_timestamp/1000,'HH') as `hh`
> from source_table;
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to