[ 
https://issues.apache.org/jira/browse/FLINK-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108008#comment-17108008
 ] 

Lijie Wang commented on FLINK-17645:
------------------------------------

Hi, [~sewen], just like [~zhuzh] 's comment, except OOM, maybe there are other 
errors and exceptions thrown when start the REAPER_THREAD, it will also meet 
this problem. And once it occured, we can only see the  
"java.lang.IllegalStateException" lead to job failed, it's confused for user. 
To get the real reason,  users have to find the first exception at here, and 
then realized that the first one lead to the subsequent exceptions.

> REAPER_THREAD.start() in SafetyNetCloseableRegistry failed, causing the 
> repeated failover.
> ------------------------------------------------------------------------------------------
>
>                 Key: FLINK-17645
>                 URL: https://issues.apache.org/jira/browse/FLINK-17645
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.10.1, 1.11.0
>            Reporter: Zakelly Lan
>            Assignee: Lijie Wang
>            Priority: Major
>             Fix For: 1.11.0
>
>
> I'm running a modified version of Flink, and encountered the exception below 
> when task start:
> {code:java}
> 2020-05-12 00:46:19,037 ERROR [***] org.apache.flink.runtime.taskmanager.Task 
>   - Encountered an unexpected exception
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:802)
>         at 
> org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
>         at 
> org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
>         at java.lang.Thread.run(Thread.java:834)
> 2020-05-12 00:46:19,038 INFO  [***] org.apache.flink.runtime.taskmanager.Task 
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:802)
>         at 
> org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
>         at 
> org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
>         at java.lang.Thread.run(Thread.java:834)
> {code}
> The REAPER_THREAD.start() fails because of OOM, and REAPER_THREAD will never 
> be null. Since then, every time SafetyNetCloseableRegistry init in this VM 
> will cause an IllegalStateException:
> {code:java}
> java.lang.IllegalStateException
>       at 
> org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
>       at 
> org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:71)
>       at 
> org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
>       at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
>       at java.lang.Thread.run(Thread.java:834){code}
> This may happen in very old version of Flink as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to