[ 
https://issues.apache.org/jira/browse/FLINK-12926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878302#comment-16878302
 ] 

Zhu Zhu commented on FLINK-12926:
---------------------------------

>From my observation, the issue is happening though it does not break current 
>tests.

Below are some cases it happens or may happen:

1. Even though the tests do not trigger actions from other thread, the 
production logic might do it, e.g. *Execution#deploy()* as in the attached 
picture, this happens but does not break tests since it is not in the critical 
path and the failed main thread checking does not cause failovers.

2. Besides, the *TestingComponentMainThreadExecutorServiceAdapter* uses 
*DirectScheduledExecutorService* as the underlying ScheduledExecutorService. 
However, DirectScheduledExecutorService will schedule tasks from another 
thread. So if any mainThreadExecutor.schedule* action is invoked in tests or 
production process, it may also violate the main thread checking. No test 
breaks for it yet. But I think we just fortunately dodged(Or intentional?). 
e.g. 

    -  FixedDelayRestartStrategy. No test breaks because no test uses 
FixedDelayRestartStrategy to do failover yet.

    -  HeartbeatMonitor. No test breaks because it does not check main thread, 
HeartbeatManagerTest#testHeartbeatTimeout actually does the timeout handling in 
another pool thread.

 

!Execution#deploy.jpg!

> Main thread checking in some tests fails
> ----------------------------------------
>
>                 Key: FLINK-12926
>                 URL: https://issues.apache.org/jira/browse/FLINK-12926
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination, Tests
>    Affects Versions: 1.9.0
>            Reporter: Zhu Zhu
>            Priority: Major
>         Attachments: Execution#deploy.jpg, mainThreadCheckFailure.log
>
>
> Currently all JM side job changing actions are expected to be taken in 
> JobMaster main thread.
> In current Flink tests, many cases tend to use the test main thread as the JM 
> main thread. This can lead to 2 issues:
> 1. TestingComponentMainThreadExecutorServiceAdapter is a direct executor, so 
> if it is invoked from any other thread, it will break the main thread 
> checking and fail the submitted action (as in the attached log 
> [^mainThreadCheckFailure.log])
> 2. The test main thread does not support other actions queued in its 
> executor, as the test will end once the current test thread action(the 
> current running test body) is done
>  
> In my observation, most cases which starts 
> ExecutionGraph.scheduleForExecution() will encounter this issue. Cases 
> include ExecutionGraphRestartTest, FailoverRegionTest, 
> ConcurrentFailoverStrategyExecutionGraphTest, GlobalModVersionTest, 
> ExecutionGraphDeploymentTest, etc.
>  
> One solution in my mind is to create a ScheduledExecutorService for those 
> tests, use it as the main thread and run the test body in this thread.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to