Till Rohrmann created FLINK-10866:
-------------------------------------

             Summary: Queryable state can prevent cluster from starting
                 Key: FLINK-10866
                 URL: https://issues.apache.org/jira/browse/FLINK-10866
             Project: Flink
          Issue Type: Improvement
          Components: Local Runtime
    Affects Versions: 1.6.2, 1.5.5, 1.7.0
            Reporter: Till Rohrmann


The {{KvStateServerImpl}} can currently prevent the {{TaskExecutor}} from 
starting. 

Currently, the QS server starts per default on port {{9067}}. If this port is 
not free, then it fails and stops the whole initialization of the 
{{TaskExecutor}}. I think the QS server should not stop the {{TaskExecutor}} 
from starting.

We should at least change the default port to {{0}} to avoid port conflicts. 
However, this will break all setups which don't explicitly set the QS port 
because now it either needs to be setup or extracted from the logs.

Additionally, we should think about whether a QS server startup failure should 
lead to a {{TaskExecutor}} failure or simply be logged. Both approaches have 
pros and cons. Currently, a failing QS server will also affect users which 
don't want to use QS. If we tolerate failures in the QS server, then a user who 
wants to use QS might run into problems with state not being reachable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to