[ 
https://issues.apache.org/jira/browse/FLINK-32191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17726494#comment-17726494
 ] 

Weihua Hu commented on FLINK-32191:
-----------------------------------

There are two transport types, Nio and Epoll. 

For Epoll, we already have options for keepalive, such as 
"EpollChannelOption.TCP_KEEPIDLE".
But for Nio, the keepalive options have been introduced by JDK11, such as 
"ExtendedSocketOptions.TCP_KEEPIDLE".

Flink is still required to be compatible with JDK8, even though it has been 
deprecated. Hence, we need to inform users that these configurations will not 
be taken into account if NIO and JDK8 are used together. 

Would you mind taking a look at this ticket when you are free. [~Weijie 
Guo][~wanglijie]


> Support for configuring tcp keepalive related parameters.
> ---------------------------------------------------------
>
>                 Key: FLINK-32191
>                 URL: https://issues.apache.org/jira/browse/FLINK-32191
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Network
>            Reporter: dizhou cao
>            Priority: Minor
>
> We encountered a case in our production environment where the netty client 
> was unable to send data to the server due to an abnormality in the switch 
> link. However, client can only detect the abnormality after RTO timeout 
> retransmission failure, which takes about 15 minutes in our production 
> environment. This may result in a 15-minute job unavailability. We hope to 
> perform failover and reschedule job more quickly. Flink has already enabled 
> keepalive, but the default keepalive idle time is 2 hours. We can adjust the 
> timeout of TCP keepalive by configuring TCP_KEEPIDLE, TCP_KEEPINTERVAL, and 
> TCP_KEEPCOUNT. These configurations are already supported at the Netty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to