[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17162925#comment-17162925 ]
Gabor Arki commented on KYLIN-4500: ----------------------------------- For now, I will keep monitoring our server with netstat and try to determine whether there is any correlation with the S3 pool exhaustion. Also, we will try to upgrade to 3.1.0 but will probably take some time to tell whether the issue is still reproducible with that version. I will post an update with our findings once I have them. > Timeout waiting for connection from pool > ---------------------------------------- > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug > Reporter: Gabor Arki > Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 10000 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)