[
https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabor Arki updated KYLIN-4500:
------------------------------
Description:
h4. Environment
* Kylin server 3.0.0
* EMR 5.28
h4. Issue
After an extended uptime, both Kylin query server and jobs running on EMR stop
working. The root cause is both cases is:
{noformat}
Caused by: java.io.IOException:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to
execute HTTP request: Timeout waiting for connection from pool
at
com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat}
{{Based on
[https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/]
increasing the *fs.s3.maxConnections* setting to 10000 is just delaying the
issue thus the underlying issue is likely a connection leak. It also indicates
a leak that restarting the kylin service solves the problem.}}
{{A full stack trace from the QueryService is attached.}}
was:
h4. Environment
* Kylin server 3.0.0
* EMR 5.28
h4. Issue
After an extended uptime, both Kylin query server and jobs running on EMR stop
working. The root cause is both cases is:
{{Caused by: java.io.IOException:
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to
execute HTTP request: Timeout waiting for connection from pool}}
{{ at
com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
~[emrfs-hadoop-assembly-2.37.0.jar:?]}}
{{Based on
[https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/]
increasing the *fs.s3.maxConnections* setting to 10000 is just delaying the
issue thus the underlying issue is likely a connection leak. It also indicates
a leak that restarting the kylin service solves the problem.}}
{{A full stack trace from the QueryService is attached.}}
> Timeout waiting for connection from pool
> ----------------------------------------
>
> Key: KYLIN-4500
> URL: https://issues.apache.org/jira/browse/KYLIN-4500
> Project: Kylin
> Issue Type: Bug
> Reporter: Gabor Arki
> Priority: Major
> Attachments: kylin-connection-timeout.txt
>
>
> h4. Environment
> * Kylin server 3.0.0
> * EMR 5.28
> h4. Issue
> After an extended uptime, both Kylin query server and jobs running on EMR
> stop working. The root cause is both cases is:
> {noformat}
> Caused by: java.io.IOException:
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable
> to execute HTTP request: Timeout waiting for connection from pool
> at
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257)
> ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat}
> {{Based on
> [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/]
> increasing the *fs.s3.maxConnections* setting to 10000 is just delaying the
> issue thus the underlying issue is likely a connection leak. It also
> indicates a leak that restarting the kylin service solves the problem.}}
> {{A full stack trace from the QueryService is attached.}}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)