[
https://issues.apache.org/jira/browse/HIVE-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003525#comment-18003525
]
Stamatis Zampetakis commented on HIVE-29009:
--------------------------------------------
I was able to reproduce the JVM hang completely outside Hive by using
testcontainers 1.15.2, Docker-in-Docker setup with TLS enabled, and Oracle JDK
17.0.4.1. It turns out that we have been hitting a JDK bug, namely
[JDK-8315422|https://bugs.openjdk.org/browse/JDK-8315422].
Full instructions to reproduce the problem and detailed root cause analysis can
be found in the following links:
* https://github.com/testcontainers/testcontainers-java/issues/10454
* https://github.com/zabetak/testcontainers-tls-hang-issue-10454
Since it is mainly a JDK bug it seems that issue was triggered by the upgrade
to JDK 17 (HIVE-26473). However, very recently the project and the CI have
moved to JDK 21 (HIVE-29027). The docker image that is used to run the tests in
CI (https://hub.docker.com/r/ayushtkn/hive-dev-box) is using JDK 21.0.7+6-LTS.
{noformat}
openjdk version "21.0.7" 2025-04-15 LTS
OpenJDK Runtime Environment Zulu21.42+19-CA (build 21.0.7+6-LTS)
OpenJDK 64-Bit Server VM Zulu21.42+19-CA (build 21.0.7+6-LTS, mixed mode,
sharing)
{noformat}
JDK-8315422 is fixed in JDK 21.0.5 so we can safely re-enable the
TestHttpSamlAuthentication since the hang in SSLSocketImpl is resolved.
I downloaded Zulu21.42+19-CA and confirmed that the repro scenario in
https://github.com/zabetak/testcontainers-tls-hang-issue-10454 always passes
using this JDK.
> Intermittent CI timeouts while running tests
> --------------------------------------------
>
> Key: HIVE-29009
> URL: https://issues.apache.org/jira/browse/HIVE-29009
> Project: Hive
> Issue Type: Bug
> Components: Build Infrastructure, Testing Infrastructure
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
> Attachments: jstack.txt, jstack2.txt
>
>
> Recently various CI runs in master and PRs are timing out while executing
> tests. The problem is intermittent but rather frequent. The first and last
> (at the time of logging this ticket) timeout failure in master are outlined
> below:
> First: https://ci.hive.apache.org/job/hive-precommit/job/master/2532/
> Last: https://ci.hive.apache.org/job/hive-precommit/job/master/2546/
> Unfortunately due to HIVE-29008 the CI logs do not contain enough information
> to easily determine which test is hanging and if it is the same everytime.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)