[ 
https://issues.apache.org/jira/browse/HIVE-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003525#comment-18003525
 ] 

Stamatis Zampetakis commented on HIVE-29009:
--------------------------------------------

I was able to reproduce the JVM hang completely outside Hive by using 
testcontainers 1.15.2, Docker-in-Docker setup with TLS enabled, and Oracle JDK 
17.0.4.1. It turns out that we have been hitting a JDK bug, namely 
[JDK-8315422|https://bugs.openjdk.org/browse/JDK-8315422].

Full instructions to reproduce the problem and detailed root cause analysis can 
be found in the following links:
* https://github.com/testcontainers/testcontainers-java/issues/10454
* https://github.com/zabetak/testcontainers-tls-hang-issue-10454

Since it is mainly a JDK bug it seems that issue was triggered by the upgrade 
to JDK 17 (HIVE-26473). However, very recently the project and the CI have 
moved to JDK 21 (HIVE-29027). The docker image that is used to run the tests in 
CI (https://hub.docker.com/r/ayushtkn/hive-dev-box) is using  JDK 21.0.7+6-LTS.

{noformat}
openjdk version "21.0.7" 2025-04-15 LTS
OpenJDK Runtime Environment Zulu21.42+19-CA (build 21.0.7+6-LTS)
OpenJDK 64-Bit Server VM Zulu21.42+19-CA (build 21.0.7+6-LTS, mixed mode, 
sharing)
{noformat}

JDK-8315422 is fixed in JDK 21.0.5 so we can safely re-enable the 
TestHttpSamlAuthentication since the hang in SSLSocketImpl is resolved. 

I downloaded Zulu21.42+19-CA and confirmed that the repro scenario in 
https://github.com/zabetak/testcontainers-tls-hang-issue-10454 always passes 
using this JDK.

> Intermittent CI timeouts while running tests
> --------------------------------------------
>
>                 Key: HIVE-29009
>                 URL: https://issues.apache.org/jira/browse/HIVE-29009
>             Project: Hive
>          Issue Type: Bug
>          Components: Build Infrastructure, Testing Infrastructure
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: jstack.txt, jstack2.txt
>
>
> Recently various CI runs in master and PRs are timing out while executing 
> tests. The problem is intermittent but rather frequent. The first and last 
> (at the time of logging this ticket) timeout failure in master are outlined 
> below:
> First: https://ci.hive.apache.org/job/hive-precommit/job/master/2532/
> Last: https://ci.hive.apache.org/job/hive-precommit/job/master/2546/
> Unfortunately due to HIVE-29008 the CI logs do not contain enough information 
> to easily determine which test is hanging and if it is the same everytime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to