[
https://issues.apache.org/jira/browse/HIVE-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17966268#comment-17966268
]
Stamatis Zampetakis commented on HIVE-29009:
--------------------------------------------
The JVM dump file that is generated by Surefire for split-20 has some
interesting information.
{noformat}
$ head
./split-20/itests/hive-unit/target/surefire-reports/2025-05-27T12-13-51_509-jvmRun1.dump
# Created at 2025-05-27T17:05:56.345
Exiting self fork JVM. Received SHUTDOWN command from Maven shutdown hook.
Thread dump before exiting the process
(154044@hive-precommit-master-2532-s662z-wcvmv-rj1tn):
"shutdown-hook-0"
java.lang.Thread.State: TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at
org.apache.hive.service.server.HiveServer2.graceful_stop(HiveServer2.java:1045)
at
org.apache.hive.service.server.HiveServer2.lambda$init$2(HiveServer2.java:459)
at
org.apache.hive.service.server.HiveServer2$$Lambda$288/1106392217.run(Unknown
Source)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
{noformat}
Basically, we can see that when the JVM was stopped the split was running the
TestHS2SessionHive.testSessionHive test case.
{noformat}
$ grep '"main"' -A 20
./split-20/itests/hive-unit/target/surefire-reports/2025-05-27T12-13-51_509-jvmRun1.dump
"main"
java.lang.Thread.State: WAITING
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at
org.apache.hive.service.server.TestHS2SessionHive.testSessionHive(TestHS2SessionHive.java:102)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
{noformat}
In 2546, the test that was running in split-13 before the timeout was
TestHttpSamlAuthentication.testGroupNameFiltering2 so it is not necessarily a
specific test causing the failure but possibly something else. We may need to
collect more datapoints and stacktraces from other failing CI runs.
> Intermittent CI timeouts while running tests
> --------------------------------------------
>
> Key: HIVE-29009
> URL: https://issues.apache.org/jira/browse/HIVE-29009
> Project: Hive
> Issue Type: Bug
> Components: Build Infrastructure, Testing Infrastructure
> Reporter: Stamatis Zampetakis
> Priority: Major
>
> Recently various CI runs in master and PRs are timing out while executing
> tests. The problem is intermittent but rather frequent. The first and last
> (at the time of logging this ticket) timeout failure in master are outlined
> below:
> First: https://ci.hive.apache.org/job/hive-precommit/job/master/2532/
> Last: https://ci.hive.apache.org/job/hive-precommit/job/master/2546/
> Unfortunately due to HIVE-29008 the CI logs do not contain enough information
> to easily determine which test is hanging and if it is the same everytime.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)