[ 
https://issues.apache.org/jira/browse/HIVE-29009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17966268#comment-17966268
 ] 

Stamatis Zampetakis commented on HIVE-29009:
--------------------------------------------

The JVM dump file that is generated by Surefire for split-20 has some 
interesting information.

{noformat}
$ head 
./split-20/itests/hive-unit/target/surefire-reports/2025-05-27T12-13-51_509-jvmRun1.dump
# Created at 2025-05-27T17:05:56.345
Exiting self fork JVM. Received SHUTDOWN command from Maven shutdown hook.
Thread dump before exiting the process 
(154044@hive-precommit-master-2532-s662z-wcvmv-rj1tn):
"shutdown-hook-0" 
   java.lang.Thread.State: TIMED_WAITING
        at java.lang.Thread.sleep(Native Method)
        at 
org.apache.hive.service.server.HiveServer2.graceful_stop(HiveServer2.java:1045)
        at 
org.apache.hive.service.server.HiveServer2.lambda$init$2(HiveServer2.java:459)
        at 
org.apache.hive.service.server.HiveServer2$$Lambda$288/1106392217.run(Unknown 
Source)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

{noformat}

Basically, we can see that when the JVM was stopped the split was running the 
TestHS2SessionHive.testSessionHive test case.
{noformat}
$ grep '"main"' -A 20  
./split-20/itests/hive-unit/target/surefire-reports/2025-05-27T12-13-51_509-jvmRun1.dump
"main" 
   java.lang.Thread.State: WAITING
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at 
org.apache.hive.service.server.TestHS2SessionHive.testSessionHive(TestHS2SessionHive.java:102)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
        at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
        at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
        at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
        at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
        at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
        at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
{noformat}

In 2546, the test that was running in split-13 before the timeout was 
TestHttpSamlAuthentication.testGroupNameFiltering2 so it is not necessarily a 
specific test causing the failure but possibly something else. We may need to 
collect more datapoints and stacktraces from other failing CI runs.


> Intermittent CI timeouts while running tests
> --------------------------------------------
>
>                 Key: HIVE-29009
>                 URL: https://issues.apache.org/jira/browse/HIVE-29009
>             Project: Hive
>          Issue Type: Bug
>          Components: Build Infrastructure, Testing Infrastructure
>            Reporter: Stamatis Zampetakis
>            Priority: Major
>
> Recently various CI runs in master and PRs are timing out while executing 
> tests. The problem is intermittent but rather frequent. The first and last 
> (at the time of logging this ticket) timeout failure in master are outlined 
> below:
> First: https://ci.hive.apache.org/job/hive-precommit/job/master/2532/
> Last: https://ci.hive.apache.org/job/hive-precommit/job/master/2546/
> Unfortunately due to HIVE-29008 the CI logs do not contain enough information 
> to easily determine which test is hanging and if it is the same everytime.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to