[ 
https://issues.apache.org/jira/browse/HIVE-29689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-29689:
--------------------------------
    Description: 
https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-6566/3/tests/
{code}
Error
Already running future in not supposed to be cancelled with the current 
implementation
Stacktrace
java.lang.AssertionError: Already running future in not supposed to be 
cancelled with the current implementation
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at 
org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator.testExceptionIsPropagatedFromSplitSerializer(TestHiveSplitGenerator.java:148)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
Standard Error
2026-06-27T01:24:09,955  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.override does not exist
2026-06-27T01:24:09,956  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.hivesite does not exist
2026-06-27T01:24:09,957  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.metastoresite does not exist
2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: SplitGenerator 
using llap affinitized locations: false locationProviderClass: 
org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider
2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: 
SplitLocationProvider: org.apache.hadoop.hive.ql.exec.tez.Utils$1@10c07b8d
2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - #1] 
tez.TestHiveSplitGenerator: Write split #1
2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - #1] 
tez.TestHiveSplitGenerator: Split #1 is about to throw exception
2026-06-27T01:24:10,986 ERROR [main] tez.HiveSplitGenerator: Exception while 
generating splits
java.lang.RuntimeException: java.io.IOException: Cannot write file to path: 
file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:229)
        at 
java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: Cannot write file to path: 
file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
        at 
org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator$HiveSplitGeneratorSerializerException$SplitSerializerWithException.writeSplit(TestHiveSplitGenerator.java:244)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:226)
        ... 4 more
{code}


The test asserts the contract documented in HiveSplitGenerator.SplitSerializer: 
a write task that is already running when another task fails must not be 
cancelled — i.e., split0Finished == true and split2Finished == false.

It tries to set this up with three splits on an 8-thread executor:

- Split #0: sleeps 1s, sets split0Finished = true on completion.
- Split #1: throws IOException, which sets anyTaskFailed = true.
- Split #2: enters write() after a 1s delay, so the runnable's 
!anyTaskFailed.get() guard short-circuits it and split2Finished stays false.

The problem: every task body in SplitSerializer.write() is wrapped in if 
(!anyTaskFailed.get()) { writeSplit(...) }. There is no happens-before relation 
guaranteeing that split #0's runnable evaluates that guard before split #1's 
runnable runs to completion. With 8 threads available, the executor can 
schedule split #1 first; it sets anyTaskFailed = true before split #0's thread 
even reaches the guard. Split #0 then short-circuits — writeSplit is never 
called, Thread.sleep/split0Finished.set(true) never run — and the assertion 
fails.

The CI log confirms this ordering: only Write split #1 is logged before the 
failure; the expected Write split #0 line is absent.

  was:
{code}
Error
Already running future in not supposed to be cancelled with the current 
implementation
Stacktrace
java.lang.AssertionError: Already running future in not supposed to be 
cancelled with the current implementation
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.assertTrue(Assert.java:42)
        at 
org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator.testExceptionIsPropagatedFromSplitSerializer(TestHiveSplitGenerator.java:148)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
Standard Error
2026-06-27T01:24:09,955  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.override does not exist
2026-06-27T01:24:09,956  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.hivesite does not exist
2026-06-27T01:24:09,957  WARN [main] conf.HiveConf: HiveConf of name 
hive.dummyparam.test.server.specific.config.metastoresite does not exist
2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: SplitGenerator 
using llap affinitized locations: false locationProviderClass: 
org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider
2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: 
SplitLocationProvider: org.apache.hadoop.hive.ql.exec.tez.Utils$1@10c07b8d
2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - #1] 
tez.TestHiveSplitGenerator: Write split #1
2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - #1] 
tez.TestHiveSplitGenerator: Split #1 is about to throw exception
2026-06-27T01:24:10,986 ERROR [main] tez.HiveSplitGenerator: Exception while 
generating splits
java.lang.RuntimeException: java.io.IOException: Cannot write file to path: 
file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:229)
        at 
java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.io.IOException: Cannot write file to path: 
file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
        at 
org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator$HiveSplitGeneratorSerializerException$SplitSerializerWithException.writeSplit(TestHiveSplitGenerator.java:244)
        at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:226)
        ... 4 more
{code}


The test asserts the contract documented in HiveSplitGenerator.SplitSerializer: 
a write task that is already running when another task fails must not be 
cancelled — i.e., split0Finished == true and split2Finished == false.

It tries to set this up with three splits on an 8-thread executor:

- Split #0: sleeps 1s, sets split0Finished = true on completion.
- Split #1: throws IOException, which sets anyTaskFailed = true.
- Split #2: enters write() after a 1s delay, so the runnable's 
!anyTaskFailed.get() guard short-circuits it and split2Finished stays false.

The problem: every task body in SplitSerializer.write() is wrapped in if 
(!anyTaskFailed.get()) { writeSplit(...) }. There is no happens-before relation 
guaranteeing that split #0's runnable evaluates that guard before split #1's 
runnable runs to completion. With 8 threads available, the executor can 
schedule split #1 first; it sets anyTaskFailed = true before split #0's thread 
even reaches the guard. Split #0 then short-circuits — writeSplit is never 
called, Thread.sleep/split0Finished.set(true) never run — and the assertion 
fails.

The CI log confirms this ordering: only Write split #1 is logged before the 
failure; the expected Write split #0 line is absent.


> TestHiveSplitGenerator.testExceptionIsPropagatedFromSplitSerializer is flaky
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-29689
>                 URL: https://issues.apache.org/jira/browse/HIVE-29689
>             Project: Hive
>          Issue Type: Bug
>            Reporter: László Bodor
>            Assignee: László Bodor
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screenshot 2026-06-28 at 20.44.20.png
>
>
> https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-6566/3/tests/
> {code}
> Error
> Already running future in not supposed to be cancelled with the current 
> implementation
> Stacktrace
> java.lang.AssertionError: Already running future in not supposed to be 
> cancelled with the current implementation
>       at org.junit.Assert.fail(Assert.java:89)
>       at org.junit.Assert.assertTrue(Assert.java:42)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator.testExceptionIsPropagatedFromSplitSerializer(TestHiveSplitGenerator.java:148)
>       at java.base/java.lang.reflect.Method.invoke(Method.java:580)
> Standard Error
> 2026-06-27T01:24:09,955  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.override does not exist
> 2026-06-27T01:24:09,956  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.hivesite does not exist
> 2026-06-27T01:24:09,957  WARN [main] conf.HiveConf: HiveConf of name 
> hive.dummyparam.test.server.specific.config.metastoresite does not exist
> 2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: SplitGenerator 
> using llap affinitized locations: false locationProviderClass: 
> org.apache.hadoop.hive.ql.exec.tez.HostAffinitySplitLocationProvider
> 2026-06-27T01:24:09,980  INFO [main] tez.HiveSplitGenerator: 
> SplitLocationProvider: org.apache.hadoop.hive.ql.exec.tez.Utils$1@10c07b8d
> 2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - 
> #1] tez.TestHiveSplitGenerator: Write split #1
> 2026-06-27T01:24:09,984  INFO [HiveSplitGenerator.SplitSerializer Thread - 
> #1] tez.TestHiveSplitGenerator: Split #1 is about to throw exception
> 2026-06-27T01:24:10,986 ERROR [main] tez.HiveSplitGenerator: Exception while 
> generating splits
> java.lang.RuntimeException: java.io.IOException: Cannot write file to path: 
> file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
>       at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:229)
>       at 
> java.base/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>       at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>       at java.base/java.lang.Thread.run(Thread.java:1583)
> Caused by: java.io.IOException: Cannot write file to path: 
> file:/tmp/jenkins/tez/staging/.tez/application_1000_0200/events/hive_1782548648445/0_MRInput_InputDataInformationEvent_1
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TestHiveSplitGenerator$HiveSplitGeneratorSerializerException$SplitSerializerWithException.writeSplit(TestHiveSplitGenerator.java:244)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator$SplitSerializer.lambda$write$0(HiveSplitGenerator.java:226)
>       ... 4 more
> {code}
> The test asserts the contract documented in 
> HiveSplitGenerator.SplitSerializer: a write task that is already running when 
> another task fails must not be cancelled — i.e., split0Finished == true and 
> split2Finished == false.
> It tries to set this up with three splits on an 8-thread executor:
> - Split #0: sleeps 1s, sets split0Finished = true on completion.
> - Split #1: throws IOException, which sets anyTaskFailed = true.
> - Split #2: enters write() after a 1s delay, so the runnable's 
> !anyTaskFailed.get() guard short-circuits it and split2Finished stays false.
> The problem: every task body in SplitSerializer.write() is wrapped in if 
> (!anyTaskFailed.get()) { writeSplit(...) }. There is no happens-before 
> relation guaranteeing that split #0's runnable evaluates that guard before 
> split #1's runnable runs to completion. With 8 threads available, the 
> executor can schedule split #1 first; it sets anyTaskFailed = true before 
> split #0's thread even reaches the guard. Split #0 then short-circuits — 
> writeSplit is never called, Thread.sleep/split0Finished.set(true) never run — 
> and the assertion fails.
> The CI log confirms this ordering: only Write split #1 is logged before the 
> failure; the expected Write split #0 line is absent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to