[
https://issues.apache.org/jira/browse/KAFKA-19286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uladzislau Blok updated KAFKA-19286:
------------------------------------
Description:
[https://github.com/apache/kafka/actions/runs/15046629450/job/42291644617?pr=19493]
build failure for this test.
Over the past 7 days, 463 runs gave 25 flakes and 4 failures.
[https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=org.apache.kafka.storage.internals.log.RemoteIndexCacheTest]
I was able to re-produce error with repeated test:
{code:java}
100000 tests completed, 3 failed
> Task :storage:test FAILED{code}
Looks like cache wasn't propagated:
{code:java}
org.opentest4j.AssertionFailedError: Condition not met within timeout 15000.
cache did not adhere to expected size of 1; was 0{code}
Error message is similar for all runs.
Did deeper analysis:
Logs from success run:
{code:java}
2025-11-15T21:42:23.715+0100 [DEBUG] [TestEventLogger] [2025-11-15
21:42:23,715] ERROR Inside markForCleanup
(org.apache.kafka.storage.internals.log.RemoteIndexCache:555)
2025-11-15T21:42:23.715+0100 [DEBUG] [TestEventLogger] [2025-11-15
21:42:23,715] ERROR Inside createCacheEntry
(org.apache.kafka.storage.internals.log.RemoteIndexCache:362)
2025-11-15T21:42:23.716+0100 [DEBUG] [TestEventLogger] [2025-11-15
21:42:23,716] ERROR Inside cleanup
(org.apache.kafka.storage.internals.log.RemoteIndexCache:570) {code}
Logs from failed run:
{code:java}
2025-11-15T21:42:23.720+0100 [DEBUG] [TestEventLogger] [2025-11-15
21:42:23,719] ERROR Inside markForCleanup
(org.apache.kafka.storage.internals.log.RemoteIndexCache:555)
2025-11-15T21:42:23.722+0100 [DEBUG] [TestEventLogger] [2025-11-15
21:42:23,721] ERROR Inside cleanup
(org.apache.kafka.storage.internals.log.RemoteIndexCache:570) {code}
This means that entry still was in cache. The root cause is too low delay for
scheduler:
{code:java}
cache.setFileDeleteDelayMs(0); {code}
Before scheduler starts clean up, he's triggering read thread. If it's
happening before RemoteIndexCache#remove does remove entity from cache, reader
thread will successfully read entity, and won't trigger fetching from remote log
was:
[https://github.com/apache/kafka/actions/runs/15046629450/job/42291644617?pr=19493]
build failure for this test.
Over the past 7 days, 463 runs gave 25 flakes and 4 failures.
[https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=org.apache.kafka.storage.internals.log.RemoteIndexCacheTest]
I was able to re-produce error with repeated test:
{code:java}
100000 tests completed, 3 failed
> Task :storage:test FAILED{code}
Looks like cache wasn't propagated:
{code:java}
org.opentest4j.AssertionFailedError: Condition not met within timeout 15000.
cache did not adhere to expected size of 1; was 0{code}
Error message is similar for all runs.
> RemoteIndexCacheTest.testConcurrentRemoveReadForCache1 is flaky
> ---------------------------------------------------------------
>
> Key: KAFKA-19286
> URL: https://issues.apache.org/jira/browse/KAFKA-19286
> Project: Kafka
> Issue Type: Bug
> Reporter: Andrew Schofield
> Assignee: Uladzislau Blok
> Priority: Major
>
> [https://github.com/apache/kafka/actions/runs/15046629450/job/42291644617?pr=19493]
> build failure for this test.
> Over the past 7 days, 463 runs gave 25 flakes and 4 failures.
> [https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Europe%2FLondon&tests.container=org.apache.kafka.storage.internals.log.RemoteIndexCacheTest]
> I was able to re-produce error with repeated test:
> {code:java}
> 100000 tests completed, 3 failed
> > Task :storage:test FAILED{code}
>
> Looks like cache wasn't propagated:
> {code:java}
> org.opentest4j.AssertionFailedError: Condition not met within timeout 15000.
> cache did not adhere to expected size of 1; was 0{code}
> Error message is similar for all runs.
>
> Did deeper analysis:
> Logs from success run:
>
> {code:java}
> 2025-11-15T21:42:23.715+0100 [DEBUG] [TestEventLogger] [2025-11-15
> 21:42:23,715] ERROR Inside markForCleanup
> (org.apache.kafka.storage.internals.log.RemoteIndexCache:555)
> 2025-11-15T21:42:23.715+0100 [DEBUG] [TestEventLogger] [2025-11-15
> 21:42:23,715] ERROR Inside createCacheEntry
> (org.apache.kafka.storage.internals.log.RemoteIndexCache:362)
> 2025-11-15T21:42:23.716+0100 [DEBUG] [TestEventLogger] [2025-11-15
> 21:42:23,716] ERROR Inside cleanup
> (org.apache.kafka.storage.internals.log.RemoteIndexCache:570) {code}
> Logs from failed run:
>
>
> {code:java}
> 2025-11-15T21:42:23.720+0100 [DEBUG] [TestEventLogger] [2025-11-15
> 21:42:23,719] ERROR Inside markForCleanup
> (org.apache.kafka.storage.internals.log.RemoteIndexCache:555)
> 2025-11-15T21:42:23.722+0100 [DEBUG] [TestEventLogger] [2025-11-15
> 21:42:23,721] ERROR Inside cleanup
> (org.apache.kafka.storage.internals.log.RemoteIndexCache:570) {code}
> This means that entry still was in cache. The root cause is too low delay for
> scheduler:
> {code:java}
> cache.setFileDeleteDelayMs(0); {code}
> Before scheduler starts clean up, he's triggering read thread. If it's
> happening before RemoteIndexCache#remove does remove entity from cache,
> reader thread will successfully read entity, and won't trigger fetching from
> remote log
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)