shangxinli opened a new pull request, #18766: URL: https://github.com/apache/hudi/pull/18766
### Describe the issue this Pull Request addresses De-flakes the three sibling `TestHudi*FileOperations` test classes in `hudi-trino-plugin`. They use a fixed `Thread.sleep(1000L)` to wait for async table-stats computation before reading and asserting on OpenTelemetry spans. The sleep races with variable stats latency — when stats from query N is still running while query N+1's measurement happens, the still-running async work emits spans that land in the (just-reset) exporter for query N+1, scrambling the expected counts. **Symptom**: paired tests in `TestHudiNoCacheFileOperations` fail with a symmetric off-by-N pattern — `testJoin` missing exactly the same metadata-table operations that `testSelectWithFilter` has extra — depending on test execution order and stats-computation timing. Same pattern can hit the other two sibling classes. Observed on https://github.com/apache/hudi/pull/18765, passed on rerun without code change — classic flake signature. ### Summary and Changelog Replaced the fixed sleep with a poll loop that returns once two consecutive span snapshots (200ms apart) agree, bounded by a 30-second ceiling. This is the deterministic signal that async work has settled — all stats spans for the current query are accounted for and nothing new is landing — so the assertion sees a stable, complete count. - `TestHudiNoCacheFileOperations.assertFileSystemAccesses`: replaced `Thread.sleep(1000L)` with `waitForStableSpans()`. - `TestHudiMemoryCacheFileOperations.assertFileSystemAccesses`: same change. - `TestHudiAlluxioCacheFileOperations.assertFileSystemAccesses`: same change. ### Impact Test-only. No production-code changes. The poll loop returns in ~400ms on the fast path (one extra read after the first stable snapshot), compared to the previous unconditional 1s sleep — so the typical case is *faster*, not slower. The 30s ceiling is the worst-case bound when stats genuinely take that long, in which case the assertion will fail with whatever was last read (rather than hanging). ### Risk Level **low** — test-only change, identical fix applied to three sibling classes, deterministic root-cause fix rather than a workaround. Verified: `mvn test-compile` in `hudi-trino-plugin` → BUILD SUCCESS. ### Documentation Update None. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
