wombatu-kun opened a new pull request, #18995:
URL: https://github.com/apache/hudi/pull/18995

   ### Describe the issue this Pull Request addresses
   
   The Trino-plugin tests `TestHudiMemoryCacheFileOperations`, 
`TestHudiNoCacheFileOperations` and `TestHudiAlluxioCacheFileOperations` are 
flaky. They intermittently fail in CI with a symmetric off-by-N mismatch in the 
metadata-table file-operation counts between the two paired measurements (for 
example `testJoin` short by 18 cacheLength / 24 cacheStream / 18 lastModified 
while `testSelectWithFilter` is long by exactly the same amounts). This was 
last observed on the CI run for PR #18988, a change confined to 
`hudi-client-common` that cannot affect the Trino read path. The earlier 
de-flake attempt in #18766 (a span-stability poll) reduced the failure rate but 
did not eliminate it.
   
   ### Summary and Changelog
   
   These tests assert the exact multiset of low-level filesystem-access spans 
that a single query emits against the Hudi metadata table. The root cause of 
the flake is that `HudiMetadata.getTableStatistics` submits an asynchronous 
table-statistics refresh on a shared background executor for every query during 
planning. That task reads the metadata-table column-stats partition and emits 
`METADATA_TABLE` spans that can outlive the synchronous query and arrive in the 
next test's measurement window, scrambling the counts (the symmetric off-by-N 
signature).
   
   This PR disables the async refresh in the three test query runners by 
setting `hudi.table-statistics-enabled=false`. With the only asynchronous 
metadata reader gone, the file-operation counts are deterministic as soon as 
the query returns, so the `waitForStableSpans` polling helper added in #18766 
is removed and the expected multisets are recalibrated to the synchronous query 
I/O. Only `testJoin`'s first-query expectations change: the first query no 
longer performs the extra column-stats read, so its counts now equal the 
already-warm second query. `testSelectWithFilter` is unchanged. The `throws 
InterruptedException` declarations that existed only for the removed 
`Thread.sleep` are dropped.
   
   No production code is changed, and no code was copied from third-party 
sources.
   
   ### Impact
   
   Test-only change, scoped to three test classes in `hudi-trino-plugin`. No 
production code, public API, configuration default, or runtime behavior 
changes. The `hudi.table-statistics-enabled` setting is toggled only inside 
these tests' query runners.
   
   ### Risk Level
   
   none
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to