wombatu-kun commented on PR #18995: URL: https://github.com/apache/hudi/pull/18995#issuecomment-4701501486
@voonhous dug into the master failure (run 27489081724). #18995 removed only one of several asynchronous sources of these spans, so the race stayed open. The tests assert the exact per-query filesystem-span multiset, but Trino resets the span exporter at the start of each `executeWithPlan`. Any span a background thread emits after the synchronous query returns therefore lands in the next test's measurement window, which is the symmetric off-by-N you see. Besides the stats refresh, the metadata table is read by the background split-loader / partition-listing / index-support pools, and the Alluxio variant additionally depends on async cache warmth (a read is `Alluxio.readCached` vs `readExternal` depending on whether an earlier cache write finished). Every observed flake delta was a `METADATA_TABLE` op (plus `Alluxio.readCached` on the Alluxio test). Fix in #19004: stop asserting the racy quantities and keep only the synchronous foreground reads, i.e. filter out `METADATA_TABLE` ops in all three classes and all `Alluxio.*` ops in the Alluxio class. `hudi.table-statistics-enabled=false` stays, because the stats executor also reads `index.json` and the `hoodie.properties` files on a background thread, which are part of the surviving asserted set (dropping the flag reproduced an off-by-one on exactly those files). Verified locally: build with zero checkstyle violations and 20 consecutive green runs of the three classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
