wombatu-kun opened a new issue, #19037: URL: https://github.com/apache/hudi/issues/19037
## Task Description **What needs to be done:** PR #19004 de-flaked the three Trino-plugin file-operation tests (`TestHudiNoCacheFileOperations`, `TestHudiMemoryCacheFileOperations`, `TestHudiAlluxioCacheFileOperations`) by dropping all `METADATA_TABLE` operations from `getFileOperations` (and all `Alluxio.*` operations in the Alluxio class) before asserting the per-query multiset of filesystem-access spans. That removed the per-query flakiness but also removed the assertions' ability to detect metadata-table read amplification: a future change that, for example, doubles the number of metadata-table reads per query would now pass silently because no test counts those reads anymore. Find a way to restore a regression signal on metadata-table read volume for these tests without re-introducing the span-leak flakiness that #19004 (and the earlier #18766 / #18995) fought. **Why this task is needed:** The metadata-table read counts were the main thing these `FileOperations` tests pinned down - how many low-level reads each query issues against the metadata table. After #19004 the metadata-table dimension is no longer asserted at all, so read-amplification regressions on the Trino read path are now invisible to CI. (The Alluxio cache-hit dimension is separately re-covered by the count-independent `testReadsServedFromAlluxioCache` added in the same PR, so only the metadata-table dimension is uncovered.) ## Background: why the obvious fixes do not work Trino resets the OpenTelemetry span exporter at the start of each `executeWithPlan`, so any span emitted by a Hudi background thread (the shared split-loader / split-manager / `ForkJoinPool.commonPool` pools that read the metadata table) after the synchronous query returns lands in the *next* measurement window. The result is a symmetric off-by-N: one query is counted long and the paired query short by almost the same amount. - An **exact-count** assertion on metadata-table spans flakes - this is the original failure. - A **tolerance / lower-bound** assertion on metadata-table spans still flakes, because the leak is bidirectional: a query can be counted short (its own spans leaked out) as well as long, and a lower bound is violated by the short case. This is the key difference from the Alluxio cache-hit check, where leaked spans only ever *add* hits (monotonic), so a lower bound there is safe. ## Candidate directions (to validate, not decided) These are hypotheses for the follow-up, not a committed design: 1. **Aggregate / conservation assertion.** The leak shifts spans between adjacent windows but does not create or destroy them, so the *total* metadata-table read count across the paired measurements (or across the whole test class) should be conserved even though the per-query split is not. Asserting that aggregate would still catch a 2x amplification (which doubles the total) while tolerating the attribution jitter. Needs validation that nothing leaks past the chosen aggregation boundary (for example the last query's late spans). 2. **Deterministic drain / quiesce** of the background metadata-table reader pools before the measurement window closes, so the metadata-table spans are captured inside the synchronous query window and exact counts become deterministic again. The obstacle is that those pools are shared / global with no clean await hook exposed to the Trino test harness. 3. (Recorded as rejected) A span-stability poll - the #18766 approach - did not bound the race and is not a path to revisit. ## Task Type Test enhancement ## Related Issues - Originating PR: #19004 (prior attempts: #18766, #18995) - Reviewer call-out: https://github.com/apache/hudi/pull/19004#pullrequestreview-4522612971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
