armitage420 commented on PR #6075: URL: https://github.com/apache/hive/pull/6075#issuecomment-3330074818
@deniskuzZ I've only seen this flakiness once in one of the PRs. While investigating, I found this JIRA that addressed a similar issue across Iceberg q-tests: [HIVE-25607](https://issues.apache.org/jira/browse/HIVE-25607) I also encountered other flaky tests later (which seem fairly common) and created JIRAs in case anyone wants to address them: [HIVE-29158](https://issues.apache.org/jira/browse/HIVE-29158), [HIVE-29157](https://issues.apache.org/jira/browse/HIVE-29157) Regarding the approach of masking before sorting, I have some concerns: 1. Sorting using SORT_QUERY_RESULTS is executed per query, whereas masking currently executes once per file. Changing the architecture to mask per query would increase the number of operations (and likely execution time for the pipeline). 2. The sorting discussed above is only applied to a few qfiles, so according to point 1, masking would still operate per query in those cases. 3. An approach where masking per query is done only when SORT_QUERY_RESULTS is used (but otherwise operates per file) might work better, though I'm not sure if this makes architectural sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
