armitage420 commented on PR #6075:
URL: https://github.com/apache/hive/pull/6075#issuecomment-3330074818

   @deniskuzZ  I've only seen this flakiness once in one of the PRs. While 
investigating, I found this JIRA that addressed a similar issue across Iceberg 
q-tests: [HIVE-25607](https://issues.apache.org/jira/browse/HIVE-25607)
   
   I also encountered other flaky tests later (which seem fairly common) and 
created JIRAs in case anyone wants to address them: 
[HIVE-29158](https://issues.apache.org/jira/browse/HIVE-29158), 
[HIVE-29157](https://issues.apache.org/jira/browse/HIVE-29157)
   
   Regarding the approach of masking before sorting, I have some concerns:
   1. Sorting using SORT_QUERY_RESULTS is executed per query, whereas masking 
currently executes once per file. Changing the architecture to mask per query 
would increase the number of operations (and likely execution time for the 
pipeline).
   
   2. The sorting discussed above is only applied to a few qfiles, so according 
to point 1, masking would still operate per query in those cases.
   
   3. An approach where masking per query is done only when SORT_QUERY_RESULTS 
is used (but otherwise operates per file) might work better, though I'm not 
sure if this makes architectural sense.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to