InvisibleProgrammer opened a new pull request, #6487:
URL: https://github.com/apache/hive/pull/6487

   Rebalance tests are sensitive and the hard-coded assertions need to be 
modified regularly. 
   Some examples: 
   - 
https://github.com/apache/hive/commit/388e80690373a064d1fa464579a8b22173395ef2
   - 
https://github.com/apache/hive/commit/413069ea0faeb893fe78faa215b08a70ece80595#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816
   - 
https://github.com/apache/hive/commit/6e1f1ccf2aca58af4be06d6ceea1aa2b877a5d08#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816
   ... 
   
   There are two causes identified: 
   - Firstly, the number of buckets and even the order of the elements inside a 
bucket depends on the version string of Orc: 
https://issues.apache.org/jira/browse/HIVE-29536?focusedCommentId=18080335&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-18080335
 (Thanks @thomasrebele to digging into it)
   - Secondly, the base directory can change as well (like here: 
https://github.com/apache/hive/commit/1a90d278be9af1dc5b8aeb4c4cca24febac91936#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816L199)
 
   
   ### What changes were proposed in this pull request?
   The goal of the change is to stabilize those tests by doing two things: 
   - Rebalance assertions are not hard-coded. Instead of that, we can check if 
the buckets are balanced or not and if all the data is available.
   - Base folder can be searched dinamically
   
   Please note: I also refactored the code little bit and extracted rebalance 
compaction tests into a new class. 
   
   ### Why are the changes needed?
   We experienced regular and serious regression issues due to the effect of 
the orc version number.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   With the existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to