InvisibleProgrammer opened a new pull request, #6487: URL: https://github.com/apache/hive/pull/6487
Rebalance tests are sensitive and the hard-coded assertions need to be modified regularly. Some examples: - https://github.com/apache/hive/commit/388e80690373a064d1fa464579a8b22173395ef2 - https://github.com/apache/hive/commit/413069ea0faeb893fe78faa215b08a70ece80595#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816 - https://github.com/apache/hive/commit/6e1f1ccf2aca58af4be06d6ceea1aa2b877a5d08#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816 ... There are two causes identified: - Firstly, the number of buckets and even the order of the elements inside a bucket depends on the version string of Orc: https://issues.apache.org/jira/browse/HIVE-29536?focusedCommentId=18080335&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-18080335 (Thanks @thomasrebele to digging into it) - Secondly, the base directory can change as well (like here: https://github.com/apache/hive/commit/1a90d278be9af1dc5b8aeb4c4cca24febac91936#diff-dedd154465fd42855d9d6710d54553660dae87405ce2e4ea931475de1d5bb816L199) ### What changes were proposed in this pull request? The goal of the change is to stabilize those tests by doing two things: - Rebalance assertions are not hard-coded. Instead of that, we can check if the buckets are balanced or not and if all the data is available. - Base folder can be searched dinamically Please note: I also refactored the code little bit and extracted rebalance compaction tests into a new class. ### Why are the changes needed? We experienced regular and serious regression issues due to the effect of the orc version number. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? With the existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
