Hello Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13545 to look at the new patch set (#2). Change subject: [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement Consistent remote placement currently uses the relative filename within a partition for the consistent hash. If the relative filenames for different partitions have a simple naming scheme, then multiple partitions may have files of the same name. This is true for some tables written by Hive (e.g. in our minicluster the tpcds.store_sales has this problem). This can lead to unbalanced placement of remote ranges. This adds a full_path_hash to the FbFileDesc and computes it in the catalog. The THdfsFileSplit is based on the FbFileDesc, and this gets passed through. The scheduler hashes this rather than the relative path. This is a proof of concept and is not ready to go. Work remaining (if this is the approach we take): - Add some scheduler tests to simulate this approach - Look into an end-to-end test based on the Docker tests (which do remote IO) The alternative is to construct the DescriptorTbl in the scheduler and use that to reconstruct the full path for a file and hash it. Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M common/fbs/CatalogObjects.fbs M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 8 files changed, 86 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13545/2 -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>