Hello Tim Armstrong, Todd Lipcon, Impala Public Jenkins, I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13545 to look at the new patch set (#3). Change subject: [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement ...................................................................... [WIP] IMPALA-8630: Hash the full path when calculating consistent remote placement Consistent remote placement currently uses the relative filename within a partition for the consistent hash. If the relative filenames for different partitions have a simple naming scheme, then multiple partitions may have files of the same name. This is true for some tables written by Hive (e.g. in our minicluster the tpcds.store_sales has this problem). This can lead to unbalanced placement of remote ranges. This adds a partition_path_hash to the THdfsFileSplit and THdfsFileSplitGeneratorSpec, calculated in the frontend (which has all of the partition information). The scheduler hashes this in addition to the relative path. This is a proof of concept and is not ready to go. Work remaining (if this is the approach we take): - Add some scheduler tests to simulate this approach - Look into an end-to-end test based on the Docker tests (which do remote IO) Alternatives: - Hash the full path when reading the file metadata in the catalog and store it in FbFileDesc. - Construct the DescriptorTbl in the scheduler and use that to reconstruct the full path or partition path for a file and hash it. Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b --- M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M common/thrift/PlanNodes.thrift M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java 6 files changed, 87 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13545/3 -- To view, visit http://gerrit.cloudera.org:8080/13545 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b Gerrit-Change-Number: 13545 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>