Joe McDonnell has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13545


Change subject: IMPALA-8630: Include partition id when calculating consistent 
remote placement
......................................................................

IMPALA-8630: Include partition id when calculating consistent remote placement

Consistent remote placement currently uses the relative filename within
a partition for the consistent hash. If the relative filenames for
different partitions have a simple naming scheme, then multiple
partitions may have files of the same name. This is true for some
tables written by Hive (e.g. in our minicluster the tpcds.store_sales
has this problem). This can lead to unbalanced placement of remote
ranges.

This adds the partition_id in the hash, so files with the same
name from different partitions will have different hashes.

Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
---
M be/src/scheduling/scheduler.cc
1 file changed, 5 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13545/1
--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>

Reply via email to