Hello Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13545

to look at the new patch set (#2).

Change subject: [WIP] IMPALA-8630: Hash the full path when calculating 
consistent remote placement
......................................................................

[WIP] IMPALA-8630: Hash the full path when calculating consistent remote 
placement

Consistent remote placement currently uses the relative filename within
a partition for the consistent hash. If the relative filenames for
different partitions have a simple naming scheme, then multiple
partitions may have files of the same name. This is true for some
tables written by Hive (e.g. in our minicluster the tpcds.store_sales
has this problem). This can lead to unbalanced placement of remote
ranges.

This adds a full_path_hash to the FbFileDesc and computes it in the catalog.
The THdfsFileSplit is based on the FbFileDesc, and this gets passed through.
The scheduler hashes this rather than the relative path.

This is a proof of concept and is not ready to go.

Work remaining (if this is the approach we take):
 - Add some scheduler tests to simulate this approach
 - Look into an end-to-end test based on the Docker tests (which do remote IO)

The alternative is to construct the DescriptorTbl in the scheduler
and use that to reconstruct the full path for a file and hash it.

Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
---
M be/src/scheduling/scheduler-test-util.cc
M be/src/scheduling/scheduler-test-util.h
M be/src/scheduling/scheduler-test.cc
M be/src/scheduling/scheduler.cc
M common/fbs/CatalogObjects.fbs
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
8 files changed, 86 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/45/13545/2
--
To view, visit http://gerrit.cloudera.org:8080/13545
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I46c739fc31af539af2b3509e2a161f4e29f44d7b
Gerrit-Change-Number: 13545
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to