Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24
PS4, Line 24: datastructures
> nit: data structures
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h
File be/src/scheduling/query-schedule.h:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260
PS4, Line 260: // Map of plan node to scan ranges that are computed from 
request_. If a plan node is
             :   // not present, then it has no associated additional scan 
ranges.
             :   // Populated by the ComputeScanRanges.
             :   std::map<PlanNodeId, vector<TScanRangeLocationList>> 
computed_ranges_;
> How is this related to PerNodeScanRanges (L41)?
backing out this refactor.


http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258
PS4, Line 258: // Combine actual and computed ranges from the schedule.
             :       const vector<TScanRangeLocationList>* scan_ranges = 
&entry.second;
             :       const vector<TScanRangeLocationList>* computed_ranges =
             :           schedule->GetComputedRanges(entry.first);
             :       vector<TScanRangeLocationList> combined_ranges;
             :       if (computed_ranges != nullptr) {
             :         for (const auto& range : entry.second) {
             :           if (!range.scan_range.__isset.hdfs_file_split_spec) {
             :             combined_ranges.push_back(range);
             :           }
             :         }
             :         for (const auto& range : 
(*schedule->GetComputedRanges(entry.first))) {
             :           combined_ranges.push_back(range);
             :         }
             :         scan_ranges = &combined_ranges;
             :       }
> I think my previous comment made things worse, sorry about that. I think it
no problems. added back here. expansion is done per spec in a separate helper 
function below.


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199
PS4, Line 199: 3: required i64 partition_id
> Comment what this is referencing.
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209
PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec
> Add a comment either here or at the top to describe when THdfsFileSplit vs
Lets go with THdfsFileSplitGeneratorSpec


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126
PS4, Line 126: fileFormat'
> nit: missing quote
Done


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712
PS4, Line 712: fileDesc.getNumFileBlocks()
> What will happen if this is a zero-byte file (no blocks). Can't we prune th
changed it to be more explicit (based on the fs).


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743
PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName())
> Do we need to do this for every file? Can't we do this once per partition?
good point, lifted that to be per partition.


http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761
PS4, Line 761: disk ids are expected in blocks,
             :    * checkMissingDiskIds should be set to true. If disk ids are 
checked,
> "checkMissingDiskIds is true, "
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 4
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jan 2018 05:47:30 +0000
Gerrit-HasComments: Yes

Reply via email to