Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8523/4//COMMIT_MSG@24 PS4, Line 24: datastructures > nit: data structures Done http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h File be/src/scheduling/query-schedule.h: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/query-schedule.h@260 PS4, Line 260: // Map of plan node to scan ranges that are computed from request_. If a plan node is : // not present, then it has no associated additional scan ranges. : // Populated by the ComputeScanRanges. : std::map<PlanNodeId, vector<TScanRangeLocationList>> computed_ranges_; > How is this related to PerNodeScanRanges (L41)? backing out this refactor. http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/4/be/src/scheduling/scheduler.cc@258 PS4, Line 258: // Combine actual and computed ranges from the schedule. : const vector<TScanRangeLocationList>* scan_ranges = &entry.second; : const vector<TScanRangeLocationList>* computed_ranges = : schedule->GetComputedRanges(entry.first); : vector<TScanRangeLocationList> combined_ranges; : if (computed_ranges != nullptr) { : for (const auto& range : entry.second) { : if (!range.scan_range.__isset.hdfs_file_split_spec) { : combined_ranges.push_back(range); : } : } : for (const auto& range : (*schedule->GetComputedRanges(entry.first))) { : combined_ranges.push_back(range); : } : scan_ranges = &combined_ranges; : } > I think my previous comment made things worse, sorry about that. I think it no problems. added back here. expansion is done per spec in a separate helper function below. http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@199 PS4, Line 199: 3: required i64 partition_id > Comment what this is referencing. Done http://gerrit.cloudera.org:8080/#/c/8523/4/common/thrift/PlanNodes.thrift@209 PS4, Line 209: 4: optional THdfsFileSplitSpec hdfs_file_split_spec > Add a comment either here or at the top to describe when THdfsFileSplit vs Lets go with THdfsFileSplitGeneratorSpec http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@126 PS4, Line 126: fileFormat' > nit: missing quote Done http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@712 PS4, Line 712: fileDesc.getNumFileBlocks() > What will happen if this is a zero-byte file (no blocks). Can't we prune th changed it to be more explicit (based on the fs). http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@743 PS4, Line 743: partitionFs.getDefaultBlockSize(new Path(fileDesc.getFileName()) > Do we need to do this for every file? Can't we do this once per partition? good point, lifted that to be per partition. http://gerrit.cloudera.org:8080/#/c/8523/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@761 PS4, Line 761: disk ids are expected in blocks, : * checkMissingDiskIds should be set to true. If disk ids are checked, > "checkMissingDiskIds is true, " Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 4 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Thu, 18 Jan 2018 05:47:30 +0000 Gerrit-HasComments: Yes