Lars Volker has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 9: (14 comments) http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG@30 PS9, Line 30: Testing: Do we have a test that covers mixed queries? Have you considered adding a test to the scheduler tests? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h@450 PS9, Line 450: instances nit: instance http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@257 PS9, Line 257: int num_ranges = 0; Have you considered generating the scan ranges here and passing them into ComputeScanRangeAssignment? They seem easy to detect there since they have an empty locations list. That way we would not increase the complexity of ComputeScanRangeAssignment further. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504 PS9, Line 504: 'spec' is assumed to *not* have block location information. It doesn't have a member to do so. Is this a stale comment? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504 PS9, Line 504: // expanded_locations. 'spec' is assumed to *not* have block location information. missing rename? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@505 PS9, Line 505: HDFS This comment seems a bit misleading, since S3 etc are not HDFS filesystems. I think it means that these are still handled by the HDFS scanners. Can you clarify the comment? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@516 PS9, Line 516: long scan_range_offset = 0; Any reason to not use int64_t here? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@518 PS9, Line 518: long scan_range_length = std::min(spec.max_block_size, fb_desc->length()); I'm not sure it can happen, but if spec.max_block_size == 0, the loop below won't terminate. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@582 PS9, Line 582: Defer "Defer" read to me like if something is being done immediately. Can you clarify the comment, e.g. like "Holds scan ranges that will not be immediately assigned..."? http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@603 PS9, Line 603: *num_ranges += 1; If you move this count to L676 right after the call to RecordScanRangeAssignment, you can remove the parameter in GenerateScanRanges(). (you'll also have to add remote_scan_range_locations.size() after all of them have been collected.) http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@681 PS9, Line 681: remote_scan_range_locations.push_back(&location); nit: use vector<>::insert() instead of a loop here. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@686 PS9, Line 686: // Allow exec_at_coord to take precedence over selecting a remote host. This should only happen for generated scan ranges. Can you add a DCHECK to ensure that? http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@204 PS9, Line 204: // -1 means not-splittable nit: add newlines above comments. http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@205 PS9, Line 205: 2: required i64 max_block_size Could this ever be infinity? I feel it might be clearer to have a boolean flag for 'splittable', unless you're concerned about size. -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Thu, 01 Feb 2018 04:07:10 +0000 Gerrit-HasComments: Yes