Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/8523 )
Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls ...................................................................... Patch Set 9: (14 comments) main changes: (1) minimized changes to the main scheduler method with an iterator, (2) added backend tests http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG@30 PS9, Line 30: Testing: > Do we have a test that covers mixed queries? Have you considered adding a t thx for the suggestions. added tests to scheduler-test.cc. there are end-to-end tests for testing multiple file systems in the same table/query (see testdata/workloads/functional-query/queries/QueryTest/multiple-filesystems.test) http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h@450 PS9, Line 450: instances > nit: instance Done http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@257 PS9, Line 257: int num_ranges = 0; > Have you considered generating the scan ranges here and passing them into C I tried something like this, but it created two pases so I backed it out. After discussing, so that we avoid additional complexity to the main ComputeRangeAssignment method, I wrapped the original scan ranges in an iterator that generates new scan ranges or returns the original ones. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504 PS9, Line 504: // expanded_locations. 'spec' is assumed to *not* have block location information. > missing rename? removed http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504 PS9, Line 504: 'spec' is assumed to *not* have block location information. > It doesn't have a member to do so. Is this a stale comment? removed. this refers to longer range plan where hdfs files (with block info) will also be handled. the todo in the thrift spec covers this. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@505 PS9, Line 505: HDFS > This comment seems a bit misleading, since S3 etc are not HDFS filesystems. remvoed http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@516 PS9, Line 516: long scan_range_offset = 0; > Any reason to not use int64_t here? Done http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@518 PS9, Line 518: long scan_range_length = std::min(spec.max_block_size, fb_desc->length()); > I'm not sure it can happen, but if spec.max_block_size == 0, the loop below good point. added a check for this in FE as well as here. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@582 PS9, Line 582: Defer > "Defer" read to me like if something is being done immediately. Can you cla Done http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@603 PS9, Line 603: *num_ranges += 1; > If you move this count to L676 right after the call to RecordScanRangeAssig got rid of this part of the api. the number of scan ranges is now accessed via the iterator. http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@681 PS9, Line 681: remote_scan_range_locations.push_back(&location); > nit: use vector<>::insert() instead of a loop here. removed http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@686 PS9, Line 686: // Allow exec_at_coord to take precedence over selecting a remote host. > This should only happen for generated scan ranges. Can you add a DCHECK to removed http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@204 PS9, Line 204: // -1 means not-splittable > nit: add newlines above comments. Done http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@205 PS9, Line 205: 2: required i64 max_block_size > Could this ever be infinity? I feel it might be clearer to have a boolean f Done -- To view, visit http://gerrit.cloudera.org:8080/8523 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5 Gerrit-Change-Number: 8523 Gerrit-PatchSet: 9 Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com> Gerrit-Comment-Date: Mon, 12 Feb 2018 08:40:29 +0000 Gerrit-HasComments: Yes