Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 9:

(14 comments)

main changes: (1) minimized changes to the main scheduler method with an 
iterator, (2) added backend tests

http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG@30
PS9, Line 30: Testing:
> Do we have a test that covers mixed queries? Have you considered adding a t
thx for the suggestions. added tests to scheduler-test.cc. there are end-to-end 
tests for testing multiple file systems in the same table/query (see 
testdata/workloads/functional-query/queries/QueryTest/multiple-filesystems.test)


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h@450
PS9, Line 450: instances
> nit: instance
Done


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@257
PS9, Line 257:       int num_ranges = 0;
> Have you considered generating the scan ranges here and passing them into C
I tried something like this, but it created two pases so I backed it out. After 
discussing, so that we avoid additional complexity to the main 
ComputeRangeAssignment method, I wrapped the original scan ranges in an 
iterator that generates new scan ranges or returns the original ones.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504
PS9, Line 504: // expanded_locations. 'spec' is assumed to *not* have block 
location information.
> missing rename?
removed


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504
PS9, Line 504: 'spec' is assumed to *not* have block location information.
> It doesn't have a member to do so. Is this a stale comment?
removed. this refers to longer range plan where hdfs files (with block info) 
will also be handled. the todo in the thrift spec covers this.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@505
PS9, Line 505: HDFS
> This comment seems a bit misleading, since S3 etc are not HDFS filesystems.
remvoed


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@516
PS9, Line 516:   long scan_range_offset = 0;
> Any reason to not use int64_t here?
Done


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@518
PS9, Line 518:   long scan_range_length = std::min(spec.max_block_size, 
fb_desc->length());
> I'm not sure it can happen, but if spec.max_block_size == 0, the loop below
good point. added a check for this in FE as well as here.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@582
PS9, Line 582: Defer
> "Defer" read to me like if something is being done immediately. Can you cla
Done


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@603
PS9, Line 603:     *num_ranges += 1;
> If you move this count to L676 right after the call to RecordScanRangeAssig
got rid of this part of the api. the number of scan ranges is now accessed via 
the iterator.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@681
PS9, Line 681:     remote_scan_range_locations.push_back(&location);
> nit: use vector<>::insert() instead of a loop here.
removed


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@686
PS9, Line 686:     // Allow exec_at_coord to take precedence over selecting a 
remote host.
> This should only happen for generated scan ranges. Can you add a DCHECK to 
removed


http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@204
PS9, Line 204:   // -1 means not-splittable
> nit: add newlines above comments.
Done


http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@205
PS9, Line 205:   2: required i64 max_block_size
> Could this ever be infinity? I feel it might be clearer to have a boolean f
Done



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Feb 2018 08:40:29 +0000
Gerrit-HasComments: Yes

Reply via email to