Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8523 )

Change subject: IMPALA-5931: Generates scan ranges in planner for s3/adls
......................................................................


Patch Set 9:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8523/9//COMMIT_MSG@30
PS9, Line 30: Testing:
Do we have a test that covers mixed queries? Have you considered adding a test 
to the scheduler tests?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.h@450
PS9, Line 450: instances
nit: instance


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@257
PS9, Line 257:       int num_ranges = 0;
Have you considered generating the scan ranges here and passing them into 
ComputeScanRangeAssignment? They seem easy to detect there since they have an 
empty locations list. That way we would not increase the complexity of 
ComputeScanRangeAssignment further.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504
PS9, Line 504: 'spec' is assumed to *not* have block location information.
It doesn't have a member to do so. Is this a stale comment?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@504
PS9, Line 504: // expanded_locations. 'spec' is assumed to *not* have block 
location information.
missing rename?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@505
PS9, Line 505: HDFS
This comment seems a bit misleading, since S3 etc are not HDFS filesystems. I 
think it means that these are still handled by the HDFS scanners. Can you 
clarify the comment?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@516
PS9, Line 516:   long scan_range_offset = 0;
Any reason to not use int64_t here?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@518
PS9, Line 518:   long scan_range_length = std::min(spec.max_block_size, 
fb_desc->length());
I'm not sure it can happen, but if spec.max_block_size == 0, the loop below 
won't terminate.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@582
PS9, Line 582: Defer
"Defer" read to me like if something is being done immediately. Can you clarify 
the comment, e.g. like "Holds scan ranges that will not be immediately 
assigned..."?


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@603
PS9, Line 603:     *num_ranges += 1;
If you move this count to L676 right after the call to 
RecordScanRangeAssignment, you can remove the parameter in 
GenerateScanRanges(). (you'll also have to add 
remote_scan_range_locations.size() after all of them have been collected.)


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@681
PS9, Line 681:     remote_scan_range_locations.push_back(&location);
nit: use vector<>::insert() instead of a loop here.


http://gerrit.cloudera.org:8080/#/c/8523/9/be/src/scheduling/scheduler.cc@686
PS9, Line 686:     // Allow exec_at_coord to take precedence over selecting a 
remote host.
This should only happen for generated scan ranges. Can you add a DCHECK to 
ensure that?


http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@204
PS9, Line 204:   // -1 means not-splittable
nit: add newlines above comments.


http://gerrit.cloudera.org:8080/#/c/8523/9/common/thrift/PlanNodes.thrift@205
PS9, Line 205:   2: required i64 max_block_size
Could this ever be infinity? I feel it might be clearer to have a boolean flag 
for 'splittable', unless you're concerned about size.



--
To view, visit http://gerrit.cloudera.org:8080/8523
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326065adbb2f7e632814113aae85cb51ca4779a5
Gerrit-Change-Number: 8523
Gerrit-PatchSet: 9
Gerrit-Owner: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dhe...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Reviewer: Vuk Ercegovac <vercego...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Feb 2018 04:07:10 +0000
Gerrit-HasComments: Yes

Reply via email to