Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13307 )

Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission 
Controller.
......................................................................


Patch Set 3:

(12 comments)

looks good, just looked at the implementation for now, will look at the tests 
next

http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@14
PS3, Line 14: The new configuration parameters are:
mention that a value of zero means it wont be used and will default to the 
non-scalable param.


http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@15
PS3, Line 15: Max Running
nit: use "Max Requests" to be consistent


http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@26
PS3, Line 26: the Multiple of the current total number of
            :   running Impalads which gives the maximum memory available 
across the
            :   cluster for the pool.
can you elaborate on what this number is like you did for the ones above? Also 
mention the unit (bytes). Also update the same in the thrift definition


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h
File be/src/scheduling/admission-controller.h:

http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@603
PS3, Line 603: cluster_size_snapshot
nit: maybe use something like latest_cluster_size. the snapshot in the name 
threw me off a bit since i expected a struct or something instead of just the 
size. Feel free to ignore this comment.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc
File be/src/scheduling/admission-controller.cc:

http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1000
PS3, Line 1000:       // Use a heuristic to limit the number of requests we 
dequeue locally to avoid all
              :       // impalads dequeuing too many requests at the same time. 
This is based on the
              :       // max_requests_estimate limit and the current queue 
size. We will attempt to
              :       // dequeue up to this number of requests until reaching 
the per-pool memory limit.
nit: if this functionality is completely moved over to a method, maybe we can 
mention all this in the method comment and remove it here


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007
PS3, Line 1007:       if (!status.ok()) continue; // to next pool
> I think the old behaviour had the advantage of logging something if nothing
I agree, would prefer returning max_to_dequeue itself and handling the <= 0 case


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1071
PS3, Line 1071: total_available
this can still be less than equal to zero, right? like in case of 
overadmission. If yes then we can get rid of the branching based on 
PoolHasFixedRunningQueriesCountLimit


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1121
PS3, Line 1121: "Invalid pool config: the min_query_mem_limit $0 is greater than
Seems like now this case can happen if the cluster size changes (an external 
reason), so calling this an invalid config might not be right. Do you think it 
makes sense to mention both cases (invalid config and low cluster size) and 
offer ways to fix it, like we do in REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1396
PS3, Line 1396: GetKnownBackends
this might be a problem if some backends are shutting down and the query wont 
be scheduled on them but the max memory used is still high, causing over 
admission.
If instead we only look at the size of the active backends, that would be a 
problem for queries already scheduled to run on the backends shutting down.
@all Do you think it makes sense to handle this transient period right now? It 
might be a problem if a group of executors have started shutting down at the 
same time.

Another solution could be use to the cluster size from the QuerySchedule 
itself, but that has its own set of problems. Any thoughts?


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1403
PS3, Line 1403: max_memory_multiple
just thinking out loud: should we check if this multiple is greater than the 
executor.mem_to_admit for each executor because in that case this would be an 
invalid config.
Maybe not, since there is currently nothing preventing us from setting a higher 
pool_config.max_mem_resources as well.


http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1444
PS3, Line 1444: &&
should be OR (||)


http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift
File common/thrift/ImpalaInternalService.thrift:

http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift@685
PS3, Line 685:
nit: mention what a value of 0 means for all



--
To view, visit http://gerrit.cloudera.org:8080/13307
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188
Gerrit-Change-Number: 13307
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 May 2019 18:22:03 +0000
Gerrit-HasComments: Yes

Reply via email to