Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/13307 )
Change subject: IMPALA-8536: Add Scalable Pool Configuration to Admission Controller. ...................................................................... Patch Set 3: (12 comments) looks good, just looked at the implementation for now, will look at the tests next http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@14 PS3, Line 14: The new configuration parameters are: mention that a value of zero means it wont be used and will default to the non-scalable param. http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@15 PS3, Line 15: Max Running nit: use "Max Requests" to be consistent http://gerrit.cloudera.org:8080/#/c/13307/3//COMMIT_MSG@26 PS3, Line 26: the Multiple of the current total number of : running Impalads which gives the maximum memory available across the : cluster for the pool. can you elaborate on what this number is like you did for the ones above? Also mention the unit (bytes). Also update the same in the thrift definition http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h File be/src/scheduling/admission-controller.h: http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.h@603 PS3, Line 603: cluster_size_snapshot nit: maybe use something like latest_cluster_size. the snapshot in the name threw me off a bit since i expected a struct or something instead of just the size. Feel free to ignore this comment. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc File be/src/scheduling/admission-controller.cc: http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1000 PS3, Line 1000: // Use a heuristic to limit the number of requests we dequeue locally to avoid all : // impalads dequeuing too many requests at the same time. This is based on the : // max_requests_estimate limit and the current queue size. We will attempt to : // dequeue up to this number of requests until reaching the per-pool memory limit. nit: if this functionality is completely moved over to a method, maybe we can mention all this in the method comment and remove it here http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1007 PS3, Line 1007: if (!status.ok()) continue; // to next pool > I think the old behaviour had the advantage of logging something if nothing I agree, would prefer returning max_to_dequeue itself and handling the <= 0 case http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1071 PS3, Line 1071: total_available this can still be less than equal to zero, right? like in case of overadmission. If yes then we can get rid of the branching based on PoolHasFixedRunningQueriesCountLimit http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1121 PS3, Line 1121: "Invalid pool config: the min_query_mem_limit $0 is greater than Seems like now this case can happen if the cluster size changes (an external reason), so calling this an invalid config might not be right. Do you think it makes sense to mention both cases (invalid config and low cluster size) and offer ways to fix it, like we do in REASON_MEM_LIMIT_TOO_LOW_FOR_RESERVATION http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1396 PS3, Line 1396: GetKnownBackends this might be a problem if some backends are shutting down and the query wont be scheduled on them but the max memory used is still high, causing over admission. If instead we only look at the size of the active backends, that would be a problem for queries already scheduled to run on the backends shutting down. @all Do you think it makes sense to handle this transient period right now? It might be a problem if a group of executors have started shutting down at the same time. Another solution could be use to the cluster size from the QuerySchedule itself, but that has its own set of problems. Any thoughts? http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1403 PS3, Line 1403: max_memory_multiple just thinking out loud: should we check if this multiple is greater than the executor.mem_to_admit for each executor because in that case this would be an invalid config. Maybe not, since there is currently nothing preventing us from setting a higher pool_config.max_mem_resources as well. http://gerrit.cloudera.org:8080/#/c/13307/3/be/src/scheduling/admission-controller.cc@1444 PS3, Line 1444: && should be OR (||) http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: http://gerrit.cloudera.org:8080/#/c/13307/3/common/thrift/ImpalaInternalService.thrift@685 PS3, Line 685: nit: mention what a value of 0 means for all -- To view, visit http://gerrit.cloudera.org:8080/13307 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If47508728124076f3b9200c27cffc989f7a4f188 Gerrit-Change-Number: 13307 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Andrew Sherman <asher...@cloudera.com> Gerrit-Reviewer: Bikramjeet Vig <bikramjeet....@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 30 May 2019 18:22:03 +0000 Gerrit-HasComments: Yes