[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 10: (9 comments) Thanks Michael. Uploading PS 11 http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG@7 PS10, Line 7: Reject new connections after --fe_service_threads > Does this need to be updated too ? Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@65 PS10, Line 65: timeout > nit: timeout_ms Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@116 PS10, Line 116: /// Amount of time after which a connection request will be timed out. > in milliseconds Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp@263 PS10, Line 263: if (queue_timeout_ms_ != 0) : entry->expiration_time_ = MonotonicMillis() + queue_timeout_ms_; > May need to guard against negative value although it's user's fault in that Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h File be/src/rpc/thrift-server.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@149 PS10, Line 149: int64_t queue_timeout_ms = 0); > May make sense to document the meaning of this argument too. Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@197 PS10, Line 197: /// Amount of time an accepted client connection will be kept in the accept : /// queue before it is timed out. Used in TAcceptQueueServer. > May want to document that if it's 0, it means there is no timeout (similar Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@277 PS10, Line 277: } > nit: missing blank line after this function. Done http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc@239 PS10, Line 239: accepted_cnxn_timeout > Should this have a more specific name such as "accepted_client_cnxn_timeout Changed. http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90 PS10, Line 90: except Exception as e: > How do we catch cases in which the test passes without raising the exceptio The test will be marked as failed in that case (XFAIL). Added an else block with client.close() to not make test verify step wait. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Mar 2019 22:58:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 10: (9 comments) LGTM. Mostly nits. http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12579/10//COMMIT_MSG@7 PS10, Line 7: Reject new connections after --fe_service_threads Does this need to be updated too ? http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@65 PS10, Line 65: timeout nit: timeout_ms http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.h@116 PS10, Line 116: /// Amount of time after which a connection request will be timed out. in milliseconds http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/TAcceptQueueServer.cpp@263 PS10, Line 263: if (queue_timeout_ms_ != 0) : entry->expiration_time_ = MonotonicMillis() + queue_timeout_ms_; May need to guard against negative value although it's user's fault in that case. if (queue_timeout_ms > 0) { } http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h File be/src/rpc/thrift-server.h: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@149 PS10, Line 149: int64_t queue_timeout_ms = 0); May make sense to document the meaning of this argument too. nit: this line seems to fit into the previous line too. http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@197 PS10, Line 197: /// Amount of time an accepted client connection will be kept in the accept : /// queue before it is timed out. Used in TAcceptQueueServer. May want to document that if it's 0, it means there is no timeout (similar to max_concurrent_connections_ above). http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/rpc/thrift-server.h@277 PS10, Line 277: } nit: missing blank line after this function. http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/10/be/src/service/impala-server.cc@239 PS10, Line 239: accepted_cnxn_timeout Should this have a more specific name such as "accepted_client_cnxn_timeout" as we use Thrift server for various internal services too ? http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/10/tests/custom_cluster/test_frontend_connection_limit.py@90 PS10, Line 90: except Exception as e: How do we catch cases in which the test passes without raising the exception ? -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Mar 2019 22:05:22 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 10: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2551/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Mar 2019 20:59:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 10: Code-Review+1 Carrying Andrew's +1 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Mar 2019 20:20:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Andrew Sherman, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#10). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 5 minutes. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. The timeout applies only to client facing thrift servers, i.e., HS2 and Beeswax servers. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/service/impala-server.cc M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 7 files changed, 248 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/10 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 10 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 8: (5 comments) Thanks for your comments, Andrew. Please see PS 9. http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h@97 PS8, Line 97: /// Name of the thrift server > Add terminating period (nit). Done http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h@113 PS8, Line 113: /// Number of connections rejected due to timeout > Add terminating period (nit). Done http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.cpp@220 PS8, Line 220: } catch (string s) { > I know this isn't part of this change, but can we really ever get string ex Good question. I am not certain about string exceptions, but the intent here seems to be to catch non-thrift exceptions. http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/service/impala-server.cc@241 PS8, Line 241: "the queue before we time it out and reject the connection request. A value of 0 " > In the flags that configure the accept queue we call it "the post-accept, p Changed the wording. Please let me know if this looks better. http://gerrit.cloudera.org:8080/#/c/12579/8/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/8/tests/custom_cluster/test_frontend_connection_limit.py@89 PS8, Line 89: raise ImpalaBeeswaxException(e.message, e) > I am confused about this. Why do we raise an exception here? I see that th We need to catch the exception to close() the session so that the query gets unregistered, and re-raise the exception. Otherwise, the async query hangs around and makes the test fail. The comment in the above line tries to allude to that. Please let me know if I should expand the comment. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 8 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 25 Mar 2019 21:34:14 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Andrew Sherman has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 8: (5 comments) All I have are nits and questions http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h@97 PS8, Line 97: /// Name of the thrift server Add terminating period (nit). http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.h@113 PS8, Line 113: /// Number of connections rejected due to timeout Add terminating period (nit). http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/rpc/TAcceptQueueServer.cpp@220 PS8, Line 220: } catch (string s) { I know this isn't part of this change, but can we really ever get string exceptions? http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/8/be/src/service/impala-server.cc@241 PS8, Line 241: "the queue before we time it out and reject the connection request. A value of 0 " In the flags that configure the accept queue we call it "the post-accept, pre-setup connection queue", which is rather verbose, but do we want to use the same wording for consistency? Also the description might be clearer if it was rewritten to not use "we" (this is a definite nit) http://gerrit.cloudera.org:8080/#/c/12579/8/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/8/tests/custom_cluster/test_frontend_connection_limit.py@89 PS8, Line 89: raise ImpalaBeeswaxException(e.message, e) I am confused about this. Why do we raise an exception here? I see that the test is marked xfail(raises=ImpalaBeeswaxException)but doesn't make the test fragile in the case where some other bit of code raises a ImpalaBeeswaxException? Maybe this is a python testing pattern I am unfamiliar with. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 8 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Mon, 25 Mar 2019 18:00:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2461/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 19 Mar 2019 03:17:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 8: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2462/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 8 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 19 Mar 2019 03:21:32 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#8). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 5 minutes. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. The timeout applies only to client facing thrift servers, i.e., HS2 and Beeswax servers. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/service/impala-server.cc M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 7 files changed, 245 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/8 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 8 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/12579/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/7/be/src/service/impala-server.cc@240 PS7, Line 240: "(Advanced) The amount of time in milliseconds an accepted connection will wait in the " line too long (92 > 90) -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 19 Mar 2019 02:37:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 7: (1 comment) Fixed http://gerrit.cloudera.org:8080/#/c/12579/7/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12579/7/be/src/service/impala-server.cc@240 PS7, Line 240: "(Advanced) The amount of time in milliseconds an accepted connection will wait in the " > line too long (92 > 90) Done -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 19 Mar 2019 02:39:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#7). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 5 minutes. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. The timeout applies only to client facing thrift servers, i.e., HS2 and Beeswax servers. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M be/src/rpc/thrift-server.cc M be/src/rpc/thrift-server.h M be/src/service/impala-server.cc M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 7 files changed, 245 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/7 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 7 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (6 comments) Addressed your comments, Michael. Please have another look. Thanks. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h@88 PS6, Line 88: void CleanupAndClose(const std::string& error, > Please add a brief comment about what this function does and what those par Done http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@29 PS6, Line 29: DEFINE_int64(accepted_cnxn_timeout, 300, : "(Advanced) The amount of time in seconds an accepted connection will wait in the " : "queue before we time it out and reject the connection request. A value of 0 " : "means there is no timeout."); > Sorry but now that I realize this may affect all existing Thrift based serv Changed the implementation to pass the timeout value via the constructor chain from ImpalaServer. Now only HS2 and Beeswax servers have the timeout. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@195 PS6, Line 195: wait_time > nit: coding convention in Impala usually keeps the unit as the suffix of th Done http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@205 PS6, Line 205: msecs > nit: we either use ms or milliseconds. Done http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@207 PS6, Line 207: (wait_result == THRIFT_ETIMEDOUT) > Do we care about any potential non-zero result value ? Should we bail out i It does not seem to return any other non-zero value besides THRIFT_TIMEDOUT. https://github.com/apache/thrift/blob/master/lib/cpp/src/thrift/concurrency/Monitor.cpp#L78 http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@265 PS6, Line 265: if (FLAGS_accepted_cnxn_timeout != 0) : entry->expiration_time_ = : MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; > nit styling: Done -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 19 Mar 2019 02:36:16 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (6 comments) Looking good. Please see some more comments. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h File be/src/rpc/TAcceptQueueServer.h: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.h@88 PS6, Line 88: void CleanupAndClose(const std::string& error, Please add a brief comment about what this function does and what those parameters are. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@29 PS6, Line 29: DEFINE_int64(accepted_cnxn_timeout, 300, : "(Advanced) The amount of time in seconds an accepted connection will wait in the " : "queue before we time it out and reject the connection request. A value of 0 " : "means there is no timeout."); Sorry but now that I realize this may affect all existing Thrift based services, I would think it's safe to only enable this for external client facing Thrift services (e.g. HS2, Beeswax). For Catalog and statestore and Impala internal connections, it may be better to leave it as no timeout. In other words, it may make sense to have this timeout as part of the constructor of TAcceptQueueServer and create flags only for HS2 and Beeswax related Thrift services. Sorry for not noticing this earlier. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@195 PS6, Line 195: wait_time nit: coding convention in Impala usually keeps the unit as the suffix of the variable name so this can be wait_time_ms; http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@205 PS6, Line 205: msecs nit: we either use ms or milliseconds. http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@207 PS6, Line 207: (wait_result == THRIFT_ETIMEDOUT) Do we care about any potential non-zero result value ? Should we bail out in those cases too ? http://gerrit.cloudera.org:8080/#/c/12579/6/be/src/rpc/TAcceptQueueServer.cpp@265 PS6, Line 265: if (FLAGS_accepted_cnxn_timeout != 0) : entry->expiration_time_ = : MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; nit styling: if (FLAGS_... != 0) { } -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 12 Mar 2019 01:50:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2362/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 06 Mar 2019 00:32:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#6). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 300 seconds. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 4 files changed, 212 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/6 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@198 PS5, Line 198: if (entry->expiration_time_ != 0) : wait_time = entry->expiration_time_ - MonotonicMillis(); > styling nit: Done http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@200 PS5, Line 200: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; > Does it make sense to log only if we are timing out a request below ? Why I am keeping it here is so we can diagnose if clients are potentially waiting for long periods for fe threads, even if timeout was infinite. A purely diagnostic log. So we could say, "If you want client connections to fail faster, please set connection timeout to x seconds". I can use a LOG_EVERY_N kind of macro to reduce the log pollution. Let me know what you think. http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@263 PS5, Line 263: MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; > nit: indent 4 Done http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@267 PS5, Line 267: connection_setup_pool.Offer( > In theory, this could block too but it's still counted towards the expirati This is a very interesting question. Regardless of where we spent time, expiration_time_ could've elapsed by the time we got around to the wait in SetupConnection. I'm leaning towards waiting for a MAX(1, expiration_time_ - MonotonicMillis()). What do you think? http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py@1 PS5, Line 1: > nit: blank line Done -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 5 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 05 Mar 2019 23:46:51 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 5: (5 comments) http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@198 PS5, Line 198: if (entry->expiration_time_ != 0) : wait_time = entry->expiration_time_ - MonotonicMillis(); styling nit: if (entry->expiration_time_ != 0) { wait_time = entry->expiration_time_ - MonotonicMillis(); } http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@200 PS5, Line 200: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; Does it make sense to log only if we are timing out a request below ? http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@263 PS5, Line 263: MonotonicMillis() + FLAGS_accepted_cnxn_timeout * MILLIS_PER_SEC; nit: indent 4 http://gerrit.cloudera.org:8080/#/c/12579/5/be/src/rpc/TAcceptQueueServer.cpp@267 PS5, Line 267: connection_setup_pool.Offer( In theory, this could block too but it's still counted towards the expiration_time_. So, is it possible for the code to reach line 199 above and compute a negative wait_time value ? http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/5/tests/custom_cluster/test_frontend_connection_limit.py@1 PS5, Line 1: nit: blank line -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 5 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 01 Mar 2019 23:30:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2311/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 4 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 01 Mar 2019 01:14:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#5). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 300 seconds. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 4 files changed, 209 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/5 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 5 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: (10 comments) Thanks for the review. Uploading PS 4 http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@28 PS3, Line 28: 120, > Will it be safer to have a higher default timeout (e.g. 300 seconds) to avo I've thought about this a bit, and came up with this value which I think is at the threshold of tolerance for a user trying to run an interactive query. But 300s is fine for me to. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@151 PS3, Line 151: const string& error) > nit: we tend to put the constant parameters followed by non-constant parame Done http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@193 PS3, Line 193: 0LL > nit: 'LL' not needed. Done http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@196 PS3, Line 196: if (entry->expiration_time_) > nit: if (entry->expiration_time_ != 0) { Done http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@197 PS3, Line 197: wait_time = entry->expiration_time_ - MonotonicMillis(); > Shouldn't this be inside the while loop below ? Done http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@199 PS3, Line 199: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; > Will this lead to log spam ? Potentially, if we're constantly maxing out fe threads and new connections keep coming. I can reduce the log frequency. Let me know if you feel strongly about this. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@204 PS3, Line 204: Timing out connection request."; > Any chance we can print some details (e.g. IP address, port) here ? I haven't seen a 'simple' way to do this, but definitely would like to add more info about the client as well as 'which' server this is, e.g., beeswax or hs2. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@239 PS3, Line 239: shared_ptr > This can be "unique_ptr" now that TTransport lives inside TAcceptQueueEntry Tried this, but since unique_ptr is not copyable compilation fails at line 242 (calling a destroyed object). Tried to massage it a bit, but readability gets compromised. Will stick with shared_ptr for now. The shared_ptr gets moved to the SetupConnection thread anyway so hopefully this is not too obscure. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@260 PS3, Line 260: if (FLAGS_accepted_cnxn_timeout) > nit: if (FLAGS_accepted_cnxn_timeout != 0) { Done http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@261 PS3, Line 261: MILLIS_PER_SEC; > long line. Done -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 01 Mar 2019 00:38:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#4). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 120 seconds. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 4 files changed, 209 insertions(+), 28 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/4 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 4 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: (10 comments) http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@28 PS3, Line 28: 120, Will it be safer to have a higher default timeout (e.g. 300 seconds) to avoid users' complaint given the current behavior is that we don't have a timeout ? Please also add a remark that setting it to 0 means there is no timeout. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@151 PS3, Line 151: const string& error) nit: we tend to put the constant parameters followed by non-constant parameters. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@193 PS3, Line 193: 0LL nit: 'LL' not needed. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@196 PS3, Line 196: if (entry->expiration_time_) nit: if (entry->expiration_time_ != 0) { http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@197 PS3, Line 197: wait_time = entry->expiration_time_ - MonotonicMillis(); Shouldn't this be inside the while loop below ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@199 PS3, Line 199: LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Waiting for " << wait_time << " msecs."; Will this lead to log spam ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@204 PS3, Line 204: Timing out connection request."; Any chance we can print some details (e.g. IP address, port) here ? http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@239 PS3, Line 239: shared_ptr This can be "unique_ptr" now that TTransport lives inside TAcceptQueueEntry. This makes ownership easier to understand. http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@260 PS3, Line 260: if (FLAGS_accepted_cnxn_timeout) nit: if (FLAGS_accepted_cnxn_timeout != 0) { http://gerrit.cloudera.org:8080/#/c/12579/3/be/src/rpc/TAcceptQueueServer.cpp@261 PS3, Line 261: MILLIS_PER_SEC; long line. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 27 Feb 2019 22:26:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 27 Feb 2019 00:04:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3835/ DRY_RUN=true -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Feb 2019 19:52:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2255/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Feb 2019 19:15:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Hello Michael Ho, Thomas Marshall, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12579 to look at the new patch set (#3). Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 120 seconds. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 4 files changed, 204 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/3 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 3 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 2: (11 comments) Fixed http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@19 PS2, Line 19: import sys > flake8: F401 'sys' imported but unused Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@23 PS2, Line 23: from getpass import getuser > flake8: F401 'getpass.getuser' imported but unused Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: > flake8: E251 unexpected spaces around keyword / parameter equals Done http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@91 PS2, Line 91: > flake8: E261 at least two spaces before inline comment Done -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Feb 2019 18:39:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/2246/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 26 Feb 2019 03:46:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12579 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 2: (11 comments) http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@19 PS2, Line 19: import sys flake8: F401 'sys' imported but unused http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@23 PS2, Line 23: from getpass import getuser flake8: F401 'getpass.getuser' imported but unused http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@64 PS2, Line 64: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@65 PS2, Line 65: flake8: E251 unexpected spaces around keyword / parameter equals http://gerrit.cloudera.org:8080/#/c/12579/2/tests/custom_cluster/test_frontend_connection_limit.py@91 PS2, Line 91: flake8: E261 at least two spaces before inline comment -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Tue, 26 Feb 2019 03:16:34 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12579 Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. If --accepted_cnxn_timeout > 0, new connection requests are rejected by the server if we can't get a server thread within the specified timeout. We set the default timeout to be 120 seconds. The old behavior can be restored by setting --accepted_cnxn_timeout=0, i.e., no timeout. Testing: Added a new custom cluster test suite to exercise the new code. Ran core and exhaustive tests. Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 --- M be/src/rpc/TAcceptQueueServer.cpp M be/src/rpc/TAcceptQueueServer.h M common/thrift/metrics.json A tests/custom_cluster/test_frontend_connection_limit.py 4 files changed, 206 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/79/12579/2 -- To view, visit http://gerrit.cloudera.org:8080/12579 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idb345c1d84cc2f691f54ded467f253e758f87e64 Gerrit-Change-Number: 12579 Gerrit-PatchSet: 2 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has abandoned this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Abandoned Changing the design. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: > I meant "the timeout can also be turned off by setting it to 0 or > something by user". We will have a non-zero timeout by default. Given the feedback so far, I wonder if we need to go back to the drawing board and re-think the approach? -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Wed, 23 Jan 2019 20:54:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (6 comments) > > (5 comments) > > > > The historical tidbit that struck my interests is the following: > > > > * This helps solve IMPALA-4135, where connections were timing out > > while waiting in the > > * OS accept queue, by ensuring that accept() is called as quickly > > as possible. > > */ > > class TAcceptQueueServer : public TServer { > > > > So, at some point in the past, users, when connecting to a busy > > impala, would hit a timeout. > > > (5 comments) > > > > The historical tidbit that struck my interests is the following: > > > > * This helps solve IMPALA-4135, where connections were timing out > > while waiting in the > > * OS accept queue, by ensuring that accept() is called as quickly > > as possible. > > */ > > class TAcceptQueueServer : public TServer { > > > > So, at some point in the past, users, when connecting to a busy > > impala, would hit a timeout. > > For some more context, IMPALA-4135 was addressing an issue where > backend connections would time out (in particular, due to queries > that contained a large number of fragments), I don't think we > particularly considered the impact on client connections. > > Now that we're transitioning the backend connections to krpc, the > original motivation for TAcceptQueueServer is becoming obsolete Thanks Thomas for sharing the historical context. Given what you said here, and the fact that we now use KRPC for the backend, I wonder if we could just revert to using TServer for the client facing servers? http://gerrit.cloudera.org:8080/#/c/12226/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12226/6//COMMIT_MSG@18 PS6, Line 18: By default we retain the current behavior. If > What's the point of this work if we don't change the default? When do we ex Great question! The motivation for setting it off my default was to 'preserve' the current blocking behavior of the HS2 client connection request; i.e., just to make sure we don't trigger some unintended consequences. Purely from a technical perspective, I'd rather have it on by default so we get the non-blocking server behavior. I'd be happy to turn it on by default. Let me know. http://gerrit.cloudera.org:8080/#/c/12226/6/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12226/6/be/src/rpc/TAcceptQueueServer.cpp@166 PS6, Line 166: if (FLAGS_reject_request_on_max_fe_threads) { : Synchronized s(tasksMonitor_); : if (maxTasks_ > 0 && tasks_.size() >= maxTasks_) { : LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Rejecting new connection request."; : throw TTransportException("Server busy. Please try again later."); : } : } > Regardless of the value of FLAGS_reject_request_on_max_fe_threads, do you w I think it makes sense to warn so that we can tell what's going on. Even if we export these metrics we don't always get those in support cases. http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@35 PS6, Line 35: statement = "select * from tpch_parquet.lineitem limit 5" > Please use a query with a "sleep(...)" in it. 'select * limit 5" can execut Done http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@47 PS6, Line 47: query = self.statement.format() > I don't think you need the .format(), either here or in line 99. Done http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@49 PS6, Line 49: client.execute(query) > Come to think of it, I don't even think you need to do a query. Just connec create_impala_client() does not seem to throw an exception when the server rejects the connection. I need to execute_query step to detect the connection failure... http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@50 PS6, Line 50: except Exception as e: : print 'Unexpected exception: ' + e.message : sys.exit(1) > It's not nice to sys.exit() from tests. The exception is enough. Just let i Done -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-N
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: I meant "the timeout can also be turned off by setting it to 0 or something by user". We will have a non-zero timeout by default. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 23 Jan 2019 01:51:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: It seems like a more gradual approach to tackling IMPALA-7800 is to implement a timeout instead of a boolean flag which rejects the new connection right away. The behavior of instant rejection can be emulated by having a really short timeout and we can have a large default timeout. The timeout should also be disabled by setting it to 0 or something. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Wed, 23 Jan 2019 01:48:45 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Thomas Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: > (5 comments) > > The historical tidbit that struck my interests is the following: > > * This helps solve IMPALA-4135, where connections were timing out > while waiting in the > * OS accept queue, by ensuring that accept() is called as quickly > as possible. > */ > class TAcceptQueueServer : public TServer { > > So, at some point in the past, users, when connecting to a busy > impala, would hit a timeout. > (5 comments) > > The historical tidbit that struck my interests is the following: > > * This helps solve IMPALA-4135, where connections were timing out > while waiting in the > * OS accept queue, by ensuring that accept() is called as quickly > as possible. > */ > class TAcceptQueueServer : public TServer { > > So, at some point in the past, users, when connecting to a busy > impala, would hit a timeout. For some more context, IMPALA-4135 was addressing an issue where backend connections would time out (in particular, due to queries that contained a large number of fragments), I don't think we particularly considered the impact on client connections. Now that we're transitioning the backend connections to krpc, the original motivation for TAcceptQueueServer is becoming obsolete -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Mon, 21 Jan 2019 20:27:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@49 PS6, Line 49: client.execute(query) Come to think of it, I don't even think you need to do a query. Just connecting eats the thread on the server side, so I don't think you need threading in the test at all. Just make 5 impala clients, and they'll each eat the thread. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 18 Jan 2019 18:00:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: (5 comments) The historical tidbit that struck my interests is the following: * This helps solve IMPALA-4135, where connections were timing out while waiting in the * OS accept queue, by ensuring that accept() is called as quickly as possible. */ class TAcceptQueueServer : public TServer { So, at some point in the past, users, when connecting to a busy impala, would hit a timeout. For our purposes, let's pretend that "busy" is that all fe_service_threads are handling a session, and those sessions are ... lasting forever. Then, we decided, with IMPALA-4135, to make that "connect timeout" not work, by accepting the connection right away, and users would block. And now, we have an option to go full circle, where the user will get rejected right away, albeit without a timeout, and with an error message. Part of the problem stems from the fact that "idle" sessions still eat up a thread, at least for impala-shell. (I ran bin/start-impala-cluster.py --impalad_args=-fe_service_threads=1 and ran two impala-shell's. One of them is just stuck, with the TCP connection opened, but progress blocked in "self.imp_service.PingImpalaService()" from the perspective of impala-shell.) What's the motivation to make this change, and, also to default it off? We can already tell that a user has N sessions because they're exposed in the UI. I think the metrics impala.thrift-server.beeswax-frontend.connection-setup-queue-size impala.thrift-server.beeswax-frontend.connections-in-use impala.thrift-server.beeswax-frontend.total-connections might be somewhat insufficient to tell us we're in this state, but exposing the task queue size directly as a metric is something we can do. Today, without this flag, if there are many non-interactive, but well-behaved users, they can submit 256 queries concurrently, and only 64 of them will run at a time because of the connection limit, but everyone will eventually drain out. With this flag, we'll see cases where some of this queries get rejected. http://gerrit.cloudera.org:8080/#/c/12226/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12226/6//COMMIT_MSG@18 PS6, Line 18: By default we retain the current behavior. If What's the point of this work if we don't change the default? When do we expect it to be used? http://gerrit.cloudera.org:8080/#/c/12226/6/be/src/rpc/TAcceptQueueServer.cpp File be/src/rpc/TAcceptQueueServer.cpp: http://gerrit.cloudera.org:8080/#/c/12226/6/be/src/rpc/TAcceptQueueServer.cpp@166 PS6, Line 166: if (FLAGS_reject_request_on_max_fe_threads) { : Synchronized s(tasksMonitor_); : if (maxTasks_ > 0 && tasks_.size() >= maxTasks_) { : LOG(INFO) << "All " << maxTasks_ << " server threads are in use. " : << "Rejecting new connection request."; : throw TTransportException("Server busy. Please try again later."); : } : } Regardless of the value of FLAGS_reject_request_on_max_fe_threads, do you want to (a) warn that tasks are sitting the queue in the log? Note that we already have queue_size_metric_, so maybe it's already monitored. http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py File tests/custom_cluster/test_frontend_connection_limit.py: http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@35 PS6, Line 35: statement = "select * from tpch_parquet.lineitem limit 5" Please use a query with a "sleep(...)" in it. 'select * limit 5" can execute very quickly, and your test_server_busy() test might be flaky because of it. http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@47 PS6, Line 47: query = self.statement.format() I don't think you need the .format(), either here or in line 99. I also don't think you need the variable query, which you only use once, and is just self.statement. http://gerrit.cloudera.org:8080/#/c/12226/6/tests/custom_cluster/test_frontend_connection_limit.py@50 PS6, Line 50: except Exception as e: : print 'Unexpected exception: ' + e.message : sys.exit(1) It's not nice to sys.exit() from tests. The exception is enough. Just let it percolate out. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12226 ) Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/1816/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Thomas Marshall Gerrit-Comment-Date: Fri, 18 Jan 2019 17:54:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7800: Reject new connections after --fe service threads
Zoram Thanga has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12226 Change subject: IMPALA-7800: Reject new connections after --fe_service_threads .. IMPALA-7800: Reject new connections after --fe_service_threads The current implementation of the FE thrift server waits indefinitely to open the new session, if the maximum number of FE service threads specified by --fe_service_threads has been allocated. This patch introduces a startup flag to control how the server should treat new connection requests if we have run out of the configured number of server threads. By default we retain the current behavior. If --reject_request_on_max_fe_threads=true, however, the new connection requests are rejected by the server. Testing: Added a new custom cluster test suite to exercise the new code. Ran core tests. Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 --- M be/src/rpc/TAcceptQueueServer.cpp A tests/custom_cluster/test_frontend_connection_limit.py 2 files changed, 115 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/26/12226/6 -- To view, visit http://gerrit.cloudera.org:8080/12226 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4b9d48aaff9e3b5854b5121798211c641c58a559 Gerrit-Change-Number: 12226 Gerrit-PatchSet: 6 Gerrit-Owner: Zoram Thanga