[jira] [Commented] (IMPALA-9154) KRPC DataStreamService threads blocked in PublishFilter
[ https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979682#comment-16979682 ] Michael Ho commented on IMPALA-9154: Given the fix is non-trivial, it may make sense to back out the offending change for now. > KRPC DataStreamService threads blocked in PublishFilter > --- > > Key: IMPALA-9154 > URL: https://issues.apache.org/jira/browse/IMPALA-9154 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.4.0 >Reporter: Tim Armstrong >Assignee: Fang-Yu Rao >Priority: Blocker > Labels: hang > Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt > > > I hit this on primitive_many_fragments when doing a single node perf run: > {noformat} > ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja > --workloads=targeted-perf --iterations=5 > {noformat}tan > I noticed that the query was hung and the execution threads were hung sending > row batches. Then looking at the RPCz page, all of the threads were busy: > !image-2019-11-13-08-30-27-178.png! > Multiple threads were stuck in UpdateFilter() - see [^pstack-exchange.txt]. > It looks like this is a deadlock bug because a KRPC thread is blocked waiting > for an RPC that needs to be served by one of the limited threads from that > same thread pool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9154) KRPC DataStreamService threads blocked in PublishFilter
[ https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980610#comment-16980610 ] ASF subversion and git services commented on IMPALA-9154: - Commit e716e76cccf59c2780571429b1b945d6bbc61b8d in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=e716e76 ] IMPALA-9154: Revert "IMPALA-7984: Port runtime filter from Thrift RPC to KRPC" The previous patch porting runtime filter from Thrift RPC to KRPC introduces a deadlock if there are a very limited number of threads on the Impala cluster. Specifically, in that patch a Coordinator used a synchronous KRPC to propagate an aggregated filter to other hosts. A deadlock would happen if there is no thread available on the receiving side to answer that KRPC especially the calling and receiving threads are called from the same thread pool. One possible way to address this issue is to make the call of propagating a runtime filter asynchronous to free the calling thread. Before resolving this issue, we revert this patch for now. This reverts commit ec11c18884988e838a8838e1e8ecc37461e1a138. Change-Id: I32371a515fb607da396914502da8c7fb071406bc Reviewed-on: http://gerrit.cloudera.org:8080/14780 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > KRPC DataStreamService threads blocked in PublishFilter > --- > > Key: IMPALA-9154 > URL: https://issues.apache.org/jira/browse/IMPALA-9154 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.4.0 >Reporter: Tim Armstrong >Assignee: Fang-Yu Rao >Priority: Blocker > Labels: hang > Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt > > > I hit this on primitive_many_fragments when doing a single node perf run: > {noformat} > ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja > --workloads=targeted-perf --iterations=5 > {noformat}tan > I noticed that the query was hung and the execution threads were hung sending > row batches. Then looking at the RPCz page, all of the threads were busy: > !image-2019-11-13-08-30-27-178.png! > Multiple threads were stuck in UpdateFilter() - see [^pstack-exchange.txt]. > It looks like this is a deadlock bug because a KRPC thread is blocked waiting > for an RPC that needs to be served by one of the limited threads from that > same thread pool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9154) KRPC DataStreamService threads blocked in PublishFilter
[ https://issues.apache.org/jira/browse/IMPALA-9154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019784#comment-17019784 ] ASF subversion and git services commented on IMPALA-9154: - Commit 79aae231443a305ce8503dbc7b4335e8ae3f3946 in impala's branch refs/heads/master from Fang-Yu Rao [ https://gitbox.apache.org/repos/asf?p=impala.git;h=79aae23 ] IMPALA-9154: Make runtime filter propagation asynchronous This patch fixes a bug introduced by IMPALA-7984 that ports the functions implementing the aggregation and propagation of runtime filters from Thrift RPC to KRPC. Specifically, in IMPALA-7984, the propagation of an aggregated runtime filter was implemented using the synchronous KRPC. Hence, when there is a very limited number of KRPC threads for Impala's data stream service, e.g., 1, there will be a deadlock if the node running the Coordinator is trying to propagate the aggregated filter to the same node running the Coordinator since there is no available thread to receive the aggregated filter. This patch makes the propagation of an aggregated runtime filter asynchronous to address the issue described above. To prevent the memory consumed by the aggregated filter from being reclaimed when the aggregated filter is still referenced by some inflight KRPC's, we add an additional field in the class Coordinator::FilterState to keep track of the number of inflight KRPC's for the propagation of this aggregated filter to make sure that we will reclaim the memory only when all the associated KRPC's have completed. Moreover, when ReleaseExecResources() is invoked by the Coordinator to release all the resources associated with query execution, including the memory consumed by the aggregated runtime filters, we make sure the consumed memory by the aggregated filters is released only when the inflight KRPC's associated with each aggregated filter have finished. Testing: - Passed primitive_many_fragments.test with the database tpch30 in an Impala minicluster started with the parameter --impalad_args=--datastream_service_num_svc_threads=1. - Passed the exhaustive tests in the DEBUG build. - Passed the core tests in the ASAN build. Change-Id: Ifb6726d349be701f3a0602b2ad5a934082f188a0 Reviewed-on: http://gerrit.cloudera.org:8080/14975 Reviewed-by: Tim Armstrong Tested-by: Impala Public Jenkins > KRPC DataStreamService threads blocked in PublishFilter > --- > > Key: IMPALA-9154 > URL: https://issues.apache.org/jira/browse/IMPALA-9154 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec >Affects Versions: Impala 3.4.0 >Reporter: Tim Armstrong >Assignee: Fang-Yu Rao >Priority: Blocker > Labels: hang > Attachments: image-2019-11-13-08-30-27-178.png, pstack-exchange.txt > > > I hit this on primitive_many_fragments when doing a single node perf run: > {noformat} > ./bin/single_node_perf_run.py --num_impalads=1 --scale=30 --ninja > --workloads=targeted-perf --iterations=5 > {noformat}tan > I noticed that the query was hung and the execution threads were hung sending > row batches. Then looking at the RPCz page, all of the threads were busy: > !image-2019-11-13-08-30-27-178.png! > Multiple threads were stuck in UpdateFilter() - see [^pstack-exchange.txt]. > It looks like this is a deadlock bug because a KRPC thread is blocked waiting > for an RPC that needs to be served by one of the limited threads from that > same thread pool -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org