[ https://issues.apache.org/jira/browse/IMPALA-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313707#comment-17313707 ]
Quanlong Huang commented on IMPALA-10621: ----------------------------------------- Thank [~wzhou]! Assigning this to me first. It seems a flaky issue. Haven't seen it in later runs. Will try to reproduce it. > TestSpillingDebugActionDimensions.test_spilling_large_rows hanging > ------------------------------------------------------------------ > > Key: IMPALA-10621 > URL: https://issues.apache.org/jira/browse/IMPALA-10621 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Critical > > Saw a possible haning in test_spilling_large_rows in a downstream exhaustive > release build. The query didn't finish in 12mins: > {code:java} > query_test/test_spilling.py:83: in test_spilling_large_rows > self.run_test_case('QueryTest/spilling-large-rows', vector, > unique_database) > common/impala_test_suite.py:665: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:603: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:923: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:205: in execute > result = self.fetch_results(query_string, handle) > beeswax/impala_beeswax.py:452: in fetch_results > exec_result = self.__fetch_results(query_handle, max_rows) > beeswax/impala_beeswax.py:463: in __fetch_results > results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, > fetch_rows)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> > E MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to > unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max > allowed lag is 720000ms) {code} > impalad.INFO indicates it's waiting for the coordinator fragment to finish: > {code:java} > I0325 23:54:16.146754 16521 Frontend.java:1600] > b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_6df9f183 > I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] > b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms > I0325 23:54:16.147109 16521 Frontend.java:1643] > b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished. > I0325 23:54:16.148593 20605 admission-controller.cc:1747] > b94d5cd2894cbe68:68acfd6300000000] Trying to admit > id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool > executor_group_name=default per_host_mem_estimate=309.98 MB > dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 > max_mem=-1.00 B > I0325 23:54:16.148609 20605 admission-controller.cc:1755] > b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, > agg_num_queued=0, agg_mem_reserved=1.00 GB, local_host(local_mem_admitted=0, > num_admitted_running=0, num_queued=0, backend_mem_reserved=0, > topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, > 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], > total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, > pool_total_mem=0, average_per_query=0) > I0325 23:54:16.148628 20605 admission-controller.cc:1215] > b94d5cd2894cbe68:68acfd6300000000] Admitting query > id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.148648 20605 impala-server.cc:2150] > b94d5cd2894cbe68:68acfd6300000000] Registering query locations > I0325 23:54:16.148655 20605 coordinator.cc:150] > b94d5cd2894cbe68:68acfd6300000000] Exec() > query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr > I0325 23:54:16.148877 20605 coordinator.cc:474] > b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.149272 27701 control-service.cc:148] > b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 > #instances=3 > I0325 23:54:16.149561 20605 coordinator.cc:533] > b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.149663 20609 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000002] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 > per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1 > I0325 23:54:16.149741 20611 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000000] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 > per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 > I0325 23:54:16.149821 20613 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000004] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 > per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3 > I0325 23:54:16.795727 20609 query-state.cc:957] > b94d5cd2894cbe68:68acfd6300000002] Instance completed. > instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK > I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] > b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): > fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2 > I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] > get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: > host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 > remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] > query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: > impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 > I0325 23:54:16.968387 20613 query-state.cc:957] > b94d5cd2894cbe68:68acfd6300000004] Instance completed. > instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK > I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] > b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): > fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4 > I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query > id=b94d5cd2894cbe68:68acfd6300000000 execution completed > I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for > backends to finish, 1 remaining. > query_id=b94d5cd2894cbe68:68acfd6300000000{code} > After 12mins, it reaches the timeout limit: > {code:java} > I0326 00:06:54.688612 28694 status.cc:82] Query > b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: > 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is > 720000ms) > @ 0xc93371 > @ 0x1398edc > @ 0x138fd14 > @ 0x1596d91 > @ 0x159820a > @ 0x1e20f11 > @ 0x7f57d79ede24 > @ 0x7f57d443734c > I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): > cancelling query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: > Sending CancelQueryFInstances rpc > I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: > query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4 > I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: > Not cancelling because the backend is already done: > I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() > query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends > I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control > resources for query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: > Not cancelling because the backend is already done: Cancelled > I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: > Not cancelling because the backend is already done: > I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() > query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends > {code} > This seems similar to IMPALA-10339. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org