[ 
https://issues.apache.org/jira/browse/IMPALA-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313707#comment-17313707
 ] 

Quanlong Huang commented on IMPALA-10621:
-----------------------------------------

Thank [~wzhou]! Assigning this to me first. It seems a flaky issue. Haven't 
seen it in later runs. Will try to reproduce it.

> TestSpillingDebugActionDimensions.test_spilling_large_rows hanging
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10621
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10621
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Critical
>
> Saw a possible haning in test_spilling_large_rows in a downstream exhaustive 
> release build. The query didn't finish in 12mins:
> {code:java}
> query_test/test_spilling.py:83: in test_spilling_large_rows
>     self.run_test_case('QueryTest/spilling-large-rows', vector, 
> unique_database)
> common/impala_test_suite.py:665: in run_test_case
>     result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:603: in __exec_in_impala
>     result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:923: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:205: in execute
>     result = self.fetch_results(query_string, handle)
> beeswax/impala_beeswax.py:452: in fetch_results
>     exec_result = self.__fetch_results(query_handle, max_rows)
> beeswax/impala_beeswax.py:463: in __fetch_results
>     results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, 
> fetch_rows))
> beeswax/impala_beeswax.py:520: in __do_rpc
>     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E    MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to 
> unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max 
> allowed lag is 720000ms) {code}
> impalad.INFO indicates it's waiting for the coordinator fragment to finish:
> {code:java}
> I0325 23:54:16.146754 16521 Frontend.java:1600] 
> b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_6df9f183
> I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] 
> b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
> I0325 23:54:16.147109 16521 Frontend.java:1643] 
> b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
> I0325 23:54:16.148593 20605 admission-controller.cc:1747] 
> b94d5cd2894cbe68:68acfd6300000000] Trying to admit 
> id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool 
> executor_group_name=default per_host_mem_estimate=309.98 MB 
> dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0325 23:54:16.148609 20605 admission-controller.cc:1755] 
> b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, 
> agg_num_queued=0, agg_mem_reserved=1.00 GB,  local_host(local_mem_admitted=0, 
> num_admitted_running=0, num_queued=0, backend_mem_reserved=0, 
> topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, 
> 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], 
> total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, 
> pool_total_mem=0, average_per_query=0)
> I0325 23:54:16.148628 20605 admission-controller.cc:1215] 
> b94d5cd2894cbe68:68acfd6300000000] Admitting query 
> id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.148648 20605 impala-server.cc:2150] 
> b94d5cd2894cbe68:68acfd6300000000] Registering query locations
> I0325 23:54:16.148655 20605 coordinator.cc:150] 
> b94d5cd2894cbe68:68acfd6300000000] Exec() 
> query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr
> I0325 23:54:16.148877 20605 coordinator.cc:474] 
> b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.149272 27701 control-service.cc:148] 
> b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
> #instances=3
> I0325 23:54:16.149561 20605 coordinator.cc:533] 
> b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.149663 20609 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000002] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 
> per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
> I0325 23:54:16.149741 20611 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000000] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 
> per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
> I0325 23:54:16.149821 20613 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000004] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 
> per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
> I0325 23:54:16.795727 20609 query-state.cc:957] 
> b94d5cd2894cbe68:68acfd6300000002] Instance completed. 
> instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
> I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] 
> b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): 
> fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
> I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] 
> get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: 
> host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 
> remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] 
> query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: 
> impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
> I0325 23:54:16.968387 20613 query-state.cc:957] 
> b94d5cd2894cbe68:68acfd6300000004] Instance completed. 
> instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
> I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] 
> b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): 
> fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
> I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query 
> id=b94d5cd2894cbe68:68acfd6300000000 execution completed
> I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for 
> backends to finish, 1 remaining. 
> query_id=b94d5cd2894cbe68:68acfd6300000000{code}
> After 12mins, it reaches the timeout limit:
> {code:java}
> I0326 00:06:54.688612 28694 status.cc:82] Query 
> b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 
> 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 
> 720000ms)
>     @           0xc93371
>     @          0x1398edc
>     @          0x138fd14
>     @          0x1596d91
>     @          0x159820a
>     @          0x1e20f11
>     @     0x7f57d79ede24
>     @     0x7f57d443734c
> I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): 
> cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: 
> Sending CancelQueryFInstances rpc
> I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: 
> query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
> I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: 
> Not cancelling because the backend is already done:
> I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() 
> query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
> I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control 
> resources for query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: 
> Not cancelling because the backend is already done: Cancelled
> I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: 
> Not cancelling because the backend is already done:
> I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() 
> query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
> {code}
> This seems similar to IMPALA-10339.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to