[jira] [Comment Edited] (IMPALA-10621) TestSpillingDebugActionDimensions.test_spilling_large_rows hanging

Wenzhe Zhou (Jira) Thu, 01 Apr 2021 15:30:06 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313453#comment-17313453
 ]


Wenzhe Zhou edited comment on IMPALA-10621 at 4/1/21, 10:29 PM:
----------------------------------------------------------------

Checked the log messages. This issue is different from  IMPALA-10339.

For IMPALA-10339, the query is expected to fail with error "Cannot perform hash 
join at node with id". The log message on the executor shows that the executor 
got the expected error when executing one fragment instance, and the executor 
sent the status reports to coordinator. But due to a bug to handle error with 
race condition, executor sent status report with the fragment instance as 
"done" without error first, then sent last status report with error. The 
coordinator ignore the last status report since all the fragment instances were 
"done" and the backend was marked as "completed" before it received last status 
report. So the query was not cancelled by coordinator after executor 
encountered error. The race condition was  fixed by the patch for IMPALA-10259.

The query for this issue is expected to finish successfully. We did not see any 
error for the query in the log files on coordinator and executor. One fragment 
instance was not completed on the executor which was resident on coordinator. 
Don't know why the execution of the fragment instance hang. There were some 
core-dumps for the build, but those core-dumps happened in the time range after 
this issue happened.

Log message on coordinator/executor

I0325 23:54:16.146649 16521 impala-server.cc:1324] 
b94d5cd2894cbe68:68acfd6300000000] Registered query 
query_id=b94d5cd2894cbe68:68acfd630000
 0000 session_id=ae4773fa0d696261:93f32cf8a19f93bc
 I0325 23:54:16.146754 16521 Frontend.java:1600] 
b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
 group by bigstr db: test_spilling_large_rows_6df9f183
 I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] 
b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
 I0325 23:54:16.147109 16521 Frontend.java:1643] 
b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
 I0325 23:54:16.148593 20605 admission-controller.cc:1747] 
b94d5cd2894cbe68:68acfd6300000000] Trying to admit 
id=b94d5cd2894cbe68:68acfd630000
 0000 in pool_name=default-pool executor_group_name=default 
per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB 
max_request
 s=-1 max_queued=200 max_mem=-1.00 B
 I0325 23:54:16.148609 20605 admission-controller.cc:1755] 
b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, 
agg_
 mem_reserved=1.00 GB, local_host(local_mem_admitted=0, num_admitted_running=0, 
num_queued=0, backend_mem_reserved=0, topN_query_stats: queri
 es=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, 
cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_leve
 l_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0)
 I0325 23:54:16.148628 20605 admission-controller.cc:1215] 
b94d5cd2894cbe68:68acfd6300000000] Admitting query 
id=b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.148648 20605 impala-server.cc:2150] 
b94d5cd2894cbe68:68acfd6300000000] Registering query locations
 I0325 23:54:16.148655 20605 coordinator.cc:150] 
b94d5cd2894cbe68:68acfd6300000000] Exec() 
query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select 
group_concat(string_col), length(bigstr) from bigstrs2
 group by bigstr
 I0325 23:54:16.148877 20605 coordinator.cc:474] 
b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.149272 27701 control-service.cc:148] 
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
#instances=3
 I0325 23:54:16.149561 20605 coordinator.cc:533] 
b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.149663 20609 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000002] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
 I0325 23:54:16.149741 20611 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000000] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
 I0325 23:54:16.149821 20613 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000004] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
 I0325 23:54:16.795727 20609 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000002] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
 I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
 I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] 
get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 
remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] 
query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: 
impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
 I0325 23:54:16.968387 20613 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000004] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
 I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
 I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query 
id=b94d5cd2894cbe68:68acfd6300000000 execution completed
 I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for 
backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000
 ...
 0325 23:54:21.159895 20608 query-state.cc:738] 
b94d5cd2894cbe68:68acfd6300000000] Cancelling fragment instances as directed by 
the coordinator. Returned status: Cancelled
 I0325 23:54:21.159915 20608 query-state.cc:974] 
b94d5cd2894cbe68:68acfd6300000000] Cancel: 
query_id=b94d5cd2894cbe68:68acfd6300000000

===

Log message on executor

I0325 23:54:16.149294 28040 control-service.cc:148] 
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
#instances=2
 I0325 23:54:16.149729 20610 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000001] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000001 fragment_idx=2 
per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1
 I0325 23:54:16.149785 20612 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000003] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000003 fragment_idx=1 
per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=2
 I0325 23:54:16.309967 20610 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000001] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000001 #in-flight=1 status=OK
 I0325 23:54:16.719277 28724 ImpaladCatalog.java:225] Adding: 
DATABASE:test_do_methods_1dc275eb version: 61783 size: 289
 I0325 23:54:16.719313 28724 ImpaladCatalog.java:225] Adding: 
TABLE:test_do_methods_1dc275eb.test_do_methods version: 61785 size: 76
 I0325 23:54:16.719333 28724 ImpaladCatalog.java:225] Adding: 
DATABASE:test_print_header_92c38e62 version: 61782 size: 297
 I0325 23:54:16.719349 28724 ImpaladCatalog.java:225] Adding: 
TABLE:test_print_header_92c38e62.test_print_header version: 61784 size: 80
 I0325 23:54:16.719367 28724 ImpaladCatalog.java:225] Adding: 
CATALOG_SERVICE_ID version: 61785 size: 60
 I0325 23:54:16.719414 28724 impala-server.cc:2016] Catalog topic update 
applied with version: 61785 new min catalog object version: 36076
 I0325 23:54:16.808297 20612 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000003] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000003, node=2
 I0325 23:54:16.899598 20612 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000003] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000003 #in-flight=0 status=OK
 I0325 23:54:16.899616 20607 query-state.cc:468] 
b94d5cd2894cbe68:68acfd6300000000] UpdateBackendExecState(): last report for 
b94d5cd2894cbe68:68acfd6300000000
 I0325 23:54:16.923373 20607 query-exec-mgr.cc:213] 
b94d5cd2894cbe68:68acfd6300000000] ReleaseQueryState(): deleted 
query_id=b94d5cd2894cbe68:68acfd6300000000


was (Author: wzhou):
Checked the log messages. This issue is different from  IMPALA-10339.

For IMPALA-10339, the query is expected to fail with error "Cannot perform hash 
join at node with id". The log message on the executor shows that the executor 
got the expected error when executing one fragment instance, and the executor 
sent the status reports to coordinator. But due to a bug to handle race 
condition, executor sent status report with the fragment instance as "done" 
without error first, then sent last status report with error. The coordinator 
ignore the last status report since all the fragment instances were "done" and 
the backend was marked as "completed" before it received last status report. So 
the query was not cancelled by coordinator after executor encountered error. 
The race condition will be fixed by the patch for IMPALA-10259.

The query for this issue is expected to finish successfully. We did not see any 
error for the query in the log files on coordinator and executor. One fragment 
instance was not completed on the executor which was resident on coordinator. 
Don't know why the execution of the fragment instance hang. There were some 
core-dumps for the build, but those core-dumps happened in the time range after 
this issue happened.

Log message on coordinator/executor

I0325 23:54:16.146649 16521 impala-server.cc:1324] 
b94d5cd2894cbe68:68acfd6300000000] Registered query 
query_id=b94d5cd2894cbe68:68acfd630000
0000 session_id=ae4773fa0d696261:93f32cf8a19f93bc
I0325 23:54:16.146754 16521 Frontend.java:1600] 
b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr db: test_spilling_large_rows_6df9f183
I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] 
b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
I0325 23:54:16.147109 16521 Frontend.java:1643] 
b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
I0325 23:54:16.148593 20605 admission-controller.cc:1747] 
b94d5cd2894cbe68:68acfd6300000000] Trying to admit 
id=b94d5cd2894cbe68:68acfd630000
0000 in pool_name=default-pool executor_group_name=default 
per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB 
max_request
s=-1 max_queued=200 max_mem=-1.00 B
I0325 23:54:16.148609 20605 admission-controller.cc:1755] 
b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, 
agg_
mem_reserved=1.00 GB, local_host(local_mem_admitted=0, num_admitted_running=0, 
num_queued=0, backend_mem_reserved=0, topN_query_stats: queri
es=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, 
cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_leve
l_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0)
I0325 23:54:16.148628 20605 admission-controller.cc:1215] 
b94d5cd2894cbe68:68acfd6300000000] Admitting query 
id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.148648 20605 impala-server.cc:2150] 
b94d5cd2894cbe68:68acfd6300000000] Registering query locations
I0325 23:54:16.148655 20605 coordinator.cc:150] 
b94d5cd2894cbe68:68acfd6300000000] Exec() 
query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select 
group_concat(string_col), length(bigstr) from bigstrs2
group by bigstr
I0325 23:54:16.148877 20605 coordinator.cc:474] 
b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149272 27701 control-service.cc:148] 
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
#instances=3
I0325 23:54:16.149561 20605 coordinator.cc:533] 
b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for 
query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.149663 20609 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000002] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
I0325 23:54:16.149741 20611 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000000] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
I0325 23:54:16.149821 20613 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000004] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 
per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
I0325 23:54:16.795727 20609 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000002] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] 
get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: 
host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 
remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] 
query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: 
impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
I0325 23:54:16.968387 20613 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000004] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query 
id=b94d5cd2894cbe68:68acfd6300000000 execution completed
I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for 
backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000
...
0325 23:54:21.159895 20608 query-state.cc:738] 
b94d5cd2894cbe68:68acfd6300000000] Cancelling fragment instances as directed by 
the coordinator. Returned status: Cancelled
I0325 23:54:21.159915 20608 query-state.cc:974] 
b94d5cd2894cbe68:68acfd6300000000] Cancel: 
query_id=b94d5cd2894cbe68:68acfd6300000000

===

Log message on executor

I0325 23:54:16.149294 28040 control-service.cc:148] 
b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
query_id=b94d5cd2894cbe68:68acfd6300000000 
coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
#instances=2
I0325 23:54:16.149729 20610 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000001] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000001 fragment_idx=2 
per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1
I0325 23:54:16.149785 20612 query-state.cc:948] 
b94d5cd2894cbe68:68acfd6300000003] Executing instance. 
instance_id=b94d5cd2894cbe68:68acfd6300000003 fragment_idx=1 
per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=2
I0325 23:54:16.309967 20610 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000001] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000001 #in-flight=1 status=OK
I0325 23:54:16.719277 28724 ImpaladCatalog.java:225] Adding: 
DATABASE:test_do_methods_1dc275eb version: 61783 size: 289
I0325 23:54:16.719313 28724 ImpaladCatalog.java:225] Adding: 
TABLE:test_do_methods_1dc275eb.test_do_methods version: 61785 size: 76
I0325 23:54:16.719333 28724 ImpaladCatalog.java:225] Adding: 
DATABASE:test_print_header_92c38e62 version: 61782 size: 297
I0325 23:54:16.719349 28724 ImpaladCatalog.java:225] Adding: 
TABLE:test_print_header_92c38e62.test_print_header version: 61784 size: 80
I0325 23:54:16.719367 28724 ImpaladCatalog.java:225] Adding: CATALOG_SERVICE_ID 
version: 61785 size: 60
I0325 23:54:16.719414 28724 impala-server.cc:2016] Catalog topic update applied 
with version: 61785 new min catalog object version: 36076
I0325 23:54:16.808297 20612 krpc-data-stream-mgr.cc:308] 
b94d5cd2894cbe68:68acfd6300000003] DeregisterRecvr(): 
fragment_instance_id=b94d5cd2894cbe68:68acfd6300000003, node=2
I0325 23:54:16.899598 20612 query-state.cc:957] 
b94d5cd2894cbe68:68acfd6300000003] Instance completed. 
instance_id=b94d5cd2894cbe68:68acfd6300000003 #in-flight=0 status=OK
I0325 23:54:16.899616 20607 query-state.cc:468] 
b94d5cd2894cbe68:68acfd6300000000] UpdateBackendExecState(): last report for 
b94d5cd2894cbe68:68acfd6300000000
I0325 23:54:16.923373 20607 query-exec-mgr.cc:213] 
b94d5cd2894cbe68:68acfd6300000000] ReleaseQueryState(): deleted 
query_id=b94d5cd2894cbe68:68acfd6300000000

> TestSpillingDebugActionDimensions.test_spilling_large_rows hanging
> ------------------------------------------------------------------
>
>                 Key: IMPALA-10621
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10621
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Priority: Critical
>
> Saw a possible haning in test_spilling_large_rows in a downstream exhaustive 
> release build. The query didn't finish in 12mins:
> {code:java}
> query_test/test_spilling.py:83: in test_spilling_large_rows
>     self.run_test_case('QueryTest/spilling-large-rows', vector, 
> unique_database)
> common/impala_test_suite.py:665: in run_test_case
>     result = exec_fn(query, user=test_section.get('USER', '').strip() or None)
> common/impala_test_suite.py:603: in __exec_in_impala
>     result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:923: in __execute_query
>     return impalad_client.execute(query, user=user)
> common/impala_connection.py:205: in execute
>     return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:205: in execute
>     result = self.fetch_results(query_string, handle)
> beeswax/impala_beeswax.py:452: in fetch_results
>     exec_result = self.__fetch_results(query_handle, max_rows)
> beeswax/impala_beeswax.py:463: in __fetch_results
>     results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, 
> fetch_rows))
> beeswax/impala_beeswax.py:520: in __do_rpc
>     raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> E    INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'>
> E    MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to 
> unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max 
> allowed lag is 720000ms) {code}
> impalad.INFO indicates it's waiting for the coordinator fragment to finish:
> {code:java}
> I0325 23:54:16.146754 16521 Frontend.java:1600] 
> b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr db: test_spilling_large_rows_6df9f183
> I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] 
> b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms
> I0325 23:54:16.147109 16521 Frontend.java:1643] 
> b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished.
> I0325 23:54:16.148593 20605 admission-controller.cc:1747] 
> b94d5cd2894cbe68:68acfd6300000000] Trying to admit 
> id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool 
> executor_group_name=default per_host_mem_estimate=309.98 MB 
> dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 
> max_mem=-1.00 B
> I0325 23:54:16.148609 20605 admission-controller.cc:1755] 
> b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, 
> agg_num_queued=0, agg_mem_reserved=1.00 GB,  local_host(local_mem_admitted=0, 
> num_admitted_running=0, num_queued=0, backend_mem_reserved=0, 
> topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, 
> 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], 
> total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, 
> pool_total_mem=0, average_per_query=0)
> I0325 23:54:16.148628 20605 admission-controller.cc:1215] 
> b94d5cd2894cbe68:68acfd6300000000] Admitting query 
> id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.148648 20605 impala-server.cc:2150] 
> b94d5cd2894cbe68:68acfd6300000000] Registering query locations
> I0325 23:54:16.148655 20605 coordinator.cc:150] 
> b94d5cd2894cbe68:68acfd6300000000] Exec() 
> query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select 
> group_concat(string_col), length(bigstr) from bigstrs2
> group by bigstr
> I0325 23:54:16.148877 20605 coordinator.cc:474] 
> b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.149272 27701 control-service.cc:148] 
> b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000 
> coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 
> #instances=3
> I0325 23:54:16.149561 20605 coordinator.cc:533] 
> b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.149663 20609 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000002] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 
> per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1
> I0325 23:54:16.149741 20611 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000000] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 
> per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
> I0325 23:54:16.149821 20613 query-state.cc:948] 
> b94d5cd2894cbe68:68acfd6300000004] Executing instance. 
> instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 
> per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3
> I0325 23:54:16.795727 20609 query-state.cc:957] 
> b94d5cd2894cbe68:68acfd6300000002] Instance completed. 
> instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK
> I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] 
> b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): 
> fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2
> I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] 
> get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: 
> host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 
> remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000
> I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] 
> query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: 
> impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000
> I0325 23:54:16.968387 20613 query-state.cc:957] 
> b94d5cd2894cbe68:68acfd6300000004] Instance completed. 
> instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK
> I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] 
> b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): 
> fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4
> I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query 
> id=b94d5cd2894cbe68:68acfd6300000000 execution completed
> I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for 
> backends to finish, 1 remaining. 
> query_id=b94d5cd2894cbe68:68acfd6300000000{code}
> After 12mins, it reaches the timeout limit:
> {code:java}
> I0326 00:06:54.688612 28694 status.cc:82] Query 
> b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: 
> 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is 
> 720000ms)
>     @           0xc93371
>     @          0x1398edc
>     @          0x138fd14
>     @          0x1596d91
>     @          0x159820a
>     @          0x1e20f11
>     @     0x7f57d79ede24
>     @     0x7f57d443734c
> I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): 
> cancelling query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: 
> Sending CancelQueryFInstances rpc
> I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: 
> query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4
> I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: 
> Not cancelling because the backend is already done:
> I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() 
> query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends
> I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control 
> resources for query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): 
> query_id=b94d5cd2894cbe68:68acfd6300000000
> I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: 
> Not cancelling because the backend is already done: Cancelled
> I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] 
> query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: 
> Not cancelling because the backend is already done:
> I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() 
> query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends
> {code}
> This seems similar to IMPALA-10339.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-10621) TestSpillingDebugActionDimensions.test_spilling_large_rows hanging

Reply via email to