[ https://issues.apache.org/jira/browse/IMPALA-10621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313453#comment-17313453 ]
Wenzhe Zhou edited comment on IMPALA-10621 at 4/1/21, 10:29 PM: ---------------------------------------------------------------- Checked the log messages. This issue is different from IMPALA-10339. For IMPALA-10339, the query is expected to fail with error "Cannot perform hash join at node with id". The log message on the executor shows that the executor got the expected error when executing one fragment instance, and the executor sent the status reports to coordinator. But due to a bug to handle error with race condition, executor sent status report with the fragment instance as "done" without error first, then sent last status report with error. The coordinator ignore the last status report since all the fragment instances were "done" and the backend was marked as "completed" before it received last status report. So the query was not cancelled by coordinator after executor encountered error. The race condition was fixed by the patch for IMPALA-10259. The query for this issue is expected to finish successfully. We did not see any error for the query in the log files on coordinator and executor. One fragment instance was not completed on the executor which was resident on coordinator. Don't know why the execution of the fragment instance hang. There were some core-dumps for the build, but those core-dumps happened in the time range after this issue happened. Log message on coordinator/executor I0325 23:54:16.146649 16521 impala-server.cc:1324] b94d5cd2894cbe68:68acfd6300000000] Registered query query_id=b94d5cd2894cbe68:68acfd630000 0000 session_id=ae4773fa0d696261:93f32cf8a19f93bc I0325 23:54:16.146754 16521 Frontend.java:1600] b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_6df9f183 I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms I0325 23:54:16.147109 16521 Frontend.java:1643] b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished. I0325 23:54:16.148593 20605 admission-controller.cc:1747] b94d5cd2894cbe68:68acfd6300000000] Trying to admit id=b94d5cd2894cbe68:68acfd630000 0000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB max_request s=-1 max_queued=200 max_mem=-1.00 B I0325 23:54:16.148609 20605 admission-controller.cc:1755] b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, agg_ mem_reserved=1.00 GB, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queri es=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_leve l_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0) I0325 23:54:16.148628 20605 admission-controller.cc:1215] b94d5cd2894cbe68:68acfd6300000000] Admitting query id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.148648 20605 impala-server.cc:2150] b94d5cd2894cbe68:68acfd6300000000] Registering query locations I0325 23:54:16.148655 20605 coordinator.cc:150] b94d5cd2894cbe68:68acfd6300000000] Exec() query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr I0325 23:54:16.148877 20605 coordinator.cc:474] b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149272 27701 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=3 I0325 23:54:16.149561 20605 coordinator.cc:533] b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149663 20609 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000002] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1 I0325 23:54:16.149741 20611 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000000] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0325 23:54:16.149821 20613 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000004] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3 I0325 23:54:16.795727 20609 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000002] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2 I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 I0325 23:54:16.968387 20613 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000004] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4 I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query id=b94d5cd2894cbe68:68acfd6300000000 execution completed I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000 ... 0325 23:54:21.159895 20608 query-state.cc:738] b94d5cd2894cbe68:68acfd6300000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I0325 23:54:21.159915 20608 query-state.cc:974] b94d5cd2894cbe68:68acfd6300000000] Cancel: query_id=b94d5cd2894cbe68:68acfd6300000000 === Log message on executor I0325 23:54:16.149294 28040 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=2 I0325 23:54:16.149729 20610 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000001] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000001 fragment_idx=2 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1 I0325 23:54:16.149785 20612 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000003] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000003 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=2 I0325 23:54:16.309967 20610 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000001] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000001 #in-flight=1 status=OK I0325 23:54:16.719277 28724 ImpaladCatalog.java:225] Adding: DATABASE:test_do_methods_1dc275eb version: 61783 size: 289 I0325 23:54:16.719313 28724 ImpaladCatalog.java:225] Adding: TABLE:test_do_methods_1dc275eb.test_do_methods version: 61785 size: 76 I0325 23:54:16.719333 28724 ImpaladCatalog.java:225] Adding: DATABASE:test_print_header_92c38e62 version: 61782 size: 297 I0325 23:54:16.719349 28724 ImpaladCatalog.java:225] Adding: TABLE:test_print_header_92c38e62.test_print_header version: 61784 size: 80 I0325 23:54:16.719367 28724 ImpaladCatalog.java:225] Adding: CATALOG_SERVICE_ID version: 61785 size: 60 I0325 23:54:16.719414 28724 impala-server.cc:2016] Catalog topic update applied with version: 61785 new min catalog object version: 36076 I0325 23:54:16.808297 20612 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000003] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000003, node=2 I0325 23:54:16.899598 20612 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000003] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000003 #in-flight=0 status=OK I0325 23:54:16.899616 20607 query-state.cc:468] b94d5cd2894cbe68:68acfd6300000000] UpdateBackendExecState(): last report for b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.923373 20607 query-exec-mgr.cc:213] b94d5cd2894cbe68:68acfd6300000000] ReleaseQueryState(): deleted query_id=b94d5cd2894cbe68:68acfd6300000000 was (Author: wzhou): Checked the log messages. This issue is different from IMPALA-10339. For IMPALA-10339, the query is expected to fail with error "Cannot perform hash join at node with id". The log message on the executor shows that the executor got the expected error when executing one fragment instance, and the executor sent the status reports to coordinator. But due to a bug to handle race condition, executor sent status report with the fragment instance as "done" without error first, then sent last status report with error. The coordinator ignore the last status report since all the fragment instances were "done" and the backend was marked as "completed" before it received last status report. So the query was not cancelled by coordinator after executor encountered error. The race condition will be fixed by the patch for IMPALA-10259. The query for this issue is expected to finish successfully. We did not see any error for the query in the log files on coordinator and executor. One fragment instance was not completed on the executor which was resident on coordinator. Don't know why the execution of the fragment instance hang. There were some core-dumps for the build, but those core-dumps happened in the time range after this issue happened. Log message on coordinator/executor I0325 23:54:16.146649 16521 impala-server.cc:1324] b94d5cd2894cbe68:68acfd6300000000] Registered query query_id=b94d5cd2894cbe68:68acfd630000 0000 session_id=ae4773fa0d696261:93f32cf8a19f93bc I0325 23:54:16.146754 16521 Frontend.java:1600] b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr db: test_spilling_large_rows_6df9f183 I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms I0325 23:54:16.147109 16521 Frontend.java:1643] b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished. I0325 23:54:16.148593 20605 admission-controller.cc:1747] b94d5cd2894cbe68:68acfd6300000000] Trying to admit id=b94d5cd2894cbe68:68acfd630000 0000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=309.98 MB dedicated_coord_mem_estimate=134.99 MB max_request s=-1 max_queued=200 max_mem=-1.00 B I0325 23:54:16.148609 20605 admission-controller.cc:1755] b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, agg_num_queued=0, agg_ mem_reserved=1.00 GB, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0, topN_query_stats: queri es=[3e42ce0b92a72227:b4c127a000000000, 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], total_mem_consumed=0; pool_leve l_stats: num_running=3, min=0, max=0, pool_total_mem=0, average_per_query=0) I0325 23:54:16.148628 20605 admission-controller.cc:1215] b94d5cd2894cbe68:68acfd6300000000] Admitting query id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.148648 20605 impala-server.cc:2150] b94d5cd2894cbe68:68acfd6300000000] Registering query locations I0325 23:54:16.148655 20605 coordinator.cc:150] b94d5cd2894cbe68:68acfd6300000000] Exec() query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select group_concat(string_col), length(bigstr) from bigstrs2 group by bigstr I0325 23:54:16.148877 20605 coordinator.cc:474] b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149272 27701 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=3 I0325 23:54:16.149561 20605 coordinator.cc:533] b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.149663 20609 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000002] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1 I0325 23:54:16.149741 20611 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000000] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0325 23:54:16.149821 20613 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000004] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3 I0325 23:54:16.795727 20609 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000002] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2 I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 I0325 23:54:16.968387 20613 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000004] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4 I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query id=b94d5cd2894cbe68:68acfd6300000000 execution completed I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for backends to finish, 1 remaining. query_id=b94d5cd2894cbe68:68acfd6300000000 ... 0325 23:54:21.159895 20608 query-state.cc:738] b94d5cd2894cbe68:68acfd6300000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I0325 23:54:21.159915 20608 query-state.cc:974] b94d5cd2894cbe68:68acfd6300000000] Cancel: query_id=b94d5cd2894cbe68:68acfd6300000000 === Log message on executor I0325 23:54:16.149294 28040 control-service.cc:148] b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): query_id=b94d5cd2894cbe68:68acfd6300000000 coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 #instances=2 I0325 23:54:16.149729 20610 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000001] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000001 fragment_idx=2 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1 I0325 23:54:16.149785 20612 query-state.cc:948] b94d5cd2894cbe68:68acfd6300000003] Executing instance. instance_id=b94d5cd2894cbe68:68acfd6300000003 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=2 I0325 23:54:16.309967 20610 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000001] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000001 #in-flight=1 status=OK I0325 23:54:16.719277 28724 ImpaladCatalog.java:225] Adding: DATABASE:test_do_methods_1dc275eb version: 61783 size: 289 I0325 23:54:16.719313 28724 ImpaladCatalog.java:225] Adding: TABLE:test_do_methods_1dc275eb.test_do_methods version: 61785 size: 76 I0325 23:54:16.719333 28724 ImpaladCatalog.java:225] Adding: DATABASE:test_print_header_92c38e62 version: 61782 size: 297 I0325 23:54:16.719349 28724 ImpaladCatalog.java:225] Adding: TABLE:test_print_header_92c38e62.test_print_header version: 61784 size: 80 I0325 23:54:16.719367 28724 ImpaladCatalog.java:225] Adding: CATALOG_SERVICE_ID version: 61785 size: 60 I0325 23:54:16.719414 28724 impala-server.cc:2016] Catalog topic update applied with version: 61785 new min catalog object version: 36076 I0325 23:54:16.808297 20612 krpc-data-stream-mgr.cc:308] b94d5cd2894cbe68:68acfd6300000003] DeregisterRecvr(): fragment_instance_id=b94d5cd2894cbe68:68acfd6300000003, node=2 I0325 23:54:16.899598 20612 query-state.cc:957] b94d5cd2894cbe68:68acfd6300000003] Instance completed. instance_id=b94d5cd2894cbe68:68acfd6300000003 #in-flight=0 status=OK I0325 23:54:16.899616 20607 query-state.cc:468] b94d5cd2894cbe68:68acfd6300000000] UpdateBackendExecState(): last report for b94d5cd2894cbe68:68acfd6300000000 I0325 23:54:16.923373 20607 query-exec-mgr.cc:213] b94d5cd2894cbe68:68acfd6300000000] ReleaseQueryState(): deleted query_id=b94d5cd2894cbe68:68acfd6300000000 > TestSpillingDebugActionDimensions.test_spilling_large_rows hanging > ------------------------------------------------------------------ > > Key: IMPALA-10621 > URL: https://issues.apache.org/jira/browse/IMPALA-10621 > Project: IMPALA > Issue Type: Bug > Reporter: Quanlong Huang > Priority: Critical > > Saw a possible haning in test_spilling_large_rows in a downstream exhaustive > release build. The query didn't finish in 12mins: > {code:java} > query_test/test_spilling.py:83: in test_spilling_large_rows > self.run_test_case('QueryTest/spilling-large-rows', vector, > unique_database) > common/impala_test_suite.py:665: in run_test_case > result = exec_fn(query, user=test_section.get('USER', '').strip() or None) > common/impala_test_suite.py:603: in __exec_in_impala > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:923: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:205: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:205: in execute > result = self.fetch_results(query_string, handle) > beeswax/impala_beeswax.py:452: in fetch_results > exec_result = self.__fetch_results(query_handle, max_rows) > beeswax/impala_beeswax.py:463: in __fetch_results > results = self.__do_rpc(lambda: self.imp_service.fetch(handle, False, > fetch_rows)) > beeswax/impala_beeswax.py:520: in __do_rpc > raise ImpalaBeeswaxException(self.__build_error_message(b), b) > E ImpalaBeeswaxException: ImpalaBeeswaxException: > E INNER EXCEPTION: <class 'beeswaxd.ttypes.BeeswaxException'> > E MESSAGE: Query b94d5cd2894cbe68:68acfd6300000000 cancelled due to > unresponsive backend: 127.0.0.1:27000 has not sent a report in 753530ms (max > allowed lag is 720000ms) {code} > impalad.INFO indicates it's waiting for the coordinator fragment to finish: > {code:java} > I0325 23:54:16.146754 16521 Frontend.java:1600] > b94d5cd2894cbe68:68acfd6300000000] Analyzing query: select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr db: test_spilling_large_rows_6df9f183 > I0325 23:54:16.147089 16521 BaseAuthorizationChecker.java:110] > b94d5cd2894cbe68:68acfd6300000000] Authorization check took 0 ms > I0325 23:54:16.147109 16521 Frontend.java:1643] > b94d5cd2894cbe68:68acfd6300000000] Analysis and authorization finished. > I0325 23:54:16.148593 20605 admission-controller.cc:1747] > b94d5cd2894cbe68:68acfd6300000000] Trying to admit > id=b94d5cd2894cbe68:68acfd6300000000 in pool_name=default-pool > executor_group_name=default per_host_mem_estimate=309.98 MB > dedicated_coord_mem_estimate=134.99 MB max_requests=-1 max_queued=200 > max_mem=-1.00 B > I0325 23:54:16.148609 20605 admission-controller.cc:1755] > b94d5cd2894cbe68:68acfd6300000000] Stats: agg_num_running=0, > agg_num_queued=0, agg_mem_reserved=1.00 GB, local_host(local_mem_admitted=0, > num_admitted_running=0, num_queued=0, backend_mem_reserved=0, > topN_query_stats: queries=[3e42ce0b92a72227:b4c127a000000000, > 0f472bfdce173e65:1920e85d00000000, cc48be51cfecde3a:c93d09f600000000], > total_mem_consumed=0; pool_level_stats: num_running=3, min=0, max=0, > pool_total_mem=0, average_per_query=0) > I0325 23:54:16.148628 20605 admission-controller.cc:1215] > b94d5cd2894cbe68:68acfd6300000000] Admitting query > id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.148648 20605 impala-server.cc:2150] > b94d5cd2894cbe68:68acfd6300000000] Registering query locations > I0325 23:54:16.148655 20605 coordinator.cc:150] > b94d5cd2894cbe68:68acfd6300000000] Exec() > query_id=b94d5cd2894cbe68:68acfd6300000000 stmt=select > group_concat(string_col), length(bigstr) from bigstrs2 > group by bigstr > I0325 23:54:16.148877 20605 coordinator.cc:474] > b94d5cd2894cbe68:68acfd6300000000] starting execution on 2 backends for > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.149272 27701 control-service.cc:148] > b94d5cd2894cbe68:68acfd6300000000] ExecQueryFInstances(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > coord=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 > #instances=3 > I0325 23:54:16.149561 20605 coordinator.cc:533] > b94d5cd2894cbe68:68acfd6300000000] started execution on 2 backends for > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.149663 20609 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000002] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000002 fragment_idx=2 > per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=1 > I0325 23:54:16.149741 20611 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000000] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000000 fragment_idx=0 > per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 > I0325 23:54:16.149821 20613 query-state.cc:948] > b94d5cd2894cbe68:68acfd6300000004] Executing instance. > instance_id=b94d5cd2894cbe68:68acfd6300000004 fragment_idx=1 > per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=3 > I0325 23:54:16.795727 20609 query-state.cc:957] > b94d5cd2894cbe68:68acfd6300000002] Instance completed. > instance_id=b94d5cd2894cbe68:68acfd6300000002 #in-flight=2 status=OK > I0325 23:54:16.817301 20613 krpc-data-stream-mgr.cc:308] > b94d5cd2894cbe68:68acfd6300000004] DeregisterRecvr(): > fragment_instance_id=b94d5cd2894cbe68:68acfd6300000004, node=2 > I0325 23:54:16.854861 16521 impala-beeswax-server.cc:215] > get_results_metadata(): query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.908766 27701 coordinator.cc:1027] Backend completed: > host=impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27001 > remaining=2 query_id=b94d5cd2894cbe68:68acfd6300000000 > I0325 23:54:16.908782 27701 coordinator-backend-state.cc:371] > query_id=b94d5cd2894cbe68:68acfd6300000000: first in-progress backend: > impala-ec2-centos74-m5-4xlarge-ondemand-1971.vpc.cloudera.com:27000 > I0325 23:54:16.968387 20613 query-state.cc:957] > b94d5cd2894cbe68:68acfd6300000004] Instance completed. > instance_id=b94d5cd2894cbe68:68acfd6300000004 #in-flight=1 status=OK > I0325 23:54:17.001765 20611 krpc-data-stream-mgr.cc:308] > b94d5cd2894cbe68:68acfd6300000000] DeregisterRecvr(): > fragment_instance_id=b94d5cd2894cbe68:68acfd6300000000, node=4 > I0325 23:54:17.001814 16521 coordinator.cc:706] ExecState: query > id=b94d5cd2894cbe68:68acfd6300000000 execution completed > I0325 23:54:17.001832 16521 coordinator.cc:878] Coordinator waiting for > backends to finish, 1 remaining. > query_id=b94d5cd2894cbe68:68acfd6300000000{code} > After 12mins, it reaches the timeout limit: > {code:java} > I0326 00:06:54.688612 28694 status.cc:82] Query > b94d5cd2894cbe68:68acfd6300000000 cancelled due to unresponsive backend: > 127.0.0.1:27000 has not sent a report in 753530ms (max allowed lag is > 720000ms) > @ 0xc93371 > @ 0x1398edc > @ 0x138fd14 > @ 0x1596d91 > @ 0x159820a > @ 0x1e20f11 > @ 0x7f57d79ede24 > @ 0x7f57d443734c > I0326 00:06:54.688832 28683 impala-server.cc:1875] CancelFromThreadPool(): > cancelling query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.688863 28683 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: > Sending CancelQueryFInstances rpc > I0326 00:06:54.688975 27708 control-service.cc:219] CancelQueryFInstances(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.688988 27708 query-exec-mgr.cc:126] QueryState: > query_id=b94d5cd2894cbe68:68acfd6300000000 refcnt=4 > I0326 00:06:54.688993 27708 query-state.cc:974] Cancel: > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689061 28683 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: > Not cancelling because the backend is already done: > I0326 00:06:54.689071 28683 coordinator.cc:994] CancelBackends() > query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 1 backends > I0326 00:06:54.689091 16521 coordinator.cc:1370] Release admission control > resources for query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689462 16521 impala-server.cc:1408] UnregisterQuery(): > query_id=b94d5cd2894cbe68:68acfd6300000000 > I0326 00:06:54.689471 16521 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27000: > Not cancelling because the backend is already done: Cancelled > I0326 00:06:54.689476 16521 coordinator-backend-state.cc:974] > query_id=b94d5cd2894cbe68:68acfd6300000000 target backend=127.0.0.1:27001: > Not cancelling because the backend is already done: > I0326 00:06:54.689479 16521 coordinator.cc:994] CancelBackends() > query_id=b94d5cd2894cbe68:68acfd6300000000, tried to cancel 0 backends > {code} > This seems similar to IMPALA-10339. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org