[ https://issues.apache.org/jira/browse/IMPALA-10259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253728#comment-17253728 ]
Wenzhe Zhou edited comment on IMPALA-10259 at 12/22/20, 8:09 PM: ----------------------------------------------------------------- This issue happened recently in cdpd-master core ASAN build: [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-core-asan/497/] It happened when executing test case: tests/query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj. According to log messages, coordinator scheduled query fragment instances on three executors, which were listening on port 27000, 27001, 27002 respectively. The second executor, which was listening on port 27001, sent status report to coordinator with instance_exec_status as done to make BackendState::num_remaining_instances_ decrease to 0 so that coordinator change the corresponding BackendState as done. Later, the executor got error "Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0", and sent last status report to the coordinator. But the coordinator did not apply the exec status report and returned response with Status::OK since the BackendState was marked as done. This made the executor to hit DCHECK failure. Here are the log messages on three impalad: *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000* I1213 21:16:24.329084 16251 impala-server.cc:1212] 0b4f6b681d27f917:a71a3c0b00000000] Registered query query_id=0b4f6b681d27f917:a71a3c0b00000000 session_id=f6448bd69bd96332:83f4be5c320631b9 I1213 21:16:24.329277 16251 Frontend.java:1522] 0b4f6b681d27f917:a71a3c0b00000000] Analyzing query: select l_orderkey, l_partkey, l_suppkey, l_linenumber from lineitem where if(l_orderkey % 2 = 0, NULL, l_partkey) not in ( select p_partkey from part where p_retailprice != l_extendedprice * l_tax) order by 1,2,3,4 limit 5 db: tpch_parquet I1213 21:16:24.331490 16251 BaseAuthorizationChecker.java:110] 0b4f6b681d27f917:a71a3c0b00000000] Authorization check took 2 ms I1213 21:16:24.331568 16251 Frontend.java:1564] 0b4f6b681d27f917:a71a3c0b00000000] Analysis and authorization finished. I1213 21:16:24.336283 30797 admission-controller.cc:1630] 0b4f6b681d27f917:a71a3c0b00000000] Trying to admit id=0b4f6b681d27f917:a71a3c0b00000000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=520.59 MB dedicated_coord_mem_estimate=100.02 MB max_requests=-1 max_queued=200 max_mem=-1.00 B I1213 21:16:24.336367 30797 admission-controller.cc:1638] 0b4f6b681d27f917:a71a3c0b00000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.72 GB, local_host(local_mem_admitted=248.80 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=139.99 KB, topN_query_stats: queries=[3f4a3ecbde524632:bd18d56c00000000, 874bc9e6299fc558:6a5dfa3d00000000, 164f449fbd3220c6:3ee5b1c800000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=139.99 KB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=59.36 KB, pool_total_mem=139.99 KB, average_per_query=35.00 KB) I1213 21:16:24.336488 30797 admission-controller.cc:1185] 0b4f6b681d27f917:a71a3c0b00000000] Admitting query id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.336814 30797 impala-server.cc:2032] 0b4f6b681d27f917:a71a3c0b00000000] Registering query locations I1213 21:16:24.336901 30797 coordinator.cc:148] 0b4f6b681d27f917:a71a3c0b00000000] Exec() query_id=0b4f6b681d27f917:a71a3c0b00000000 stmt=select l_orderkey, l_partkey, l_suppkey, l_linenumber from lineitem where if(l_orderkey % 2 = 0, NULL, l_partkey) not in ( select p_partkey from part where p_retailprice != l_extendedprice * l_tax) order by 1,2,3,4 limit 5 I1213 21:16:24.338374 30797 coordinator.cc:472] 0b4f6b681d27f917:a71a3c0b00000000] starting execution on 3 backends for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.340186 23828 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=2 I1213 21:16:24.342927 30801 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000003] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000003 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=4 I1213 21:16:24.343096 30802 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000000] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=5 I1213 21:16:24.343809 30797 coordinator.cc:531] 0b4f6b681d27f917:a71a3c0b00000000] started execution on 3 backends for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.395705 30807 admission-controller.cc:1638] 1e4895c902a7df9d:ea55d7b700000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.78 GB, local_host(local_mem_admitted=250.31 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=29.65 MB, topN_query_stats: queries=[0b4f6b681d27f917:a71a3c0b00000000, 3f4a3ecbde524632:bd18d56c00000000, c946a8029dce14c0:a87471b400000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=29.65 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=29.56 MB, pool_total_mem=29.65 MB, average_per_query=7.41 MB) I1213 21:16:24.465718 30813 admission-controller.cc:1638] 8f49abb9e4f88fb1:898d815700000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.78 GB, local_host(local_mem_admitted=250.31 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=29.65 MB, topN_query_stats: queries=[0b4f6b681d27f917:a71a3c0b00000000, 3f4a3ecbde524632:bd18d56c00000000, c946a8029dce14c0:a87471b400000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=29.65 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=29.56 MB, pool_total_mem=29.65 MB, average_per_query=7.41 MB) {color:#de350b}_--- got status report from executor on port 27001, backend state was set as done, but status OK so coordinator did not call UpdateExecState to update exec state._{color} *I1213 21:16:44.394950 23824 coordinator.cc:959] Backend completed: host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001 remaining=3 query_id=0b4f6b681d27f917:a71a3c0b00000000* I1213 21:16:44.395043 23824 coordinator-backend-state.cc:362] query_id=0b4f6b681d27f917:a71a3c0b00000000: first in-progress backend: impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 I1213 21:16:44.653590 2706 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000003] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003, node=4 I1213 21:16:45.173908 30801 runtime-state.cc:196] 0b4f6b681d27f917:a71a3c0b00000003] Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:*27000* by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.174247 30801 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000003] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003 I1213 21:16:45.183601 30799 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.183585 30801 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000003] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000003 #in-flight=5 status=MEM_LIMIT_EXCEEDED: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.192860 23824 coordinator.cc:959] Backend completed: host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 remaining=2 query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.192946 23824 coordinator-backend-state.cc:362] query_id=0b4f6b681d27f917:a71a3c0b00000000: first in-progress backend: impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27002 I1213 21:16:45.193042 23824 coordinator.cc:712] ExecState: query id=0b4f6b681d27f917:a71a3c0b00000000 finstance=0b4f6b681d27f917:a71a3c0b00000003 on host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 (EXECUTING -> ERROR) status=Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.193156 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27000: Not cancelling because the backend is already done: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.193229 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27002: Sending CancelQueryFInstances rpc I1213 21:16:45.194350 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done: I1213 21:16:45.194430 23824 coordinator.cc:926] CancelBackends() query_id=0b4f6b681d27f917:a71a3c0b00000000, tried to cancel 1 backends I1213 21:16:45.194624 23824 coordinator.cc:1272] Release admission control resources for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197295 30799 query-state.cc:686] 0b4f6b681d27f917:a71a3c0b00000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I1213 21:16:45.197345 30799 query-state.cc:923] 0b4f6b681d27f917:a71a3c0b00000000] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197432 30799 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197516 30799 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003 I1213 21:16:45.198832 30802 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.198951 30802 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000000] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000, node=5 I1213 21:16:45.199146 30802 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000000] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000000 #in-flight=4 status=CANCELLED: Cancelled I1213 21:16:45.220476 16251 impala-beeswax-server.cc:260] close(): query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.220543 16251 impala-server.cc:1290] UnregisterQuery(): query_id=0b4f6b681d27f917:a71a3c0b00000000 *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001* I1213 21:16:24.341915 24136 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=2 I1213 21:16:24.345479 30805 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000001] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000001 fragment_idx=2 per_fragment_instance_idx=0 coord_state_idx=2 #in-flight=3 I1213 21:16:24.345636 30806 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000004] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000004 fragment_idx=1 per_fragment_instance_idx=2 coord_state_idx=2 #in-flight=4 I1213 21:16:24.631848 30805 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000001] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000001 I1213 21:16:24.632047 30805 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000001] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000001 #in-flight=3 status=OK ...... 1213 21:16:43.751832 2547 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000004] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000004, node=4 I1213 21:16:44.356305 30806 runtime-state.cc:196] 0b4f6b681d27f917:a71a3c0b00000004] Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:*27001* by fragment 0b4f6b681d27f917:a71a3c0b00000004 Memory left in process limit: 1.80 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=10.88 MB Total=100.88 MB Peak=100.88 MB Fragment 0b4f6b681d27f917:a71a3c0b00000001: Reservation=0 OtherMemory=0 Total=0 Peak=6.13 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=6.08 MB KrpcDataStreamSender (dst_id=4): Total=0 Peak=22.57 KB Fragment 0b4f6b681d27f917:a71a3c0b00000004: Reservation=90.00 MB OtherMemory=10.84 MB Total=100.84 MB Peak=100.84 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=4.63 MB Total=47.13 MB Peak=47.18 MB Queued Batches: Total=3.16 MB Peak=3.17 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB CodeGen: Total=852.00 B Peak=52.50 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:44.356761 30806 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000004] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000004 II1213 21:16:44.381556 30806 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000004] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000004 #in-flight=2 status=MEM_LIMIT_EXCEEDED: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001 by fragment 0b4f6b681d27f917:a71a3c0b00000004 Memory left in process limit: 1.80 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=10.88 MB Total=100.88 MB Peak=100.88 MB Fragment 0b4f6b681d27f917:a71a3c0b00000001: Reservation=0 OtherMemory=0 Total=0 Peak=6.13 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=6.08 MB KrpcDataStreamSender (dst_id=4): Total=0 Peak=22.57 KB Fragment 0b4f6b681d27f917:a71a3c0b00000004: Reservation=90.00 MB OtherMemory=10.84 MB Total=100.84 MB Peak=100.84 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=4.63 MB Total=47.13 MB Peak=47.18 MB Queued Batches: Total=3.16 MB Peak=3.17 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB CodeGen: Total=852.00 B Peak=52.50 KB CodeGen: Total=42.90 KB Peak=6.36 MB *I*1213 21:16:44.396633 30804 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000** *F1213 21:16:44.399322 30804 query-state.cc:877] 0b4f6b681d27f917:a71a3c0b00000000] Check failed: is_cancelled_.Load() == 1 (0 vs. 1)* *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27002* I1213 21:16:24.340821 24179 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=1 I1213 21:16:24.343477 30803 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000002] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000002 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1 I1213 21:16:33.402086 32272 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000002] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002, node=4 I1213 21:16:45.193837 24179 control-service.cc:213] CancelQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.193897 24179 query-exec-mgr.cc:126] QueryState: query_id=0b4f6b681d27f917:a71a3c0b00000000 refcnt=3 I1213 21:16:45.193934 24179 query-state.cc:923] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.193979 24179 krpc-data-stream-mgr.cc:337] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002 I1213 21:16:45.194149 30803 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000002] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002 I1213 21:16:45.199045 30800 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.199033 30803 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000002] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000002 #in-flight=0 status=CANCELLED: Cancelled I1213 21:16:45.202457 30800 query-state.cc:686] 0b4f6b681d27f917:a71a3c0b00000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I1213 21:16:45.202517 30800 query-state.cc:923] 0b4f6b681d27f917:a71a3c0b00000000] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.214329 30800 query-exec-mgr.cc:213] 0b4f6b681d27f917:a71a3c0b00000000] ReleaseQueryState(): deleted query_id=0b4f6b681d27f917:a71a3c0b00000000 was (Author: wzhou): This issue happened recently in cdpd-master core ASAN build: [https://master-02.jenkins.cloudera.com/job/impala-cdpd-master-core-asan/497/] It happened when executing test case: tests/query_test/test_spilling.py::TestSpillingDebugActionDimensions::test_spilling_naaj. According to log messages, coordinator scheduled query fragment instances on three executors, which were listening on port 27000, 27001, 27002 respectively. The second executor, which was listening on port 27001, sent status report to coordinator with instance_exec_status as done to make BackendState::num_remaining_instances_ decrease to 0 so that coordinator change the corresponding BackendState as done. Later, the executor got error "Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0", and sent last status report to the coordinator. But the coordinator did not apply the exec status report and returned response with Status::OK since the BackendState was marked as done. This made the executor to hit DCHECK failure. Here are the log messages on three impalad: *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000* I1213 21:16:24.329084 16251 impala-server.cc:1212] 0b4f6b681d27f917:a71a3c0b00000000] Registered query query_id=0b4f6b681d27f917:a71a3c0b00000000 session_id=f6448bd69bd96332:83f4be5c320631b9 I1213 21:16:24.329277 16251 Frontend.java:1522] 0b4f6b681d27f917:a71a3c0b00000000] Analyzing query: select l_orderkey, l_partkey, l_suppkey, l_linenumber from lineitem where if(l_orderkey % 2 = 0, NULL, l_partkey) not in ( select p_partkey from part where p_retailprice != l_extendedprice * l_tax) order by 1,2,3,4 limit 5 db: tpch_parquet I1213 21:16:24.331490 16251 BaseAuthorizationChecker.java:110] 0b4f6b681d27f917:a71a3c0b00000000] Authorization check took 2 ms I1213 21:16:24.331568 16251 Frontend.java:1564] 0b4f6b681d27f917:a71a3c0b00000000] Analysis and authorization finished. I1213 21:16:24.336283 30797 admission-controller.cc:1630] 0b4f6b681d27f917:a71a3c0b00000000] Trying to admit id=0b4f6b681d27f917:a71a3c0b00000000 in pool_name=default-pool executor_group_name=default per_host_mem_estimate=520.59 MB dedicated_coord_mem_estimate=100.02 MB max_requests=-1 max_queued=200 max_mem=-1.00 B I1213 21:16:24.336367 30797 admission-controller.cc:1638] 0b4f6b681d27f917:a71a3c0b00000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.72 GB, local_host(local_mem_admitted=248.80 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=139.99 KB, topN_query_stats: queries=[3f4a3ecbde524632:bd18d56c00000000, 874bc9e6299fc558:6a5dfa3d00000000, 164f449fbd3220c6:3ee5b1c800000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=139.99 KB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=59.36 KB, pool_total_mem=139.99 KB, average_per_query=35.00 KB) I1213 21:16:24.336488 30797 admission-controller.cc:1185] 0b4f6b681d27f917:a71a3c0b00000000] Admitting query id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.336814 30797 impala-server.cc:2032] 0b4f6b681d27f917:a71a3c0b00000000] Registering query locations I1213 21:16:24.336901 30797 coordinator.cc:148] 0b4f6b681d27f917:a71a3c0b00000000] Exec() query_id=0b4f6b681d27f917:a71a3c0b00000000 stmt=select l_orderkey, l_partkey, l_suppkey, l_linenumber from lineitem where if(l_orderkey % 2 = 0, NULL, l_partkey) not in ( select p_partkey from part where p_retailprice != l_extendedprice * l_tax) order by 1,2,3,4 limit 5 I1213 21:16:24.338374 30797 coordinator.cc:472] 0b4f6b681d27f917:a71a3c0b00000000] starting execution on 3 backends for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.340186 23828 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=2 I1213 21:16:24.342927 30801 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000003] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000003 fragment_idx=1 per_fragment_instance_idx=1 coord_state_idx=0 #in-flight=4 I1213 21:16:24.343096 30802 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000000] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000000 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=5 I1213 21:16:24.343809 30797 coordinator.cc:531] 0b4f6b681d27f917:a71a3c0b00000000] started execution on 3 backends for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:24.395705 30807 admission-controller.cc:1638] 1e4895c902a7df9d:ea55d7b700000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.78 GB, local_host(local_mem_admitted=250.31 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=29.65 MB, topN_query_stats: queries=[0b4f6b681d27f917:a71a3c0b00000000, 3f4a3ecbde524632:bd18d56c00000000, c946a8029dce14c0:a87471b400000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=29.65 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=29.56 MB, pool_total_mem=29.65 MB, average_per_query=7.41 MB) I1213 21:16:24.465718 30813 admission-controller.cc:1638] 8f49abb9e4f88fb1:898d815700000000] Stats: agg_num_running=3, agg_num_queued=0, agg_mem_reserved=9.78 GB, local_host(local_mem_admitted=250.31 GB, num_admitted_running=3, num_queued=0, backend_mem_reserved=29.65 MB, topN_query_stats: queries=[0b4f6b681d27f917:a71a3c0b00000000, 3f4a3ecbde524632:bd18d56c00000000, c946a8029dce14c0:a87471b400000000, b440cb236aa701ee:f6fa8b0700000000], total_mem_consumed=29.65 MB, fraction_of_pool_total_mem=1; pool_level_stats: num_running=4, min=11.29 KB, max=29.56 MB, pool_total_mem=29.65 MB, average_per_query=7.41 MB) {color:#de350b}_--- got status report from executor on port 27001, backend state was set as done, but status OK so coordinator did not call UpdateExecState to update exec state._{color} *I1213 21:16:44.394950 23824 coordinator.cc:959] Backend completed: host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001 remaining=3 query_id=0b4f6b681d27f917:a71a3c0b00000000* I1213 21:16:44.395043 23824 coordinator-backend-state.cc:362] query_id=0b4f6b681d27f917:a71a3c0b00000000: first in-progress backend: impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 I1213 21:16:44.653590 2706 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000003] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003, node=4 I1213 21:16:45.173908 30801 runtime-state.cc:196] 0b4f6b681d27f917:a71a3c0b00000003] Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:*27000* by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.174247 30801 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000003] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003 I1213 21:16:45.183601 30799 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.183585 30801 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000003] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000003 #in-flight=5 status=MEM_LIMIT_EXCEEDED: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.192860 23824 coordinator.cc:959] Backend completed: host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 remaining=2 query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.192946 23824 coordinator-backend-state.cc:362] query_id=0b4f6b681d27f917:a71a3c0b00000000: first in-progress backend: impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27002 I1213 21:16:45.193042 23824 coordinator.cc:712] ExecState: query id=0b4f6b681d27f917:a71a3c0b00000000 finstance=0b4f6b681d27f917:a71a3c0b00000003 on host=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 (EXECUTING -> ERROR) status=Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.193156 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27000: Not cancelling because the backend is already done: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 by fragment 0b4f6b681d27f917:a71a3c0b00000003 Memory left in process limit: 11.27 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=9.70 MB Total=99.70 MB Peak=99.79 MB Fragment 0b4f6b681d27f917:a71a3c0b00000003: Reservation=90.00 MB OtherMemory=9.65 MB Total=99.65 MB Peak=99.74 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=3.44 MB Total=45.94 MB Peak=47.18 MB Queued Batches: Total=1.97 MB Peak=3.16 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB Fragment 0b4f6b681d27f917:a71a3c0b00000000: Reservation=0 OtherMemory=8.00 KB Total=8.00 KB Peak=8.00 KB EXCHANGE_NODE (id=5): Reservation=0 OtherMemory=0 Total=0 Peak=0 KrpcDeferredRpcs: Total=0 Peak=0 PLAN_ROOT_SINK: Total=0 Peak=0 CodeGen: Total=1.44 KB Peak=423.00 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:45.193229 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27002: Sending CancelQueryFInstances rpc I1213 21:16:45.194350 23824 coordinator-backend-state.cc:921] query_id=0b4f6b681d27f917:a71a3c0b00000000 target backend=127.0.0.1:27001: Not cancelling because the backend is already done: I1213 21:16:45.194430 23824 coordinator.cc:926] CancelBackends() query_id=0b4f6b681d27f917:a71a3c0b00000000, tried to cancel 1 backends I1213 21:16:45.194624 23824 coordinator.cc:1272] Release admission control resources for query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197295 30799 query-state.cc:686] 0b4f6b681d27f917:a71a3c0b00000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I1213 21:16:45.197345 30799 query-state.cc:923] 0b4f6b681d27f917:a71a3c0b00000000] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197432 30799 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.197516 30799 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000003 I1213 21:16:45.198832 30802 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000000] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.198951 30802 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000000] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000000, node=5 I1213 21:16:45.199146 30802 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000000] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000000 #in-flight=4 status=CANCELLED: Cancelled I1213 21:16:45.220476 16251 impala-beeswax-server.cc:260] close(): query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.220543 16251 impala-server.cc:1290] UnregisterQuery(): query_id=0b4f6b681d27f917:a71a3c0b00000000 *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001* I1213 21:16:24.341915 24136 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=2 I1213 21:16:24.345479 30805 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000001] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000001 fragment_idx=2 per_fragment_instance_idx=0 coord_state_idx=2 #in-flight=3 I1213 21:16:24.345636 30806 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000004] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000004 fragment_idx=1 per_fragment_instance_idx=2 coord_state_idx=2 #in-flight=4 I1213 21:16:24.631848 30805 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000001] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000001 I1213 21:16:24.632047 30805 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000001] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000001 #in-flight=3 status=OK ...... 1213 21:16:43.751832 2547 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000004] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000004, node=4 I1213 21:16:44.356305 30806 runtime-state.cc:196] 0b4f6b681d27f917:a71a3c0b00000004] Error from query 0b4f6b681d27f917:a71a3c0b00000000: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:*27001* by fragment 0b4f6b681d27f917:a71a3c0b00000004 Memory left in process limit: 1.80 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=10.88 MB Total=100.88 MB Peak=100.88 MB Fragment 0b4f6b681d27f917:a71a3c0b00000001: Reservation=0 OtherMemory=0 Total=0 Peak=6.13 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=6.08 MB KrpcDataStreamSender (dst_id=4): Total=0 Peak=22.57 KB Fragment 0b4f6b681d27f917:a71a3c0b00000004: Reservation=90.00 MB OtherMemory=10.84 MB Total=100.84 MB Peak=100.84 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=4.63 MB Total=47.13 MB Peak=47.18 MB Queued Batches: Total=3.16 MB Peak=3.17 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB CodeGen: Total=852.00 B Peak=52.50 KB CodeGen: Total=42.90 KB Peak=6.36 MB I1213 21:16:44.356761 30806 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000004] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000004 II1213 21:16:44.381556 30806 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000004] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000004 #in-flight=2 status=MEM_LIMIT_EXCEEDED: Memory limit exceeded: Could not free memory by spilling to disk: scratch_limit is 0 Error occurred on backend impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27001 by fragment 0b4f6b681d27f917:a71a3c0b00000004 Memory left in process limit: 1.80 GB Query(0b4f6b681d27f917:a71a3c0b00000000): Reservation=90.00 MB ReservationLimit=90.00 MB OtherMemory=10.88 MB Total=100.88 MB Peak=100.88 MB Fragment 0b4f6b681d27f917:a71a3c0b00000001: Reservation=0 OtherMemory=0 Total=0 Peak=6.13 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=6.08 MB KrpcDataStreamSender (dst_id=4): Total=0 Peak=22.57 KB Fragment 0b4f6b681d27f917:a71a3c0b00000004: Reservation=90.00 MB OtherMemory=10.84 MB Total=100.84 MB Peak=100.84 MB SORT_NODE (id=3): Total=6.12 MB Peak=8.05 MB HASH_JOIN_NODE (id=2): Reservation=47.50 MB OtherMemory=83.25 KB Total=47.58 MB Peak=47.58 MB Exprs: Total=17.12 KB Peak=17.12 KB Hash Join Builder (join_node_id=2): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=2) Exprs: Total=13.12 KB Peak=13.12 KB HDFS_SCAN_NODE (id=0): Reservation=42.50 MB OtherMemory=4.63 MB Total=47.13 MB Peak=47.18 MB Queued Batches: Total=3.16 MB Peak=3.17 MB EXCHANGE_NODE (id=4): Reservation=0 OtherMemory=0 Total=0 Peak=7.63 MB KrpcDeferredRpcs: Total=0 Peak=0 KrpcDataStreamSender (dst_id=5): Total=4.41 KB Peak=4.41 KB CodeGen: Total=852.00 B Peak=52.50 KB CodeGen: Total=42.90 KB Peak=6.36 MB I*1213 21:16:44.396633 30804 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000* *F1213 21:16:44.399322 30804 query-state.cc:877] 0b4f6b681d27f917:a71a3c0b00000000] Check failed: is_cancelled_.Load() == 1 (0 vs. 1)* *Log messages on executor impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27002* I1213 21:16:24.340821 24179 control-service.cc:142] 0b4f6b681d27f917:a71a3c0b00000000] ExecQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 coord=impala-ec2-centos74-r5-4xlarge-ondemand-053c.vpc.cloudera.com:27000 #instances=1 I1213 21:16:24.343477 30803 query-state.cc:897] 0b4f6b681d27f917:a71a3c0b00000002] Executing instance. instance_id=0b4f6b681d27f917:a71a3c0b00000002 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=1 #in-flight=1 I1213 21:16:33.402086 32272 krpc-data-stream-mgr.cc:306] 0b4f6b681d27f917:a71a3c0b00000002] DeregisterRecvr(): fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002, node=4 I1213 21:16:45.193837 24179 control-service.cc:213] CancelQueryFInstances(): query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.193897 24179 query-exec-mgr.cc:126] QueryState: query_id=0b4f6b681d27f917:a71a3c0b00000000 refcnt=3 I1213 21:16:45.193934 24179 query-state.cc:923] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.193979 24179 krpc-data-stream-mgr.cc:337] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002 I1213 21:16:45.194149 30803 krpc-data-stream-mgr.cc:337] 0b4f6b681d27f917:a71a3c0b00000002] cancelling active streams for fragment_instance_id=0b4f6b681d27f917:a71a3c0b00000002 I1213 21:16:45.199045 30800 query-state.cc:464] 0b4f6b681d27f917:a71a3c0b00000000] UpdateBackendExecState(): last report for 0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.199033 30803 query-state.cc:906] 0b4f6b681d27f917:a71a3c0b00000002] Instance completed. instance_id=0b4f6b681d27f917:a71a3c0b00000002 #in-flight=0 status=CANCELLED: Cancelled I1213 21:16:45.202457 30800 query-state.cc:686] 0b4f6b681d27f917:a71a3c0b00000000] Cancelling fragment instances as directed by the coordinator. Returned status: Cancelled I1213 21:16:45.202517 30800 query-state.cc:923] 0b4f6b681d27f917:a71a3c0b00000000] Cancel: query_id=0b4f6b681d27f917:a71a3c0b00000000 I1213 21:16:45.214329 30800 query-exec-mgr.cc:213] 0b4f6b681d27f917:a71a3c0b00000000] ReleaseQueryState(): deleted query_id=0b4f6b681d27f917:a71a3c0b00000000 > Hit DCHECK in TestImpalaShell.test_completed_query_errors_2 > ----------------------------------------------------------- > > Key: IMPALA-10259 > URL: https://issues.apache.org/jira/browse/IMPALA-10259 > Project: IMPALA > Issue Type: Bug > Components: Backend > Reporter: Quanlong Huang > Assignee: Wenzhe Zhou > Priority: Blocker > Labels: broken-build, crash > > TestImpalaShell.test_completed_query_errors_2 hits a DCHECK in a core ASAN > build: > {code:java} > F1016 17:08:54.728466 19955 query-state.cc:877] > 924f4ce603ac07bb:a08656e300000000] Check failed: is_cancelled_.Load() == 1 (0 > vs. 1) {code} > The test is: > {code:java} > shell.test_shell_commandline.TestImpalaShell.test_completed_query_errors_2[table_format_and_file_extension: > ('textfile', '.txt') | protocol: hs2] {code} > The query is: > {code:java} > I1016 17:08:49.026532 19947 Frontend.java:1522] > 924f4ce603ac07bb:a08656e300000000] Analyzing query: select id, cnt from > functional_parquet.bad_column_metadata t, (select 1 cnt) u db: default {code} > Query options: > {code:java} > I1016 17:08:49.020670 19947 impala-hs2-server.cc:269] > TClientRequest.queryOptions: TQueryOptions { > 01: abort_on_error (bool) = true, > 02: max_errors (i32) = 100, > 03: disable_codegen (bool) = false, > 04: batch_size (i32) = 0, > 05: num_nodes (i32) = 0, > 06: max_scan_range_length (i64) = 0, > 07: num_scanner_threads (i32) = 0, > 11: debug_action (string) = "", > 12: mem_limit (i64) = 0, > 15: hbase_caching (i32) = 0, > 16: hbase_cache_blocks (bool) = false, > 17: parquet_file_size (i64) = 0, > 18: explain_level (i32) = 1, > 19: sync_ddl (bool) = false, > 24: disable_outermost_topn (bool) = false, > 26: query_timeout_s (i32) = 0, > 28: appx_count_distinct (bool) = false, > 29: disable_unsafe_spills (bool) = false, > 31: exec_single_node_rows_threshold (i32) = 100, > 32: optimize_partition_key_scans (bool) = false, > 33: replica_preference (i32) = 0, > 34: schedule_random_replica (bool) = false, > 36: disable_streaming_preaggregations (bool) = false, > 37: runtime_filter_mode (i32) = 2, > 38: runtime_bloom_filter_size (i32) = 1048576, > 39: runtime_filter_wait_time_ms (i32) = 0, > 40: disable_row_runtime_filtering (bool) = false, > 41: max_num_runtime_filters (i32) = 10, > 42: parquet_annotate_strings_utf8 (bool) = false, > 43: parquet_fallback_schema_resolution (i32) = 0, > 45: s3_skip_insert_staging (bool) = true, > 46: runtime_filter_min_size (i32) = 1048576, > 47: runtime_filter_max_size (i32) = 16777216, > 48: prefetch_mode (i32) = 1, > 49: strict_mode (bool) = false, > 50: scratch_limit (i64) = -1, > 51: enable_expr_rewrites (bool) = true, > 52: decimal_v2 (bool) = true, > 53: parquet_dictionary_filtering (bool) = true, > 54: parquet_array_resolution (i32) = 0, > 55: parquet_read_statistics (bool) = true, > 56: default_join_distribution_mode (i32) = 0, > 57: disable_codegen_rows_threshold (i32) = 50000, > 58: default_spillable_buffer_size (i64) = 2097152, > 59: min_spillable_buffer_size (i64) = 65536, > 60: max_row_size (i64) = 524288, > 61: idle_session_timeout (i32) = 0, > 62: compute_stats_min_sample_size (i64) = 1073741824, > 63: exec_time_limit_s (i32) = 0, > 64: shuffle_distinct_exprs (bool) = true, > 65: max_mem_estimate_for_admission (i64) = 0, > 66: thread_reservation_limit (i32) = 3000, > 67: thread_reservation_aggregate_limit (i32) = 0, > 68: kudu_read_mode (i32) = 0, > 69: allow_erasure_coded_files (bool) = false, > 70: timezone (string) = "", > 71: scan_bytes_limit (i64) = 0, > 72: cpu_limit_s (i64) = 0, > 73: topn_bytes_limit (i64) = 536870912, > 74: client_identifier (string) = "Impala Shell v4.0.0-SNAPSHOT (1e30eec) > built on Fri Oct 16 13:26:18 PDT 2020", > 75: resource_trace_ratio (double) = 0, > 76: num_remote_executor_candidates (i32) = 3, > 77: num_rows_produced_limit (i64) = 0, > 78: planner_testcase_mode (bool) = false, > 79: default_file_format (i32) = 0, > 80: parquet_timestamp_type (i32) = 0, > 81: parquet_read_page_index (bool) = true, > 82: parquet_write_page_index (bool) = true, > 84: disable_hdfs_num_rows_estimate (bool) = false, > 86: spool_query_results (bool) = false, > 87: default_transactional_type (i32) = 0, > 88: statement_expression_limit (i32) = 250000, > 89: max_statement_length_bytes (i32) = 16777216, > 90: disable_data_cache (bool) = false, > 91: max_result_spooling_mem (i64) = 104857600, > 92: max_spilled_result_spooling_mem (i64) = 1073741824, > 93: disable_hbase_num_rows_estimate (bool) = false, > 94: fetch_rows_timeout_ms (i64) = 10000, > 95: now_string (string) = "", > 96: parquet_object_store_split_size (i64) = 268435456, > 97: mem_limit_executors (i64) = 0, > 98: broadcast_bytes_limit (i64) = 34359738368, > 99: preagg_bytes_limit (i64) = -1, > 100: enable_cnf_rewrites (bool) = true, > 101: max_cnf_exprs (i32) = 0, > 102: kudu_snapshot_read_timestamp_micros (i64) = 0, > 103: retry_failed_queries (bool) = false, > 104: enabled_runtime_filter_types (i32) = 3, > 105: async_codegen (bool) = false, > 106: enable_distinct_semi_join_optimization (bool) = true, > 107: sort_run_bytes_limit (i64) = -1, > 108: max_fs_writers (i32) = 0, > 109: refresh_updated_hms_partitions (bool) = false, > 110: spool_all_results_for_retries (bool) = true, > 112: use_local_tz_for_unix_timestamp_conversions (bool) = false, > 113: convert_legacy_hive_parquet_utc_timestamps (bool) = false, > 114: enable_outer_join_to_inner_transformation (bool) = false, > } {code} > Stacktrace: > {code:java} > Thread 392 (crashed) > 0 libc-2.17.so + 0x351f7 > 1 impalad!google::LogMessage::Flush() + 0x1eb > 2 impalad!google::LogMessageFatal::~LogMessageFatal() + 0x9 > 3 impalad!impala::QueryState::MonitorFInstances() [query-state.cc : 877 + > 0x45] > 4 impalad!impala::QueryExecMgr::ExecuteQueryHelper(impala::QueryState*) > [query-exec-mgr.cc : 162 + 0x8] > 5 impalad!boost::_bi::bind_t<void, boost::_mfi::mf1<void, > impala::QueryExecMgr, impala::QueryState*>, > boost::_bi::list2<boost::_bi::value<impala::QueryExecMgr*>, > boost::_bi::value<impala::QueryState*> > >::operator()() [bind.hpp : 1222 + > 0xe] > 6 impalad!boost::function0<void>::operator()() const [function_template.hpp > : 770 + 0x5] > 7 impalad!impala::Thread::SuperviseThread(std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*) [thread.cc : 360 + 0x9] > 8 impalad!void > boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > >, > boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > >::operator()<void (*)(std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, > std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > const&, boost::function<void ()>, impala::ThreadDebugInfo const*, > impala::Promise<long, (impala::PromiseMode)0>*), > boost::_bi::list0>(boost::_bi::type<void>, void > (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, boost::function<void > ()>, impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>*), boost::_bi::list0&, int) [bind.hpp : 531 + 0x12] > 9 impalad!boost::_bi::bind_t<void, void > (*)(std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > const&, std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const&, boost::function<void > ()>, impala::ThreadDebugInfo const*, impala::Promise<long, > (impala::PromiseMode)0>*), > boost::_bi::list5<boost::_bi::value<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > >, > boost::_bi::value<std::__cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > >, boost::_bi::value<boost::function<void ()> >, > boost::_bi::value<impala::ThreadDebugInfo*>, > boost::_bi::value<impala::Promise<long, (impala::PromiseMode)0>*> > > >::operator()() [bind.hpp : 1222 + 0xe] > 10 impalad!thread_proxy + 0x72 > 11 libpthread-2.17.so + 0x7e25 > 12 libc-2.17.so + 0xf834d {code} > This looks like IMPALA-10050 but not sure if it's the same cause. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org