[ https://issues.apache.org/jira/browse/IMPALA-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17080076#comment-17080076 ]
Sahil Takiar commented on IMPALA-5746: -------------------------------------- [~twmarshall] and I discussed this a bit on the review for the test-case [https://gerrit.cloudera.org/#/c/15666/] but moving the conversation here. So, originally I though IMPALA-2990 fixes this, but unfortunately it looks like the situation is more complicated. There is at least one situation where killing a coordinator does not cause executors to kill any orphaned fragments. The fragments only get killed after the report status RPC fails for 10 minutes. I ran the following query: {code:java} select * from tpch.lineitem t1, tpch.lineitem t2, tpch.lineitem t3 where t1.l_orderkey = t2.l_orderkey and t1.l_orderkey = t3.l_orderkey and t3.l_orderkey = t2.l_orderkey order by t1.l_orderkey, t2.l_orderkey, t3.l_orderkey limit 100; {code} On a cluster started via {{./bin/start-impala-cluster.py}} (oddly it looks like if I use a slightly different cluster topology, things are bit different - so perhaps there is a race condition somewhere). Waited for the query to run for a bit (progress bar said it was bout 50% complete). Killed the coordinator, waited for a bit, and then looked at the /memz page for one of the executors, which showed this: {code:java} Process: Limit=7.28 GB Total=1.27 GB Peak=1.44 GB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 Buffer Pool: Unused Reservation: Total=-18.30 MB Control Service Queue: Limit=50.00 MB Total=0 Peak=15.24 KB Data Stream Service Queue: Limit=372.92 MB Total=0 Peak=2.01 MB Data Stream Manager Early RPCs: Total=0 Peak=0 TCMalloc Overhead: Total=30.94 MB RequestPool=default-pool: Total=1.12 GB Peak=1.18 GB Query(3e42b7e4a9f9b58b:72759e5d00000000): Reservation=1.10 GB ReservationLimit=5.83 GB OtherMemory=17.17 MB Total=1.12 GB Peak=1.18 GB Runtime Filter Bank: Reservation=10.00 MB ReservationLimit=10.00 MB OtherMemory=0 Total=10.00 MB Peak=10.00 MB Fragment 3e42b7e4a9f9b58b:72759e5d00000008: Reservation=0 OtherMemory=0 Total=0 Peak=65.57 MB HDFS_SCAN_NODE (id=2): Reservation=0 OtherMemory=0 Total=0 Peak=65.42 MB KrpcDataStreamSender (dst_id=8): Total=0 Peak=150.41 KB Fragment 3e42b7e4a9f9b58b:72759e5d00000005: Reservation=0 OtherMemory=0 Total=0 Peak=65.57 MB HDFS_SCAN_NODE (id=1): Reservation=0 OtherMemory=0 Total=0 Peak=65.42 MB KrpcDataStreamSender (dst_id=7): Total=0 Peak=150.41 KB Fragment 3e42b7e4a9f9b58b:72759e5d00000002: Reservation=0 OtherMemory=0 Total=0 Peak=65.91 MB HDFS_SCAN_NODE (id=0): Reservation=0 OtherMemory=0 Total=0 Peak=65.91 MB KrpcDataStreamSender (dst_id=6): Total=0 Peak=150.41 KB Fragment 3e42b7e4a9f9b58b:72759e5d0000000b: Reservation=1.09 GB OtherMemory=17.06 MB Total=1.11 GB Peak=1.11 GB SORT_NODE (id=5): Total=148.00 KB Peak=148.00 KB HASH_JOIN_NODE (id=4): Reservation=558.00 MB OtherMemory=42.25 KB Total=558.04 MB Peak=558.06 MB Exprs: Total=13.12 KB Peak=13.12 KB Hash Join Builder (join_node_id=4): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=4) Exprs: Total=13.12 KB Peak=13.12 KB HASH_JOIN_NODE (id=3): Reservation=558.00 MB OtherMemory=34.25 KB Total=558.03 MB Peak=558.05 MB Exprs: Total=13.12 KB Peak=13.12 KB Hash Join Builder (join_node_id=3): Total=13.12 KB Peak=21.12 KB Hash Join Builder (join_node_id=3) Exprs: Total=13.12 KB Peak=13.12 KB EXCHANGE_NODE (id=6): Reservation=16.84 MB OtherMemory=0 Total=16.84 MB Peak=16.85 MB KrpcDeferredRpcs: Total=0 Peak=37.36 KB EXCHANGE_NODE (id=7): Reservation=0 OtherMemory=0 Total=0 Peak=2.54 MB KrpcDeferredRpcs: Total=0 Peak=0 EXCHANGE_NODE (id=8): Reservation=0 OtherMemory=0 Total=0 Peak=16.69 MB KrpcDeferredRpcs: Total=0 Peak=37.56 KB KrpcDataStreamSender (dst_id=9): Total=272.00 B Peak=272.00 B CodeGen: Total=12.64 KB Peak=696.50 KB CodeGen: Total=12.64 KB Peak=696.50 KB CodeGen: Total=12.64 KB Peak=696.50 KB CodeGen: Total=75.92 KB Peak=5.00 MB Untracked Memory: Total=147.84 MB {code} The logs of the Impala executor show: {code:java} I0409 14:44:01.852174 28903 kudu-status-util.h:55] 3e42b7e4a9f9b58b:72759e5d00000000] ReportExecStatus() RPC failed: Network error: Client connection negotiation failed: client connection to 127.0.0.1:27000: connect: Connection refused (error 111) W0409 14:44:01.852253 28903 query-state.cc:498] 3e42b7e4a9f9b58b:72759e5d00000000] Failed to send ReportExecStatus() RPC for query 3e42b7e4a9f9b58b:72759e5d00000000. Consecutive failed reports = 9. Time spent retrying = 220034ms. I0409 14:44:04.862691 8833 krpc-data-stream-mgr.cc:422] Reduced stream ID cache from 3 items, to 2, eviction took: 0 I0409 14:44:51.856971 8752 connection.cc:445] Transfer of RPC call RPC call impala.ControlService.ReportExecStatus -> {remote=127.0.0.1:27000 (stakiar-desktop), user_credentials={real_user=impala}, network_plane=control} aborted: Runtime error: RPC transfer destroyed b efore it finished sending {code} In a loop until the 10 minute timeout is hit, and then fragment cancels itself: {code:java} E0409 14:51:36.889616 28903 query-state.cc:523] 3e42b7e4a9f9b58b:72759e5d00000000] Cancelling fragment instances due to failure to reach the coordinator. (ReportExecStatus() RPC failed: Network error: Client connection negotiation failed: client connection to 127.0.0.1:27000: connect: Connection refused (error 111) ). I0409 14:51:36.889686 28903 query-state.cc:751] 3e42b7e4a9f9b58b:72759e5d00000000] Cancel: query_id=3e42b7e4a9f9b58b:72759e5d00000000 I0409 14:51:36.889746 28903 krpc-data-stream-mgr.cc:337] 3e42b7e4a9f9b58b:72759e5d00000000] cancelling active streams for fragment_instance_id=3e42b7e4a9f9b58b:72759e5d0000000b {code} The /memz page does not change until the 10 minute timeout is hit. After that, the /memz page shows no running queries. So I guess the situation is a lot better than before (10 minutes vs. 30+ minutes, and the timeout is configurable), but 10 minutes is probably still too long. I'm not positive I understand exactly what is going on here. It seems that some fragments do release their resources, and others don't. Maybe this just has to do with timing, but {{impala-server.num-fragments-in-flight}} is 1, which means there is only fragment still running, and the value doesn't go to 0 until the 10 minute timeout is hit. > Remote fragments continue to hold onto memory after stopping the coordinator > daemon > ----------------------------------------------------------------------------------- > > Key: IMPALA-5746 > URL: https://issues.apache.org/jira/browse/IMPALA-5746 > Project: IMPALA > Issue Type: Bug > Components: Distributed Exec > Affects Versions: Impala 2.10.0 > Reporter: Mostafa Mokhtar > Assignee: Sahil Takiar > Priority: Critical > Attachments: remote_fragments_holding_memory.txt > > > Repro > # Start running queries > # Kill the coordinator node > # On the running Impalad check the memz tab, remote fragments continue to run > and hold on to resources > Remote fragments held on to memory +30 minutes after stopping the coordinator > service. > Attached thread dump from an Impalad running remote fragments . > Snapshot of memz tab 30 minutes after killing the coordinator > {code} > Process: Limit=201.73 GB Total=5.32 GB Peak=179.36 GB > Free Disk IO Buffers: Total=1.87 GB Peak=1.87 GB > RequestPool=root.default: Total=1.35 GB Peak=178.51 GB > Query(f64169d4bb3c901c:3a21d8ae00000000): Total=2.64 MB Peak=104.73 MB > Fragment f64169d4bb3c901c:3a21d8ae00000051: Total=2.64 MB Peak=2.67 MB > AGGREGATION_NODE (id=15): Total=2.54 MB Peak=2.57 MB > Exprs: Total=30.12 KB Peak=30.12 KB > EXCHANGE_NODE (id=14): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=12.29 KB > DataStreamSender (dst_id=17): Total=85.31 KB Peak=85.31 KB > CodeGen: Total=1.53 KB Peak=374.50 KB > Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.54 MB > Query(2a4f12b3b4b1dc8c:db7e8cf200000000): Total=258.29 MB Peak=412.98 MB > Fragment 2a4f12b3b4b1dc8c:db7e8cf20000008c: Total=2.29 MB Peak=2.29 MB > SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB > Exprs: Total=25.12 KB Peak=25.12 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB > CodeGen: Total=4.17 KB Peak=1.05 MB > Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.66 MB > Query(68421d2a5dea0775:83f5d97200000000): Total=282.77 MB Peak=443.53 MB > Fragment 68421d2a5dea0775:83f5d9720000004a: Total=26.77 MB Peak=26.92 MB > SORT_NODE (id=8): Total=8.00 KB Peak=8.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > ANALYTIC_EVAL_NODE (id=7): Total=4.00 KB Peak=4.00 KB > Exprs: Total=4.00 KB Peak=4.00 KB > SORT_NODE (id=6): Total=24.00 MB Peak=24.00 MB > AGGREGATION_NODE (id=12): Total=2.72 MB Peak=2.83 MB > Exprs: Total=85.12 KB Peak=85.12 KB > EXCHANGE_NODE (id=11): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=84.80 KB > DataStreamSender (dst_id=13): Total=1.27 KB Peak=1.27 KB > CodeGen: Total=24.80 KB Peak=4.13 MB > Block Manager: Limit=161.39 GB Total=280.50 MB Peak=286.52 MB > Query(e94c89fa89a74d27:82812bf900000000): Total=258.29 MB Peak=436.85 MB > Fragment e94c89fa89a74d27:82812bf90000008e: Total=2.29 MB Peak=2.29 MB > SORT_NODE (id=11): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=20): Total=2.27 MB Peak=2.27 MB > Exprs: Total=25.12 KB Peak=25.12 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=21): Total=3.88 KB Peak=3.88 KB > CodeGen: Total=4.17 KB Peak=1.05 MB > Block Manager: Limit=161.39 GB Total=256.25 MB Peak=321.62 MB > Query(4e43dad3bdc935d8:938b8b7e00000000): Total=2.65 MB Peak=105.60 MB > Fragment 4e43dad3bdc935d8:938b8b7e00000052: Total=2.65 MB Peak=2.68 MB > AGGREGATION_NODE (id=15): Total=2.55 MB Peak=2.57 MB > Exprs: Total=30.12 KB Peak=30.12 KB > EXCHANGE_NODE (id=14): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=13.68 KB > DataStreamSender (dst_id=17): Total=91.41 KB Peak=91.41 KB > CodeGen: Total=1.53 KB Peak=374.50 KB > Block Manager: Limit=161.39 GB Total=512.00 KB Peak=1.30 MB > Query(b34bdd65f1ed017e:5a0291bd00000000): Total=2.37 MB Peak=106.56 MB > Fragment b34bdd65f1ed017e:5a0291bd0000004b: Total=2.37 MB Peak=2.37 MB > SORT_NODE (id=6): Total=4.00 KB Peak=4.00 KB > AGGREGATION_NODE (id=10): Total=2.35 MB Peak=2.35 MB > Exprs: Total=34.12 KB Peak=34.12 KB > EXCHANGE_NODE (id=9): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=4.23 KB > DataStreamSender (dst_id=11): Total=3.45 KB Peak=3.45 KB > CodeGen: Total=4.51 KB Peak=1.11 MB > Block Manager: Limit=161.39 GB Total=256.00 KB Peak=912.81 KB > Query(b74ba58d53b6c45f:3e8228600000000): Total=190.41 MB Peak=425.09 MB > Fragment b74ba58d53b6c45f:3e822860000009f: Total=67.90 KB Peak=2.34 MB > SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB > HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB > Peak=9.12 KB > HDFS_SCAN_NODE (id=11): Total=0 Peak=0 > EXCHANGE_NODE (id=24): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB > CodeGen: Total=12.59 KB Peak=2.29 MB > Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB > Fragment b74ba58d53b6c45f:3e8228600000085: Total=2.32 MB Peak=2.32 MB > AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB > Exprs: Total=44.12 KB Peak=44.12 KB > EXCHANGE_NODE (id=20): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=23): Total=22.09 KB Peak=22.09 KB > CodeGen: Total=2.37 KB Peak=546.00 KB > Fragment b74ba58d53b6c45f:3e8228600000060: Total=188.02 MB Peak=188.34 > MB > Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB > AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB > Exprs: Total=44.12 KB Peak=44.12 KB > HASH_JOIN_NODE (id=8): Total=1.13 MB Peak=1.15 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB > Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 > KB > HASH_JOIN_NODE (id=7): Total=169.14 MB Peak=169.14 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB > Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 > KB > EXCHANGE_NODE (id=17): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=587.50 KB > EXCHANGE_NODE (id=18): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=316.11 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=4.70 KB > DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB > CodeGen: Total=16.80 KB Peak=2.83 MB > Query(cb4c14997ad6add2:c8f120100000000): Total=190.36 MB Peak=443.00 MB > Fragment cb4c14997ad6add2:c8f1201000000a4: Total=67.90 KB Peak=2.34 MB > SORT_NODE (id=14): Total=4.00 KB Peak=4.00 KB > HASH_JOIN_NODE (id=13): Total=42.25 KB Peak=42.25 KB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=13): Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=13) Exprs: Total=9.12 KB > Peak=9.12 KB > HDFS_SCAN_NODE (id=11): Total=0 Peak=0 > EXCHANGE_NODE (id=24): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=25): Total=1.05 KB Peak=1.05 KB > CodeGen: Total=12.59 KB Peak=2.29 MB > Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB > Fragment cb4c14997ad6add2:c8f120100000088: Total=2.33 MB Peak=2.33 MB > AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB > Exprs: Total=44.12 KB Peak=44.12 KB > EXCHANGE_NODE (id=20): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=23): Total=26.83 KB Peak=26.83 KB > CodeGen: Total=2.37 KB Peak=546.00 KB > Fragment cb4c14997ad6add2:c8f120100000063: Total=187.97 MB Peak=188.08 > MB > Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB > AGGREGATION_NODE (id=9): Total=1.67 MB Peak=1.67 MB > Exprs: Total=44.12 KB Peak=44.12 KB > HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB > Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 > KB > HASH_JOIN_NODE (id=7): Total=169.07 MB Peak=169.14 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB > Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 > KB > EXCHANGE_NODE (id=17): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=314.15 KB > EXCHANGE_NODE (id=18): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=861.18 KB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=4.70 KB > DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB > CodeGen: Total=16.80 KB Peak=2.83 MB > Query(f04a57ce97102dd7:c2a1081700000000): Total=190.31 MB Peak=419.11 MB > Fragment f04a57ce97102dd7:c2a1081700000085: Total=2.33 MB Peak=2.33 MB > AGGREGATION_NODE (id=21): Total=2.29 MB Peak=2.29 MB > Exprs: Total=44.12 KB Peak=44.12 KB > EXCHANGE_NODE (id=20): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=0 > DataStreamSender (dst_id=23): Total=23.67 KB Peak=23.67 KB > CodeGen: Total=2.37 KB Peak=546.00 KB > Block Manager: Limit=161.39 GB Total=160.75 MB Peak=160.83 MB > Fragment f04a57ce97102dd7:c2a1081700000060: Total=187.99 MB Peak=188.07 > MB > Runtime Filter Bank: Total=16.00 MB Peak=16.00 MB > AGGREGATION_NODE (id=9): Total=1.68 MB Peak=1.68 MB > Exprs: Total=44.12 KB Peak=44.12 KB > HASH_JOIN_NODE (id=8): Total=1.14 MB Peak=1.15 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=8): Total=1.01 MB Peak=1.02 MB > Hash Join Builder (join_node_id=8) Exprs: Total=9.12 KB Peak=9.12 > KB > HASH_JOIN_NODE (id=7): Total=169.09 MB Peak=169.14 MB > Exprs: Total=9.12 KB Peak=9.12 KB > Hash Join Builder (join_node_id=7): Total=169.01 MB Peak=169.02 MB > Hash Join Builder (join_node_id=7) Exprs: Total=9.12 KB Peak=9.12 > KB > EXCHANGE_NODE (id=17): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=156.71 KB > EXCHANGE_NODE (id=18): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=1.32 MB > EXCHANGE_NODE (id=19): Total=0 Peak=0 > DataStreamRecvr: Total=0 Peak=4.70 KB > DataStreamSender (dst_id=20): Total=58.39 KB Peak=58.39 KB > CodeGen: Total=16.80 KB Peak=2.83 MB > Untracked Memory: Total=2.10 GB > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org