[jira] [Created] (IMPALA-6188) test_top_n_reclaim is flaky

Tim Armstrong (JIRA) Wed, 15 Nov 2017 15:15:06 -0800

Tim Armstrong created IMPALA-6188:
-------------------------------------

             Summary: test_top_n_reclaim is flaky
                 Key: IMPALA-6188
                 URL: https://issues.apache.org/jira/browse/IMPALA-6188
             Project: IMPALA
          Issue Type: Bug
          Components: Infrastructure
    Affects Versions: Impala 2.11.0
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong
            Priority: Critical



[~jbapple] reported a test failure here: 
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/607/

{noformat}

03:16:32  TestTopNReclaimQuery.test_top_n_reclaim[exec_option: {'batch_size': 
0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': 
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | 
table_format: text/none] 
03:16:32 [gw10] linux2 -- Python 2.7.12 
/home/ubuntu/Impala/bin/../infra/python/env/bin/python
03:16:32 query_test/test_queries.py:246: in test_top_n_reclaim
03:16:32     result = self.execute_query(self.QUERY, exec_options)
03:16:32 common/impala_test_suite.py:512: in wrapper
03:16:32     return function(*args, **kwargs)
03:16:32 common/impala_test_suite.py:537: in execute_query
03:16:32     return self.__execute_query(self.client, query, query_options)
03:16:32 common/impala_test_suite.py:604: in __execute_query
03:16:32     return impalad_client.execute(query, user=user)
03:16:32 common/impala_connection.py:160: in execute
03:16:32     return self.__beeswax_client.execute(sql_stmt, user=user)
03:16:32 beeswax/impala_beeswax.py:173: in execute
03:16:32     handle = self.__execute_query(query_string.strip(), user=user)
03:16:32 beeswax/impala_beeswax.py:341: in __execute_query
03:16:32     self.wait_for_completion(handle)
03:16:32 beeswax/impala_beeswax.py:361: in wait_for_completion
03:16:32     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
03:16:32 E   ImpalaBeeswaxException: ImpalaBeeswaxException:
03:16:32 E    Query aborted:Memory limit exceeded
03:16:32 ---------------------------- Captured stderr setup 
-----------------------------
03:16:32 -- connecting to: localhost:21000
03:16:32 ----------------------------- Captured stderr call 
-----------------------------
03:16:32 SET batch_size=0;
03:16:32 SET num_nodes=0;
03:16:32 SET disable_codegen_rows_threshold=0;
03:16:32 SET disable_codegen=False;
03:16:32 SET abort_on_error=1;
03:16:32 SET mem_limit=50m;
03:16:32 SET exec_single_node_rows_threshold=0;
03:16:32 -- executing against localhost:21000
03:16:32 select * from tpch.lineitem order by l_orderkey desc limit 10;;
{noformat}

I was able to reproduce something similar locally by running in a loop:
{noformat}
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
E    Query aborted:Memory limit exceeded: Failed to allocate memory in 
TopNNode::ReclaimTuplePool.
E   SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
E   Error occurred on backend tarmstrong-box:22000 by fragment 
c641289b5b0652a4:9ce6652000000001
E   Memory left in process limit: 8.15 GB
E   Memory left in query limit: -2.95 MB
E   Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. 
Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB 
Total=52.95 MB Peak=52.95 MB
E     Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 
OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
E       EXCHANGE_NODE (id=2): Total=0 Peak=0
E       DataStreamRecvr: Total=0 Peak=0
E       PLAN_ROOT_SINK: Total=0 Peak=0
E       CodeGen: Total=305.00 B Peak=224.50 KB
E     Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 
OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
E       SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
E       HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
E       DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
E       CodeGen: Total=23.94 KB Peak=1.64 MB
E   
E   Memory limit exceeded: Failed to allocate memory in 
TopNNode::ReclaimTuplePool.
E   SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
E   Error occurred on backend tarmstrong-box:22000 by fragment 
c641289b5b0652a4:9ce6652000000001
E   Memory left in process limit: 8.15 GB
E   Memory left in query limit: -2.95 MB
E   Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded. 
Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB 
Total=52.95 MB Peak=52.95 MB
E     Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0 
OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
E       EXCHANGE_NODE (id=2): Total=0 Peak=0
E       DataStreamRecvr: Total=0 Peak=0
E       PLAN_ROOT_SINK: Total=0 Peak=0
E       CodeGen: Total=305.00 B Peak=224.50 KB
E     Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0 
OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
E       SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
E       HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
E       DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
E       CodeGen: Total=23.94 KB Peak=1.64 MB
{noformat}

My current hypothesis is that it's spinning up an extra scanner thread in the 
failure case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (IMPALA-6188) test_top_n_reclaim is flaky

Reply via email to