Tim Armstrong created IMPALA-6188:
-------------------------------------
Summary: test_top_n_reclaim is flaky
Key: IMPALA-6188
URL: https://issues.apache.org/jira/browse/IMPALA-6188
Project: IMPALA
Issue Type: Bug
Components: Infrastructure
Affects Versions: Impala 2.11.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong
Priority: Critical
[~jbapple] reported a test failure here:
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/607/
{noformat}
03:16:32 TestTopNReclaimQuery.test_top_n_reclaim[exec_option: {'batch_size':
0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen':
False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} |
table_format: text/none]
03:16:32 [gw10] linux2 -- Python 2.7.12
/home/ubuntu/Impala/bin/../infra/python/env/bin/python
03:16:32 query_test/test_queries.py:246: in test_top_n_reclaim
03:16:32 result = self.execute_query(self.QUERY, exec_options)
03:16:32 common/impala_test_suite.py:512: in wrapper
03:16:32 return function(*args, **kwargs)
03:16:32 common/impala_test_suite.py:537: in execute_query
03:16:32 return self.__execute_query(self.client, query, query_options)
03:16:32 common/impala_test_suite.py:604: in __execute_query
03:16:32 return impalad_client.execute(query, user=user)
03:16:32 common/impala_connection.py:160: in execute
03:16:32 return self.__beeswax_client.execute(sql_stmt, user=user)
03:16:32 beeswax/impala_beeswax.py:173: in execute
03:16:32 handle = self.__execute_query(query_string.strip(), user=user)
03:16:32 beeswax/impala_beeswax.py:341: in __execute_query
03:16:32 self.wait_for_completion(handle)
03:16:32 beeswax/impala_beeswax.py:361: in wait_for_completion
03:16:32 raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
03:16:32 E ImpalaBeeswaxException: ImpalaBeeswaxException:
03:16:32 E Query aborted:Memory limit exceeded
03:16:32 ---------------------------- Captured stderr setup
-----------------------------
03:16:32 -- connecting to: localhost:21000
03:16:32 ----------------------------- Captured stderr call
-----------------------------
03:16:32 SET batch_size=0;
03:16:32 SET num_nodes=0;
03:16:32 SET disable_codegen_rows_threshold=0;
03:16:32 SET disable_codegen=False;
03:16:32 SET abort_on_error=1;
03:16:32 SET mem_limit=50m;
03:16:32 SET exec_single_node_rows_threshold=0;
03:16:32 -- executing against localhost:21000
03:16:32 select * from tpch.lineitem order by l_orderkey desc limit 10;;
{noformat}
I was able to reproduce something similar locally by running in a loop:
{noformat}
E ImpalaBeeswaxException: ImpalaBeeswaxException:
E Query aborted:Memory limit exceeded: Failed to allocate memory in
TopNNode::ReclaimTuplePool.
E SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
E Error occurred on backend tarmstrong-box:22000 by fragment
c641289b5b0652a4:9ce6652000000001
E Memory left in process limit: 8.15 GB
E Memory left in query limit: -2.95 MB
E Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded.
Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB
Total=52.95 MB Peak=52.95 MB
E Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0
OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
E EXCHANGE_NODE (id=2): Total=0 Peak=0
E DataStreamRecvr: Total=0 Peak=0
E PLAN_ROOT_SINK: Total=0 Peak=0
E CodeGen: Total=305.00 B Peak=224.50 KB
E Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0
OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
E SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
E HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
E DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
E CodeGen: Total=23.94 KB Peak=1.64 MB
E
E Memory limit exceeded: Failed to allocate memory in
TopNNode::ReclaimTuplePool.
E SORT_NODE (id=1) could not allocate 190.00 B without exceeding limit.
E Error occurred on backend tarmstrong-box:22000 by fragment
c641289b5b0652a4:9ce6652000000001
E Memory left in process limit: 8.15 GB
E Memory left in query limit: -2.95 MB
E Query(c641289b5b0652a4:9ce6652000000000): memory limit exceeded.
Limit=50.00 MB Reservation=0 ReservationLimit=0 OtherMemory=52.95 MB
Total=52.95 MB Peak=52.95 MB
E Fragment c641289b5b0652a4:9ce6652000000000: Reservation=0
OtherMemory=8.30 KB Total=8.30 KB Peak=232.50 KB
E EXCHANGE_NODE (id=2): Total=0 Peak=0
E DataStreamRecvr: Total=0 Peak=0
E PLAN_ROOT_SINK: Total=0 Peak=0
E CodeGen: Total=305.00 B Peak=224.50 KB
E Fragment c641289b5b0652a4:9ce6652000000001: Reservation=0
OtherMemory=52.94 MB Total=52.94 MB Peak=52.94 MB
E SORT_NODE (id=1): Total=702.00 KB Peak=706.00 KB
E HDFS_SCAN_NODE (id=0): Total=52.23 MB Peak=52.23 MB
E DataStreamSender (dst_id=2): Total=688.00 B Peak=688.00 B
E CodeGen: Total=23.94 KB Peak=1.64 MB
{noformat}
My current hypothesis is that it's spinning up an extra scanner thread in the
failure case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)