[ https://issues.apache.org/jira/browse/HAWQ-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruilong Huo reassigned HAWQ-139: -------------------------------- Assignee: Ruilong Huo (was: Lei Chang) > Out of memory with 10 concurrent TPC-H workload in YARN mode > ------------------------------------------------------------ > > Key: HAWQ-139 > URL: https://issues.apache.org/jira/browse/HAWQ-139 > Project: Apache HAWQ > Issue Type: Bug > Components: Resource Manager > Reporter: Ruilong Huo > Assignee: Ruilong Huo > > On a 18 node HAWQ cluster with YARN configured, it errors out with "out of > memory" during 10 concurrent TPC-H (10G data per node) workload. > Further analysis shows that one of TPC-H query 9 session oom using about 1.7G > memory while the query is supposed to use about 1G memory. > For a long term fix, we need to investigate on resource manager and executor > to identify action items. For a short term fix, we give HAWQ 8G memory buffer > instead of 2G by default. > {code} > 91265 [2015-11-03 12:31:15] select > nation, > o_year, > sum(amount) as sum_profit > from > ( > select > n_name as nation, > extract(year from o_orderdate) as o_year, > l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount > from > part, > supplier, > lineitem, > partsupp, > orders, > nation > where > s_suppkey = l_suppkey > and ps_suppkey = l_suppkey > and ps_partkey = l_partkey > and p_partkey = l_partkey > and o_orderkey = l_orderkey > and s_nationkey = n_nationkey > and p_name like '%aquamarine%' > ) as profit > group by > nation, > o_year > order by > nation, > o_year desc; > 91272 [2015-11-03 12:31:21] > psql:/data1/gpadmin/pulse2-agent/agents/agent1/work/HAWQ-main-SystemTest-yarn/rhel5_x86_64/lsp/report/20151103-114720/performance_tpch_concurrent/tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin/tmp/1_8_TPCH_Query_09.sql:32: > ERROR: Canceling query because of high VMEM usage. Used: 1748MB, available > 480MB, red zone: 9216MB (runaway_cleaner.c:135) (seg74 bcn-w3:5532 > pid=33619) (dispatcher.c:1681) > ***|tpch_parquet_10gpn_nocomp_part_random_10c_gpadmin_1_8_TPCH_Query_09.sql|127665 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)