Gopal V created HIVE-5144:
-----------------------------
Summary: HashTableSink allocates empty new Object[] arrays & OOMs
- use a static emptyRow instead
Key: HIVE-5144
URL: https://issues.apache.org/jira/browse/HIVE-5144
Project: Hive
Issue Type: Bug
Components: Query Processor
Environment: Ubuntu LXC + -Xmx4096m client opts
Reporter: Gopal V
Assignee: Gopal V
Priority: Minor
The map-join hashtable sink in the local-task creates an in-memory hashtable
with the following code.
{code}
Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
...
MapJoinRowContainer rowContainer = tableContainer.get(key);
if (rowContainer == null) {
rowContainer = new MapJoinRowContainer();
rowContainer.add(value);
{code}
But for a query where the joinValues[alias].size() == 0, this results in a
large number of unnecessary allocations which would be better served with a
copy-on-write default value container & a pre-allocated zero object array which
is immutable (the only immutable array there is in java).
The query tested is roughly the following to scan all of customer_demographics
in the hash-sink
{code}
select c_salutation, count(1)
from customer
JOIN customer_demographics ON customer.c_current_cdemo_sk =
customer_demographics.cd_demo_sk
group by c_salutation
limit 10
;
{code}
When running with current trunk, the code results in an OOM with 512Mb ram.
{code}
2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999
Memory usage: 292418944 percentage: 0.579
Execution failed with exit status: 3
Obtaining error information
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira