[ https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-5144: -------------------------- Attachment: HIVE-5144.02.patch Bad merge in patch. {code} - if((hasFilter(alias) && joinFilters[alias].size() > 0) || joinValues[alias] + if((hasFilter(alias) && filterMaps[alias].length > 0) || joinValues[alias]. {code} The check is supposed to be on filterMaps not joinFilters. This fixes test-failures found in the last run. > HashTableSink allocates empty new Object[] arrays & OOMs - use a static > emptyRow instead > ---------------------------------------------------------------------------------------- > > Key: HIVE-5144 > URL: https://issues.apache.org/jira/browse/HIVE-5144 > Project: Hive > Issue Type: Bug > Components: Query Processor > Environment: Ubuntu LXC + -Xmx512m client opts > Reporter: Gopal V > Assignee: Gopal V > Priority: Minor > Labels: perfomance > Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch > > > The map-join hashtable sink in the local-task creates an in-memory hashtable > with the following code. > {code} > Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias], > ... > MapJoinRowContainer rowContainer = tableContainer.get(key); > if (rowContainer == null) { > rowContainer = new MapJoinRowContainer(); > rowContainer.add(value); > {code} > But for a query where the joinValues[alias].size() == 0, this results in a > large number of unnecessary allocations which would be better served with a > copy-on-write default value container & a pre-allocated zero object array > which is immutable (the only immutable array there is in java). > The query tested is roughly the following to scan all of > customer_demographics in the hash-sink > {code} > select c_salutation, count(1) > from customer > JOIN customer_demographics ON customer.c_current_cdemo_sk = > customer_demographics.cd_demo_sk > group by c_salutation > limit 10 > ; > {code} > When running with current trunk, the code results in an OOM with 512Mb ram. > {code} > 2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999 > Memory usage: 292418944 percentage: 0.579 > Execution failed with exit status: 3 > Obtaining error information > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira