[
https://issues.apache.org/jira/browse/HIVE-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13752025#comment-13752025
]
Hudson commented on HIVE-5144:
------------------------------
FAILURE: Integrated in Hive-trunk-hadoop1-ptest #141 (See
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/141/])
HIVE-5144 : HashTableSink allocates empty new Object[] arrays & OOMs - use a
static emptyRow instead (Gopal V via Ashutosh Chauhan) (hashutosh:
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1517877)
*
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java
> HashTableSink allocates empty new Object[] arrays & OOMs - use a static
> emptyRow instead
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-5144
> URL: https://issues.apache.org/jira/browse/HIVE-5144
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Environment: Ubuntu LXC + -Xmx512m client opts
> Reporter: Gopal V
> Assignee: Gopal V
> Priority: Minor
> Labels: perfomance
> Fix For: 0.12.0
>
> Attachments: HIVE-5144.01.patch, HIVE-5144.02.patch
>
>
> The map-join hashtable sink in the local-task creates an in-memory hashtable
> with the following code.
> {code}
> Object[] value = JoinUtil.computeMapJoinValues(row, joinValues[alias],
> ...
> MapJoinRowContainer rowContainer = tableContainer.get(key);
> if (rowContainer == null) {
> rowContainer = new MapJoinRowContainer();
> rowContainer.add(value);
> {code}
> But for a query where the joinValues[alias].size() == 0, this results in a
> large number of unnecessary allocations which would be better served with a
> copy-on-write default value container & a pre-allocated zero object array
> which is immutable (the only immutable array there is in java).
> The query tested is roughly the following to scan all of
> customer_demographics in the hash-sink
> {code}
> select c_salutation, count(1)
> from customer
> JOIN customer_demographics ON customer.c_current_cdemo_sk =
> customer_demographics.cd_demo_sk
> group by c_salutation
> limit 10
> ;
> {code}
> When running with current trunk, the code results in an OOM with 512Mb ram.
> {code}
> 2013-08-23 05:11:26 Processing rows: 1400000 Hashtable size: 1399999
> Memory usage: 292418944 percentage: 0.579
> Execution failed with exit status: 3
> Obtaining error information
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira