Hari Sankar Sivarama Subramaniyan created HIVE-12084:
--------------------------------------------------------

             Summary: Hive queries with ORDER BY and large LIMIT fails with 
OutOfMemoryError Java heap space
                 Key: HIVE-12084
                 URL: https://issues.apache.org/jira/browse/HIVE-12084
             Project: Hive
          Issue Type: Bug
            Reporter: Hari Sankar Sivarama Subramaniyan
            Assignee: Hari Sankar Sivarama Subramaniyan


STEPS TO REPRODUCE:
{code}
CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` 
int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS 
TextFile;
load data local inpath 'sample_07.csv'  into table sample_07;
set hive.limit.pushdown.memory.usage=0.9999;
select * from sample_07 order by salary LIMIT 999999999;
{code}

This will result in 
{code}
Caused by: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
        at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
        at 
org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
        at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
        at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
{code}

The basic issue lies with top n optimization. We need a limit for the top n 
optimization. Ideally we would detect that the allocated bytes will be bigger 
than the "limit.pushdown.memory.usage" without trying to alloc it.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to