Hari Sankar Sivarama Subramaniyan created HIVE-12084: --------------------------------------------------------
Summary: Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java heap space Key: HIVE-12084 URL: https://issues.apache.org/jira/browse/HIVE-12084 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan STEPS TO REPRODUCE: {code} CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp` int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TextFile; load data local inpath 'sample_07.csv' into table sample_07; set hive.limit.pushdown.memory.usage=0.9999; select * from sample_07 order by salary LIMIT 999999999; {code} This will result in {code} Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234) at org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425) {code} The basic issue lies with top n optimization. We need a limit for the top n optimization. Ideally we would detect that the allocated bytes will be bigger than the "limit.pushdown.memory.usage" without trying to alloc it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)