[ https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Davies Liu updated SPARK-16766: ------------------------------- Priority: Minor (was: Critical) > TakeOrderedAndProjectExec easily cause OOM > ------------------------------------------ > > Key: SPARK-16766 > URL: https://issues.apache.org/jira/browse/SPARK-16766 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.2, 2.0.0 > Reporter: drow blonde messi > Priority: Minor > > I found that a very simple SQL statement can easily cause a OOM. > Like this: > "insert into xyz2 select * from xyz order by x limit 900000000;" > The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object > array(array size equals to the limit count) when the executeCollect or > doExecute is called. > In Spark 1.6, terminal/non-terminal TakeOrderedAndProject works the same > way: call the RDD.takeOrdered(limit), which produces a huge > BoundedPriorityQueue for every partition. > In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the > org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still > exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala > calls the leastOf method of com.google.common.collect.Ordering, and a large > Object Array is produced: > int bufferCap = k * 2; > @SuppressWarnings("unchecked") // we'll only put E's in > E[] buffer = (E[]) new Object[bufferCap]; -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org