[ 
https://issues.apache.org/jira/browse/SPARK-16766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-16766:
-------------------------------
    Priority: Minor  (was: Critical)

> TakeOrderedAndProjectExec easily cause OOM
> ------------------------------------------
>
>                 Key: SPARK-16766
>                 URL: https://issues.apache.org/jira/browse/SPARK-16766
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: drow blonde messi
>            Priority: Minor
>
> I found that a very simple SQL statement can easily cause a OOM.
> Like this:
> "insert into xyz2 select * from xyz order by x limit 900000000;"
> The problem is obvious: TakeOrderedAndProjectExec always malloc a huge Object 
> array(array size equals to the limit count) when the executeCollect or 
> doExecute is called.
> In Spark 1.6,  terminal/non-terminal TakeOrderedAndProject works the same 
> way: call the RDD.takeOrdered(limit), which produces a huge 
> BoundedPriorityQueue for every partition.
> In Spark 2.0, non-terminal TakeOrderedAndProject switch to use the  
> org.apache.spark.util.collection.Utils.takeOrdered, but the problem is still 
> exists, the expression ordering.leastOf(input.asJava, num).iterator.asScala 
> calls the leastOf method of com.google.common.collect.Ordering, and a large 
> Object Array is produced:
>     int bufferCap = k * 2;
>     @SuppressWarnings("unchecked") // we'll only put E's in
>     E[] buffer = (E[]) new Object[bufferCap];



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to