[ 
https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16467803#comment-16467803
 ] 

Thomas Graves commented on SPARK-21033:
---------------------------------------

[~cloud_fan] the followup PR [https://github.com/apache/spark/pull/21077] 
didn't go into spark 2.3.0, this should have had its own Jira and we need to 
udpate the fix version.  Can you please fix so we properly track what version 
this is in.  Also does this need to be backported to 2.3.1?

> fix the potential OOM in UnsafeExternalSorter
> ---------------------------------------------
>
>                 Key: SPARK-21033
>                 URL: https://issues.apache.org/jira/browse/SPARK-21033
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Wenchen Fan
>            Assignee: Wenchen Fan
>            Priority: Major
>             Fix For: 2.3.0
>
>
> In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for 
> pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary 
> buffer for radix sort.
> In `UnsafeExternalSorter`, we set the 
> `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, 
> and hoping the max size of point array to be 8 GB. However this is wrong, 
> `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point 
> array before reach this limitation, we may hit the max-page-size error.
> Users may see exception like this on large dataset:
> {code}
> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with 
> more than 17179869176 bytes
> at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
> at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to