[jira] [Updated] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-10-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-21033:

Description: 
In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for pointer, 
1 `long` for key-prefix, and another 2 `long`s as the temporary buffer for 
radix sort.

In `UnsafeExternalSorter`, we set the 
`DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and 
hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 
1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before 
reach this limitation, we may hit the max-page-size error.

Users may see exception like this on large dataset:
{code}
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more 
than 17179869176 bytes
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
...
{code}

  was:
## What changes were proposed in this pull request?

In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for pointer, 
1 `long` for key-prefix, and another 2 `long`s as the temporary buffer for 
radix sort.

In `UnsafeExternalSorter`, we set the 
`DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and 
hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 
1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before 
reach this limitation, we may hit the max-page-size error.

Users may see exception like this on large dataset:
{code}
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more 
than 17179869176 bytes
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
...
{code}

Setting `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to a smaller number is not 
enough, users can still set the config to a big number and trigger the too 
large page size issue. This PR fixes it by explicitly handling the too large 
page size exception in the sorter and spill.

This PR also change the type of 
`spark.shuffle.spill.numElementsForceSpillThreshold` to int, because it's only 
compared with `numRecords`, which is an int. This is an internal conf so we 
don't have a serious compatibility issue.

## How was this patch tested?

TODO


> fix the potential OOM in UnsafeExternalSorter
> -
>
> Key: SPARK-21033
> URL: https://issues.apache.org/jira/browse/SPARK-21033
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for 
> pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary 
> buffer for radix sort.
> In `UnsafeExternalSorter`, we set the 
> `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, 
> and hoping the max size of point array to be 8 GB. However this is wrong, 
> `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point 
> array before reach this limitation, we may hit the max-page-size error.
> Users may see exception like this on large dataset:
> {code}
> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with 
> more than 17179869176 bytes
> at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
> at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-21033) fix the potential OOM in UnsafeExternalSorter

2017-10-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-21033:

Description: 
## What changes were proposed in this pull request?

In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for pointer, 
1 `long` for key-prefix, and another 2 `long`s as the temporary buffer for 
radix sort.

In `UnsafeExternalSorter`, we set the 
`DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, and 
hoping the max size of point array to be 8 GB. However this is wrong, `1024 * 
1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point array before 
reach this limitation, we may hit the max-page-size error.

Users may see exception like this on large dataset:
{code}
Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with more 
than 17179869176 bytes
at 
org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
...
{code}

Setting `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to a smaller number is not 
enough, users can still set the config to a big number and trigger the too 
large page size issue. This PR fixes it by explicitly handling the too large 
page size exception in the sorter and spill.

This PR also change the type of 
`spark.shuffle.spill.numElementsForceSpillThreshold` to int, because it's only 
compared with `numRecords`, which is an int. This is an internal conf so we 
don't have a serious compatibility issue.

## How was this patch tested?

TODO

> fix the potential OOM in UnsafeExternalSorter
> -
>
> Key: SPARK-21033
> URL: https://issues.apache.org/jira/browse/SPARK-21033
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> ## What changes were proposed in this pull request?
> In `UnsafeInMemorySorter`, one record may take 32 bytes: 1 `long` for 
> pointer, 1 `long` for key-prefix, and another 2 `long`s as the temporary 
> buffer for radix sort.
> In `UnsafeExternalSorter`, we set the 
> `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to be `1024 * 1024 * 1024 / 2`, 
> and hoping the max size of point array to be 8 GB. However this is wrong, 
> `1024 * 1024 * 1024 / 2 * 32` is actually 16 GB, and if we grow the point 
> array before reach this limitation, we may hit the max-page-size error.
> Users may see exception like this on large dataset:
> {code}
> Caused by: java.lang.IllegalArgumentException: Cannot allocate a page with 
> more than 17179869176 bytes
> at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:241)
> at 
> org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:121)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:374)
> at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:396)
> at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94)
> ...
> {code}
> Setting `DEFAULT_NUM_ELEMENTS_FOR_SPILL_THRESHOLD` to a smaller number is not 
> enough, users can still set the config to a big number and trigger the too 
> large page size issue. This PR fixes it by explicitly handling the too large 
> page size exception in the sorter and spill.
> This PR also change the type of 
> `spark.shuffle.spill.numElementsForceSpillThreshold` to int, because it's 
> only compared with `numRecords`, which is an int. This is an internal conf so 
> we don't have a serious compatibility issue.
> ## How was this patch tested?
> TODO



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org