[GitHub] spark pull request #22072: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-13 Thread zsxwing
Github user zsxwing closed the pull request at:

https://github.com/apache/spark/pull/22072


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22072: [SPARK-25081][Core]Nested spill in ShuffleExterna...

2018-08-10 Thread zsxwing
GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/22072

[SPARK-25081][Core]Nested spill in ShuffleExternalSorter should not access 
released memory page (branch-2.2)

## What changes were proposed in this pull request?

Backport https://github.com/apache/spark/pull/22062 to branch-2.2.

## How was this patch tested?

Jenkins


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-25081-2.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22072


commit 1a6452ef0939c09c09801cff78b0214d7979bf6d
Author: Shixiong Zhu 
Date:   2018-08-10T17:53:44Z

Nested spill in ShuffleExternalSorter should not access released memory page

This issue is pretty similar to 
[SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907).

"allocateArray" in 
[ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99)
 may trigger a spill and cause ShuffleInMemorySorter access the released 
`array`. Another task may get the same memory page from the pool. This will 
cause two tasks access the same memory page. When a task reads memory written 
by another task, many types of failures may happen. Here are some examples I  
have seen:

- JVM crash. (This is easy to reproduce in a unit test as we fill newly 
allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points 
to an invalid memory address)
- java.lang.IllegalArgumentException: Comparison method violates its 
general contract!
- java.lang.NullPointerException at 
org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384)
- java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size 
-536870912 because the size after growing exceeds size limitation 2147483632

This PR resets states in `ShuffleInMemorySorter.reset` before calling 
`allocateArray` to fix the issue.

The new unit test will make JVM crash without the fix.

Closes #22062 from zsxwing/SPARK-25081.

Authored-by: Shixiong Zhu 
Signed-off-by: Shixiong Zhu 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org