Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20561#discussion_r167421526
  
    --- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java
 ---
    @@ -98,10 +99,22 @@ public UnsafeKVExternalSorter(
             numElementsForSpillThreshold,
             canUseRadixSort);
         } else {
    -      // The array will be used to do in-place sort, which require half of 
the space to be empty.
    -      // Note: each record in the map takes two entries in the array, one 
is record pointer,
    -      // another is the key prefix.
    -      assert(map.numKeys() * 2 <= map.getArray().size() / 2);
    +      LongArray pointArray = map.getArray();
    +      // `BytesToBytesMap`'s point array is only guaranteed to hold all 
the distinct keys, but
    +      // `UnsafeInMemorySorter`'s point array need to hold all the 
entries. Since `BytesToBytesMap`
    +      // can have duplicated keys, here we need a check to make sure the 
point array can hold
    +      // all the entries in `BytesToBytesMap`.
    +      // The point array will be used to do in-place sort, which requires 
half of the space to be
    +      // empty. Note: each record in the map takes two entries in the 
point array, one is record
    +      // pointer, another is key prefix. So the required size of point 
array is `numRecords * 4`.
    +      // TODO: It's possible to change UnsafeInMemorySorter to have 
multiple entries with same key,
    +      // so that we can always reuse the point array.
    +      if (map.numValues() > pointArray.size() / 4) {
    +        // Here we ask the map to allocate memory, so that the memory 
manager won't ask the map
    +        // to spill, if the memory is not enough.
    +        pointArray = map.allocateArray(map.numValues() * 4L);
    +      }
    +
           // During spilling, the array in map will not be used, so we can 
borrow that and use it
           // as the underlying array for in-memory sorter (it's always large 
enough).
    --- End diff --
    
    Shall we update the comment here too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to