Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20561#discussion_r167421526 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -98,10 +99,22 @@ public UnsafeKVExternalSorter( numElementsForSpillThreshold, canUseRadixSort); } else { - // The array will be used to do in-place sort, which require half of the space to be empty. - // Note: each record in the map takes two entries in the array, one is record pointer, - // another is the key prefix. - assert(map.numKeys() * 2 <= map.getArray().size() / 2); + LongArray pointArray = map.getArray(); + // `BytesToBytesMap`'s point array is only guaranteed to hold all the distinct keys, but + // `UnsafeInMemorySorter`'s point array need to hold all the entries. Since `BytesToBytesMap` + // can have duplicated keys, here we need a check to make sure the point array can hold + // all the entries in `BytesToBytesMap`. + // The point array will be used to do in-place sort, which requires half of the space to be + // empty. Note: each record in the map takes two entries in the point array, one is record + // pointer, another is key prefix. So the required size of point array is `numRecords * 4`. + // TODO: It's possible to change UnsafeInMemorySorter to have multiple entries with same key, + // so that we can always reuse the point array. + if (map.numValues() > pointArray.size() / 4) { + // Here we ask the map to allocate memory, so that the memory manager won't ask the map + // to spill, if the memory is not enough. + pointArray = map.allocateArray(map.numValues() * 4L); + } + // During spilling, the array in map will not be used, so we can borrow that and use it // as the underlying array for in-memory sorter (it's always large enough). --- End diff -- Shall we update the comment here too?
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org