Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/3438#issuecomment-64317326
  
    The main changes we implemented here are:
    * When a shuffle operation has a key ordering, sort records by key on the 
map side in addition to sorting by partition.
    * On the reduce side, keep blocks in serialized form, and deserialize and 
merge them when passing to the operation's output iterator.  This means that 
only (# of blocks being merged) records need to be deserialized at any point in 
time.  This part can be found in `SortShuffleReader`.
    * If the fetched blocks overflow memory, merge them to an on-disk file.
    * Add a `TieredDiskMerger` that avoids random I/O by merging 100 on-disk 
blocks at once.  This should be able to be used by `ExternalAppendOnlyMap` and 
`ExternalSorter` as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to