Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1609#issuecomment-50929329 So I looked through this and I also think it would be good to split it into smaller patches for 1.1. As far as I can see there are several orthogonal improvements here: - Shuffle file consolidation fixes that Aaron copied in https://github.com/apache/spark/pull/1678 - ExternalAppendOnlyMap fixes to deal with writes past end of stream; we also need these in ExternalSorter - Fixes to directory creation in DiskBlockManager (I'm still not sure when this would be a problem actually if all accesses to these directories are through getFile; needs some investigation) - Fixes to isSymlink (though as is this seems like it would only compile on Java 7) - Improvements to the API of DiskBlockObjectWriter Of these, the first two are most critical. So I'd like to get those into 1.1, and then we can do API refactoring and the other fixes on the master branch. For the directory creation fix I'd still like to understand when that can be a problem (I'm probably just missing something), but it's also one we can add in 1.1 during the QA window. I'm going to update the JIRA to create sub-tasks for these things so we can track where each one is fixed. Thanks again for putting this together Mridul, this is very helpful.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---