[ https://issues.apache.org/jira/browse/MAPREDUCE-4755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V resolved MAPREDUCE-4755. -------------------------------- Resolution: Not a Problem > Rewrite MapOutputBuffer to use direct buffers & allow parallel sort+collect > --------------------------------------------------------------------------- > > Key: MAPREDUCE-4755 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4755 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 3.0.0 > Environment: Ubuntu 12.10 x86_64 (Bulldozer 8-core) > Reporter: Gopal V > Assignee: Gopal V > Labels: optimization, sort > > The MapOutputBuffer has been written with a very severe constraint on the > amount of memory it can consume. This results in code that has to page-in & > page-out (i.e spill) data as it passes through the map buffers. > With the advent of the java.nio package, there is a fast and portable MMap > alternative to handling your own buffers. This exists outside the GC space of > Java and yet provides decently fast memory access to all the data. > The suggestion is that using mmap() direct buffers can be faster when a spill > is involved and simpler than the current spill logic when given enough > address space & uses the buffer caches to deliver best effort I/O. -- This message was sent by Atlassian JIRA (v6.3.4#6332)