[ https://issues.apache.org/jira/browse/MAPREDUCE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992070#comment-12992070 ]
Arun C Murthy commented on MAPREDUCE-2308: ------------------------------------------ You are hitting the JVM limit on the size of an array... we'll need to change the io.sort.mb to use multiple buffers... > Sort buffer size (io.sort.mb) is limited to < 2 GB > -------------------------------------------------- > > Key: MAPREDUCE-2308 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2308 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 0.20.1, 0.20.2, 0.21.0 > Environment: Cloudera CDH3b3 (0.20.2+) > Reporter: Jay Hacker > Priority: Minor > > I have MapReduce jobs that use a large amount of per-task memory, because the > algorithm I'm using converges faster if more data is together on a node. I > have my JVM heap size set at 3200 MB, and if I use the popular rule of thumb > that io.sort.mb should be ~70% of that, I get 2240 MB. I rounded this down > to 2048 MB, but map tasks crash with : > {noformat} > java.io.IOException: Invalid "io.sort.mb": 2048 > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:790) > ... > {noformat} > MapTask.MapOutputBuffer implements its buffer with a byte[] of size > io.sort.mb (in bytes), and is sanity checking the size before allocating the > array. The problem is that Java arrays can't have more than 2^31 - 1 > elements (even with a 64-bit JVM), and this is a limitation of the Java > language specificiation itself. As memory and data sizes grow, this would > seem to be a crippling limtiation of Java. > It would be nice if this ceiling were documented, and an error issued sooner, > e.g. in jobtracker startup upon reading the config. Going forward, we may > need to implement some array of arrays hack for large buffers. :( -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira