[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992070#comment-12992070
 ] 

Arun C Murthy commented on MAPREDUCE-2308:
------------------------------------------

You are hitting the JVM limit on the size of an array... we'll need to change 
the io.sort.mb to use multiple buffers...

> Sort buffer size (io.sort.mb) is limited to < 2 GB
> --------------------------------------------------
>
>                 Key: MAPREDUCE-2308
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2308
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0
>         Environment: Cloudera CDH3b3 (0.20.2+)
>            Reporter: Jay Hacker
>            Priority: Minor
>
> I have MapReduce jobs that use a large amount of per-task memory, because the 
> algorithm I'm using converges faster if more data is together on a node.  I 
> have my JVM heap size set at 3200 MB, and if I use the popular rule of thumb 
> that io.sort.mb should be ~70% of that, I get 2240 MB.  I rounded this down 
> to 2048 MB, but map tasks crash with :
> {noformat}
> java.io.IOException: Invalid "io.sort.mb": 2048
>         at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:790)
>         ...
> {noformat}
> MapTask.MapOutputBuffer implements its buffer with a byte[] of size 
> io.sort.mb (in bytes), and is sanity checking the size before allocating the 
> array.  The problem is that Java arrays can't have more than 2^31 - 1 
> elements (even with a 64-bit JVM), and this is a limitation of the Java 
> language specificiation itself.  As memory and data sizes grow, this would 
> seem to be a crippling limtiation of Java.
> It would be nice if this ceiling were documented, and an error issued sooner, 
> e.g. in jobtracker startup upon reading the config.  Going forward, we may 
> need to implement some array of arrays hack for large buffers. :(

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to