this seems to repro 100% when the teragen job is set to anything over 1 GIG.
What can I do to mitigate?
hadoop version 0.20.1
hadoop jar $HADOOP_INSTALL/hadoop-0.20.1-dev-examples.jar teragen 100000000
/user/terasort-input
Generating 100000000 using 2 maps with step of 50000000
11/04/22 13:26:01 INFO mapred.JobClient: Running job: job_201104221307_0002
11/04/22 13:26:03 INFO mapred.JobClient: map 0% reduce 0%
...
1/04/22 13:35:39 INFO mapred.JobClient: map 99% reduce 0%
11/04/22 13:35:45 INFO mapred.JobClient: map 100% reduce 0%
11/04/22 13:35:47 INFO mapred.JobClient: Job complete: job_201104221307_0002
11/04/22 13:35:47 INFO mapred.JobClient: Counters: 13
11/04/22 13:35:47 INFO mapred.JobClient: Job Counters
11/04/22 13:35:47 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1153434
11/04/22 13:35:47 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
11/04/22 13:35:47 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
11/04/22 13:35:47 INFO mapred.JobClient: Launched map tasks=2
11/04/22 13:35:47 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
11/04/22 13:35:47 INFO mapred.JobClient: FileSystemCounters
11/04/22 13:35:47 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=10000000000
11/04/22 13:35:47 INFO mapred.JobClient: Map-Reduce Framework
11/04/22 13:35:47 INFO mapred.JobClient: Map input records=100000000
11/04/22 13:35:47 INFO mapred.JobClient: PHYSICAL_MEMORY_BYTES=201478144
11/04/22 13:35:47 INFO mapred.JobClient: Spilled Records=0
11/04/22 13:35:47 INFO mapred.JobClient: CPU_MILLISECONDS=214280
11/04/22 13:35:47 INFO mapred.JobClient: VIRTUAL_MEMORY_BYTES=1269215232
11/04/22 13:35:47 INFO mapred.JobClient: Map input bytes=100000000
11/04/22 13:35:47 INFO mapred.JobClient: Map output records=100000000
hadoop jar $HADOOP_INSTALL/hadoop-0.20.1-dev-examples.jar terasort
/user/terasort-input /user/terasort-output
11/04/22 13:36:39 INFO terasort.TeraSort: starting
11/04/22 13:36:39 INFO mapred.FileInputFormat: Total input paths to process
: 2
java.lang.IllegalArgumentException: Offset 2147483648 is outside of file
(0..2147483647)
at
org.apache.hadoop.mapred.FileInputFormat.getBlockIndex(FileInputFormat.java:396)
at
org.apache.hadoop.mapred.FileInputFormat.getSplitHosts(FileInputFormat.java:552)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:357)
at
org.apache.hadoop.examples.terasort.TeraInputFormat.getSplits(TeraInputFormat.java:209)
at
org.apache.hadoop.examples.terasort.TeraInputFormat.writePartitionFile(TeraInputFormat.java:116)
at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:243)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
--
View this message in context:
http://lucene.472066.n3.nabble.com/hadoop-terasort-IllegalArgumentException-Offset-2147483648-is-outside-of-file-tp2852851p2852851.html
Sent from the Lucene - General mailing list archive at Nabble.com.