On Jul 24, 2011, at 2:34 AM, Joey Echeverria wrote: > You're running out of memory trying to generate the splits. You need to set a > bigger heap for your driver program. Assuming you're using the hadoop jar > command to launch your job, you can do this by setting HADOOP_HEAPSIZE to a > larger value in $HADOOP_HOME/conf/hadoop-env.sh >
As Harsh pointed out, please use HADOOP_CLIENT_OPTS and not HADOOP_HEAPSIZE for the job-client. Arun > -Joey > > On Jul 24, 2011 5:07 AM, "Gagan Bansal" <gagan.ban...@gmail.com> wrote: > > Hi All, > > > > I am getting the following error on running a job on about 12 TB of data. > > This happens before any mappers or reducers are launched. > > Also the job starts fine if I reduce the amount of input data. Any ideas as > > to what may be the reason for this error? > > > > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > > exceeded > > at java.util.Arrays.copyOf(Arrays.java:2786) > > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:71) > > at java.io.DataOutputStream.writeByte(DataOutputStream.java:136) > > at org.apache.hadoop.io.UTF8.writeChars(UTF8.java:278) > > at org.apache.hadoop.io.UTF8.writeString(UTF8.java:250) > > at org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:131) > > at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:111) > > at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:741) > > at org.apache.hadoop.ipc.Client.call(Client.java:1011) > > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > > at $Proxy6.getBlockLocations(Unknown Source) > > at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) > > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > > at $Proxy6.getBlockLocations(Unknown Source) > > at > > org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:359) > > at org.apache.hadoop.hdfs.DFSClient.getBlockLocations(DFSClient.java:380) > > at > > org.apache.hadoop.hdfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:178) > > at > > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:234) > > at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:946) > > at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:938) > > at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807) > > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:781) > > at > > org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:876) > > > > Gagan Bansal