Jobtracker throws GC overhead limit exceeded
Hi, We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4 task trackers), and we typically have 3-4 jobs running at a time. We have been facing the following error on the Jobtracker: java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded It seems to be thrown by RunningJob.killJob() and usually occurs after a day or so of starting up the cluster. In the Jobtracker's output file: Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1939) at java.lang.String.substring(String.java:1904) at org.apache.hadoop.fs.Path.getName(Path.java:188) at org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70) at org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660) at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746) at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232) at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938) at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953) at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012) at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62) at java.lang.Thread.run(Thread.java:619) Please help! Thanks, Meghana
Re: Jobtracker throws GC overhead limit exceeded
Meghana, What is the heapsize for your JT? Try increasing that. Also, we've fixed a huge number of issues in the JT (and Hadoop overall) since 0.19. Can you upgrade to 0.20.203, the latest stable release? thanks, Arun Sent from my iPhone On Jul 4, 2011, at 11:10 PM, Meghana meghana.mara...@germinait.com wrote: Hi, We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4 task trackers), and we typically have 3-4 jobs running at a time. We have been facing the following error on the Jobtracker: java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded It seems to be thrown by RunningJob.killJob() and usually occurs after a day or so of starting up the cluster. In the Jobtracker's output file: Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1939) at java.lang.String.substring(String.java:1904) at org.apache.hadoop.fs.Path.getName(Path.java:188) at org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70) at org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660) at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746) at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232) at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938) at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953) at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012) at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62) at java.lang.Thread.run(Thread.java:619) Please help! Thanks, Meghana
Re: Jobtracker throws GC overhead limit exceeded
Hey Arun, The JT heapsize (Xmx) is 512m. Will try increasing it, thanks! Yes, migrating to 0.20 is definitely on my to-do list, but some urgent issues have taken priority for now :( Thanks, ..meghana On 5 July 2011 12:25, Arun C Murthy a...@hortonworks.com wrote: Meghana, What is the heapsize for your JT? Try increasing that. Also, we've fixed a huge number of issues in the JT (and Hadoop overall) since 0.19. Can you upgrade to 0.20.203, the latest stable release? thanks, Arun Sent from my iPhone On Jul 4, 2011, at 11:10 PM, Meghana meghana.mara...@germinait.com wrote: Hi, We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4 task trackers), and we typically have 3-4 jobs running at a time. We have been facing the following error on the Jobtracker: java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded It seems to be thrown by RunningJob.killJob() and usually occurs after a day or so of starting up the cluster. In the Jobtracker's output file: Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead limit exceeded at java.lang.String.substring(String.java:1939) at java.lang.String.substring(String.java:1904) at org.apache.hadoop.fs.Path.getName(Path.java:188) at org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70) at org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748) at org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660) at org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746) at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532) at org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232) at org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938) at org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953) at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012) at org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62) at java.lang.Thread.run(Thread.java:619) Please help! Thanks, Meghana