Jobtracker throws GC overhead limit exceeded

2011-07-05 Thread Meghana
Hi,

We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4
task trackers), and we typically have 3-4 jobs running at a time. We have
been facing the following error on the Jobtracker:

java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded

It seems to be thrown by RunningJob.killJob() and usually occurs after a day
or so of starting up the cluster.

In the Jobtracker's output file:
Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead limit
exceeded
at java.lang.String.substring(String.java:1939)
at java.lang.String.substring(String.java:1904)
at org.apache.hadoop.fs.Path.getName(Path.java:188)
at
org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70)
at
org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
at
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
at
org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660)
at
org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532)
at
org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232)
at
org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938)
at
org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953)
at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012)
at
org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
at java.lang.Thread.run(Thread.java:619)


Please help!

Thanks,

Meghana


Re: Jobtracker throws GC overhead limit exceeded

2011-07-05 Thread Arun C Murthy
Meghana,

What is the heapsize for your JT? Try increasing that.

Also, we've fixed a huge number of issues in the JT (and Hadoop overall) since 
0.19. Can you upgrade to 0.20.203, the latest stable release?

thanks,
Arun

Sent from my iPhone

On Jul 4, 2011, at 11:10 PM, Meghana meghana.mara...@germinait.com wrote:

 Hi,
 
 We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4
 task trackers), and we typically have 3-4 jobs running at a time. We have
 been facing the following error on the Jobtracker:
 
 java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 It seems to be thrown by RunningJob.killJob() and usually occurs after a day
 or so of starting up the cluster.
 
 In the Jobtracker's output file:
 Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead limit
 exceeded
at java.lang.String.substring(String.java:1939)
at java.lang.String.substring(String.java:1904)
at org.apache.hadoop.fs.Path.getName(Path.java:188)
at
 org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70)
at
 org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
at
 org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
at
 org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660)
at
 org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
at org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532)
at
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232)
at
 org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938)
at
 org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953)
at org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012)
at
 org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
at java.lang.Thread.run(Thread.java:619)
 
 
 Please help!
 
 Thanks,
 
 Meghana


Re: Jobtracker throws GC overhead limit exceeded

2011-07-05 Thread Meghana
Hey Arun,

The JT heapsize (Xmx) is 512m. Will try increasing it, thanks!

Yes, migrating to 0.20 is definitely on my to-do list, but some urgent
issues have taken priority for now :(

Thanks,

..meghana


On 5 July 2011 12:25, Arun C Murthy a...@hortonworks.com wrote:

 Meghana,

 What is the heapsize for your JT? Try increasing that.

 Also, we've fixed a huge number of issues in the JT (and Hadoop overall)
 since 0.19. Can you upgrade to 0.20.203, the latest stable release?

 thanks,
 Arun

 Sent from my iPhone

 On Jul 4, 2011, at 11:10 PM, Meghana meghana.mara...@germinait.com
 wrote:

  Hi,
 
  We are using Hadoop 0.19 on a small cluster of 7 machines (7 datanodes, 4
  task trackers), and we typically have 3-4 jobs running at a time. We have
  been facing the following error on the Jobtracker:
 
  java.io.IOException: java.lang.OutOfMemoryError: GC overhead limit
 exceeded
 
  It seems to be thrown by RunningJob.killJob() and usually occurs after a
 day
  or so of starting up the cluster.
 
  In the Jobtracker's output file:
  Exception in thread initJobs java.lang.OutOfMemoryError: GC overhead
 limit
  exceeded
 at java.lang.String.substring(String.java:1939)
 at java.lang.String.substring(String.java:1904)
 at org.apache.hadoop.fs.Path.getName(Path.java:188)
 at
 
 org.apache.hadoop.fs.ChecksumFileSystem.isChecksumFile(ChecksumFileSystem.java:70)
 at
 
 org.apache.hadoop.fs.ChecksumFileSystem$1.accept(ChecksumFileSystem.java:442)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:726)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
 at
 
 org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:457)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:723)
 at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:748)
 at
 
 org.apache.hadoop.mapred.JobHistory$JobInfo.getJobHistoryFileName(JobHistory.java:660)
 at
 
 org.apache.hadoop.mapred.JobHistory$JobInfo.finalizeRecovery(JobHistory.java:746)
 at
 org.apache.hadoop.mapred.JobTracker.finalizeJob(JobTracker.java:1532)
 at
 
 org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:2232)
 at
 
 org.apache.hadoop.mapred.JobInProgress.terminateJob(JobInProgress.java:1938)
 at
  org.apache.hadoop.mapred.JobInProgress.terminate(JobInProgress.java:1953)
 at
 org.apache.hadoop.mapred.JobInProgress.fail(JobInProgress.java:2012)
 at
 
 org.apache.hadoop.mapred.EagerTaskInitializationListener$JobInitThread.run(EagerTaskInitializationListener.java:62)
 at java.lang.Thread.run(Thread.java:619)
 
 
  Please help!
 
  Thanks,
 
  Meghana