[jira] [Commented] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968014#comment-13968014 ] Liyin Liang commented on MAPREDUCE-5775: The patch looks good to me. +1 > SleepJob.createJob setNumReduceTasks twice > -- > > Key: MAPREDUCE-5775 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Liyin Liang >Assignee: jhanver chand sharma >Priority: Minor > Attachments: MAPREDUCE-5775.patch > > > The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, > which is unnecessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5775: --- Attachment: (was: MAPREDUCE-5775.diff) > SleepJob.createJob setNumReduceTasks twice > -- > > Key: MAPREDUCE-5775 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Liyin Liang >Assignee: jhanver chand sharma >Priority: Minor > Attachments: MAPREDUCE-5775.patch > > > The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, > which is unnecessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang reassigned MAPREDUCE-5799: -- Assignee: Liyin Liang > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Assignee: Liyin Liang >Priority: Minor > Attachments: MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5799: --- Status: Patch Available (was: Open) > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Priority: Minor > Attachments: MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
[ https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5799: --- Attachment: MAPREDUCE-5799.diff Although we can add {code} yarn.app.mapreduce.am.env LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native {code} to client's yarn-site.xml to pass this job, it's better to set a default value to MR_AM_ADMIN_USER_ENV to avoid this problem. Attach a patch to add DEFAULT_MR_AM_ADMIN_USER_ENV. > add default value of MR_AM_ADMIN_USER_ENV > - > > Key: MAPREDUCE-5799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.3.0 >Reporter: Liyin Liang >Priority: Minor > Attachments: MAPREDUCE-5799.diff > > > Submit a 1 map + 1 reduce sleep job with the following config: > {code} > > mapreduce.map.output.compress > true > > > mapreduce.map.output.compress.codec > org.apache.hadoop.io.compress.SnappyCodec > > > mapreduce.job.ubertask.enable > true > > {code} > And the LinuxContainerExecutor is enable on NodeManager. > This job will fail with the following error: > {code} > 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] > org.apache.hadoop.mapred.LocalContainerLauncher: Error running local > (uberized) 'child' : java.lang.UnsatisfiedLinkError: > org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z > at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native > Method) > at > org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) > at > org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) > at > org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) > at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) > at > org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) > at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) > at > org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) > at java.lang.Thread.run(Thread.java:662) > {code} > When create a ContainerLaunchContext for task in > TaskAttemptImpl.createCommonContainerLaunchContext(), the > DEFAULT_MAPRED_ADMIN_USER_ENV which is > "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. > Where when create a ContainerLaunchContext for mrappmaster in > YARNRunner.createApplicationSubmissionContext(), there is no default > environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV
Liyin Liang created MAPREDUCE-5799: -- Summary: add default value of MR_AM_ADMIN_USER_ENV Key: MAPREDUCE-5799 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Liyin Liang Priority: Minor Submit a 1 map + 1 reduce sleep job with the following config: {code} mapreduce.map.output.compress true mapreduce.map.output.compress.codec org.apache.hadoop.io.compress.SnappyCodec mapreduce.job.ubertask.enable true {code} And the LinuxContainerExecutor is enable on NodeManager. This job will fail with the following error: {code} 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] org.apache.hadoop.mapred.LocalContainerLauncher: Error running local (uberized) 'child' : java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317) at org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232) at java.lang.Thread.run(Thread.java:662) {code} When create a ContainerLaunchContext for task in TaskAttemptImpl.createCommonContainerLaunchContext(), the DEFAULT_MAPRED_ADMIN_USER_ENV which is "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. Where when create a ContainerLaunchContext for mrappmaster in YARNRunner.createApplicationSubmissionContext(), there is no default environment. So the ubermode job fails to find native lib. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5775: --- Assignee: Liyin Liang Affects Version/s: trunk Status: Patch Available (was: Open) > SleepJob.createJob setNumReduceTasks twice > -- > > Key: MAPREDUCE-5775 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: trunk >Reporter: Liyin Liang >Assignee: Liyin Liang >Priority: Minor > Attachments: MAPREDUCE-5775.diff > > > The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, > which is unnecessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice
[ https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5775: --- Attachment: MAPREDUCE-5775.diff Attach a patch to remove one job.setNumReduceTasks(numReducer) from each SleepJob.java. > SleepJob.createJob setNumReduceTasks twice > -- > > Key: MAPREDUCE-5775 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang >Priority: Minor > Attachments: MAPREDUCE-5775.diff > > > The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, > which is unnecessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice
Liyin Liang created MAPREDUCE-5775: -- Summary: SleepJob.createJob setNumReduceTasks twice Key: MAPREDUCE-5775 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Liyin Liang Priority: Minor The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, which is unnecessary. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903999#comment-13903999 ] Liyin Liang commented on MAPREDUCE-5487: The following line is not necessary any more. It should be deleted. {code} static final Configuration conf = new JobConf(); {code} > In task processes, JobConf is unnecessarily loaded again in Limits > -- > > Key: MAPREDUCE-5487 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance, task >Affects Versions: 2.1.0-beta >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Fix For: 2.3.0 > > Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch > > > Limits statically loads a JobConf, which incurs costs of reading files from > disk and parsing XML. The contents of this JobConf are identical to the one > loaded by YarnChild (before adding job.xml as a resource). Allowing Limits > to initialize with the JobConf loaded in YarnChild would reduce task startup > time. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization
[ https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858531#comment-13858531 ] Liyin Liang commented on MAPREDUCE-5691: [~sandyr] as a long-term work, limiting network IO using cgroups is a better way to solve this problem. [~jira.shegalov] Our cluster users run thousands of jobs every day. Its difficult for them to set parameters for specific job. > Throttle shuffle's bandwidth utilization > > > Key: MAPREDUCE-5691 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Liyin Liang > Attachments: ganglia-slave.jpg > > > In our hadoop cluster, a reducer of a big job can utilize all the bandwidth > during shuffle phase. Then any task reading data from the machine which > running that reducer becomes very very slow. > It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. > And create a throttler for Shuffle to throttle each Fetcher. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization
[ https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5691: --- Attachment: ganglia-slave.jpg Attach a ganglia network metics picture. The reducer utilizes all the input bandwidth. So the throttling should go on the reducer side. > Throttle shuffle's bandwidth utilization > > > Key: MAPREDUCE-5691 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Liyin Liang > Attachments: ganglia-slave.jpg > > > In our hadoop cluster, a reducer of a big job can utilize all the bandwidth > during shuffle phase. Then any task reading data from the machine which > running that reducer becomes very very slow. > It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. > And create a throttler for Shuffle to throttle each Fetcher. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization
Liyin Liang created MAPREDUCE-5691: -- Summary: Throttle shuffle's bandwidth utilization Key: MAPREDUCE-5691 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Liyin Liang In our hadoop cluster, a reducer of a big job can utilize all the bandwidth during shuffle phase. Then any task reading data from the machine which running that reducer becomes very very slow. It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. And create a throttler for Shuffle to throttle each Fetcher. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5684: --- Attachment: MAPREDUCE-5684-1.diff This patch changes the assert to accept both TIPFAILED and FAILED status. verifyFailingMapperCounters(job) only if status is TIPFAILED. > TestMRJobs.testFailingMapper occasionally fails > --- > > Key: MAPREDUCE-5684 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang > Attachments: MAPREDUCE-5684-1.diff > > > TestMRJobs is occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs > --- > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs > testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: > 15.657 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang reassigned MAPREDUCE-5690: -- Assignee: Liyin Liang > TestLocalMRNotification.testMR occasionally fails > - > > Key: MAPREDUCE-5690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5690.1.diff > > > TestLocalMRNotificationis occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapred.TestLocalMRNotification > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification > testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: > 24.881 sec <<< ERROR! > java.io.IOException: Job cleanup didn't start in 20 seconds > at > org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) > at > org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5690: --- Status: Patch Available (was: Open) > TestLocalMRNotification.testMR occasionally fails > - > > Key: MAPREDUCE-5690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5690.1.diff > > > TestLocalMRNotificationis occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapred.TestLocalMRNotification > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification > testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: > 24.881 sec <<< ERROR! > java.io.IOException: Job cleanup didn't start in 20 seconds > at > org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) > at > org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5690: --- Attachment: MAPREDUCE-5690.1.diff This patch adds waiting job's map progress before job.killJob(). > TestLocalMRNotification.testMR occasionally fails > - > > Key: MAPREDUCE-5690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang > Attachments: MAPREDUCE-5690.1.diff > > > TestLocalMRNotificationis occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapred.TestLocalMRNotification > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification > testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: > 24.881 sec <<< ERROR! > java.io.IOException: Job cleanup didn't start in 20 seconds > at > org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) > at > org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852529#comment-13852529 ] Liyin Liang commented on MAPREDUCE-5690: The failure of TestLocalMRNotification.testMR is caused by UtilsForTests.runJobKill(). During UtilsForTests.runJobKill(), a job with KillMapper is submitted to LocalJobRunner. When the job is in RUNNING status, kill it by job.killJob(). Then wait the job to complete with 20 seconds timeout. The problem is job.killJob() intends to interrupt the KillMapper, which will sleep for a long time. While if job.killJob() is invoked before KillMapper is launched, the job will continue run the mapper with a long time. > TestLocalMRNotification.testMR occasionally fails > - > > Key: MAPREDUCE-5690 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang > > TestLocalMRNotificationis occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapred.TestLocalMRNotification > --- > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification > testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: > 24.881 sec <<< ERROR! > java.io.IOException: Job cleanup didn't start in 20 seconds > at > org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) > at > org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at junit.framework.TestCase.runTest(TestCase.java:168) > at junit.framework.TestCase.runBare(TestCase.java:134) > at junit.framework.TestResult$1.protect(TestResult.java:110) > at junit.framework.TestResult.runProtected(TestResult.java:128) > at junit.framework.TestResult.run(TestResult.java:113) > at junit.framework.TestCase.run(TestCase.java:124) > at junit.framework.TestSuite.runTest(TestSuite.java:243) > at junit.framework.TestSuite.run(TestSuite.java:238) > at > org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails
Liyin Liang created MAPREDUCE-5690: -- Summary: TestLocalMRNotification.testMR occasionally fails Key: MAPREDUCE-5690 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang TestLocalMRNotificationis occasionally failing with the error: {code} --- Test set: org.apache.hadoop.mapred.TestLocalMRNotification --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification testMR(org.apache.hadoop.mapred.TestLocalMRNotification) Time elapsed: 24.881 sec <<< ERROR! java.io.IOException: Job cleanup didn't start in 20 seconds at org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685) at org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at junit.framework.TestCase.runTest(TestCase.java:168) at junit.framework.TestCase.runBare(TestCase.java:134) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails
[ https://issues.apache.org/jira/browse/MAPREDUCE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848985#comment-13848985 ] Liyin Liang commented on MAPREDUCE-5684: The failure happens when job.getTaskCompletionEvents(0, 2) is redirected to history server. In the .jhist file all the failed attempts' status is FAILED. > TestMRJobs.testFailingMapper occasionally fails > --- > > Key: MAPREDUCE-5684 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang > > TestMRJobs is occasionally failing with the error: > {code} > --- > Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs > --- > Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec > <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs > testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: > 15.657 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:147) > at > org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails
Liyin Liang created MAPREDUCE-5684: -- Summary: TestMRJobs.testFailingMapper occasionally fails Key: MAPREDUCE-5684 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang TestMRJobs is occasionally failing with the error: {code} --- Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs --- Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs) Time elapsed: 15.657 sec <<< FAILURE! java.lang.AssertionError: expected: but was: at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:147) at org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5614: --- Target Version/s: (was: 2.3.0) > job history file name should escape queue name > -- > > Key: MAPREDUCE-5614 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: mr-5614-2.diff, mr-5614.diff > > > Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is > the delimiter of job history file name, JobHistoryServer shows "cug" as the > queue name. To fix this problem, we should escape queuename in job history > file name. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847119#comment-13847119 ] Liyin Liang commented on MAPREDUCE-5623: Nice patch! > TestJobCleanup fails because of RejectedExecutionException and NPE. > --- > > Key: MAPREDUCE-5623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Tsuyoshi OZAWA >Assignee: Jason Lowe > Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, > MAPREDUCE-5623.3.patch > > > org.apache.hadoop.mapred.TestJobCleanup can fail because of > RejectedExecutionException by NonAggregatingLogHandler. This problem is > described in YARN-1409. TestJobCleanup can still fail after fixing > RejectedExecutionException, because of NPE by Job#getCounters()'s returning > null. > {code} > --- > Test set: org.apache.hadoop.mapred.TestJobCleanup > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup > testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup) Time elapsed: > 31.068 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199) > at > org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5679: --- Attachment: MAPREDUCE-5679-3.diff Incorporate Jason Lowe 's comment. Thanks for the review. > TestJobHistoryParsing has race condition > > > Key: MAPREDUCE-5679 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5679-2.diff, MAPREDUCE-5679-3.diff, > MAPREDUCE-5679.diff > > > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of > race condition. > {noformat} > testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) > Time elapsed: 4.102 sec <<< ERROR! > java.io.IOException: Unable to initialize History Viewer > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125) > {noformat} > In the checkHistoryParsing() function, after > {code} > HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId); > {code} > a thread named MoveIntermediateToDone will be launched to move history file > from done_intermediate to done directory. > If the history file is moved, > {code} > HistoryViewer viewer = new HistoryViewer(fc.makeQualified( > fileInfo.getHistoryFile()).toString(), conf, true); > {code} > will throw IOException,because the history file is not found. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5679: --- Attachment: MAPREDUCE-5679-2.diff testHistoryParsingForFailedAttempts() and testCountersForFailedTask() have the same race conditions. > TestJobHistoryParsing has race condition > > > Key: MAPREDUCE-5679 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5679-2.diff, MAPREDUCE-5679.diff > > > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of > race condition. > {noformat} > testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) > Time elapsed: 4.102 sec <<< ERROR! > java.io.IOException: Unable to initialize History Viewer > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125) > {noformat} > In the checkHistoryParsing() function, after > {code} > HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId); > {code} > a thread named MoveIntermediateToDone will be launched to move history file > from done_intermediate to done directory. > If the history file is moved, > {code} > HistoryViewer viewer = new HistoryViewer(fc.makeQualified( > fileInfo.getHistoryFile()).toString(), conf, true); > {code} > will throw IOException,because the history file is not found. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5679: --- Assignee: Liyin Liang Target Version/s: 2.4.0 Status: Patch Available (was: Open) > TestJobHistoryParsing has race condition > > > Key: MAPREDUCE-5679 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: MAPREDUCE-5679.diff > > > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of > race condition. > {noformat} > testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) > Time elapsed: 4.102 sec <<< ERROR! > java.io.IOException: Unable to initialize History Viewer > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125) > {noformat} > In the checkHistoryParsing() function, after > {code} > HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId); > {code} > a thread named MoveIntermediateToDone will be launched to move history file > from done_intermediate to done directory. > If the history file is moved, > {code} > HistoryViewer viewer = new HistoryViewer(fc.makeQualified( > fileInfo.getHistoryFile()).toString(), conf, true); > {code} > will throw IOException,because the history file is not found. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition
[ https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5679: --- Attachment: MAPREDUCE-5679.diff This patch encapsulates "test output for HistoryViewer" with lock of fileInfo to avoid race condition. > TestJobHistoryParsing has race condition > > > Key: MAPREDUCE-5679 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang > Attachments: MAPREDUCE-5679.diff > > > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of > race condition. > {noformat} > testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) > Time elapsed: 4.102 sec <<< ERROR! > java.io.IOException: Unable to initialize History Viewer > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) > at > org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) > at > org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339) > at > org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125) > {noformat} > In the checkHistoryParsing() function, after > {code} > HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId); > {code} > a thread named MoveIntermediateToDone will be launched to move history file > from done_intermediate to done directory. > If the history file is moved, > {code} > HistoryViewer viewer = new HistoryViewer(fc.makeQualified( > fileInfo.getHistoryFile()).toString(), conf, true); > {code} > will throw IOException,because the history file is not found. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (MAPREDUCE-5679) TestJobHistoryParsing has race condition
Liyin Liang created MAPREDUCE-5679: -- Summary: TestJobHistoryParsing has race condition Key: MAPREDUCE-5679 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of race condition. {noformat} testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing) Time elapsed: 4.102 sec <<< ERROR! java.io.IOException: Unable to initialize History Viewer at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137) at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798) at org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86) at org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339) at org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125) {noformat} In the checkHistoryParsing() function, after {code} HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId); {code} a thread named MoveIntermediateToDone will be launched to move history file from done_intermediate to done directory. If the history file is moved, {code} HistoryViewer viewer = new HistoryViewer(fc.makeQualified( fileInfo.getHistoryFile()).toString(), conf, true); {code} will throw IOException,because the history file is not found. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845957#comment-13845957 ] Liyin Liang commented on MAPREDUCE-5623: Hi Jason Lowe, The patch is nice to me. I think testKilledJob() has the same problem with testFailedJob(). So it's better to fix both them. > TestJobCleanup fails because of RejectedExecutionException and NPE. > --- > > Key: MAPREDUCE-5623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Tsuyoshi OZAWA >Assignee: Jason Lowe > Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch > > > org.apache.hadoop.mapred.TestJobCleanup can fail because of > RejectedExecutionException by NonAggregatingLogHandler. This problem is > described in YARN-1409. TestJobCleanup can still fail after fixing > RejectedExecutionException, because of NPE by Job#getCounters()'s returning > null. > {code} > --- > Test set: org.apache.hadoop.mapred.TestJobCleanup > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup > testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup) Time elapsed: > 31.068 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199) > at > org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.
[ https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845028#comment-13845028 ] Liyin Liang commented on MAPREDUCE-5623: Hi Jason Lowe, If the client was redirected to the history server, job.getCounters() will return null. Because the .jhist file of a failed job does't contain job level counters. > TestJobCleanup fails because of RejectedExecutionException and NPE. > --- > > Key: MAPREDUCE-5623 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Tsuyoshi OZAWA >Assignee: Tsuyoshi OZAWA > Attachments: MAPREDUCE-5623.1.patch > > > org.apache.hadoop.mapred.TestJobCleanup can fail because of > RejectedExecutionException by NonAggregatingLogHandler. This problem is > described in YARN-1409. TestJobCleanup can still fail after fixing > RejectedExecutionException, because of NPE by Job#getCounters()'s returning > null. > {code} > --- > Test set: org.apache.hadoop.mapred.TestJobCleanup > --- > Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec > <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup > testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup) Time elapsed: > 31.068 sec <<< ERROR! > java.lang.NullPointerException: null > at > org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199) > at > org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296) > {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5614: --- Attachment: mr-5614-2.diff Update patch to incorporate Zhijie's comment. > job history file name should escape queue name > -- > > Key: MAPREDUCE-5614 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: mr-5614-2.diff, mr-5614.diff > > > Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is > the delimiter of job history file name, JobHistoryServer shows "cug" as the > queue name. To fix this problem, we should escape queuename in job history > file name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5614: --- Attachment: mr-5614.diff attach a patch to escape queue name. > job history file name should escape queue name > -- > > Key: MAPREDUCE-5614 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: mr-5614.diff > > > Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is > the delimiter of job history file name, JobHistoryServer shows "cug" as the > queue name. To fix this problem, we should escape queuename in job history > file name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name
[ https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-5614: --- Status: Patch Available (was: Open) > job history file name should escape queue name > -- > > Key: MAPREDUCE-5614 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: mr-5614.diff > > > Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is > the delimiter of job history file name, JobHistoryServer shows "cug" as the > queue name. To fix this problem, we should escape queuename in job history > file name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (MAPREDUCE-5614) job history file name should escape queue name
Liyin Liang created MAPREDUCE-5614: -- Summary: job history file name should escape queue name Key: MAPREDUCE-5614 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.2.0 Reporter: Liyin Liang Assignee: Liyin Liang Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is the delimiter of job history file name, JobHistoryServer shows "cug" as the queue name. To fix this problem, we should escape queuename in job history file name. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Status: Patch Available (was: Open) > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Fix Version/s: 1.2.0 > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Fix For: 1.2.0 > > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Affects Version/s: (was: 1.1.1) 1.1.2 > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572109#comment-13572109 ] Liyin Liang commented on MAPREDUCE-4978: The attached patch add a new method updateJobWithSplit() only for new-api job. This patch also fixed MAPREDUCE-1743. > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.1 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Affects Version/s: 1.1.1 > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.1 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Attachment: 4978-1.diff > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Affects Versions: 1.1.1 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 4978-1.diff > > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
[ https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4978: --- Fix Version/s: (was: 1.2.0) > Add a updateJobWithSplit() method for new-api job > - > > Key: MAPREDUCE-4978 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Liyin Liang >Assignee: Liyin Liang > > HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api > job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job
Liyin Liang created MAPREDUCE-4978: -- Summary: Add a updateJobWithSplit() method for new-api job Key: MAPREDUCE-4978 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Liyin Liang Assignee: Liyin Liang Fix For: 1.2.0 HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api job. It's better to add another method for new-api job. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
[ https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang reassigned MAPREDUCE-1743: -- Assignee: Liyin Liang > conf.get("map.input.file") returns null when using MultipleInputs in Hadoop > 0.20 > > > Key: MAPREDUCE-1743 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2 >Reporter: Yuanyuan Tian >Assignee: Liyin Liang > Attachments: mr-1743.diff > > > There is a problem in getting the input file name in the mapper when uisng > MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support > different formats for my inputs to the my MapReduce job. And inside each > mapper, I also need to know the exact input file that the mapper is > processing. However, conf.get("map.input.file") returns null. Can anybody > help me solve this problem? Thanks in advance. > public class Test extends Configured implements Tool{ > static class InnerMapper extends MapReduceBase implements > Mapper > { > > > public void configure(JobConf conf) > { > String inputName=conf.get("map.input.file")); > ... > } > > } > > public int run(String[] arg0) throws Exception { > JonConf job; > job = new JobConf(Test.class); > ... > > MultipleInputs.addInputPath(conf, new Path("A"), > TextInputFormat.class); > MultipleInputs.addInputPath(conf, new Path("B"), > SequenceFileFormat.class); > ... > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
[ https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-1743: --- Attachment: mr-1743.diff Attach a patch based on branch-1.1 with Jim's solution.This patch works well in our production cluster. > conf.get("map.input.file") returns null when using MultipleInputs in Hadoop > 0.20 > > > Key: MAPREDUCE-1743 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2 >Reporter: Yuanyuan Tian > Attachments: mr-1743.diff > > > There is a problem in getting the input file name in the mapper when uisng > MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support > different formats for my inputs to the my MapReduce job. And inside each > mapper, I also need to know the exact input file that the mapper is > processing. However, conf.get("map.input.file") returns null. Can anybody > help me solve this problem? Thanks in advance. > public class Test extends Configured implements Tool{ > static class InnerMapper extends MapReduceBase implements > Mapper > { > > > public void configure(JobConf conf) > { > String inputName=conf.get("map.input.file")); > ... > } > > } > > public int run(String[] arg0) throws Exception { > JonConf job; > job = new JobConf(Test.class); > ... > > MultipleInputs.addInputPath(conf, new Path("A"), > TextInputFormat.class); > MultipleInputs.addInputPath(conf, new Path("B"), > SequenceFileFormat.class); > ... > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
[ https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424704#comment-13424704 ] Liyin Liang commented on MAPREDUCE-1743: Jim's solution is nice. > conf.get("map.input.file") returns null when using MultipleInputs in Hadoop > 0.20 > > > Key: MAPREDUCE-1743 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.2 >Reporter: Yuanyuan Tian > > There is a problem in getting the input file name in the mapper when uisng > MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support > different formats for my inputs to the my MapReduce job. And inside each > mapper, I also need to know the exact input file that the mapper is > processing. However, conf.get("map.input.file") returns null. Can anybody > help me solve this problem? Thanks in advance. > public class Test extends Configured implements Tool{ > static class InnerMapper extends MapReduceBase implements > Mapper > { > > > public void configure(JobConf conf) > { > String inputName=conf.get("map.input.file")); > ... > } > > } > > public int run(String[] arg0) throws Exception { > JonConf job; > job = new JobConf(Test.class); > ... > > MultipleInputs.addInputPath(conf, new Path("A"), > TextInputFormat.class); > MultipleInputs.addInputPath(conf, new Path("B"), > SequenceFileFormat.class); > ... > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control
[ https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-4478: --- Attachment: 4478.diff Attach a patch to fix this bug. I don't know whether the synchronized is necessary. > TaskTracker's heartbeat is out of control > - > > Key: MAPREDUCE-4478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3 >Reporter: Liyin Liang > Attachments: 4478.diff > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control
[ https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422022#comment-13422022 ] Liyin Liang commented on MAPREDUCE-4478: There are two configuration items to control the TaskTracker's heartbeat interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is *mapreduce.tasktracker.outofband.heartbeat.damper*. If we set *mapreduce.tasktracker.outofband.heartbeat* with true and set *mapreduce.tasktracker.outofband.heartbeat.damper* with default value (100), TaskTracker may send heartbeat without any interval. The code to control heartbeat interval is as follows: {code:java} long now = System.currentTimeMillis(); // accelerate to account for multiple finished tasks up-front long remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; while (remaining > 0) { // sleeps for the wait time or // until there are *enough* empty slots to schedule tasks synchronized (finishedCount) { finishedCount.wait(remaining); // Recompute now = System.currentTimeMillis(); remaining = (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now; if (remaining <= 0) { // Reset count finishedCount.set(0); break; } } } {code} During the first time computing, if *finishedCount* is more than zero, *getHeartbeatInterval(finishedCount.get())* will return zero. Then *remaining* will be less than or equal with zero. In this case, the *while* loop will be skipped. So *finishedCount* will never be set with zero. > TaskTracker's heartbeat is out of control > - > > Key: MAPREDUCE-4478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3 >Reporter: Liyin Liang > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Moved] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control
[ https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang moved HDFS-3722 to MAPREDUCE-4478: -- Affects Version/s: (was: 1.0.3) (was: 1.0.2) (was: 1.0.1) (was: 1.0.0) 1.0.0 1.0.1 1.0.2 1.0.3 Key: MAPREDUCE-4478 (was: HDFS-3722) Project: Hadoop Map/Reduce (was: Hadoop HDFS) > TaskTracker's heartbeat is out of control > - > > Key: MAPREDUCE-4478 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 1.0.3, 1.0.2, 1.0.1, 1.0.0 >Reporter: Liyin Liang > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2349) speed up list[located]status calls from input formats
[ https://issues.apache.org/jira/browse/MAPREDUCE-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266293#comment-13266293 ] Liyin Liang commented on MAPREDUCE-2349: This jira is very meaningful for large, busy cluster. > speed up list[located]status calls from input formats > - > > Key: MAPREDUCE-2349 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2349 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task >Reporter: Joydeep Sen Sarma > > when a job has many input paths - listStatus - or the improved > listLocatedStatus - calls (invoked from the getSplits() method) can take a > long time. Most of the time is spent waiting for the previous call to > complete and then dispatching the next call. > This can be greatly speeded up by dispatching multiple calls at once (via > executors). If the same filesystem client is used - then the calls are much > better pipelined (since calls are serialized) and don't impose extra burden > on the namenode while at the same time greatly reducing the latency to the > client. In a simple test on non-peak hours, this resulted in the getSplits() > time reducing from about 3s to about 0.5s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080715#comment-13080715 ] Liyin Liang commented on MAPREDUCE-2209: Hi Subroto, Your analysis is great and your patch looks good to me. However, I found another issue MAPREDUCE-2364 which is duplicated with this one. What's more, their solution is mostly the same with your patch. I think one of them should be close as duplicate. > TaskTracker's heartbeat hang for several minutes when copying large job.jar > from HDFS > - > > Key: MAPREDUCE-2209 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.23.0 > Environment: hadoop version: 0.19.1 >Reporter: Liyin Liang >Priority: Blocker > Attachments: 2209-1.diff, MAPREDUCE-2209.patch > > > If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat > hang for several minutes when localizing the job. The jstack of related > threads are as follows: > {code:borderStyle=solid} > "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf > runnable [0x42e56000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > - locked <0x002afc892ec8> (a sun.nio.ch.Util$1) > - locked <0x002afc892eb0> (a > java.util.Collections$UnmodifiableSet) > - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked <0x002afce26158> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) > - locked <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) > at > org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) > "Map-events fetcher for all reduce tasks on > tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 > tid=0x002b05ef8000 > nid=0x1ada waiting for monitor entry [0x42d55000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) > - waiting to lock <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) > - locked <0x002a9eefe1f8> (a java.util.TreeMap) > "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 > nid=0x1ab0 waiting for monitor entry [0x4234b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.
[jira] [Commented] (MAPREDUCE-2364) Shouldn't hold lock on rjob while localizing resources.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078631#comment-13078631 ] Liyin Liang commented on MAPREDUCE-2364: I think this issue is the same with MAPREDUCE-2209. > Shouldn't hold lock on rjob while localizing resources. > --- > > Key: MAPREDUCE-2364 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2364 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.203.0 >Reporter: Owen O'Malley >Assignee: Devaraj Das > Fix For: 0.20.204.0 > > Attachments: MAPREDUCE-2364.patch, > no-lock-localize-branch-0.20-security.patch, no-lock-localize-trunk.patch > > > There is a deadlock while localizing resources on the TaskTracker. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073442#comment-13073442 ] Liyin Liang commented on MAPREDUCE-2209: Hi Subroto, In fact,we have fixed this issuethrough reducing the lock of _TaskTracker::getMapCompletionEvents()_. And it works well in our 1500 nodes product cluster.I will attach a diff file for 0.19. > TaskTracker's heartbeat hang for several minutes when copying large job.jar > from HDFS > - > > Key: MAPREDUCE-2209 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 > Project: Hadoop Map/Reduce > Issue Type: Bug > Environment: hadoop version: 0.19.1 >Reporter: Liyin Liang >Priority: Blocker > Attachments: 2209-1.diff > > > If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat > hang for several minutes when localizing the job. The jstack of related > threads are as follows: > {code:borderStyle=solid} > "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf > runnable [0x42e56000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > - locked <0x002afc892ec8> (a sun.nio.ch.Util$1) > - locked <0x002afc892eb0> (a > java.util.Collections$UnmodifiableSet) > - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked <0x002afce26158> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) > - locked <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) > at > org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) > "Map-events fetcher for all reduce tasks on > tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 > tid=0x002b05ef8000 > nid=0x1ada waiting for monitor entry [0x42d55000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) > - waiting to lock <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) > - locked <0x002a9eefe1f8> (a java.util.TreeMap) > "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 > nid=0x1ab0 waiting for monitor entry [0x4234b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684) > - waiting to lock <0x002a9eefe1f8
[jira] [Updated] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2209: --- Attachment: 2209-1.diff > TaskTracker's heartbeat hang for several minutes when copying large job.jar > from HDFS > - > > Key: MAPREDUCE-2209 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 > Project: Hadoop Map/Reduce > Issue Type: Bug > Environment: hadoop version: 0.19.1 >Reporter: Liyin Liang >Priority: Blocker > Attachments: 2209-1.diff > > > If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat > hang for several minutes when localizing the job. The jstack of related > threads are as follows: > {code:borderStyle=solid} > "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf > runnable [0x42e56000] >java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) > at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) > at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) > at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) > - locked <0x002afc892ec8> (a sun.nio.ch.Util$1) > - locked <0x002afc892eb0> (a > java.util.Collections$UnmodifiableSet) > - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) > at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) > at > org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) > at > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) > at > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > at java.io.BufferedInputStream.read(BufferedInputStream.java:237) > - locked <0x002afce26158> (a java.io.BufferedInputStream) > at java.io.DataInputStream.readShort(DataInputStream.java:295) > at > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) > - locked <0x002afce26218> (a > org.apache.hadoop.hdfs.DFSClient$DFSInputStream) > at java.io.DataInputStream.read(DataInputStream.java:83) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) > at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) > at > org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) > at > org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) > - locked <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) > at > org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) > "Map-events fetcher for all reduce tasks on > tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 > tid=0x002b05ef8000 > nid=0x1ada waiting for monitor entry [0x42d55000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) > - waiting to lock <0x002afce2d260> (a > org.apache.hadoop.mapred.TaskTracker$RunningJob) > at > org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) > - locked <0x002a9eefe1f8> (a java.util.TreeMap) > "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 > nid=0x1ab0 waiting for monitor entry [0x4234b000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684) > - waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap) > - locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.
[jira] [Resolved] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang resolved MAPREDUCE-2510. Resolution: Fixed > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068206#comment-13068206 ] Liyin Liang commented on MAPREDUCE-2714: Attaching a patch for 0.20 branch. > When a job is retired by the same user's another job, its jobconf file is not > deleted from the log directory of the JobTracker > --- > > Key: MAPREDUCE-2714 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 2714-1.diff > > > After MAPREDUCE-130, the job's conf copy will be deleted from the log > directory of the JobTracker when the job is retired. However, it just works > if the job is retired by _RetireJobs_ thread of JobTracker. If a job is > retired by the same user's another job, its conf copy will not be deleted. > This kind of retire happens in _JobTracker::finalizeJob(job)_, when > JobTracker maintains more than _MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs > information in memory for a given user. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2714: --- Attachment: 2714-1.diff > When a job is retired by the same user's another job, its jobconf file is not > deleted from the log directory of the JobTracker > --- > > Key: MAPREDUCE-2714 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.1, 0.20.2 >Reporter: Liyin Liang >Assignee: Liyin Liang > Attachments: 2714-1.diff > > > After MAPREDUCE-130, the job's conf copy will be deleted from the log > directory of the JobTracker when the job is retired. However, it just works > if the job is retired by _RetireJobs_ thread of JobTracker. If a job is > retired by the same user's another job, its conf copy will not be deleted. > This kind of retire happens in _JobTracker::finalizeJob(job)_, when > JobTracker maintains more than _MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs > information in memory for a given user. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker
When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker --- Key: MAPREDUCE-2714 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 0.20.1 Reporter: Liyin Liang Assignee: Liyin Liang After MAPREDUCE-130, the job's conf copy will be deleted from the log directory of the JobTracker when the job is retired. However, it just works if the job is retired by _RetireJobs_ thread of JobTracker. If a job is retired by the same user's another job, its conf copy will not be deleted. This kind of retire happens in _JobTracker::finalizeJob(job)_, when JobTracker maintains more than _MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs information in memory for a given user. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2339) optimize JobInProgress.getTaskInProgress(taskid)
[ https://issues.apache.org/jira/browse/MAPREDUCE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067452#comment-13067452 ] Liyin Liang commented on MAPREDUCE-2339: Nice patch! A user submitted a job with more than 680,000 map tasks to our cluster. Then jobtracker become inefficient to process heartbeats, many threads are blocked and lots of requests are queued. Through jstack of JobTracker process, we find most of the time are spent on JIP.getTaskInProgress(). This patch is a good way to improve JIP.getTaskInProgress()'s performance and fix our problem. > optimize JobInProgress.getTaskInProgress(taskid) > > > Key: MAPREDUCE-2339 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2339 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.20.2, 0.21.0 >Reporter: Kang Xiao > Attachments: MAPREDUCE-2339.patch, MAPREDUCE-2339.patch > > > JobInProgress.getTaskInProgress(taskid) use a linner search to get the > TaskInProgress object by taskid. In fact, it can be replaced by much more > efficient array index operation. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator
[ https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049621#comment-13049621 ] Liyin Liang commented on MAPREDUCE-1904: This is a great patch. Here is part of the stack when work thread is blocked: {code} "1797055149@qtp0-98" prio=10 tid=0x002aa1a4 nid=0x333 waiting for monitor entry [0x49dc5000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:377) - waiting to lock <0xa090> (a org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:142) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3086) {code} I have written a job with one map which output 1M data, and 100 reduces. Each reduce spawn 10 threads to fetch data from map side 3k times just like shuffle phase. When run this job, most of work threads is blocked on AllocatorPerContext. With LRUCache, most work threads are blocked on LOG.info() as following stack. {code} "1793911889@qtp0-101" prio=10 tid=0x002aa153 nid=0x34f2 waiting for monitor entry [0x41d45000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.log4j.Category.callAppenders(Category.java:204) - waiting to lock <0xa01be928> (a org.apache.log4j.spi.RootLogger) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:133) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3246) {code} With LRUCache + disable LOG.info(): This job takes 3mins, 19sec to run. Without LRUCache + enable LOG.info(): This job takes just 37sec to run. b.t.w LRUCache should use *mapId* as key instead of *(jobId + mapId)*. Because jobId is just part of mapId. > Reducing locking contention in TaskTracker.MapOutputServlet's > LocalDirAllocator > --- > > Key: MAPREDUCE-1904 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: tasktracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan > Attachments: LocalDirAllocator.JPG, LocalDirAllocator_Monitor.JPG, > MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, TaskTracker- yourkit > profiler output .jpg, Thread profiler output showing contention.jpg, profiler > output after applying the patch.jpg > > > While profiling tasktracker with Sort benchmark, it was observed that threads > block on LocalDirAllocator.getLocalPathToRead() in order to get the index > file and temporary map output file. > As LocalDirAllocator is tied up with ServetContext, only one instance would > be available per tasktracker httpserver. Given the jobid & mapid, > LocalDirAllocator retrieves index file path and temporary map output file > path. getLocalPathToRead() is internally synchronized. > Introducing a LRUCache for this lookup reduces the contention heavily > (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the > LRUCache can be varied based on the environment and I observed a throughput > improvement in the order of 4-7% with the introduction of LRUCache. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048963#comment-13048963 ] Liyin Liang commented on MAPREDUCE-2510: We have planned to build our own jetty version based on 6.1.14, with following patches to fix OOM bugs. JETTY-1157, Don't hold array passed in write(byte[]). JETTY-861,switched buffer pools to ThreadLocal implementation. JETTY-1188,Null old jobs in QueuedThreadPool. It works well in test cluster. > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) >
[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045237#comment-13045237 ] Liyin Liang commented on MAPREDUCE-143: --- bq. I think we ran into the same issue, any work around or config tweak to avoid running into this? Thanks. I have created MAPREDUCE-2510 for this problem. As Chris's comment, Jetty 6.1.26 does not have this behavior. However, Jetty 6.1.26 has its own bugs MAPREDUCE-2529 and MAPREDUCE-2530 which are more serious than OOM. > OOM in the TaskTracker while serving map outputs > > > Key: MAPREDUCE-143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Devaraj Das > > Saw this exception in the TT logs: > 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) > at > org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28) > at > org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:71) > at > org.mortbay.jetty.AbstractBuffers.getBuffer(AbstractBuffers.java:131) > at org.mortbay.jetty.HttpGenerator.addContent(HttpGenerator.java:145) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:642) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2879) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2529) Recognize Jetty bug 1342 and handle it
[ https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041381#comment-13041381 ] Liyin Liang commented on MAPREDUCE-2529: After upgrading to jetty 6.1.26, our product cluster met the same problem. Through observation, we found TT will throw lots of "java.io.IOException: Broken pipe" when serve map-output and Jetty print logs as follows in this case. 2011-05-30 00:11:06,389 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@6cf3a37f Busy selector - injecting delay 3 times So we just grep "Busy selector" from TT's log to detect this bug. > Recognize Jetty bug 1342 and handle it > -- > > Key: MAPREDUCE-2529 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker >Affects Versions: 0.20.204.0, 0.23.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > Attachments: jetty1342-20security.patch > > > We are seeing many instances of the Jetty-1342 > (http://jira.codehaus.org/browse/JETTY-1342). The bug doesn't cause Jetty to > stop responding altogether, some fetches go through but a lot of them throw > exceptions and eventually fail. The only way we have found to get the TT out > of this state is to restart the TT. This jira is to catch this particular > exception (or perhaps a configurable regex) and handle it in an automated way > to either blacklist or shutdown the TT after seeing it a configurable number > of them. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039736#comment-13039736 ] Liyin Liang commented on MAPREDUCE-2510: After upgrading our product cluster's Jetty version to 6.1.26. The checkpoint become very slow. fsimage size download time Before upgrading 10G2 mins After upgrading9.95G 15 mins What's more, there are many "JVM BUG(s)" logs in NN's log file: 2011-05-26 22:46:48,807 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@173ab5e JVM BUG(s) - injecting delay59 times 2011-05-26 22:46:48,807 INFO org.mortbay.log: org.mortbay.io.nio.SelectorManager$SelectSet@173ab5e JVM BUG(s) - recreating selector 59 times, canceled keys 944 times According to Jetty 6.1.26's code, Jetty's Selector sleep some time when print above logs. > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) >
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039487#comment-13039487 ] Liyin Liang commented on MAPREDUCE-2510: Hi Koji, We just trigger this bug in our test cluster with jetty6.1.26. Can you please share your workaround? > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) --
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035935#comment-13035935 ] Liyin Liang commented on MAPREDUCE-2510: Hi Chris, I really appreciate your comments! Jetty 6.1.26 did free the references to Runnable instances. We'll upgrade our cluster's jetty version asap. Thanks again. > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.Queue
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035185#comment-13035185 ] Liyin Liang commented on MAPREDUCE-2510: Hi Chris, HADOOP-6882 upgrade the version of Jetty to 6.1.26. That jira has checked in to 0.20 branch. But I still don't know why 6.1.26 does not have this behavior. > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.Queued
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035183#comment-13035183 ] Liyin Liang commented on MAPREDUCE-2510: Hi Chris, our Jetty version is 6.1.14, the same with trunk. Is there an issue about upgrading Jetty to 6.1.26? Why Jetty 6.1.26 does not have this behavior? I saw that Cloudera's cdh3u0 use Jetty 6.1.26. Thanks > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.jav
[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
[ https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035175#comment-13035175 ] Liyin Liang commented on MAPREDUCE-2510: The following comments are copied from MAPREDUCE-143: We dump the heap of TaskTracker and analyze it with MAT. We found one instance of "org.mortbay.thread.QueuedThreadPool" occupies 853,258,184 (72.51%) bytes. This object contain a "java.lang.Runnable[]" which has 7200 elements. The QueuedThreadPool of jetty6 own an array of jobs. If an idle thread is available a job is directly dispatched, otherwise the job is queued to the array. At first the size of the array is _maxThreads(tasktracker.http.threads). When its full, the size grow to array.length() + _maxThreads. Because the grow has no limit, this array can occupy too many memory when there are lots of fetch request from reduce task. So is this jetty6's bug? > TaskTracker throw OutOfMemoryError after upgrade to jetty6 > -- > > Key: MAPREDUCE-2510 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Liyin Liang > > Our product cluster's TaskTracker sometimes throw OutOfMemoryError after > upgrade to jetty6. The exception in TT's log is as follows: > 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedInputStream.(BufferedInputStream.java:178) > at > org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) > at > org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:324) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) > Exceptions in .out file: > java.lang.OutOfMemoryError: Java heap space > Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap > space > Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap > space > java.lang.OutOfMemoryError: Java heap space > java.lang.reflect.InvocationTargetException > Exception in thread "IPC Server handler 6 on 50050" at > sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) > at org.mortbay.log.Log.warn(Log.java:181) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.jav
[jira] [Created] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6
TaskTracker throw OutOfMemoryError after upgrade to jetty6 -- Key: MAPREDUCE-2510 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Liyin Liang Our product cluster's TaskTracker sometimes throw OutOfMemoryError after upgrade to jetty6. The exception in TT's log is as follows: 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput java.lang.OutOfMemoryError: Java heap space at java.io.BufferedInputStream.(BufferedInputStream.java:178) at org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) Exceptions in .out file: java.lang.OutOfMemoryError: Java heap space Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap space java.lang.OutOfMemoryError: Java heap space java.lang.reflect.InvocationTargetException Exception in thread "IPC Server handler 6 on 50050" at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126) at org.mortbay.log.Log.warn(Log.java:181) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032367#comment-13032367 ] Liyin Liang commented on MAPREDUCE-143: --- The QueuedThreadPool of jetty6 own an array of jobs. If an idle thread is available a job is directly dispatched, otherwise the job is queued to the array. At first the size of the array is _maxThreads(40). When its full, the size grow to array.length() + _maxThreads. Because the grow has no limit, this array can occupy too many memory when there are lots of fetch request from reduce task. So is this jetty6's bug? > OOM in the TaskTracker while serving map outputs > > > Key: MAPREDUCE-143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Devaraj Das > > Saw this exception in the TT logs: > 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) > at > org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28) > at > org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:71) > at > org.mortbay.jetty.AbstractBuffers.getBuffer(AbstractBuffers.java:131) > at org.mortbay.jetty.HttpGenerator.addContent(HttpGenerator.java:145) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:642) > at > org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) > at > org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2879) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs
[ https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032254#comment-13032254 ] Liyin Liang commented on MAPREDUCE-143: --- Our cluster met the similar problem after upgrade to jetty6. related log: 2011-05-11 16:24:26,914 ERROR org.mortbay.log: Error for /mapOutput java.lang.OutOfMemoryError: Java heap space at java.io.BufferedInputStream.(BufferedInputStream.java:178) at org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44) at org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) 2011-05-11 17:31:39,376 ERROR org.mortbay.log: Error for /mapOutput 2011-05-11 17:31:45,523 ERROR org.mortbay.log: Error for /mapOutput java.lang.OutOfMemoryError: Java heap space jmap -heap result: Heap Configuration: MinHeapFreeRatio = 40 MaxHeapFreeRatio = 70 MaxHeapSize = 1610612736 (1536.0MB) NewSize = 1310720 (1.25MB) MaxNewSize = 17592186044415 MB OldSize = 5439488 (5.1875MB) NewRatio = 2 SurvivorRatio= 8 PermSize = 21757952 (20.75MB) MaxPermSize = 85983232 (82.0MB) Heap Usage: PS Young Generation Eden Space: capacity = 61865984 (59.0MB) used = 61865984 (59.0MB) free = 0 (0.0MB) 100.0% used >From Space: capacity = 178913280 (170.625MB) used = 11205368 (10.686271667480469MB) free = 167707912 (159.93872833251953MB) 6.263016361893315% used To Space: capacity = 178913280 (170.625MB) used = 0 (0.0MB) free = 178913280 (170.625MB) 0.0% used PS Old Generation capacity = 1073741824 (1024.0MB) used = 1073710024 (1023.9696731567383MB) free = 31800 (0.03032684326171875MB) 99.99703839421272% used PS Perm Generation capacity = 21757952 (20.75MB) used = 17614112 (16.798126220703125MB) free = 4143840 (3.951873779296875MB) 80.95482516001506% used We dump the heap of TaskTracker and analyze it with MAT. We found one instance of "org.mortbay.thread.QueuedThreadPool" occupies 853,258,184 (72.51%) bytes. This object contain a "java.lang.Runnable[]" which has 7200 elements. > OOM in the TaskTracker while serving map outputs > > > Key: MAPREDUCE-143 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-143 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Devaraj Das > > Saw this exception in the TT logs: > 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: GC overhead limit exceeded > 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:312) > at > org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28) > at > org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:
[jira] Commented: (MAPREDUCE-2271) TestSetupTaskScheduling failing in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987020#action_12987020 ] Liyin Liang commented on MAPREDUCE-2271: Hi Todd, I think the test case testNumSlotsUsedForTaskCleanup is supposed to check that one task-cleanup task only need one slot even for high RAM jobs. This test case create a fake high RAM job with one map task and one reduce task. Each task require 2 slots. Then check that each heartbeat will schedule one task-cleanup task which need only one slot. So it need't to create dummy tracker status with FAILED_UNCLEAN tasks. The result of the change in MAPREDUCE-2207 is that task-cleanup tasks can't be scheduled to trackers with FAILED_UNCLEAN tasks to report during heartbeat, no matter the task failed on which tracker. This cause none task-cleanup task will be scheduled during heartbeat in the test case. The following code: {code:} List tasks = jobTracker.getSetupAndCleanupTasks(ttStatus); {code} will always return *null*, only if ttStatus has tasks with FAILED_UNCLEAN status. > TestSetupTaskScheduling failing in trunk > > > Key: MAPREDUCE-2271 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2271 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Todd Lipcon >Assignee: Liyin Liang >Priority: Blocker > Attachments: 2271-1.diff > > > This test case is failing in trunk after the commit of MAPREDUCE-2207 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2271) TestSetupTaskScheduling failing in trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2271: --- Attachment: 2271-1.diff With [MAPREDUCE-2207|https://issues.apache.org/jira/browse/MAPREDUCE-2207], a tracker can't get any task-cleanup-task, if it has tasks with _FAILED_UNCLEAN_ state. The _testNumSlotsUsedForTaskCleanup_ of _TestSetupTaskScheduling_ creates a dummy tracker status with two _FAILED_UNCLEAN_ tasks to report. So the jobtracker return null when call _getSetupAndCleanupTasks_ with this tracker status. I think it's useless to add task status to the tracker status in that test case, because the job already has two task-setup-tasks to schedule and the job's two tasks's status are _FAILED_UNCLEAN_. In other words, the job's tasks status need not to be updated. So we can just remove _addNewTaskStatus_ codes as 2271-1.diff. > TestSetupTaskScheduling failing in trunk > > > Key: MAPREDUCE-2271 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2271 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Todd Lipcon >Priority: Blocker > Attachments: 2271-1.diff > > > This test case is failing in trunk after the commit of MAPREDUCE-2207 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983984#action_12983984 ] Liyin Liang commented on MAPREDUCE-2207: Hi Todd, The patch committed is absolutely the same one i ran tests. Maybe I made some mistakes when ran "ant test". I'll work on [MAPREDUCE-2271| https://issues.apache.org/jira/browse/MAPREDUCE-2271] to fix TestSetupTaskScheduling. By the way, I don't understand why the result of "ant test-patch" is +1. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen >Assignee: Liyin Liang > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, > 2207-3.diff, ant-test.txt > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2264) Job status exceeds 100% in some cases
[ https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981623#action_12981623 ] Liyin Liang commented on MAPREDUCE-2264: I think [HADOOP-5210|https://issues.apache.org/jira/browse/HADOOP-5210] has fixed this bug. > Job status exceeds 100% in some cases > -- > > Key: MAPREDUCE-2264 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Reporter: Adam Kramer > > I'm looking now at my jobtracker's list of running reduce tasks. One of them > is 120.05% complete, the other is 107.28% complete. > I understand that these numbers are estimates, but there is no case in which > an estimate of 100% for a non-complete task is better than an estimate of > 99.99%, nor is there any case in which an estimate greater than 100% is valid. > I suggest that whatever logic is computing these set 99.99% as a hard maximum. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: ant-test.txt Hi Scott, The ant-test.txt file is the result of "ant test". The result of "ant test-patch" is as follows: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] [exec] +1 system test framework. The patch passed system test framework compile. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, > 2207-3.diff, ant-test.txt > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: 2207-3.diff > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, > 2207-3.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: 2207-3.diff Hi Scott, I agree with you. According to [MAPREDUCE-2118|https://issues.apache.org/jira/browse/MAPREDUCE-2118], maybe getJobSetupAndCleanupTasks will not hold JT lock in the future. I have changed the name of the method hasFailedAndNeedCleanupTask() to hasFailedUncleanTask(). Thanks for your advice. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: 2207-2.diff move the logic to server side according to Scott's comment. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975403#action_12975403 ] Liyin Liang commented on MAPREDUCE-2207: Hi Scott, If we move this logic to server side, every heartbeat has to call hasFailedAndNeedCleanupTaskToReport() inside the lock of JobTracker. Is there be a performance loss of jt? > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: 2207-1.diff > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff, 2207-1.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Release Note: Task-cleanup task should not be scheduled on the node that the task just failed Status: Patch Available (was: Open) Patch with unit test for trunk. The patch just added a _assert_ on TestTaskFail.java to test the feature. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2026) JobTracker.getJobCounters() should not hold JobTracker lock while calling JobInProgress.getCounters()
[ https://issues.apache.org/jira/browse/MAPREDUCE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972349#action_12972349 ] Liyin Liang commented on MAPREDUCE-2026: Hi Joydeep, your patch moved incrementTaskCounters out of lock of JobInProgress in function getCounters(). Should we do the same thing to function getMapCounters() and getReduceCounters()? > JobTracker.getJobCounters() should not hold JobTracker lock while calling > JobInProgress.getCounters() > - > > Key: MAPREDUCE-2026 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2026 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Scott Chen >Assignee: Joydeep Sen Sarma > Fix For: 0.22.0 > > Attachments: 2026.1.patch, MAPREDUCE-2026.txt > > > JobTracker.getJobCounter() will lock JobTracker and call > JobInProgress.getCounters(). > JobInProgress.getCounters() can be very expensive because it aggregates all > the task counters. > We found that from the JobTracker jstacks that this method is one of the > bottleneck of the JobTracker performance. > JobInProgress.getCounters() should be able to be called out side the > JobTracker lock because it already has JobInProgress lock. > For example, it is used by jobdetails.jsp without a JobTracker lock. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971126#action_12971126 ] Liyin Liang commented on MAPREDUCE-2207: Hi Scott, I'm happy to work on this JIRA and provide a patch with unit test for trunk. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2207: --- Attachment: 0.19.1.diff For hadoop 0.19.1 > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > Attachments: 0.19.1.diff > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed
[ https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970709#action_12970709 ] Liyin Liang commented on MAPREDUCE-2207: Hi Scott, Our product cluster met a similar problem about job setup-task. We let TT don't ask for new task when report a failed job setup/cleanup task in heartbeat to fix this issue. I'll attach our patch based on 0.19.1. > Task-cleanup task should not be scheduled on the node that the task just > failed > --- > > Key: MAPREDUCE-2207 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobtracker >Affects Versions: 0.23.0 >Reporter: Scott Chen > Fix For: 0.23.0 > > > Currently the task-cleanup task always go to the same node that the task just > failed. > There is a higher chance that it hits a bad node. This should be changed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liyin Liang updated MAPREDUCE-2209: --- Description: If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang for several minutes when localizing the job. The jstack of related threads are as follows: {code:borderStyle=solid} "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf runnable [0x42e56000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x002afc892ec8> (a sun.nio.ch.Util$1) - locked <0x002afc892eb0> (a java.util.Collections$UnmodifiableSet) - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0x002afce26158> (a java.io.BufferedInputStream) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) - locked <0x002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) - locked <0x002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) - locked <0x002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) "Map-events fetcher for all reduce tasks on tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 tid=0x002b05ef8000 nid=0x1ada waiting for monitor entry [0x42d55000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) - waiting to lock <0x002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) - locked <0x002a9eefe1f8> (a java.util.TreeMap) "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 nid=0x1ab0 waiting for monitor entry [0x4234b000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684) - waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap) - locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) "main" prio=10 tid=0x40113800 nid=0x197d waiting for monitor entry [0x4022a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1196) - waiting to lock <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1068) at org.apache.hadoop.
[jira] Commented: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
[ https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969238#action_12969238 ] Liyin Liang commented on MAPREDUCE-2209: I setup a cluster with the latest version 0.21.0. To simulate the large job.jar problem, let TaskLauncher thread sleep 100 seconds just before download job.jar in localizeJobJarFile function. Then the heartbeat of some TT will hang for almost 100 seconds. Basically, the jstack is the same with 0.19: {code:borderStyle=solid} "TaskLauncher for MAP tasks" daemon prio=10 tid=0x2aab3145a800 nid=0x3fe8 waiting on condition [0x440b3000..0x440b3a10] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.mapred.TaskTracker.localizeJobJarFile(TaskTracker.java:1150) at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1074) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977) - locked <0x2aaab3a86f10> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2248) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2213) "Map-events fetcher for all reduce tasks on tracker_hd2:localhost.localdomain/127.0.0.1:36128" daemon prio=10 tid=0x2aab 31451c00 nid=0x3fde waiting for monitor entry [0x41a4..0x41a40d90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:800) - waiting to lock <0x2aaab3a86f10> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:834) - locked <0x2aaab38ee1b8> (a java.util.TreeMap) "IPC Server handler 0 on 36128" daemon prio=10 tid=0x4368ac00 nid=0x3fc8 waiting for monitor entry [0x425f6000..0x425 f7c90] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:3254) - waiting to lock <0x2aaab38ee1b8> (a java.util.TreeMap) - locked <0x2aaab37f1708> (a org.apache.hadoop.mapred.TaskTracker) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344) "main" prio=10 tid=0x42fff400 nid=0x3f91 waiting for monitor entry [0x41ef..0x41ef0ed0] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1535) - waiting to lock <0x2aaab37f1708> (a org.apache.hadoop.mapred.TaskTracker) at org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1433) at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2330) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3462) {code} lock order of relative threads: TaskLauncher(localizeJobJarFile): locked RunningJob Map-events fetcher:locked runningJobs waiting to lock RunningJob IPC Server handler(getMapCompletionEvents): locked TaskTracker waiting to lock runningJobs main(transmitHeartBeat): waiting to lock TaskTracker So, TaskTracker is locked indirectly when downloading job.jar. > TaskTracker's heartbeat hang for several minutes when copying large job.jar > from HDFS > - > > Key: MAPREDUCE-2209 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 > Project: Hadoop Map/Reduce > Issue Type: Bug > Environment: hadoop version: 0.19.1 >Reporter: Liyin Liang >Priority: Blocker > > If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat > hang for several minutes when localizing the job. The jstack
[jira] Created: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS
TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS - Key: MAPREDUCE-2209 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209 Project: Hadoop Map/Reduce Issue Type: Bug Environment: hadoop version: 0.19.1 Reporter: Liyin Liang Priority: Blocker If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang for several minutes when localizing the job. The jstack of related threads are as follows: "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf runnable [0x42e56000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x002afc892ec8> (a sun.nio.ch.Util$1) - locked <0x002afc892eb0> (a java.util.Collections$UnmodifiableSet) - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked <0x002afce26158> (a java.io.BufferedInputStream) at java.io.DataInputStream.readShort(DataInputStream.java:295) at org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556) - locked <0x002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673) - locked <0x002afce26218> (a org.apache.hadoop.hdfs.DFSClient$DFSInputStream) at java.io.DataInputStream.read(DataInputStream.java:83) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214) at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824) - locked <0x002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745) at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103) at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710) "Map-events fetcher for all reduce tasks on tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 tid=0x002b05ef8000 nid=0x1ada waiting for monitor entry [0x42d55000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582) - waiting to lock <0x002afce2d260> (a org.apache.hadoop.mapred.TaskTracker$RunningJob) at org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617) - locked <0x002a9eefe1f8> (a java.util.TreeMap) "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 nid=0x1ab0 waiting for monitor entry [0x4234b000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684) - waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap) - locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker) at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) "main" prio=10 tid=0x40113800 nid=0x197d waiting for monitor entry [0x4022a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracke
[jira] Created: (MAPREDUCE-2168) We should implement limits on shuffle connections to TaskTracker per job
We should implement limits on shuffle connections to TaskTracker per job - Key: MAPREDUCE-2168 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2168 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Liyin Liang As trailing map tasks will be attacked by all reduces simultaneously, all the worker threads that for the http server of a TaskTracker may be occupied by one job's reduce tasks to fetch map outputs. Then this tasktracker's iowait and load will be very high (100+ in our cluster, we set tasktracker.http.threads with 100). What's more, other job's reduces have to wait some time (may be several minutes) to connect to the TaskTracker to fetch there map's outputs. So I think we should implement limits on shuffle connections: 1. limit the worker threads' number maybe percent occupied the same job's reduces ; 2. limit the worker threads' number serving the same map output simultaneously. Thoughts? ps: we are using hadoop 0.19. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
[ https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920079#action_12920079 ] Liyin Liang commented on MAPREDUCE-1943: your latest patch is based on your previous patch, why? > Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes > > > Key: MAPREDUCE-1943 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar > Fix For: 0.22.0 > > Attachments: MAPREDUCE-1943-0.20-yahoo.patch, > MAPREDUCE-1943-0.20-yahoo.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, > MAPREDUCE-1943-yahoo-hadoop-0.20S.patch > > > We have come across issues in production clusters wherein users abuse > counters, statusreport messages and split sizes. One such case was when one > of the users had 100 million counters. This leads to jobtracker going out of > memory and being unresponsive. In this jira I am proposing to put sane limits > on the status report length, the number of counters and the size of block > locations returned by the input split. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
[ https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904607#action_12904607 ] Liyin Liang commented on MAPREDUCE-1533: The patch has a small problem. I think line257-259 of Counters.java should be: for(String subcounter: subcountersArray) { builder.append(subcounter); } instead of: for(Counter counter: subcounters.values()) { builder.append(counter.makeEscapedCompactString()); } > Reduce or remove usage of String.format() usage in > CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString() > -- > > Key: MAPREDUCE-1533 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobtracker >Affects Versions: 0.20.1 >Reporter: Rajesh Balamohan >Assignee: Dick King > Fix For: 0.22.0 > > Attachments: mapreduce-1533--2010-05-10a.patch, > mapreduce-1533--2010-05-21.patch, mapreduce-1533--2010-05-21a.patch, > mapreduce-1533--2010-05-24.patch, MAPREDUCE-1533-and-others-20100413.1.txt, > MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, > mapreduce-1533-v1.8.patch > > > When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT > executes heartBeat() method heavily. This internally makes a call to > CapacityTaskScheduler.updateQSIObjects(). > CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() > for setting the job scheduling information. Based on the datastructure size > of "jobQueuesManager" and "queueInfoMap", the number of times String.format() > gets executed becomes very high. String.format() internally does pattern > matching which turns to be out very heavy (This was revealed while profiling > JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of > which String.format() took 46%. > Would it be possible to do String.format() only at the time of invoking > JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while > processing heartbeats. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker
[ https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904091#action_12904091 ] Liyin Liang commented on MAPREDUCE-1247: Hi Guanyin, our product cluster met the same problem. Would you please attach your patch file? tks. > Send out-of-band heartbeat to avoid fake lost tasktracker > - > > Key: MAPREDUCE-1247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: ZhuGuanyin >Assignee: ZhuGuanyin > > Currently the TaskTracker report task status to jobtracker through heartbeat, > sometimes if the tasktracker lock the tasktracker to do some cleanup job, > like remove task temp data on disk, the heartbeat thread would hang for a > long time while waiting for the lock, so the jobtracker just thought it had > lost and would reschedule all its finished maps or un finished reduce on > other tasktrackers, we call it "fake lost tasktracker", some times it doesn't > acceptable especially when we run some large jobs. So We introduce a > out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.