[jira] [Commented] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice

2014-04-13 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968014#comment-13968014
 ] 

Liyin Liang commented on MAPREDUCE-5775:


The patch looks good to me. +1

> SleepJob.createJob setNumReduceTasks twice
> --
>
> Key: MAPREDUCE-5775
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Liyin Liang
>Assignee: jhanver chand sharma
>Priority: Minor
> Attachments: MAPREDUCE-5775.patch
>
>
> The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice

2014-04-13 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5775:
---

Attachment: (was: MAPREDUCE-5775.diff)

> SleepJob.createJob setNumReduceTasks twice
> --
>
> Key: MAPREDUCE-5775
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Liyin Liang
>Assignee: jhanver chand sharma
>Priority: Minor
> Attachments: MAPREDUCE-5775.patch
>
>
> The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2014-03-26 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang reassigned MAPREDUCE-5799:
--

Assignee: Liyin Liang

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
>Priority: Minor
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2014-03-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5799:
---

Status: Patch Available  (was: Open)

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Priority: Minor
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2014-03-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5799:
---

Attachment: MAPREDUCE-5799.diff

Although we can add 
{code}
  
  yarn.app.mapreduce.am.env
  LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native
  
{code}
to client's yarn-site.xml to pass this job, it's better to set a default value 
to MR_AM_ADMIN_USER_ENV to avoid this problem.
Attach a patch to add DEFAULT_MR_AM_ADMIN_USER_ENV.

> add default value of MR_AM_ADMIN_USER_ENV
> -
>
> Key: MAPREDUCE-5799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Liyin Liang
>Priority: Minor
> Attachments: MAPREDUCE-5799.diff
>
>
> Submit a 1 map + 1 reduce sleep job with the following config:
> {code}
>   
>   mapreduce.map.output.compress
>   true
>   
>   
>   mapreduce.map.output.compress.codec
>   org.apache.hadoop.io.compress.SnappyCodec
>   
> 
>   mapreduce.job.ubertask.enable
>   true
> 
> {code}
> And the LinuxContainerExecutor is enable on NodeManager.
> This job will fail with the following error:
> {code}
> 2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
> org.apache.hadoop.mapred.LocalContainerLauncher: Error running local 
> (uberized) 'child' : java.lang.UnsatisfiedLinkError: 
> org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
> at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
> Method)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
> at 
> org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
> at 
> org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
> at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
> at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
> at 
> org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
> at java.lang.Thread.run(Thread.java:662)
> {code}
> When create a ContainerLaunchContext for task in 
> TaskAttemptImpl.createCommonContainerLaunchContext(), the 
> DEFAULT_MAPRED_ADMIN_USER_ENV which is 
> "LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
> Where when create a ContainerLaunchContext for mrappmaster in 
> YARNRunner.createApplicationSubmissionContext(), there is no default 
> environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5799) add default value of MR_AM_ADMIN_USER_ENV

2014-03-18 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5799:
--

 Summary: add default value of MR_AM_ADMIN_USER_ENV
 Key: MAPREDUCE-5799
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5799
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Liyin Liang
Priority: Minor


Submit a 1 map + 1 reduce sleep job with the following config:
{code}
  
  mapreduce.map.output.compress
  true
  
  
  mapreduce.map.output.compress.codec
  org.apache.hadoop.io.compress.SnappyCodec
  

  mapreduce.job.ubertask.enable
  true

{code}
And the LinuxContainerExecutor is enable on NodeManager.
This job will fail with the following error:
{code}
2014-03-18 21:28:20,153 FATAL [uber-SubtaskRunner] 
org.apache.hadoop.mapred.LocalContainerLauncher: Error running local (uberized) 
'child' : java.lang.UnsatisfiedLinkError: 
org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native 
Method)
at 
org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63)
at 
org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148)
at 
org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:115)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583)
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1990)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:774)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.runSubtask(LocalContainerLauncher.java:317)
at 
org.apache.hadoop.mapred.LocalContainerLauncher$SubtaskRunner.run(LocalContainerLauncher.java:232)
at java.lang.Thread.run(Thread.java:662)
{code}

When create a ContainerLaunchContext for task in 
TaskAttemptImpl.createCommonContainerLaunchContext(), the 
DEFAULT_MAPRED_ADMIN_USER_ENV which is 
"LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native" is added to the environment. 
Where when create a ContainerLaunchContext for mrappmaster in 
YARNRunner.createApplicationSubmissionContext(), there is no default 
environment. So the ubermode job fails to find native lib.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice

2014-03-02 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5775:
---

 Assignee: Liyin Liang
Affects Version/s: trunk
   Status: Patch Available  (was: Open)

> SleepJob.createJob setNumReduceTasks twice
> --
>
> Key: MAPREDUCE-5775
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: trunk
>Reporter: Liyin Liang
>Assignee: Liyin Liang
>Priority: Minor
> Attachments: MAPREDUCE-5775.diff
>
>
> The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice

2014-03-02 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5775:
---

Attachment: MAPREDUCE-5775.diff

Attach a patch to remove one job.setNumReduceTasks(numReducer) from each 
SleepJob.java.

> SleepJob.createJob setNumReduceTasks twice
> --
>
> Key: MAPREDUCE-5775
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>Priority: Minor
> Attachments: MAPREDUCE-5775.diff
>
>
> The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, 
> which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5775) SleepJob.createJob setNumReduceTasks twice

2014-03-02 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5775:
--

 Summary: SleepJob.createJob setNumReduceTasks twice
 Key: MAPREDUCE-5775
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5775
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Liyin Liang
Priority: Minor


The two SleepJob's createJob() call job.setNumReduceTasks(numReducer) twice, 
which is unnecessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5487) In task processes, JobConf is unnecessarily loaded again in Limits

2014-02-18 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903999#comment-13903999
 ] 

Liyin Liang commented on MAPREDUCE-5487:


The following line is not necessary any more. It should be deleted.
{code}
static final Configuration conf = new JobConf();
{code}

> In task processes, JobConf is unnecessarily loaded again in Limits
> --
>
> Key: MAPREDUCE-5487
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5487
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance, task
>Affects Versions: 2.1.0-beta
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 2.3.0
>
> Attachments: MAPREDUCE-5487-1.patch, MAPREDUCE-5487.patch
>
>
> Limits statically loads a JobConf, which incurs costs of reading files from 
> disk and parsing XML.  The contents of this JobConf are identical to the one 
> loaded by YarnChild (before adding job.xml as a resource).  Allowing Limits 
> to initialize with the JobConf loaded in YarnChild would reduce task startup 
> time.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization

2013-12-29 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13858531#comment-13858531
 ] 

Liyin Liang commented on MAPREDUCE-5691:


[~sandyr] as a long-term work, limiting network IO using cgroups is a better 
way to solve this problem. 
[~jira.shegalov] Our cluster users run thousands of jobs every day. Its 
difficult for them to set parameters for specific job.

> Throttle shuffle's bandwidth utilization
> 
>
> Key: MAPREDUCE-5691
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
> Attachments: ganglia-slave.jpg
>
>
> In our hadoop cluster, a reducer of a big job can utilize all the bandwidth 
> during shuffle phase. Then any task reading data from  the machine which 
> running that reducer becomes very very slow.
> It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. 
> And create a throttler for Shuffle to throttle each Fetcher.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization

2013-12-22 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5691:
---

Attachment: ganglia-slave.jpg

Attach a ganglia network metics picture. The reducer utilizes all the input 
bandwidth. So the throttling should go on the reducer side.

> Throttle shuffle's bandwidth utilization
> 
>
> Key: MAPREDUCE-5691
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
> Attachments: ganglia-slave.jpg
>
>
> In our hadoop cluster, a reducer of a big job can utilize all the bandwidth 
> during shuffle phase. Then any task reading data from  the machine which 
> running that reducer becomes very very slow.
> It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. 
> And create a throttler for Shuffle to throttle each Fetcher.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (MAPREDUCE-5691) Throttle shuffle's bandwidth utilization

2013-12-18 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5691:
--

 Summary: Throttle shuffle's bandwidth utilization
 Key: MAPREDUCE-5691
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5691
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Liyin Liang


In our hadoop cluster, a reducer of a big job can utilize all the bandwidth 
during shuffle phase. Then any task reading data from  the machine which 
running that reducer becomes very very slow.
It's better to move DataTransferThrottler from hadoop-hdfs to hadoop-common. 
And create a throttler for Shuffle to throttle each Fetcher.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5684:
---

Attachment: MAPREDUCE-5684-1.diff

This patch changes the assert to accept both TIPFAILED and FAILED status. 
verifyFailingMapperCounters(job) only if status is TIPFAILED.

> TestMRJobs.testFailingMapper occasionally fails
> ---
>
> Key: MAPREDUCE-5684
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
> Attachments: MAPREDUCE-5684-1.diff
>
>
> TestMRJobs is occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs
> ---
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec 
> <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
> testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time elapsed: 
> 15.657 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:147)
> at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Assigned] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang reassigned MAPREDUCE-5690:
--

Assignee: Liyin Liang

> TestLocalMRNotification.testMR occasionally fails
> -
>
> Key: MAPREDUCE-5690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5690.1.diff
>
>
> TestLocalMRNotificationis occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestLocalMRNotification
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
> testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
> 24.881 sec  <<< ERROR!
> java.io.IOException: Job cleanup didn't start in 20 seconds
> at 
> org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
> at 
> org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5690:
---

Status: Patch Available  (was: Open)

> TestLocalMRNotification.testMR occasionally fails
> -
>
> Key: MAPREDUCE-5690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5690.1.diff
>
>
> TestLocalMRNotificationis occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestLocalMRNotification
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
> testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
> 24.881 sec  <<< ERROR!
> java.io.IOException: Job cleanup didn't start in 20 seconds
> at 
> org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
> at 
> org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5690:
---

Attachment: MAPREDUCE-5690.1.diff

This patch adds waiting job's map progress before job.killJob().

> TestLocalMRNotification.testMR occasionally fails
> -
>
> Key: MAPREDUCE-5690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
> Attachments: MAPREDUCE-5690.1.diff
>
>
> TestLocalMRNotificationis occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestLocalMRNotification
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
> testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
> 24.881 sec  <<< ERROR!
> java.io.IOException: Job cleanup didn't start in 20 seconds
> at 
> org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
> at 
> org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13852529#comment-13852529
 ] 

Liyin Liang commented on MAPREDUCE-5690:


The failure of TestLocalMRNotification.testMR is caused by 
UtilsForTests.runJobKill(). During UtilsForTests.runJobKill(), a job with 
KillMapper is submitted to LocalJobRunner. When the job is in RUNNING status, 
kill it by job.killJob(). Then wait the job to complete with 20 seconds timeout.
The problem is job.killJob() intends to interrupt the KillMapper, which will 
sleep for a long time. While if  job.killJob() is invoked before KillMapper is 
launched, the job will continue run the mapper with a long time.


> TestLocalMRNotification.testMR occasionally fails
> -
>
> Key: MAPREDUCE-5690
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>
> TestLocalMRNotificationis occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestLocalMRNotification
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
> testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
> 24.881 sec  <<< ERROR!
> java.io.IOException: Job cleanup didn't start in 20 seconds
> at 
> org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
> at 
> org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at junit.framework.TestCase.runTest(TestCase.java:168)
> at junit.framework.TestCase.runBare(TestCase.java:134)
> at junit.framework.TestResult$1.protect(TestResult.java:110)
> at junit.framework.TestResult.runProtected(TestResult.java:128)
> at junit.framework.TestResult.run(TestResult.java:113)
> at junit.framework.TestCase.run(TestCase.java:124)
> at junit.framework.TestSuite.runTest(TestSuite.java:243)
> at junit.framework.TestSuite.run(TestSuite.java:238)
> at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2013-12-18 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5690:
--

 Summary: TestLocalMRNotification.testMR occasionally fails
 Key: MAPREDUCE-5690
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang


TestLocalMRNotificationis occasionally failing with the error:
{code}
---
Test set: org.apache.hadoop.mapred.TestLocalMRNotification
---
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec <<< 
FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 24.881 
sec  <<< ERROR!
java.io.IOException: Job cleanup didn't start in 20 seconds
at 
org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
at 
org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at junit.framework.TestCase.runTest(TestCase.java:168)
at junit.framework.TestCase.runBare(TestCase.java:134)
at junit.framework.TestResult$1.protect(TestResult.java:110)
at junit.framework.TestResult.runProtected(TestResult.java:128)
at junit.framework.TestResult.run(TestResult.java:113)
at junit.framework.TestCase.run(TestCase.java:124)
at junit.framework.TestSuite.runTest(TestSuite.java:243)
at junit.framework.TestSuite.run(TestSuite.java:238)
at 
org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails

2013-12-16 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848985#comment-13848985
 ] 

Liyin Liang commented on MAPREDUCE-5684:


The failure happens when job.getTaskCompletionEvents(0, 2) is redirected to 
history server.  In the .jhist file all the failed attempts' status is FAILED.

> TestMRJobs.testFailingMapper occasionally fails
> ---
>
> Key: MAPREDUCE-5684
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>
> TestMRJobs is occasionally failing with the error:
> {code}
> ---
> Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs
> ---
> Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec 
> <<< FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
> testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time elapsed: 
> 15.657 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.failNotEquals(Assert.java:647)
> at org.junit.Assert.assertEquals(Assert.java:128)
> at org.junit.Assert.assertEquals(Assert.java:147)
> at 
> org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAPREDUCE-5684) TestMRJobs.testFailingMapper occasionally fails

2013-12-16 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5684:
--

 Summary: TestMRJobs.testFailingMapper occasionally fails
 Key: MAPREDUCE-5684
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5684
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang


TestMRJobs is occasionally failing with the error:
{code}
---
Test set: org.apache.hadoop.mapreduce.v2.TestMRJobs
---
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 323.503 sec <<< 
FAILURE! - in org.apache.hadoop.mapreduce.v2.TestMRJobs
testFailingMapper(org.apache.hadoop.mapreduce.v2.TestMRJobs)  Time elapsed: 
15.657 sec  <<< FAILURE!
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:147)
at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testFailingMapper(TestMRJobs.java:313)
{code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name

2013-12-14 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5614:
---

Target Version/s:   (was: 2.3.0)

> job history file name should escape queue name
> --
>
> Key: MAPREDUCE-5614
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: mr-5614-2.diff, mr-5614.diff
>
>
> Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is 
> the delimiter of job history file name, JobHistoryServer shows "cug" as the 
> queue name. To fix this problem, we should escape queuename in job history 
> file name.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.

2013-12-12 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13847119#comment-13847119
 ] 

Liyin Liang commented on MAPREDUCE-5623:


Nice patch!

> TestJobCleanup fails because of RejectedExecutionException and NPE.
> ---
>
> Key: MAPREDUCE-5623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Assignee: Jason Lowe
> Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch, 
> MAPREDUCE-5623.3.patch
>
>
> org.apache.hadoop.mapred.TestJobCleanup can fail because of 
> RejectedExecutionException by NonAggregatingLogHandler. This problem is 
> described in YARN-1409. TestJobCleanup can still fail after fixing 
> RejectedExecutionException, because of NPE by Job#getCounters()'s returning 
> null.
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestJobCleanup
> ---
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup
> testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup)  Time elapsed: 
> 31.068 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199)
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition

2013-12-12 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5679:
---

Attachment: MAPREDUCE-5679-3.diff

Incorporate Jason Lowe 's comment. Thanks for the review.

> TestJobHistoryParsing has race condition
> 
>
> Key: MAPREDUCE-5679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5679-2.diff, MAPREDUCE-5679-3.diff, 
> MAPREDUCE-5679.diff
>
>
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of 
> race condition.
> {noformat}
> testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)
>   Time elapsed: 4.102 sec  <<< ERROR!
> java.io.IOException: Unable to initialize History Viewer
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125)
> {noformat}
> In the checkHistoryParsing() function, after 
> {code}
> HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId);
> {code}
> a thread named MoveIntermediateToDone will be launched to move history file 
> from done_intermediate to done directory.
> If the history file is moved, 
> {code}
>   HistoryViewer viewer = new HistoryViewer(fc.makeQualified(
>   fileInfo.getHistoryFile()).toString(), conf, true);
> {code}
> will throw IOException,because the history file is not found.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition

2013-12-11 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5679:
---

Attachment: MAPREDUCE-5679-2.diff

testHistoryParsingForFailedAttempts() and testCountersForFailedTask() have the 
same race conditions.

> TestJobHistoryParsing has race condition
> 
>
> Key: MAPREDUCE-5679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5679-2.diff, MAPREDUCE-5679.diff
>
>
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of 
> race condition.
> {noformat}
> testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)
>   Time elapsed: 4.102 sec  <<< ERROR!
> java.io.IOException: Unable to initialize History Viewer
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125)
> {noformat}
> In the checkHistoryParsing() function, after 
> {code}
> HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId);
> {code}
> a thread named MoveIntermediateToDone will be launched to move history file 
> from done_intermediate to done directory.
> If the history file is moved, 
> {code}
>   HistoryViewer viewer = new HistoryViewer(fc.makeQualified(
>   fileInfo.getHistoryFile()).toString(), conf, true);
> {code}
> will throw IOException,because the history file is not found.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition

2013-12-11 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5679:
---

Assignee: Liyin Liang
Target Version/s: 2.4.0
  Status: Patch Available  (was: Open)

> TestJobHistoryParsing has race condition
> 
>
> Key: MAPREDUCE-5679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: MAPREDUCE-5679.diff
>
>
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of 
> race condition.
> {noformat}
> testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)
>   Time elapsed: 4.102 sec  <<< ERROR!
> java.io.IOException: Unable to initialize History Viewer
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125)
> {noformat}
> In the checkHistoryParsing() function, after 
> {code}
> HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId);
> {code}
> a thread named MoveIntermediateToDone will be launched to move history file 
> from done_intermediate to done directory.
> If the history file is moved, 
> {code}
>   HistoryViewer viewer = new HistoryViewer(fc.makeQualified(
>   fileInfo.getHistoryFile()).toString(), conf, true);
> {code}
> will throw IOException,because the history file is not found.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5679) TestJobHistoryParsing has race condition

2013-12-11 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5679:
---

Attachment: MAPREDUCE-5679.diff

This patch  encapsulates "test output for HistoryViewer" with lock of fileInfo 
to avoid race condition.

> TestJobHistoryParsing has race condition
> 
>
> Key: MAPREDUCE-5679
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
> Attachments: MAPREDUCE-5679.diff
>
>
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of 
> race condition.
> {noformat}
> testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)
>   Time elapsed: 4.102 sec  <<< ERROR!
> java.io.IOException: Unable to initialize History Viewer
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
> at 
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
> at 
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798)
> at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86)
> at 
> org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339)
> at 
> org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125)
> {noformat}
> In the checkHistoryParsing() function, after 
> {code}
> HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId);
> {code}
> a thread named MoveIntermediateToDone will be launched to move history file 
> from done_intermediate to done directory.
> If the history file is moved, 
> {code}
>   HistoryViewer viewer = new HistoryViewer(fc.makeQualified(
>   fileInfo.getHistoryFile()).toString(), conf, true);
> {code}
> will throw IOException,because the history file is not found.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Created] (MAPREDUCE-5679) TestJobHistoryParsing has race condition

2013-12-11 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5679:
--

 Summary: TestJobHistoryParsing has race condition
 Key: MAPREDUCE-5679
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5679
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang


org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing can fail because of 
race condition.
{noformat}
testHistoryParsingWithParseErrors(org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing)
  Time elapsed: 4.102 sec  <<< ERROR!
java.io.IOException: Unable to initialize History Viewer
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:520)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:398)
at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:137)
at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:339)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:798)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryParser.(JobHistoryParser.java:86)
at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.(HistoryViewer.java:85)
at 
org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.checkHistoryParsing(TestJobHistoryParsing.java:339)
at 
org.apache.hadoop.mapreduce.v2.hs.TestJobHistoryParsing.testHistoryParsingWithParseErrors(TestJobHistoryParsing.java:125)
{noformat}

In the checkHistoryParsing() function, after 
{code}
HistoryFileInfo fileInfo = jobHistory.getJobFileInfo(jobId);
{code}
a thread named MoveIntermediateToDone will be launched to move history file 
from done_intermediate to done directory.
If the history file is moved, 
{code}
  HistoryViewer viewer = new HistoryViewer(fc.makeQualified(
  fileInfo.getHistoryFile()).toString(), conf, true);
{code}
will throw IOException,because the history file is not found.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.

2013-12-11 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845957#comment-13845957
 ] 

Liyin Liang commented on MAPREDUCE-5623:


Hi Jason Lowe,
The patch is nice to me. I think testKilledJob() has the same problem with 
testFailedJob().  So it's better to fix both them.

> TestJobCleanup fails because of RejectedExecutionException and NPE.
> ---
>
> Key: MAPREDUCE-5623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Assignee: Jason Lowe
> Attachments: MAPREDUCE-5623.1.patch, MAPREDUCE-5623.2.patch
>
>
> org.apache.hadoop.mapred.TestJobCleanup can fail because of 
> RejectedExecutionException by NonAggregatingLogHandler. This problem is 
> described in YARN-1409. TestJobCleanup can still fail after fixing 
> RejectedExecutionException, because of NPE by Job#getCounters()'s returning 
> null.
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestJobCleanup
> ---
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup
> testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup)  Time elapsed: 
> 31.068 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199)
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (MAPREDUCE-5623) TestJobCleanup fails because of RejectedExecutionException and NPE.

2013-12-10 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845028#comment-13845028
 ] 

Liyin Liang commented on MAPREDUCE-5623:


Hi Jason Lowe,
If the client was redirected to the history server, job.getCounters() will 
return null. Because the .jhist file of a failed job does't contain job level 
counters.

> TestJobCleanup fails because of RejectedExecutionException and NPE.
> ---
>
> Key: MAPREDUCE-5623
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5623
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Tsuyoshi OZAWA
>Assignee: Tsuyoshi OZAWA
> Attachments: MAPREDUCE-5623.1.patch
>
>
> org.apache.hadoop.mapred.TestJobCleanup can fail because of 
> RejectedExecutionException by NonAggregatingLogHandler. This problem is 
> described in YARN-1409. TestJobCleanup can still fail after fixing 
> RejectedExecutionException, because of NPE by Job#getCounters()'s returning 
> null.
> {code}
> ---
> Test set: org.apache.hadoop.mapred.TestJobCleanup
> ---
> Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 140.933 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestJobCleanup
> testCustomAbort(org.apache.hadoop.mapred.TestJobCleanup)  Time elapsed: 
> 31.068 sec  <<< ERROR!
> java.lang.NullPointerException: null
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testFailedJob(TestJobCleanup.java:199)
> at 
> org.apache.hadoop.mapred.TestJobCleanup.testCustomAbort(TestJobCleanup.java:296)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name

2013-11-10 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5614:
---

Attachment: mr-5614-2.diff

Update patch to incorporate Zhijie's comment.

> job history file name should escape queue name
> --
>
> Key: MAPREDUCE-5614
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: mr-5614-2.diff, mr-5614.diff
>
>
> Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is 
> the delimiter of job history file name, JobHistoryServer shows "cug" as the 
> queue name. To fix this problem, we should escape queuename in job history 
> file name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name

2013-11-07 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5614:
---

Attachment: mr-5614.diff

attach a patch to escape queue name.

> job history file name should escape queue name
> --
>
> Key: MAPREDUCE-5614
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: mr-5614.diff
>
>
> Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is 
> the delimiter of job history file name, JobHistoryServer shows "cug" as the 
> queue name. To fix this problem, we should escape queuename in job history 
> file name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-5614) job history file name should escape queue name

2013-11-07 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-5614:
---

Status: Patch Available  (was: Open)

> job history file name should escape queue name
> --
>
> Key: MAPREDUCE-5614
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: mr-5614.diff
>
>
> Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is 
> the delimiter of job history file name, JobHistoryServer shows "cug" as the 
> queue name. To fix this problem, we should escape queuename in job history 
> file name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (MAPREDUCE-5614) job history file name should escape queue name

2013-11-07 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-5614:
--

 Summary: job history file name should escape queue name
 Key: MAPREDUCE-5614
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5614
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang
Assignee: Liyin Liang


Our cluster's queue name contains hyphen e.g. cug-taobao. Because hyphen is the 
delimiter of job history file name, JobHistoryServer shows "cug" as the queue 
name. To fix this problem, we should escape queuename in job history file name.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-11-04 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Status: Patch Available  (was: Open)

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-03-15 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Fix Version/s: 1.2.0

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Fix For: 1.2.0
>
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-03-15 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Affects Version/s: (was: 1.1.1)
   1.1.2

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.2
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-02-05 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13572109#comment-13572109
 ] 

Liyin Liang commented on MAPREDUCE-4978:


The attached patch add a new method updateJobWithSplit() only for new-api job. 
This patch also fixed MAPREDUCE-1743.

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-02-05 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Affects Version/s: 1.1.1

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-02-05 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Attachment: 4978-1.diff

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 1.1.1
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 4978-1.diff
>
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-02-05 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4978:
---

Fix Version/s: (was: 1.2.0)

> Add a updateJobWithSplit() method for new-api job
> -
>
> Key: MAPREDUCE-4978
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Liyin Liang
>Assignee: Liyin Liang
>
> HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
> job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2013-02-05 Thread Liyin Liang (JIRA)
Liyin Liang created MAPREDUCE-4978:
--

 Summary: Add a updateJobWithSplit() method for new-api job
 Key: MAPREDUCE-4978
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Liyin Liang
Assignee: Liyin Liang
 Fix For: 1.2.0


HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
job. It's better to add another method for new-api job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2013-01-11 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang reassigned MAPREDUCE-1743:
--

Assignee: Liyin Liang

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 
> 0.20
> 
>
> Key: MAPREDUCE-1743
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Yuanyuan Tian
>Assignee: Liyin Liang
> Attachments: mr-1743.diff
>
>
> There is a problem in getting the input file name in the mapper when uisng 
> MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support 
> different formats for my inputs to the my MapReduce job. And inside each 
> mapper, I also need to know the exact input file that the mapper is 
> processing. However, conf.get("map.input.file") returns null. Can anybody 
> help me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
>   static class InnerMapper extends MapReduceBase implements 
> Mapper
>   {
>   
>   
>   public void configure(JobConf conf)
>   {   
>   String inputName=conf.get("map.input.file"));
>   ...
>   }
>   
>   }
>   
>   public int run(String[] arg0) throws Exception {
>   JonConf job;
>   job = new JobConf(Test.class);
>   ...
>   
>   MultipleInputs.addInputPath(conf, new Path("A"), 
> TextInputFormat.class);
>   MultipleInputs.addInputPath(conf, new Path("B"), 
> SequenceFileFormat.class);
>   ...
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2013-01-11 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-1743:
---

Attachment: mr-1743.diff

Attach a patch based on branch-1.1 with Jim's solution.This patch works well in 
our production cluster.

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 
> 0.20
> 
>
> Key: MAPREDUCE-1743
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Yuanyuan Tian
> Attachments: mr-1743.diff
>
>
> There is a problem in getting the input file name in the mapper when uisng 
> MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support 
> different formats for my inputs to the my MapReduce job. And inside each 
> mapper, I also need to know the exact input file that the mapper is 
> processing. However, conf.get("map.input.file") returns null. Can anybody 
> help me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
>   static class InnerMapper extends MapReduceBase implements 
> Mapper
>   {
>   
>   
>   public void configure(JobConf conf)
>   {   
>   String inputName=conf.get("map.input.file"));
>   ...
>   }
>   
>   }
>   
>   public int run(String[] arg0) throws Exception {
>   JonConf job;
>   job = new JobConf(Test.class);
>   ...
>   
>   MultipleInputs.addInputPath(conf, new Path("A"), 
> TextInputFormat.class);
>   MultipleInputs.addInputPath(conf, new Path("B"), 
> SequenceFileFormat.class);
>   ...
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-1743) conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

2012-07-29 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13424704#comment-13424704
 ] 

Liyin Liang commented on MAPREDUCE-1743:


Jim's solution is nice.

> conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 
> 0.20
> 
>
> Key: MAPREDUCE-1743
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1743
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.2
>Reporter: Yuanyuan Tian
>
> There is a problem in getting the input file name in the mapper when uisng 
> MultipleInputs in Hadoop 0.20. I need to use MultipleInputs to support 
> different formats for my inputs to the my MapReduce job. And inside each 
> mapper, I also need to know the exact input file that the mapper is 
> processing. However, conf.get("map.input.file") returns null. Can anybody 
> help me solve this problem? Thanks in advance.
> public class Test extends Configured implements Tool{
>   static class InnerMapper extends MapReduceBase implements 
> Mapper
>   {
>   
>   
>   public void configure(JobConf conf)
>   {   
>   String inputName=conf.get("map.input.file"));
>   ...
>   }
>   
>   }
>   
>   public int run(String[] arg0) throws Exception {
>   JonConf job;
>   job = new JobConf(Test.class);
>   ...
>   
>   MultipleInputs.addInputPath(conf, new Path("A"), 
> TextInputFormat.class);
>   MultipleInputs.addInputPath(conf, new Path("B"), 
> SequenceFileFormat.class);
>   ...
>   }
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control

2012-07-25 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-4478:
---

Attachment: 4478.diff

Attach a patch to fix this bug. I don't know whether the synchronized is 
necessary.

> TaskTracker's heartbeat is out of control
> -
>
> Key: MAPREDUCE-4478
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3
>Reporter: Liyin Liang
> Attachments: 4478.diff
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control

2012-07-24 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13422022#comment-13422022
 ] 

Liyin Liang commented on MAPREDUCE-4478:


There are two configuration items to control the TaskTracker's heartbeat 
interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is 
*mapreduce.tasktracker.outofband.heartbeat.damper*. If we set 
*mapreduce.tasktracker.outofband.heartbeat* with true and set 
*mapreduce.tasktracker.outofband.heartbeat.damper* with default value 
(100), TaskTracker may send heartbeat without any interval.

The code to control heartbeat interval is as follows:
{code:java}
long now = System.currentTimeMillis();

// accelerate to account for multiple finished tasks up-front
long remaining = 
  (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;
while (remaining > 0) {
  // sleeps for the wait time or 
  // until there are *enough* empty slots to schedule tasks
  synchronized (finishedCount) {
finishedCount.wait(remaining);

// Recompute
now = System.currentTimeMillis();
remaining = 
  (lastHeartbeat + getHeartbeatInterval(finishedCount.get())) - now;

if (remaining <= 0) {
  // Reset count 
  finishedCount.set(0);
  break;
}
  }
}
{code}

During the first time computing, if *finishedCount* is more than zero, 
*getHeartbeatInterval(finishedCount.get())* will return zero. Then *remaining* 
will be less than or equal with zero. In this case, the *while* loop will be 
skipped. So *finishedCount* will never be set with zero.


> TaskTracker's heartbeat is out of control
> -
>
> Key: MAPREDUCE-4478
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.0.3
>Reporter: Liyin Liang
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Moved] (MAPREDUCE-4478) TaskTracker's heartbeat is out of control

2012-07-24 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang moved HDFS-3722 to MAPREDUCE-4478:
--

Affects Version/s: (was: 1.0.3)
   (was: 1.0.2)
   (was: 1.0.1)
   (was: 1.0.0)
   1.0.0
   1.0.1
   1.0.2
   1.0.3
  Key: MAPREDUCE-4478  (was: HDFS-3722)
  Project: Hadoop Map/Reduce  (was: Hadoop HDFS)

> TaskTracker's heartbeat is out of control
> -
>
> Key: MAPREDUCE-4478
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4478
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 1.0.3, 1.0.2, 1.0.1, 1.0.0
>Reporter: Liyin Liang
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2349) speed up list[located]status calls from input formats

2012-05-01 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266293#comment-13266293
 ] 

Liyin Liang commented on MAPREDUCE-2349:


This jira is very meaningful for large, busy cluster.

> speed up list[located]status calls from input formats
> -
>
> Key: MAPREDUCE-2349
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2349
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: task
>Reporter: Joydeep Sen Sarma
>
> when a job has many input paths - listStatus - or the improved 
> listLocatedStatus - calls (invoked from the getSplits() method) can take a 
> long time. Most of the time is spent waiting for the previous call to 
> complete and then dispatching the next call. 
> This can be greatly speeded up by dispatching multiple calls at once (via 
> executors). If the same filesystem client is used - then the calls are much 
> better pipelined (since calls are serialized) and don't impose extra burden 
> on the namenode while at the same time greatly reducing the latency to the 
> client. In a simple test on non-peak hours, this resulted in the getSplits() 
> time reducing from about 3s to about 0.5s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2011-08-07 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080715#comment-13080715
 ] 

Liyin Liang commented on MAPREDUCE-2209:


Hi Subroto,
Your analysis is great and your patch looks good to me. However, I found 
another issue MAPREDUCE-2364 which is duplicated with this one. What's more, 
their solution is mostly the same with your patch. I think one of them should 
be close as duplicate.

> TaskTracker's heartbeat hang for several minutes when copying large job.jar 
> from HDFS
> -
>
> Key: MAPREDUCE-2209
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.23.0
> Environment: hadoop version: 0.19.1
>Reporter: Liyin Liang
>Priority: Blocker
> Attachments: 2209-1.diff, MAPREDUCE-2209.patch
>
>
> If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat 
> hang for several minutes when localizing the job. The jstack of related 
> threads are as follows:
> {code:borderStyle=solid}
> "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf 
> runnable [0x42e56000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> - locked <0x002afc892ec8> (a sun.nio.ch.Util$1)
> - locked <0x002afc892eb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked <0x002afce26158> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
> at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
> - locked <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
> at 
> org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
> at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)
> "Map-events fetcher for all reduce tasks on 
> tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 
> tid=0x002b05ef8000 
> nid=0x1ada waiting for monitor entry [0x42d55000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
> - waiting to lock <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
> - locked <0x002a9eefe1f8> (a java.util.TreeMap)
> "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 
> nid=0x1ab0 waiting for monitor entry [0x4234b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.

[jira] [Commented] (MAPREDUCE-2364) Shouldn't hold lock on rjob while localizing resources.

2011-08-03 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13078631#comment-13078631
 ] 

Liyin Liang commented on MAPREDUCE-2364:


I think this issue is the same with MAPREDUCE-2209.

> Shouldn't hold lock on rjob while localizing resources.
> ---
>
> Key: MAPREDUCE-2364
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2364
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.203.0
>Reporter: Owen O'Malley
>Assignee: Devaraj Das
> Fix For: 0.20.204.0
>
> Attachments: MAPREDUCE-2364.patch, 
> no-lock-localize-branch-0.20-security.patch, no-lock-localize-trunk.patch
>
>
> There is a deadlock while localizing resources on the TaskTracker.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2011-07-31 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073442#comment-13073442
 ] 

Liyin Liang commented on MAPREDUCE-2209:


Hi Subroto,
  In fact,we have fixed this issuethrough reducing the lock of 
_TaskTracker::getMapCompletionEvents()_. And it works well in our 1500 nodes 
product cluster.I will attach a diff file for 0.19.

> TaskTracker's heartbeat hang for several minutes when copying large job.jar 
> from HDFS
> -
>
> Key: MAPREDUCE-2209
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
> Environment: hadoop version: 0.19.1
>Reporter: Liyin Liang
>Priority: Blocker
> Attachments: 2209-1.diff
>
>
> If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat 
> hang for several minutes when localizing the job. The jstack of related 
> threads are as follows:
> {code:borderStyle=solid}
> "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf 
> runnable [0x42e56000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> - locked <0x002afc892ec8> (a sun.nio.ch.Util$1)
> - locked <0x002afc892eb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked <0x002afce26158> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
> at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
> - locked <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
> at 
> org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
> at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)
> "Map-events fetcher for all reduce tasks on 
> tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 
> tid=0x002b05ef8000 
> nid=0x1ada waiting for monitor entry [0x42d55000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
> - waiting to lock <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
> - locked <0x002a9eefe1f8> (a java.util.TreeMap)
> "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 
> nid=0x1ab0 waiting for monitor entry [0x4234b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684)
> - waiting to lock <0x002a9eefe1f8

[jira] [Updated] (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2011-07-31 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2209:
---

Attachment: 2209-1.diff

> TaskTracker's heartbeat hang for several minutes when copying large job.jar 
> from HDFS
> -
>
> Key: MAPREDUCE-2209
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
> Environment: hadoop version: 0.19.1
>Reporter: Liyin Liang
>Priority: Blocker
> Attachments: 2209-1.diff
>
>
> If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat 
> hang for several minutes when localizing the job. The jstack of related 
> threads are as follows:
> {code:borderStyle=solid}
> "TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf 
> runnable [0x42e56000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
> - locked <0x002afc892ec8> (a sun.nio.ch.Util$1)
> - locked <0x002afc892eb0> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> - locked <0x002afce26158> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> at 
> org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
> - locked <0x002afce26218> (a 
> org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
> at java.io.DataInputStream.read(DataInputStream.java:83)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
> at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
> at 
> org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
> at 
> org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
> - locked <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
> at 
> org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
> at 
> org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)
> "Map-events fetcher for all reduce tasks on 
> tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 
> tid=0x002b05ef8000 
> nid=0x1ada waiting for monitor entry [0x42d55000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
> - waiting to lock <0x002afce2d260> (a 
> org.apache.hadoop.mapred.TaskTracker$RunningJob)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
> - locked <0x002a9eefe1f8> (a java.util.TreeMap)
> "IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 
> nid=0x1ab0 waiting for monitor entry [0x4234b000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684)
> - waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap)
> - locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker)
> at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.

[jira] [Resolved] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-07-20 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang resolved MAPREDUCE-2510.


Resolution: Fixed

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker

2011-07-20 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068206#comment-13068206
 ] 

Liyin Liang commented on MAPREDUCE-2714:


Attaching a patch for 0.20 branch.

> When a job is retired by the same user's another job, its jobconf file is not 
> deleted from the log directory of the JobTracker 
> ---
>
> Key: MAPREDUCE-2714
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 2714-1.diff
>
>
> After MAPREDUCE-130, the job's conf copy will be deleted from the log 
> directory of the JobTracker when the job is retired. However, it just works 
> if the job is retired by _RetireJobs_ thread of JobTracker. If a job is 
> retired by the same user's another job, its conf copy will not be deleted. 
> This kind of retire happens in _JobTracker::finalizeJob(job)_, when 
> JobTracker maintains more than _MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs 
> information in memory for a given user.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker

2011-07-20 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2714:
---

Attachment: 2714-1.diff

> When a job is retired by the same user's another job, its jobconf file is not 
> deleted from the log directory of the JobTracker 
> ---
>
> Key: MAPREDUCE-2714
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.1, 0.20.2
>Reporter: Liyin Liang
>Assignee: Liyin Liang
> Attachments: 2714-1.diff
>
>
> After MAPREDUCE-130, the job's conf copy will be deleted from the log 
> directory of the JobTracker when the job is retired. However, it just works 
> if the job is retired by _RetireJobs_ thread of JobTracker. If a job is 
> retired by the same user's another job, its conf copy will not be deleted. 
> This kind of retire happens in _JobTracker::finalizeJob(job)_, when 
> JobTracker maintains more than _MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs 
> information in memory for a given user.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-2714) When a job is retired by the same user's another job, its jobconf file is not deleted from the log directory of the JobTracker

2011-07-20 Thread Liyin Liang (JIRA)
When a job is retired by the same user's another job, its jobconf file is not 
deleted from the log directory of the JobTracker 
---

 Key: MAPREDUCE-2714
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2714
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 0.20.1
Reporter: Liyin Liang
Assignee: Liyin Liang


After MAPREDUCE-130, the job's conf copy will be deleted from the log directory 
of the JobTracker when the job is retired. However, it just works if the job is 
retired by _RetireJobs_ thread of JobTracker. If a job is retired by the same 
user's another job, its conf copy will not be deleted. This kind of retire 
happens in _JobTracker::finalizeJob(job)_, when JobTracker maintains more than 
_MAX_COMPLETE_USER_JOBS_IN_MEMORY_ jobs information in memory for a given user.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2339) optimize JobInProgress.getTaskInProgress(taskid)

2011-07-18 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13067452#comment-13067452
 ] 

Liyin Liang commented on MAPREDUCE-2339:


Nice patch!
A user submitted a job with more than 680,000 map tasks to our cluster. Then 
jobtracker become inefficient to process heartbeats, many threads are blocked 
and lots of requests are queued. Through jstack of JobTracker process, we find 
most of the time are spent on JIP.getTaskInProgress().
This patch is a good way to improve JIP.getTaskInProgress()'s performance and 
fix our problem.

> optimize JobInProgress.getTaskInProgress(taskid)
> 
>
> Key: MAPREDUCE-2339
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2339
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.20.2, 0.21.0
>Reporter: Kang Xiao
> Attachments: MAPREDUCE-2339.patch, MAPREDUCE-2339.patch
>
>
> JobInProgress.getTaskInProgress(taskid) use a linner search to get the 
> TaskInProgress object by taskid. In fact, it can be replaced by much more 
> efficient array index operation.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-1904) Reducing locking contention in TaskTracker.MapOutputServlet's LocalDirAllocator

2011-06-14 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049621#comment-13049621
 ] 

Liyin Liang commented on MAPREDUCE-1904:


This is a great patch. Here is part of the stack when work thread is blocked:
{code}
"1797055149@qtp0-98" prio=10 tid=0x002aa1a4 nid=0x333 waiting for 
monitor entry [0x49dc5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:377)
- waiting to lock <0xa090> (a 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:142)
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3086)
{code}

I have written a job with one map which output 1M data, and 100 reduces. Each 
reduce spawn 10 threads to fetch data from map side 3k times just like shuffle 
phase. When run this job, most of work threads is blocked on 
AllocatorPerContext.

With LRUCache, most work threads are blocked on LOG.info() as following stack.
{code}
"1793911889@qtp0-101" prio=10 tid=0x002aa153 nid=0x34f2 waiting for 
monitor entry [0x41d45000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0xa01be928> (a 
org.apache.log4j.spi.RootLogger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at 
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:133)
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3246)
{code}

With LRUCache + disable LOG.info(): This job takes 3mins, 19sec to run.
Without LRUCache + enable LOG.info(): This job takes just 37sec to run.

b.t.w LRUCache should use *mapId* as key instead of *(jobId + mapId)*. Because 
jobId is just part of mapId.

> Reducing locking contention in TaskTracker.MapOutputServlet's 
> LocalDirAllocator
> ---
>
> Key: MAPREDUCE-1904
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1904
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: tasktracker
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
> Attachments: LocalDirAllocator.JPG, LocalDirAllocator_Monitor.JPG, 
> MAPREDUCE-1904-RC10.patch, MAPREDUCE-1904-trunk.patch, TaskTracker- yourkit 
> profiler output .jpg, Thread profiler output showing contention.jpg, profiler 
> output after applying the patch.jpg
>
>
> While profiling tasktracker with Sort benchmark, it was observed that threads 
> block on LocalDirAllocator.getLocalPathToRead() in order to get the index 
> file and temporary map output file.
> As LocalDirAllocator is tied up with ServetContext,  only one instance would 
> be available per tasktracker httpserver.  Given the jobid & mapid, 
> LocalDirAllocator retrieves index file path and temporary map output file 
> path. getLocalPathToRead() is internally synchronized.
> Introducing a LRUCache for this lookup reduces the contention heavily 
> (LRUCache with key =jobid +mapid and value=PATH to the file). Size of the 
> LRUCache can be varied based on the environment and I observed a throughput 
> improvement in the order of 4-7% with the introduction of LRUCache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-06-13 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048963#comment-13048963
 ] 

Liyin Liang commented on MAPREDUCE-2510:


We have planned to build our own jetty version based on 6.1.14, with following 
patches to fix OOM bugs.
JETTY-1157, Don't hold array passed in write(byte[]).
JETTY-861,switched buffer pools to ThreadLocal implementation.
JETTY-1188,Null old jobs in QueuedThreadPool.

It works well in test cluster.

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
>

[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs

2011-06-06 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045237#comment-13045237
 ] 

Liyin Liang commented on MAPREDUCE-143:
---

bq. I think we ran into the same issue, any work around or config tweak to 
avoid running into this? Thanks. 
I have created MAPREDUCE-2510 for this problem. As Chris's comment, Jetty 
6.1.26 does not have this behavior. However, Jetty 6.1.26 has its own bugs 
MAPREDUCE-2529 and MAPREDUCE-2530 which are more serious than OOM. 

> OOM in the TaskTracker while serving map outputs
> 
>
> Key: MAPREDUCE-143
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-143
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Devaraj Das
>
> Saw this exception in the TT logs:
> 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
> at 
> org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28)
> at 
> org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:71)
> at 
> org.mortbay.jetty.AbstractBuffers.getBuffer(AbstractBuffers.java:131)
> at org.mortbay.jetty.HttpGenerator.addContent(HttpGenerator.java:145)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:642)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2879)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2529) Recognize Jetty bug 1342 and handle it

2011-05-30 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041381#comment-13041381
 ] 

Liyin Liang commented on MAPREDUCE-2529:


After upgrading to jetty 6.1.26, our product cluster met the same problem. 
Through observation, we found TT will throw lots of "java.io.IOException: 
Broken pipe" when serve map-output and Jetty print logs as follows in this case.

2011-05-30 00:11:06,389 INFO org.mortbay.log: 
org.mortbay.io.nio.SelectorManager$SelectSet@6cf3a37f Busy selector - injecting 
delay 3 times

So we just grep "Busy selector" from TT's log to detect this bug. 

> Recognize Jetty bug 1342 and handle it
> --
>
> Key: MAPREDUCE-2529
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2529
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
>Affects Versions: 0.20.204.0, 0.23.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
> Attachments: jetty1342-20security.patch
>
>
> We are seeing many instances of the Jetty-1342 
> (http://jira.codehaus.org/browse/JETTY-1342). The bug doesn't cause Jetty to 
> stop responding altogether, some fetches go through but a lot of them throw 
> exceptions and eventually fail. The only way we have found to get the TT out 
> of this state is to restart the TT.  This jira is to catch this particular 
> exception (or perhaps a configurable regex) and handle it in an automated way 
> to either blacklist or shutdown the TT after seeing it a configurable number 
> of them.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-26 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039736#comment-13039736
 ] 

Liyin Liang commented on MAPREDUCE-2510:


After upgrading our product cluster's Jetty version to 6.1.26. The checkpoint 
become very slow. 

   fsimage size  download time
Before upgrading   10G2 mins
After upgrading9.95G 15 mins 

What's more, there are many "JVM BUG(s)" logs in NN's log file:
2011-05-26 22:46:48,807 INFO org.mortbay.log: 
org.mortbay.io.nio.SelectorManager$SelectSet@173ab5e JVM BUG(s) - injecting 
delay59 times

2011-05-26 22:46:48,807 INFO org.mortbay.log: 
org.mortbay.io.nio.SelectorManager$SelectSet@173ab5e JVM BUG(s) - recreating 
selector 59 times, canceled keys 944 times

According to Jetty 6.1.26's code, Jetty's Selector sleep some time when print 
above logs.


> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
>   

[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-25 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039487#comment-13039487
 ] 

Liyin Liang commented on MAPREDUCE-2510:


Hi Koji,
We just trigger this bug in our test cluster with jetty6.1.26. Can you please 
share your workaround?

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

--

[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-18 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035935#comment-13035935
 ] 

Liyin Liang commented on MAPREDUCE-2510:


Hi Chris,
I really appreciate your comments! Jetty 6.1.26 did free the references to 
Runnable instances. We'll upgrade our cluster's jetty version asap.
Thanks again.

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.Queue

[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-17 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035185#comment-13035185
 ] 

Liyin Liang commented on MAPREDUCE-2510:


Hi Chris,
HADOOP-6882 upgrade the version of Jetty to 6.1.26. That jira has checked in to 
0.20 branch. But I still don't know why 6.1.26 does not have this behavior.

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.Queued

[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-17 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035183#comment-13035183
 ] 

Liyin Liang commented on MAPREDUCE-2510:


Hi Chris, 
our Jetty version is 6.1.14, the same with trunk. Is there an issue about 
upgrading Jetty to 6.1.26? Why Jetty 6.1.26 does not have this behavior? I saw 
that Cloudera's cdh3u0 use Jetty 6.1.26.

Thanks

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.jav

[jira] [Commented] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-17 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035175#comment-13035175
 ] 

Liyin Liang commented on MAPREDUCE-2510:


The following comments are copied from MAPREDUCE-143:
We dump the heap of TaskTracker and analyze it with MAT. We found one instance 
of "org.mortbay.thread.QueuedThreadPool" occupies 853,258,184 (72.51%) bytes. 
This object contain a "java.lang.Runnable[]" which has 7200 elements.

The QueuedThreadPool of jetty6 own an array of jobs. If an idle thread is 
available a job is directly dispatched, otherwise the job is queued to the 
array. At first the size of the array is _maxThreads(tasktracker.http.threads). 
When its full, the size grow to array.length() + _maxThreads. Because the grow 
has no limit, this array can occupy too many memory when there are lots of 
fetch request from reduce task. So is this jetty6's bug?

> TaskTracker throw OutOfMemoryError after upgrade to jetty6
> --
>
> Key: MAPREDUCE-2510
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Liyin Liang
>
> Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
> upgrade to jetty6. The exception in TT's log is as follows:
> 2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.io.BufferedInputStream.(BufferedInputStream.java:178)
> at 
> org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
> at 
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:324)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
> Exceptions in .out file:
> java.lang.OutOfMemoryError: Java heap space
> Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap 
> space
> Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
> space
> java.lang.OutOfMemoryError: Java heap space
> java.lang.reflect.InvocationTargetException
> Exception in thread "IPC Server handler 6 on 50050" at 
> sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)
> at org.mortbay.log.Log.warn(Log.java:181)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.jav

[jira] [Created] (MAPREDUCE-2510) TaskTracker throw OutOfMemoryError after upgrade to jetty6

2011-05-17 Thread Liyin Liang (JIRA)
TaskTracker throw OutOfMemoryError after upgrade to jetty6
--

 Key: MAPREDUCE-2510
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2510
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Liyin Liang


Our product cluster's TaskTracker sometimes throw OutOfMemoryError after 
upgrade to jetty6. The exception in TT's log is as follows:
2011-05-17 19:16:40,756 ERROR org.mortbay.log: Error for /mapOutput

java.lang.OutOfMemoryError: Java heap space

at java.io.BufferedInputStream.(BufferedInputStream.java:178)

at 
org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)

at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)

at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)

at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)

at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)

at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:324)

at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)

at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)

at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)

at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)

at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)

at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

Exceptions in .out file:
java.lang.OutOfMemoryError: Java heap space

Exception in thread "process reaper" java.lang.OutOfMemoryError: Java heap space

Exception in thread "pool-1-thread-1" java.lang.OutOfMemoryError: Java heap 
space

java.lang.OutOfMemoryError: Java heap space

java.lang.reflect.InvocationTargetException

Exception in thread "IPC Server handler 6 on 50050" at 
sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source)

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.mortbay.log.Slf4jLog.warn(Slf4jLog.java:126)

at org.mortbay.log.Log.warn(Log.java:181)

at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:449)

at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)

at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:324)

at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)

at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)

at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)

at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)

at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)

at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs

2011-05-12 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032367#comment-13032367
 ] 

Liyin Liang commented on MAPREDUCE-143:
---

The QueuedThreadPool of jetty6 own an array of jobs. If an idle thread is 
available a job is directly dispatched, otherwise the job is queued to the 
array. At first the size of the array is _maxThreads(40). When its full, the 
size grow to array.length() + _maxThreads. Because the grow has no limit, this 
array can occupy too many memory when there are lots of fetch request from 
reduce task. So is this jetty6's bug?

> OOM in the TaskTracker while serving map outputs
> 
>
> Key: MAPREDUCE-143
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-143
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Devaraj Das
>
> Saw this exception in the TT logs:
> 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
> at 
> org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28)
> at 
> org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:71)
> at 
> org.mortbay.jetty.AbstractBuffers.getBuffer(AbstractBuffers.java:131)
> at org.mortbay.jetty.HttpGenerator.addContent(HttpGenerator.java:145)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:642)
> at 
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
> at 
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2879)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-143) OOM in the TaskTracker while serving map outputs

2011-05-11 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13032254#comment-13032254
 ] 

Liyin Liang commented on MAPREDUCE-143:
---

Our cluster met the similar problem after upgrade to jetty6.
related log:
2011-05-11 16:24:26,914 ERROR org.mortbay.log: Error for /mapOutput

java.lang.OutOfMemoryError: Java heap space

at java.io.BufferedInputStream.(BufferedInputStream.java:178)

at 
org.apache.hadoop.fs.BufferedFSInputStream.(BufferedFSInputStream.java:44)
 

at 
org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:176)

at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:359)

at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3040)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) 

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)

at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)

at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)

at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)

at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)

at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:324)

at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)

at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)

at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)

at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)

at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)

at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
2011-05-11 17:31:39,376 ERROR org.mortbay.log: Error for /mapOutput

2011-05-11 17:31:45,523 ERROR org.mortbay.log: Error for /mapOutput

java.lang.OutOfMemoryError: Java heap space

jmap -heap result:
Heap Configuration:

   MinHeapFreeRatio = 40

   MaxHeapFreeRatio = 70

   MaxHeapSize  = 1610612736 (1536.0MB)

   NewSize  = 1310720 (1.25MB)

   MaxNewSize   = 17592186044415 MB

   OldSize  = 5439488 (5.1875MB)

   NewRatio = 2

   SurvivorRatio= 8

   PermSize = 21757952 (20.75MB)

   MaxPermSize  = 85983232 (82.0MB)



Heap Usage:

PS Young Generation

Eden Space:

   capacity = 61865984 (59.0MB)

   used = 61865984 (59.0MB)

   free = 0 (0.0MB)

   100.0% used

>From Space:

   capacity = 178913280 (170.625MB)

   used = 11205368 (10.686271667480469MB)

   free = 167707912 (159.93872833251953MB)

   6.263016361893315% used

To Space:

   capacity = 178913280 (170.625MB)

   used = 0 (0.0MB)

   free = 178913280 (170.625MB)

   0.0% used

PS Old Generation

   capacity = 1073741824 (1024.0MB)

   used = 1073710024 (1023.9696731567383MB)

   free = 31800 (0.03032684326171875MB)

   99.99703839421272% used

PS Perm Generation

   capacity = 21757952 (20.75MB)

   used = 17614112 (16.798126220703125MB)

   free = 4143840 (3.951873779296875MB)

   80.95482516001506% used


We dump the heap of TaskTracker and analyze it with MAT. We found one instance 
of "org.mortbay.thread.QueuedThreadPool" occupies 853,258,184 (72.51%) bytes. 
This object contain a "java.lang.Runnable[]" which has 7200 elements.

> OOM in the TaskTracker while serving map outputs
> 
>
> Key: MAPREDUCE-143
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-143
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Devaraj Das
>
> Saw this exception in the TT logs:
> 2009-02-06 06:18:08,553 ERROR org.mortbay.log: EXCEPTION
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2009-02-06 06:18:11,247 ERROR org.mortbay.log: Error for /mapOutput
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:39)
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
> at 
> org.mortbay.io.nio.IndirectNIOBuffer.(IndirectNIOBuffer.java:28)
> at 
> org.mortbay.jetty.nio.AbstractNIOConnector.newBuffer(AbstractNIOConnector.java:

[jira] Commented: (MAPREDUCE-2271) TestSetupTaskScheduling failing in trunk

2011-01-26 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987020#action_12987020
 ] 

Liyin Liang commented on MAPREDUCE-2271:


Hi Todd,
I think the test case testNumSlotsUsedForTaskCleanup is supposed to check 
that one task-cleanup task only need one slot even for high RAM jobs. This test 
case create a fake high RAM  job with one map task and one reduce task. Each 
task require 2 slots. Then check that each heartbeat will schedule one 
task-cleanup task which need only one slot. So it need't to create dummy 
tracker status with FAILED_UNCLEAN tasks.
The result of the change in MAPREDUCE-2207 is that task-cleanup tasks can't 
be scheduled to trackers with FAILED_UNCLEAN tasks to report during heartbeat, 
no matter the task failed on which tracker. This cause none task-cleanup task 
will be scheduled during heartbeat in the test case. The following code:
{code:}
List tasks = jobTracker.getSetupAndCleanupTasks(ttStatus);
{code}
will always return *null*, only if ttStatus has tasks with FAILED_UNCLEAN 
status. 

> TestSetupTaskScheduling failing in trunk
> 
>
> Key: MAPREDUCE-2271
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2271
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Liyin Liang
>Priority: Blocker
> Attachments: 2271-1.diff
>
>
> This test case is failing in trunk after the commit of MAPREDUCE-2207

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2271) TestSetupTaskScheduling failing in trunk

2011-01-23 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2271:
---

Attachment: 2271-1.diff

With [MAPREDUCE-2207|https://issues.apache.org/jira/browse/MAPREDUCE-2207], a 
tracker can't get any task-cleanup-task, if it has tasks with _FAILED_UNCLEAN_ 
state. The _testNumSlotsUsedForTaskCleanup_ of _TestSetupTaskScheduling_ 
creates a dummy tracker status with two _FAILED_UNCLEAN_ tasks to report. So 
the jobtracker return null when call _getSetupAndCleanupTasks_ with this 
tracker status.
I think it's useless to add task status to the tracker status in that test 
case, because the job already has two task-setup-tasks to schedule and the 
job's two tasks's status are _FAILED_UNCLEAN_. In other words, the job's tasks 
status need not to be updated.
So we can just remove _addNewTaskStatus_ codes as 2271-1.diff.

> TestSetupTaskScheduling failing in trunk
> 
>
> Key: MAPREDUCE-2271
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2271
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Priority: Blocker
> Attachments: 2271-1.diff
>
>
> This test case is failing in trunk after the commit of MAPREDUCE-2207

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2011-01-19 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983984#action_12983984
 ] 

Liyin Liang commented on MAPREDUCE-2207:


Hi Todd,
The patch committed is absolutely the same one i ran tests. Maybe I made 
some mistakes when ran "ant test".  I'll work on [MAPREDUCE-2271| 
https://issues.apache.org/jira/browse/MAPREDUCE-2271] to fix 
TestSetupTaskScheduling. By the way, I don't understand why the result of "ant 
test-patch" is +1.

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
>Assignee: Liyin Liang
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, 
> 2207-3.diff, ant-test.txt
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2264) Job status exceeds 100% in some cases

2011-01-13 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12981623#action_12981623
 ] 

Liyin Liang commented on MAPREDUCE-2264:


I think [HADOOP-5210|https://issues.apache.org/jira/browse/HADOOP-5210] has 
fixed this bug.

> Job status exceeds 100% in some cases 
> --
>
> Key: MAPREDUCE-2264
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2264
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Reporter: Adam Kramer
>
> I'm looking now at my jobtracker's list of running reduce tasks. One of them 
> is 120.05% complete, the other is 107.28% complete.
> I understand that these numbers are estimates, but there is no case in which 
> an estimate of 100% for a non-complete task is better than an estimate of 
> 99.99%, nor is there any case in which an estimate greater than 100% is valid.
> I suggest that whatever logic is computing these set 99.99% as a hard maximum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2011-01-07 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: ant-test.txt

Hi Scott, 
The ant-test.txt file is the result of "ant test". The result of "ant 
test-patch" is as follows:

 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 1.3.9) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 system test framework.  The patch passed system test 
framework compile.


> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, 
> 2207-3.diff, ant-test.txt
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2011-01-03 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: 2207-3.diff

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff, 
> 2207-3.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2011-01-03 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: 2207-3.diff

Hi Scott,
I agree with you. According to 
[MAPREDUCE-2118|https://issues.apache.org/jira/browse/MAPREDUCE-2118], maybe 
getJobSetupAndCleanupTasks will not hold JT lock in the future.

I have changed the name of the method hasFailedAndNeedCleanupTask() to 
hasFailedUncleanTask(). Thanks for your advice.

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff, 2207-3.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2011-01-02 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: 2207-2.diff

move the logic to server side according to Scott's comment.

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff, 2207-2.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-27 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975403#action_12975403
 ] 

Liyin Liang commented on MAPREDUCE-2207:


Hi Scott,
If we move this logic to server side, every heartbeat has to call 
hasFailedAndNeedCleanupTaskToReport() inside the lock of JobTracker. Is there  
be a performance loss of jt?

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-26 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: 2207-1.diff

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff, 2207-1.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-26 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Release Note: Task-cleanup task should not be scheduled on the node that 
the task just failed
  Status: Patch Available  (was: Open)

Patch with unit test for trunk. The patch just added a _assert_ on 
TestTaskFail.java to test the feature.

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2026) JobTracker.getJobCounters() should not hold JobTracker lock while calling JobInProgress.getCounters()

2010-12-16 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972349#action_12972349
 ] 

Liyin Liang commented on MAPREDUCE-2026:


Hi Joydeep, your patch moved incrementTaskCounters out of lock of JobInProgress 
in function getCounters(). Should we do the same thing to function 
getMapCounters() and getReduceCounters()?

> JobTracker.getJobCounters() should not hold JobTracker lock while calling 
> JobInProgress.getCounters()
> -
>
> Key: MAPREDUCE-2026
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2026
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Scott Chen
>Assignee: Joydeep Sen Sarma
> Fix For: 0.22.0
>
> Attachments: 2026.1.patch, MAPREDUCE-2026.txt
>
>
> JobTracker.getJobCounter() will lock JobTracker and call 
> JobInProgress.getCounters().
> JobInProgress.getCounters() can be very expensive because it aggregates all 
> the task counters.
> We found that from the JobTracker jstacks that this method is one of the 
> bottleneck of the JobTracker performance.
> JobInProgress.getCounters() should be able to be called out side the 
> JobTracker lock because it already has JobInProgress lock.
> For example, it is used by jobdetails.jsp without a JobTracker lock.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-13 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971126#action_12971126
 ] 

Liyin Liang commented on MAPREDUCE-2207:


Hi Scott,
I'm happy to work on this JIRA and provide a patch with unit test for trunk. 

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-12 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2207:
---

Attachment: 0.19.1.diff

For hadoop 0.19.1

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
> Attachments: 0.19.1.diff
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-2207) Task-cleanup task should not be scheduled on the node that the task just failed

2010-12-12 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970709#action_12970709
 ] 

Liyin Liang commented on MAPREDUCE-2207:


Hi Scott,
Our product cluster met a similar problem about job setup-task. We let TT 
don't ask for new task when report a failed job setup/cleanup task in heartbeat 
to fix this issue. I'll attach our patch based on 0.19.1.

> Task-cleanup task should not be scheduled on the node that the task just 
> failed
> ---
>
> Key: MAPREDUCE-2207
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2207
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobtracker
>Affects Versions: 0.23.0
>Reporter: Scott Chen
> Fix For: 0.23.0
>
>
> Currently the task-cleanup task always go to the same node that the task just 
> failed.
> There is a higher chance that it hits a bad node. This should be changed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2010-12-08 Thread Liyin Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Liang updated MAPREDUCE-2209:
---

Description: 
If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang 
for several minutes when localizing the job. The jstack of related threads are 
as follows:
{code:borderStyle=solid}
"TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf 
runnable [0x42e56000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x002afc892ec8> (a sun.nio.ch.Util$1)
- locked <0x002afc892eb0> (a java.util.Collections$UnmodifiableSet)
- locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked <0x002afce26158> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readShort(DataInputStream.java:295)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
- locked <0x002afce26218> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
- locked <0x002afce26218> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
- locked <0x002afce2d260> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
at 
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)

"Map-events fetcher for all reduce tasks on 
tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 
tid=0x002b05ef8000 
nid=0x1ada waiting for monitor entry [0x42d55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
- waiting to lock <0x002afce2d260> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
- locked <0x002a9eefe1f8> (a java.util.TreeMap)


"IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 
nid=0x1ab0 waiting for monitor entry [0x4234b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684)
- waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap)
- locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

"main" prio=10 tid=0x40113800 nid=0x197d waiting for monitor entry 
[0x4022a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1196)
- waiting to lock <0x002a9eac1de8> (a 
org.apache.hadoop.mapred.TaskTracker)
at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1068)
at org.apache.hadoop.

[jira] Commented: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2010-12-08 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969238#action_12969238
 ] 

Liyin Liang commented on MAPREDUCE-2209:


I setup a cluster with the latest version 0.21.0.  To simulate the large 
job.jar problem, let TaskLauncher thread sleep 100 seconds just before download 
job.jar in localizeJobJarFile function.  Then the heartbeat of some TT will 
hang for almost 100 seconds. Basically, the jstack is the same with 0.19:
{code:borderStyle=solid}
"TaskLauncher for MAP tasks" daemon prio=10 tid=0x2aab3145a800 nid=0x3fe8 
waiting on condition [0x440b3000..0x440b3a10]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJobJarFile(TaskTracker.java:1150)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1074)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
- locked <0x2aaab3a86f10> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2248)
at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2213)

"Map-events fetcher for all reduce tasks on 
tracker_hd2:localhost.localdomain/127.0.0.1:36128" daemon prio=10 tid=0x2aab
31451c00 nid=0x3fde waiting for monitor entry 
[0x41a4..0x41a40d90]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:800)
- waiting to lock <0x2aaab3a86f10> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:834)
- locked <0x2aaab38ee1b8> (a java.util.TreeMap)

"IPC Server handler 0 on 36128" daemon prio=10 tid=0x4368ac00 
nid=0x3fc8 waiting for monitor entry [0x425f6000..0x425
f7c90]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:3254)
- waiting to lock <0x2aaab38ee1b8> (a java.util.TreeMap)
- locked <0x2aaab37f1708> (a org.apache.hadoop.mapred.TaskTracker)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:342)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1350)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1346)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:742)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1344)

"main" prio=10 tid=0x42fff400 nid=0x3f91 waiting for monitor entry 
[0x41ef..0x41ef0ed0]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracker.java:1535)
- waiting to lock <0x2aaab37f1708> (a 
org.apache.hadoop.mapred.TaskTracker)
at 
org.apache.hadoop.mapred.TaskTracker.offerService(TaskTracker.java:1433)
at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:2330)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3462)
{code}
lock order of relative threads:
TaskLauncher(localizeJobJarFile): locked RunningJob
Map-events fetcher:locked 
runningJobs   waiting to lock RunningJob
IPC Server handler(getMapCompletionEvents):  locked TaskTracker   waiting to 
lock runningJobs
main(transmitHeartBeat): waiting to 
lock TaskTracker   
So, TaskTracker is locked indirectly when downloading job.jar.

> TaskTracker's heartbeat hang for several minutes when copying large job.jar 
> from HDFS
> -
>
> Key: MAPREDUCE-2209
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
> Environment: hadoop version: 0.19.1
>Reporter: Liyin Liang
>Priority: Blocker
>
> If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat 
> hang for several minutes when localizing the job. The jstack 

[jira] Created: (MAPREDUCE-2209) TaskTracker's heartbeat hang for several minutes when copying large job.jar from HDFS

2010-12-03 Thread Liyin Liang (JIRA)
TaskTracker's heartbeat hang for several minutes when copying large job.jar 
from HDFS
-

 Key: MAPREDUCE-2209
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2209
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: hadoop version: 0.19.1
Reporter: Liyin Liang
Priority: Blocker


If a job's jar file is very large, e.g 200m+, the TaskTracker's heartbeat hang 
for several minutes when localizing the job. The jstack of related threads are 
as follows:

"TaskLauncher for task" daemon prio=10 tid=0x002b05ee5000 nid=0x1adf 
runnable [0x42e56000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x002afc892ec8> (a sun.nio.ch.Util$1)
- locked <0x002afc892eb0> (a java.util.Collections$UnmodifiableSet)
- locked <0x002afc8927d8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:260)
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:150)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:123)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
- locked <0x002afce26158> (a java.io.BufferedInputStream)
at java.io.DataInputStream.readShort(DataInputStream.java:295)
at 
org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1304)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1556)
- locked <0x002afce26218> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1673)
- locked <0x002afce26218> (a 
org.apache.hadoop.hdfs.DFSClient$DFSInputStream)
at java.io.DataInputStream.read(DataInputStream.java:83)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:209)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:142)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1214)
at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1195)
at 
org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:824)
- locked <0x002afce2d260> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1745)
at 
org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:103)
at 
org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1710)

"Map-events fetcher for all reduce tasks on 
tracker_r01a08025:localhost/127.0.0.1:50050" daemon prio=10 
tid=0x002b05ef8000 
nid=0x1ada waiting for monitor entry [0x42d55000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.reducesInShuffle(TaskTracker.java:582)
- waiting to lock <0x002afce2d260> (a 
org.apache.hadoop.mapred.TaskTracker$RunningJob)
at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:617)
- locked <0x002a9eefe1f8> (a java.util.TreeMap)


"IPC Server handler 2 on 50050" daemon prio=10 tid=0x002b050eb000 
nid=0x1ab0 waiting for monitor entry [0x4234b000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:2684)
- waiting to lock <0x002a9eefe1f8> (a java.util.TreeMap)
- locked <0x002a9eac1de8> (a org.apache.hadoop.mapred.TaskTracker)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)

"main" prio=10 tid=0x40113800 nid=0x197d waiting for monitor entry 
[0x4022a000]
   java.lang.Thread.State: BLOCKED (on object monitor)
at 
org.apache.hadoop.mapred.TaskTracker.transmitHeartBeat(TaskTracke

[jira] Created: (MAPREDUCE-2168) We should implement limits on shuffle connections to TaskTracker per job

2010-11-01 Thread Liyin Liang (JIRA)
We should  implement limits on shuffle connections to TaskTracker per job
-

 Key: MAPREDUCE-2168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2168
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Liyin Liang


As trailing map tasks will be attacked by all reduces simultaneously, all the 
worker threads that for the http server of a TaskTracker may be occupied  by 
one job's reduce tasks to fetch map outputs. Then this tasktracker's iowait and 
load will be very high (100+ in our cluster, we set tasktracker.http.threads 
with 100). What's more, other job's reduces have to wait some time (may be 
several minutes) to connect to the TaskTracker to fetch there map's outputs.
So I think we should implement limits on shuffle connections:
1. limit the worker threads' number maybe percent  occupied  the same job's 
reduces ;
2. limit the worker threads' number serving the same map output simultaneously.
Thoughts? 

ps: we are using hadoop 0.19.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1943) Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes

2010-10-11 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920079#action_12920079
 ] 

Liyin Liang commented on MAPREDUCE-1943:


your latest patch is  based on your previous patch, why?

> Implement limits on per-job JobConf, Counters, StatusReport, Split-Sizes
> 
>
> Key: MAPREDUCE-1943
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1943
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
> Fix For: 0.22.0
>
> Attachments: MAPREDUCE-1943-0.20-yahoo.patch, 
> MAPREDUCE-1943-0.20-yahoo.patch, MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S-fix.patch, 
> MAPREDUCE-1943-yahoo-hadoop-0.20S.patch
>
>
> We have come across issues in production clusters wherein users abuse 
> counters, statusreport messages and split sizes. One such case was when one 
> of the users had 100 million counters. This leads to jobtracker going out of 
> memory and being unresponsive. In this jira I am proposing to put sane limits 
> on the status report length, the number of counters and the size of block 
> locations returned by the input split. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1533) Reduce or remove usage of String.format() usage in CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()

2010-08-31 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904607#action_12904607
 ] 

Liyin Liang commented on MAPREDUCE-1533:


The patch has a small problem.
I think line257-259 of Counters.java should be:
for(String subcounter: subcountersArray) {
  builder.append(subcounter);
}

instead of:
for(Counter counter: subcounters.values()) {
  builder.append(counter.makeEscapedCompactString());
}

> Reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects and Counters.makeEscapedString()
> --
>
> Key: MAPREDUCE-1533
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobtracker
>Affects Versions: 0.20.1
>Reporter: Rajesh Balamohan
>Assignee: Dick King
> Fix For: 0.22.0
>
> Attachments: mapreduce-1533--2010-05-10a.patch, 
> mapreduce-1533--2010-05-21.patch, mapreduce-1533--2010-05-21a.patch, 
> mapreduce-1533--2010-05-24.patch, MAPREDUCE-1533-and-others-20100413.1.txt, 
> MAPREDUCE-1533-and-others-20100413.bugfix.txt, mapreduce-1533-v1.4.patch, 
> mapreduce-1533-v1.8.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (MAPREDUCE-1247) Send out-of-band heartbeat to avoid fake lost tasktracker

2010-08-29 Thread Liyin Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904091#action_12904091
 ] 

Liyin Liang commented on MAPREDUCE-1247:


Hi Guanyin, our product cluster met the same problem. Would you please attach 
your patch file? tks.

> Send out-of-band heartbeat to avoid fake lost tasktracker
> -
>
> Key: MAPREDUCE-1247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1247
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: ZhuGuanyin
>Assignee: ZhuGuanyin
>
> Currently the TaskTracker report task status to jobtracker through heartbeat, 
> sometimes if the tasktracker  lock the tasktracker to do some cleanup  job, 
> like remove task temp data on disk, the heartbeat thread would hang for a 
> long time while waiting for the lock, so the jobtracker just thought it had 
> lost and would reschedule all its finished maps or un finished reduce on 
> other tasktrackers, we call it "fake lost tasktracker", some times it doesn't 
> acceptable especially when we run some large jobs.  So We introduce a 
> out-of-band heartbeat mechanism to send an out-of-band heartbeat in that case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.