[jira] [Updated] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

2012-03-18 Thread Arun C Murthy (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy updated MAPREDUCE-4030:
-

Target Version/s: 0.23.2

> If the nodemanager on which the maptask is executed is going down before the 
> mapoutput is consumed by the reducer,then the job is failing with shuffle 
> error
> 
>
> Key: MAPREDUCE-4030
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99% 
> completion,one of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error 
> in shuffle in fetcher#1 at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: 
> java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

2012-03-18 Thread Arun C Murthy (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232479#comment-13232479
 ] 

Arun C Murthy commented on MAPREDUCE-4030:
--

Nishan - can u pls check reducer log and AM log to verify if the AM was 
notified of the map and it was re-run? That should happen before the reduce 
bailed out. Thanks.

> If the nodemanager on which the maptask is executed is going down before the 
> mapoutput is consumed by the reducer,then the job is failing with shuffle 
> error
> 
>
> Key: MAPREDUCE-4030
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: Nishan Shetty
>
> My cluster has 2 NM's.
> The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
> When the job execution is in progress and Mappers has finished about 99% 
> completion,one of the NM has gone down.
> The job has failed with the following trace
> "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error 
> in shuffle in fetcher#1 at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: 
> java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240)
>  at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error

2012-03-18 Thread Nishan Shetty (Created) (JIRA)
If the nodemanager on which the maptask is executed is going down before the 
mapoutput is consumed by the reducer,then the job is failing with shuffle error


 Key: MAPREDUCE-4030
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: Nishan Shetty


My cluster has 2 NM's.
The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1.
When the job execution is in progress and Mappers has finished about 99% 
completion,one of the NM has gone down.
The job has failed with the following trace

"Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in fetcher#1 at 
org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: 
java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253)
 at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187)
 at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) 
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) "

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4008) ResourceManager throws MetricsException on start up saying QueueMetrics MBean already exists

2012-03-18 Thread Devaraj K (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232425#comment-13232425
 ] 

Devaraj K commented on MAPREDUCE-4008:
--

{quote}
-1 tests included. The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this patch.
Also please list what manual steps were performed to verify this patch.
{quote}
It doesn't need new tests since it is logging the exception when this issue 
happens.





> ResourceManager throws MetricsException on start up saying QueueMetrics MBean 
> already exists
> 
>
> Key: MAPREDUCE-4008
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4008
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2, resourcemanager, scheduler
>Affects Versions: 0.24.0
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: MAPREDUCE-4008.patch
>
>
> {code:xml}
> 2012-03-14 15:22:23,089 WARN org.apache.hadoop.metrics2.util.MBeans: Error 
> creating MBean object name: 
> Hadoop:service=ResourceManager,name=QueueMetrics,q0=default
> org.apache.hadoop.metrics2.MetricsException: 
> org.apache.hadoop.metrics2.MetricsException: 
> Hadoop:service=ResourceManager,name=QueueMetrics,q0=default already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:117)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:102)
>   at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:91)
>   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:55)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:218)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:93)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:243)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:227)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:288)
>   at $Proxy6.postStart(Unknown Source)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:183)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:155)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54)
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.start(ResourceManager.java:454)
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:588)
> Caused by: org.apache.hadoop.metrics2.MetricsException: 
> Hadoop:service=ResourceManager,name=QueueMetrics,q0=default already exists!
>   at 
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:113)
>   ... 19 more
> 2012-03-14 15:22:23,090 WARN org.apache.hadoop.metrics2.util.MBeans: Failed 
> to register MBean "null"
> javax.management.RuntimeOperationsException: Exception occurred trying to 
> register the MBean
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:969)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482)
>   at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:218)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:93)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:243)
>   at 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:227)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces

[jira] [Commented] (MAPREDUCE-3992) Reduce fetcher doesn't verify HTTP status code of response

2012-03-18 Thread Steve Loughran (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232303#comment-13232303
 ] 

Steve Loughran commented on MAPREDUCE-3992:
---

+1
* I understand why there aren't tests here, it's very hard to set up a test for 
this.
* I've reviewed the code and it looks fairly straightforward -bails out early 
on a reponse != 200.
* If there's no error code checking in one part of the system, there always the 
risk the same thing has happened elsewhere, which is something that we need to 
keep an eye out for.


> Reduce fetcher doesn't verify HTTP status code of response
> --
>
> Key: MAPREDUCE-3992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3992
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 0.23.1, 0.24.0, 1.0.1
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mr-3992.txt
>
>
> Currently, the reduce fetch code doesn't check the HTTP status code of the 
> response. This can lead to the following situation:
> - the map output servlet gets an IOException after setting the headers but 
> before the first call to flush()
> - this causes it to send a response with a non-OK result code, including the 
> exception text as the response body (response.sendError() does this if the 
> response isn't committed)
> - it will still include the response headers indicating it's a valid response
> In the case of a merge-to-memory, the compression codec might then try to 
> interpret the HTML response as compressed data, resulting in either a huge 
> allocation (OOME) or some other nasty error. This bug seems to be present in 
> MR1, but haven't checked trunk/MR2 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4028) Hadoop Capacity-Scheduler

2012-03-18 Thread cldoltd (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232287#comment-13232287
 ] 

cldoltd commented on MAPREDUCE-4028:


thank you!
my job has run successfully

> Hadoop Capacity-Scheduler
> -
>
> Key: MAPREDUCE-4028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/capacity-sched
>Affects Versions: 1.0.0
>Reporter: cldoltd
>
> I config success capacity-scheduler
> But when i run job has error:
>   
> Queue "default" does not exist
> this error is :
> 12/01/18 16:21:04 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:adtech 
> cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.io.IOException: Queue "default" does not exist
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> Caused by: java.io.IOException: Queue "default" does not exist
> at 
> org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437)
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941)
> ... 11 more
> i check hadoop queue 
> hadoop queue -showacls
> Queue acls for user :  john
> Queue  Operations
> =
> queue1  submit-job,administer-jobs
> queue2  submit-job,administer-jobs
> queue3  submit-job,administer-jobs
> queue4  submit-job,administer-jobs
> queue5  submit-job,administer-jobs
> queue6  submit-job,administer-jobs
> my config in mapresite.xml:
>
> mapred.jobtracker.taskScheduler
> org.apache.hadoop.mapred.CapacityTaskScheduler
> 
> 
> mapred.queue.names
> queue1,queue2,queue3,queue4,queue5,queue6
> 
> 
> mapred.acls.enabled
> false
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4029) NodeManager status web page should express 'last update' times as seconds ago

2012-03-18 Thread Harsh J (Created) (JIRA)
NodeManager status web page should express 'last update' times as seconds ago
-

 Key: MAPREDUCE-4029
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4029
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: webapps
Affects Versions: 0.23.1
Reporter: Harsh J


The 'Last health update' field on the MR2 apps' nodes page (at 
http://host:8088/cluster/nodes) is a timestamp right now, which isn't really 
informative for what the field means. It ought to be in seconds-ago from now(), 
like was the case in JobTracker for heartbeats.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-4028) Hadoop Capacity-Scheduler

2012-03-18 Thread Harsh J (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-4028.


Resolution: Invalid

Hi,

Your job is submitting with its queue name set to 'default' (cause, its default 
and you have not changed it to queue{1-…}).

This JIRA is for reporting genuinely determined bugs. What you have run into is 
simply a user error. I recommend hitting the mailing list 
(mapreduce-u...@hadoop.apache.org) first for all your user problems, and only 
report bugs after you are sure.

For your current issue, which is not a bug, please re-read the tutorial section 
on how to use queues with your jobs: 
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Submitting+Jobs+to+Queues

> Hadoop Capacity-Scheduler
> -
>
> Key: MAPREDUCE-4028
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/capacity-sched
>Affects Versions: 1.0.0
>Reporter: cldoltd
>
> I config success capacity-scheduler
> But when i run job has error:
>   
> Queue "default" does not exist
> this error is :
> 12/01/18 16:21:04 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:adtech 
> cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
> java.io.IOException: Queue "default" does not exist
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
> Caused by: java.io.IOException: Queue "default" does not exist
> at 
> org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437)
> at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941)
> ... 11 more
> i check hadoop queue 
> hadoop queue -showacls
> Queue acls for user :  john
> Queue  Operations
> =
> queue1  submit-job,administer-jobs
> queue2  submit-job,administer-jobs
> queue3  submit-job,administer-jobs
> queue4  submit-job,administer-jobs
> queue5  submit-job,administer-jobs
> queue6  submit-job,administer-jobs
> my config in mapresite.xml:
>
> mapred.jobtracker.taskScheduler
> org.apache.hadoop.mapred.CapacityTaskScheduler
> 
> 
> mapred.queue.names
> queue1,queue2,queue3,queue4,queue5,queue6
> 
> 
> mapred.acls.enabled
> false
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3992) Reduce fetcher doesn't verify HTTP status code of response

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3992:
--

Target Version/s: 0.24.0, 0.23.3, 1.0.3  (was: 0.23.3, 1.0.2, 0.24.0)

> Reduce fetcher doesn't verify HTTP status code of response
> --
>
> Key: MAPREDUCE-3992
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3992
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1
>Affects Versions: 0.23.1, 0.24.0, 1.0.1
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Attachments: mr-3992.txt
>
>
> Currently, the reduce fetch code doesn't check the HTTP status code of the 
> response. This can lead to the following situation:
> - the map output servlet gets an IOException after setting the headers but 
> before the first call to flush()
> - this causes it to send a response with a non-OK result code, including the 
> exception text as the response body (response.sendError() does this if the 
> response isn't committed)
> - it will still include the response headers indicating it's a valid response
> In the case of a merge-to-memory, the compression codec might then try to 
> interpret the HTML response as compressed data, resulting in either a huge 
> allocation (OOME) or some other nasty error. This bug seems to be present in 
> MR1, but haven't checked trunk/MR2 yet.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3993) reduce fetch catch clause should catch RTEs as well

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3993:
--

Target Version/s: 0.23.3, 1.0.3  (was: 0.23.3, 1.0.2)

> reduce fetch catch clause should catch RTEs as well
> ---
>
> Key: MAPREDUCE-3993
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3993
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 0.23.1, 1.0.2
>Reporter: Todd Lipcon
>
> When using a compression codec for intermediate compression, some cases of 
> corrupt data can cause the codec to throw exceptions other than IOException 
> (eg java.lang.InternalError). This will currently cause the whole reduce task 
> to fail, instead of simply treating it like another case of a failed fetch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (MAPREDUCE-4028) Hadoop Capacity-Scheduler

2012-03-18 Thread cldoltd (Created) (JIRA)
Hadoop Capacity-Scheduler
-

 Key: MAPREDUCE-4028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: contrib/capacity-sched
Affects Versions: 1.0.0
Reporter: cldoltd


I config success capacity-scheduler
But when i run job has error:
  
Queue "default" does not exist

this error is :

12/01/18 16:21:04 ERROR security.UserGroupInformation: 
PriviledgedActionException as:adtech 
cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: 
java.io.IOException: Queue "default" does not exist
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)
Caused by: java.io.IOException: Queue "default" does not exist
at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437)
at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941)
... 11 more


i check hadoop queue 


hadoop queue -showacls
Queue acls for user :  john

Queue  Operations
=
queue1  submit-job,administer-jobs
queue2  submit-job,administer-jobs
queue3  submit-job,administer-jobs
queue4  submit-job,administer-jobs
queue5  submit-job,administer-jobs
queue6  submit-job,administer-jobs

my config in mapresite.xml:

   
mapred.jobtracker.taskScheduler
org.apache.hadoop.mapred.CapacityTaskScheduler



mapred.queue.names
queue1,queue2,queue3,queue4,queue5,queue6



mapred.acls.enabled
false



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException

2012-03-18 Thread Matt Foley (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Foley updated MAPREDUCE-3583:
--

Fix Version/s: (was: 1.1.0)
   (was: 0.24.0)

> ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
> -
>
> Key: MAPREDUCE-3583
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.20.205.0
> Environment: 64-bit Linux:
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
> 17:42:25 UTC 2011 x86_64 GNU/Linux
>Reporter: Zhihong Yu
>Assignee: Zhihong Yu
>Priority: Critical
> Fix For: 0.23.2, 1.0.2
>
> Attachments: mapreduce-3583-trunk-v2.txt, 
> mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, 
> mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, 
> mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, 
> mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, 
> mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, 
> mapreduce-3583-v7.txt, mapreduce-3583.txt
>
>
> HBase PreCommit builds frequently gave us NumberFormatException.
> From 
> https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/:
> {code}
> 2011-12-20 01:44:01,180 WARN  [main] mapred.JobClient(784): No job jar file 
> set.  User classes may not be found. See JobConf(Class) or 
> JobConf#setJar(String).
> java.lang.NumberFormatException: For input string: "18446743988060683582"
>   at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>   at java.lang.Long.parseLong(Long.java:422)
>   at java.lang.Long.parseLong(Long.java:468)
>   at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413)
>   at 
> org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148)
>   at 
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
>   at org.apache.hadoop.mapred.Task.initialize(Task.java:536)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353)
>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> {code}
> From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, 
> causing NFE:
> {code}
> // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss)
>  pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)),
> {code}
> You can find information on the OS at the beginning of 
> https://builds.apache.org/job/PreCommit-HBASE-Build/553/console:
> {code}
> asf011.sp2.ygridcore.net
> Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 
> 17:42:25 UTC 2011 x86_64 GNU/Linux
> core file size  (blocks, -c) 0
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 20
> file size   (blocks, -f) unlimited
> pending signals (-i) 16382
> max locked memory   (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files  (-n) 6
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 2048
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
> 6
> Running in Jenkins mode
> {code}
> From Nicolas Sze:
> {noformat}
> It looks like that the ppid is a 64-bit positive integer but Java long is 
> signed and so only works with 63-bit positive integers.  In your case,
>   2^64 > 18446743988060683582 > 2^63.
> Therefore, there is a NFE. 
> {noformat}
> I propose changing allProcessInfo to Map so that we 
> don't encounter this problem by avoiding parsing large integer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira