[jira] [Updated] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error
[ https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated MAPREDUCE-4030: - Target Version/s: 0.23.2 > If the nodemanager on which the maptask is executed is going down before the > mapoutput is consumed by the reducer,then the job is failing with shuffle > error > > > Key: MAPREDUCE-4030 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Nishan Shetty > > My cluster has 2 NM's. > The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1. > When the job execution is in progress and Mappers has finished about 99% > completion,one of the NM has gone down. > The job has failed with the following trace > "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error > in shuffle in fetcher#1 at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: > java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error
[ https://issues.apache.org/jira/browse/MAPREDUCE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232479#comment-13232479 ] Arun C Murthy commented on MAPREDUCE-4030: -- Nishan - can u pls check reducer log and AM log to verify if the AM was notified of the map and it was re-run? That should happen before the reduce bailed out. Thanks. > If the nodemanager on which the maptask is executed is going down before the > mapoutput is consumed by the reducer,then the job is failing with shuffle > error > > > Key: MAPREDUCE-4030 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Reporter: Nishan Shetty > > My cluster has 2 NM's. > The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1. > When the job execution is in progress and Mappers has finished about 99% > completion,one of the NM has gone down. > The job has failed with the following trace > "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error > in shuffle in fetcher#1 at > org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at > org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at > org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: > java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253) > at > org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4030) If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error
If the nodemanager on which the maptask is executed is going down before the mapoutput is consumed by the reducer,then the job is failing with shuffle error Key: MAPREDUCE-4030 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4030 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Reporter: Nishan Shetty My cluster has 2 NM's. The value of "mapreduce.job.reduce.slowstart.completedmaps" is set to 1. When the job execution is in progress and Mappers has finished about 99% completion,one of the NM has gone down. The job has failed with the following trace "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#1 at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:123) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:371) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:148) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:143) Caused by: java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.checkReducerHealth(ShuffleScheduler.java:253) at org.apache.hadoop.mapreduce.task.reduce.ShuffleScheduler.copyFailed(ShuffleScheduler.java:187) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:240) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4008) ResourceManager throws MetricsException on start up saying QueueMetrics MBean already exists
[ https://issues.apache.org/jira/browse/MAPREDUCE-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232425#comment-13232425 ] Devaraj K commented on MAPREDUCE-4008: -- {quote} -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {quote} It doesn't need new tests since it is logging the exception when this issue happens. > ResourceManager throws MetricsException on start up saying QueueMetrics MBean > already exists > > > Key: MAPREDUCE-4008 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4008 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2, resourcemanager, scheduler >Affects Versions: 0.24.0 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-4008.patch > > > {code:xml} > 2012-03-14 15:22:23,089 WARN org.apache.hadoop.metrics2.util.MBeans: Error > creating MBean object name: > Hadoop:service=ResourceManager,name=QueueMetrics,q0=default > org.apache.hadoop.metrics2.MetricsException: > org.apache.hadoop.metrics2.MetricsException: > Hadoop:service=ResourceManager,name=QueueMetrics,q0=default already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:117) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:102) > at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:91) > at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:55) > at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:218) > at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:93) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:243) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:227) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$3.invoke(MetricsSystemImpl.java:288) > at $Proxy6.postStart(Unknown Source) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.start(MetricsSystemImpl.java:183) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.init(MetricsSystemImpl.java:155) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.init(DefaultMetricsSystem.java:54) > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.initialize(DefaultMetricsSystem.java:50) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.start(ResourceManager.java:454) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:588) > Caused by: org.apache.hadoop.metrics2.MetricsException: > Hadoop:service=ResourceManager,name=QueueMetrics,q0=default already exists! > at > org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:113) > ... 19 more > 2012-03-14 15:22:23,090 WARN org.apache.hadoop.metrics2.util.MBeans: Failed > to register MBean "null" > javax.management.RuntimeOperationsException: Exception occurred trying to > register the MBean > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:969) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:917) > at > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:312) > at > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:482) > at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57) > at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:218) > at > org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:93) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:243) > at > org.apache.hadoop.metrics2.impl.MetricsSystemImpl$1.postStart(MetricsSystemImpl.java:227) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcces
[jira] [Commented] (MAPREDUCE-3992) Reduce fetcher doesn't verify HTTP status code of response
[ https://issues.apache.org/jira/browse/MAPREDUCE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232303#comment-13232303 ] Steve Loughran commented on MAPREDUCE-3992: --- +1 * I understand why there aren't tests here, it's very hard to set up a test for this. * I've reviewed the code and it looks fairly straightforward -bails out early on a reponse != 200. * If there's no error code checking in one part of the system, there always the risk the same thing has happened elsewhere, which is something that we need to keep an eye out for. > Reduce fetcher doesn't verify HTTP status code of response > -- > > Key: MAPREDUCE-3992 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3992 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 0.23.1, 0.24.0, 1.0.1 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mr-3992.txt > > > Currently, the reduce fetch code doesn't check the HTTP status code of the > response. This can lead to the following situation: > - the map output servlet gets an IOException after setting the headers but > before the first call to flush() > - this causes it to send a response with a non-OK result code, including the > exception text as the response body (response.sendError() does this if the > response isn't committed) > - it will still include the response headers indicating it's a valid response > In the case of a merge-to-memory, the compression codec might then try to > interpret the HTML response as compressed data, resulting in either a huge > allocation (OOME) or some other nasty error. This bug seems to be present in > MR1, but haven't checked trunk/MR2 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4028) Hadoop Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13232287#comment-13232287 ] cldoltd commented on MAPREDUCE-4028: thank you! my job has run successfully > Hadoop Capacity-Scheduler > - > > Key: MAPREDUCE-4028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/capacity-sched >Affects Versions: 1.0.0 >Reporter: cldoltd > > I config success capacity-scheduler > But when i run job has error: > > Queue "default" does not exist > this error is : > 12/01/18 16:21:04 ERROR security.UserGroupInformation: > PriviledgedActionException as:adtech > cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.io.IOException: Queue "default" does not exist > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) > Caused by: java.io.IOException: Queue "default" does not exist > at > org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437) > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941) > ... 11 more > i check hadoop queue > hadoop queue -showacls > Queue acls for user : john > Queue Operations > = > queue1 submit-job,administer-jobs > queue2 submit-job,administer-jobs > queue3 submit-job,administer-jobs > queue4 submit-job,administer-jobs > queue5 submit-job,administer-jobs > queue6 submit-job,administer-jobs > my config in mapresite.xml: > > mapred.jobtracker.taskScheduler > org.apache.hadoop.mapred.CapacityTaskScheduler > > > mapred.queue.names > queue1,queue2,queue3,queue4,queue5,queue6 > > > mapred.acls.enabled > false > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4029) NodeManager status web page should express 'last update' times as seconds ago
NodeManager status web page should express 'last update' times as seconds ago - Key: MAPREDUCE-4029 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4029 Project: Hadoop Map/Reduce Issue Type: Improvement Components: webapps Affects Versions: 0.23.1 Reporter: Harsh J The 'Last health update' field on the MR2 apps' nodes page (at http://host:8088/cluster/nodes) is a timestamp right now, which isn't really informative for what the field means. It ought to be in seconds-ago from now(), like was the case in JobTracker for heartbeats. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-4028) Hadoop Capacity-Scheduler
[ https://issues.apache.org/jira/browse/MAPREDUCE-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved MAPREDUCE-4028. Resolution: Invalid Hi, Your job is submitting with its queue name set to 'default' (cause, its default and you have not changed it to queue{1-…}). This JIRA is for reporting genuinely determined bugs. What you have run into is simply a user error. I recommend hitting the mailing list (mapreduce-u...@hadoop.apache.org) first for all your user problems, and only report bugs after you are sure. For your current issue, which is not a bug, please re-read the tutorial section on how to use queues with your jobs: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Submitting+Jobs+to+Queues > Hadoop Capacity-Scheduler > - > > Key: MAPREDUCE-4028 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/capacity-sched >Affects Versions: 1.0.0 >Reporter: cldoltd > > I config success capacity-scheduler > But when i run job has error: > > Queue "default" does not exist > this error is : > 12/01/18 16:21:04 ERROR security.UserGroupInformation: > PriviledgedActionException as:adtech > cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.io.IOException: Queue "default" does not exist > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) > Caused by: java.io.IOException: Queue "default" does not exist > at > org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437) > at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941) > ... 11 more > i check hadoop queue > hadoop queue -showacls > Queue acls for user : john > Queue Operations > = > queue1 submit-job,administer-jobs > queue2 submit-job,administer-jobs > queue3 submit-job,administer-jobs > queue4 submit-job,administer-jobs > queue5 submit-job,administer-jobs > queue6 submit-job,administer-jobs > my config in mapresite.xml: > > mapred.jobtracker.taskScheduler > org.apache.hadoop.mapred.CapacityTaskScheduler > > > mapred.queue.names > queue1,queue2,queue3,queue4,queue5,queue6 > > > mapred.acls.enabled > false > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3992) Reduce fetcher doesn't verify HTTP status code of response
[ https://issues.apache.org/jira/browse/MAPREDUCE-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-3992: -- Target Version/s: 0.24.0, 0.23.3, 1.0.3 (was: 0.23.3, 1.0.2, 0.24.0) > Reduce fetcher doesn't verify HTTP status code of response > -- > > Key: MAPREDUCE-3992 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3992 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1 >Affects Versions: 0.23.1, 0.24.0, 1.0.1 >Reporter: Todd Lipcon >Assignee: Todd Lipcon > Attachments: mr-3992.txt > > > Currently, the reduce fetch code doesn't check the HTTP status code of the > response. This can lead to the following situation: > - the map output servlet gets an IOException after setting the headers but > before the first call to flush() > - this causes it to send a response with a non-OK result code, including the > exception text as the response body (response.sendError() does this if the > response isn't committed) > - it will still include the response headers indicating it's a valid response > In the case of a merge-to-memory, the compression codec might then try to > interpret the HTML response as compressed data, resulting in either a huge > allocation (OOME) or some other nasty error. This bug seems to be present in > MR1, but haven't checked trunk/MR2 yet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3993) reduce fetch catch clause should catch RTEs as well
[ https://issues.apache.org/jira/browse/MAPREDUCE-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-3993: -- Target Version/s: 0.23.3, 1.0.3 (was: 0.23.3, 1.0.2) > reduce fetch catch clause should catch RTEs as well > --- > > Key: MAPREDUCE-3993 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3993 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv1, mrv2 >Affects Versions: 0.23.1, 1.0.2 >Reporter: Todd Lipcon > > When using a compression codec for intermediate compression, some cases of > corrupt data can cause the codec to throw exceptions other than IOException > (eg java.lang.InternalError). This will currently cause the whole reduce task > to fail, instead of simply treating it like another case of a failed fetch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (MAPREDUCE-4028) Hadoop Capacity-Scheduler
Hadoop Capacity-Scheduler - Key: MAPREDUCE-4028 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4028 Project: Hadoop Map/Reduce Issue Type: Bug Components: contrib/capacity-sched Affects Versions: 1.0.0 Reporter: cldoltd I config success capacity-scheduler But when i run job has error: Queue "default" does not exist this error is : 12/01/18 16:21:04 ERROR security.UserGroupInformation: PriviledgedActionException as:adtech cause:org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.IOException: Queue "default" does not exist at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3943) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382) Caused by: java.io.IOException: Queue "default" does not exist at org.apache.hadoop.mapred.JobInProgress.(JobInProgress.java:437) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3941) ... 11 more i check hadoop queue hadoop queue -showacls Queue acls for user : john Queue Operations = queue1 submit-job,administer-jobs queue2 submit-job,administer-jobs queue3 submit-job,administer-jobs queue4 submit-job,administer-jobs queue5 submit-job,administer-jobs queue6 submit-job,administer-jobs my config in mapresite.xml: mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.CapacityTaskScheduler mapred.queue.names queue1,queue2,queue3,queue4,queue5,queue6 mapred.acls.enabled false -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3583) ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException
[ https://issues.apache.org/jira/browse/MAPREDUCE-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Foley updated MAPREDUCE-3583: -- Fix Version/s: (was: 1.1.0) (was: 0.24.0) > ProcfsBasedProcessTree#constructProcessInfo() may throw NumberFormatException > - > > Key: MAPREDUCE-3583 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3583 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 0.20.205.0 > Environment: 64-bit Linux: > asf011.sp2.ygridcore.net > Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 > 17:42:25 UTC 2011 x86_64 GNU/Linux >Reporter: Zhihong Yu >Assignee: Zhihong Yu >Priority: Critical > Fix For: 0.23.2, 1.0.2 > > Attachments: mapreduce-3583-trunk-v2.txt, > mapreduce-3583-trunk-v2.txt, mapreduce-3583-trunk-v3.txt, > mapreduce-3583-trunk-v4.txt, mapreduce-3583-trunk-v5.txt, > mapreduce-3583-trunk-v6.txt, mapreduce-3583-trunk-v7.txt, > mapreduce-3583-trunk.txt, mapreduce-3583-v2.txt, mapreduce-3583-v3.txt, > mapreduce-3583-v4.txt, mapreduce-3583-v5.txt, mapreduce-3583-v6.txt, > mapreduce-3583-v7.txt, mapreduce-3583.txt > > > HBase PreCommit builds frequently gave us NumberFormatException. > From > https://builds.apache.org/job/PreCommit-HBASE-Build/553//testReport/org.apache.hadoop.hbase.mapreduce/TestHFileOutputFormat/testMRIncrementalLoad/: > {code} > 2011-12-20 01:44:01,180 WARN [main] mapred.JobClient(784): No job jar file > set. User classes may not be found. See JobConf(Class) or > JobConf#setJar(String). > java.lang.NumberFormatException: For input string: "18446743988060683582" > at > java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) > at java.lang.Long.parseLong(Long.java:422) > at java.lang.Long.parseLong(Long.java:468) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:413) > at > org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:148) > at > org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401) > at org.apache.hadoop.mapred.Task.initialize(Task.java:536) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:353) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083) > at org.apache.hadoop.mapred.Child.main(Child.java:249) > {code} > From hadoop 0.20.205 source code, looks like ppid was 18446743988060683582, > causing NFE: > {code} > // Set (name) (ppid) (pgrpId) (session) (utime) (stime) (vsize) (rss) > pinfo.updateProcessInfo(m.group(2), Integer.parseInt(m.group(3)), > {code} > You can find information on the OS at the beginning of > https://builds.apache.org/job/PreCommit-HBASE-Build/553/console: > {code} > asf011.sp2.ygridcore.net > Linux asf011.sp2.ygridcore.net 2.6.32-33-server #71-Ubuntu SMP Wed Jul 20 > 17:42:25 UTC 2011 x86_64 GNU/Linux > core file size (blocks, -c) 0 > data seg size (kbytes, -d) unlimited > scheduling priority (-e) 20 > file size (blocks, -f) unlimited > pending signals (-i) 16382 > max locked memory (kbytes, -l) 64 > max memory size (kbytes, -m) unlimited > open files (-n) 6 > pipe size(512 bytes, -p) 8 > POSIX message queues (bytes, -q) 819200 > real-time priority (-r) 0 > stack size (kbytes, -s) 8192 > cpu time (seconds, -t) unlimited > max user processes (-u) 2048 > virtual memory (kbytes, -v) unlimited > file locks (-x) unlimited > 6 > Running in Jenkins mode > {code} > From Nicolas Sze: > {noformat} > It looks like that the ppid is a 64-bit positive integer but Java long is > signed and so only works with 63-bit positive integers. In your case, > 2^64 > 18446743988060683582 > 2^63. > Therefore, there is a NFE. > {noformat} > I propose changing allProcessInfo to Map so that we > don't encounter this problem by avoiding parsing large integer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira