[jira] [Commented] (MAPREDUCE-4338) NodeManager daemon is failing to start.
[ https://issues.apache.org/jira/browse/MAPREDUCE-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393661#comment-13393661 ] srikanth ayalasomayajulu commented on MAPREDUCE-4338: - I feel there is a connectivity. can you please let me know where to check the connectivity. This error is obstructing to move forward in my work. Can you please help me > NodeManager daemon is failing to start. > --- > > Key: MAPREDUCE-4338 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4338 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: nodemanager >Affects Versions: 0.23.0 > Environment: Ubuntu Server 11.04, >Reporter: srikanth ayalasomayajulu > Labels: features, hadoop > Fix For: 0.23.0 > > Original Estimate: 4h > Remaining Estimate: 4h > > Node manager daemons is not getting started on the slave machines. and giving > an error like stated below. > 2012-06-12 19:05:56,172 FATAL nodemanager.NodeManager > (NodeManager.java:main(233)) - Error starting NodeManager > org.apache.hadoop.yarn.YarnException: Failed to Start > org.apache.hadoop.yarn.server.nodemanager.NodeManager > at > org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231) > Caused by: org.apache.avro.AvroRuntimeException: > java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132) > at > org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) > ... 2 more > Caused by: java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161) > at > org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128) > ... 3 more > Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: > Call From mvm5/192.168.100.177 to mvm4:8025 failed on connection exception: > java.net.ConnectException: Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139) > at $Proxy14.registerNodeManager(Unknown Source) > at > org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59) > ... 5 more > Caused by: java.net.ConnectException: Call From mvm5/192.168.100.177 to > mvm4:8025 failed on connection exception: java.net.ConnectException: > Connection refused; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:617) > at org.apache.hadoop.ipc.Client.call(Client.java:1089) > at > org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136) > ... 7 more > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:419) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:460) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:557) > at > org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195) > at org.apache.hadoop.ipc.Client.call(Client.java:1065) > ... 8 more > 2012-06-12 19:05:56,184 INFO ipc.Server (Server.java:stop(1709)) - Stopping > server on 47645 > 2012-06-12 19:05:56,184 INFO ipc.Server (Server.java:stop(1709)) - Stopping > server on 4344 > 2012-06-12 19:05:56,190 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stop(199)) - Stopping NodeManager metrics system... > 2012-06-12 19:05:56,190 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:stopSources(408)) - Stopping metrics source JvmMetrics > 2012-06-12 19:05:56,191 INFO nodemanager.NodeManager > (StringUtils.java:run(605)) - SHUTDOWN_MSG:
[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393569#comment-13393569 ] Tsuyoshi OZAWA commented on MAPREDUCE-4343: --- s/course/coarse/ > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > Attachments: MR-4343.1.patch > > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393567#comment-13393567 ] Tsuyoshi OZAWA commented on MAPREDUCE-4343: --- It's very course patch but works on my environment, so could you review it? > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > Attachments: MR-4343.1.patch > > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated MAPREDUCE-4343: -- Attachment: MR-4343.1.patch Harsh, The attached patch allows ResourceManager to use ZKStore. > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > Attachments: MR-4343.1.patch > > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4203) Create equivalent of ProcfsBasedProcessTree for Windows
[ https://issues.apache.org/jira/browse/MAPREDUCE-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393560#comment-13393560 ] Jonathan Eagles commented on MAPREDUCE-4203: Bikas, when you say based on winutils, it makes me think you are basing this patch on some existing code. If so, can you comment on the source and licensing. > Create equivalent of ProcfsBasedProcessTree for Windows > --- > > Key: MAPREDUCE-4203 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4203 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Bikas Saha >Assignee: Bikas Saha > Attachments: MAPREDUCE-4203.branch-1-win.1.patch, > MAPREDUCE-4203.patch, test.cpp > > > ProcfsBasedProcessTree is used by the TaskTracker to get process information > like memory and cpu usage. This information is used to manage resources etc. > The current implementation is based on Linux procfs functionality and hence > does not work on other platforms, specifically windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (MAPREDUCE-297) generalize the TT / JT servers to handle more generic tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J resolved MAPREDUCE-297. --- Resolution: Duplicate Resolving as dupe of MAPREDUCE-279, that has provided a way for this. But, do reopen if I am found wrong in my interpretation of the description. > generalize the TT / JT servers to handle more generic tasks > --- > > Key: MAPREDUCE-297 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-297 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: eric baldeschwieler > > We've been discussing a proposal to generalize the TT / JT servers to handle > more generic tasks and move job specific work out of the job tracker and into > client code so the whole system is both much more general and has more > coherent layering. The result would look more like condor/pbs like systems > (or presumably borg) with map-reduce as a user job. > Such a system would allow the current map-reduce code to coexist with other > work-queuing libraries or maybe even persistent services on the same Hadoop > cluster, although that would be a stretch goal. We'll kick off a thread with > some documents soon. > Our primary goal in going this way would be to get better utilization out of > map-reduce clusters and support a richer scheduling model. The ability to > support alternative job frameworks would just be gravy! > > Putting this in as a place holder. Hope to get folks talking about this to > post some more detail. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-3868) Reenable Raid
[ https://issues.apache.org/jira/browse/MAPREDUCE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393520#comment-13393520 ] Weiyan Wang commented on MAPREDUCE-3868: Move to svn, @Scott, please follow below steps to apply the patch: 1. run MAPREDUCE-3868v1.sh svn (you may need to modify the script to use MKPARENT="--parents") 2. patch -p0 -i MAPREDUCE-3868-3.patch 3. svn add hadoop-assemblies/src/main/resources/assemblies/hadoop-raid-dist.xml 4. svn add hadoop-hdfs-project/hadoop-hdfs-raid/pom.xml > Reenable Raid > - > > Key: MAPREDUCE-3868 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3868 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/raid >Reporter: Scott Chen >Assignee: Weiyan Wang > Attachments: MAPREDUCE-3868-1.patch, MAPREDUCE-3868-2.patch, > MAPREDUCE-3868-3.patch, MAPREDUCE-3868.patch, MAPREDUCE-3868v1.patch, > MAPREDUCE-3868v1.sh > > > Currently Raid is outdated and not compiled. Make it compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393519#comment-13393519 ] Tsuyoshi OZAWA commented on MAPREDUCE-4343: --- I'm trying to solve the 2nd problem now, and I've not estimated how large the work is. I'll report the status before I go to sleep. > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-3868) Reenable Raid
[ https://issues.apache.org/jira/browse/MAPREDUCE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weiyan Wang updated MAPREDUCE-3868: --- Attachment: MAPREDUCE-3868-3.patch > Reenable Raid > - > > Key: MAPREDUCE-3868 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3868 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: contrib/raid >Reporter: Scott Chen >Assignee: Weiyan Wang > Attachments: MAPREDUCE-3868-1.patch, MAPREDUCE-3868-2.patch, > MAPREDUCE-3868-3.patch, MAPREDUCE-3868.patch, MAPREDUCE-3868v1.patch, > MAPREDUCE-3868v1.sh > > > Currently Raid is outdated and not compiled. Make it compile. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393514#comment-13393514 ] Harsh J commented on MAPREDUCE-4343: Hi, With (1) done in your local build, are you able to successfully validate that (2) is possible today already? In case it is (that the RM recovers its state properly, once ZKStore is fixed), we can just do (1) here and add some docs on how to use ZKStore (the configs) and resolve the ticket. > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager
[ https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393510#comment-13393510 ] Tsuyoshi OZAWA commented on MAPREDUCE-4343: --- Harsh, I think this problem should be split into 2 tickets as follows: 1. Fix the runtime error of ZKStore by adding default constructor. 2. Add ZK recovery support to the resource manager. In fact, I've already created the patch for fixing the 1st problem. Should I attach the file here or on the new ticket? > ZK recovery support for ResourceManager > --- > > Key: MAPREDUCE-4343 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Harsh J > > MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's > RM, but looks like it failed to complete it (for scalability reasons? etc?) > and there seems to be no JIRA tracking this feature that has been already > claimed publicly as a good part about YARN. > If it did complete it, we should document how to use it. Setting the > following only yields: > {code} > > yarn.resourcemanager.store.class > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore > > > yarn.resourcemanager.zookeeper-store.address > test.vm:2181/yarn-recovery-store > > {code} > {code} > Error starting ResourceManager > java.lang.RuntimeException: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128) > at > org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32) > at > org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621) > Caused by: java.lang.NoSuchMethodException: > org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.() > at java.lang.Class.getConstructor0(Class.java:2706) > at java.lang.Class.getDeclaredConstructor(Class.java:1985) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122) > ... 2 more > {code} > This JIRA is hence filed to track the addition/completion of recovery via ZK. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sho Shimauchi updated MAPREDUCE-4329: - Attachment: MAPREDUCE-4329.txt replace the comment on MapReducePolicyProvider to Harsh's one. > security.task.umbilical.protocol.acl should not be configurable > --- > > Key: MAPREDUCE-4329 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 1.0.3 >Reporter: Sho Shimauchi >Assignee: Sho Shimauchi > Attachments: MAPREDUCE-4329.txt, MAPREDUCE-4329.txt > > > On running MapReduce job, username is changed to jobid and the job fails. > Exception is as follows: > {code} > 2012-06-08 19:39:26,555 WARN > org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying > to get groups for user job_201206081934_0002 > org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no > such user > at org.apache.hadoop.util.Shell.runCommand(Shell.java:255) > at org.apache.hadoop.util.Shell.run(Shell.java:182) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:79) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026) > at > org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141) > at > org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99) > at org.apache.hadoop.ipc.Server.authorize(Server.java:1659) > at > org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > {code} > This issue can be reproduced by following steps: > 1. set hadoop.security.authorization = true in core-site.xml > {code} > > hadoop.security.authorization > true > > {code} > 2. set any value except for '*' to security.task.umbilical.protocol.acl in > hadoop-policy.xml > {code} > > security.task.umbilical.protocol.acl > sho sho > ACL for TaskUmbilicalProtocol, used by the map and reduce > tasks to communicate with the parent tasktracker. > The ACL is a comma-separated list of user and group names. The user and > group list is separated by a blank. For e.g. "alice,bob users,wheel". > A special value of "*" means all users are allowed. > > {code} > 3. run any mapreduce job. > h4. Code Analysis > ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118 > {code} > UserGroupInformation taskOwner > = > UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString()); > taskOwner.addToken(jt); > > // Set the credentials > defaultConf.setCredentials(credentials); > > final TaskUmbilicalProtocol umbilical = > taskOwner.doAs(new PrivilegedExceptionAction() { > @Override > public TaskUmbilicalProtocol run() throws Exception { > return > (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class, > TaskUmbilicalProtocol.versionID, > address, > defaultConf); > } > }); > {code} > This code indicates that TaskUmbilicalProtocol uses jobid as username. > This code came from MAPREDUCE-1457. > https://issues.apache.org/jira/browse/MAPREDUCE-1457 > Devaraj said as follows in the JIRA: > {quote} > 2) In Child.java, the task authenticates to the TaskTracker using the > jobtoken. The username in the jobtoken is jobId. The doAs block done using > taskOwner is required so that the username mentioned in the token and the one > doing the operation matches. > {quote} > We can't change security.task.umbilical.protocol
[jira] [Commented] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393480#comment-13393480 ] Sho Shimauchi commented on MAPREDUCE-4329: -- Thanks for reviewing the patch! Yeah your comment makes makes more sense, so I'll update the patch to replace the old comment to yours. This property was set as deprecated in MAPREDUCE-2746 so I don't think there is nothing to do for trunk. {code} Configuration.addDeprecation("security.task.umbilical.protocol.acl", new String[] { MRJobConfig.MR_AM_SECURITY_SERVICE_AUTHORIZATION_TASK_UMBILICAL }); {code} > security.task.umbilical.protocol.acl should not be configurable > --- > > Key: MAPREDUCE-4329 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 1.0.3 >Reporter: Sho Shimauchi >Assignee: Sho Shimauchi > Attachments: MAPREDUCE-4329.txt > > > On running MapReduce job, username is changed to jobid and the job fails. > Exception is as follows: > {code} > 2012-06-08 19:39:26,555 WARN > org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying > to get groups for user job_201206081934_0002 > org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no > such user > at org.apache.hadoop.util.Shell.runCommand(Shell.java:255) > at org.apache.hadoop.util.Shell.run(Shell.java:182) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:79) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026) > at > org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141) > at > org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99) > at org.apache.hadoop.ipc.Server.authorize(Server.java:1659) > at > org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > {code} > This issue can be reproduced by following steps: > 1. set hadoop.security.authorization = true in core-site.xml > {code} > > hadoop.security.authorization > true > > {code} > 2. set any value except for '*' to security.task.umbilical.protocol.acl in > hadoop-policy.xml > {code} > > security.task.umbilical.protocol.acl > sho sho > ACL for TaskUmbilicalProtocol, used by the map and reduce > tasks to communicate with the parent tasktracker. > The ACL is a comma-separated list of user and group names. The user and > group list is separated by a blank. For e.g. "alice,bob users,wheel". > A special value of "*" means all users are allowed. > > {code} > 3. run any mapreduce job. > h4. Code Analysis > ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118 > {code} > UserGroupInformation taskOwner > = > UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString()); > taskOwner.addToken(jt); > > // Set the credentials > defaultConf.setCredentials(credentials); > > final TaskUmbilicalProtocol umbilical = > taskOwner.doAs(new PrivilegedExceptionAction() { > @Override > public TaskUmbilicalProtocol run() throws Exception { > return > (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class, > TaskUmbilicalProtocol.versionID, > address, > defaultConf); > } > }); > {code} > This code indicates that TaskUmbilicalProtocol uses jobid as username. > This code came from MAPREDUCE-1457. > https://issues.apache.org/jira/browse/MAPREDUCE
[jira] [Created] (MAPREDUCE-4345) ZK-based High Availability (HA) for ResourceManager (RM)
Harsh J created MAPREDUCE-4345: -- Summary: ZK-based High Availability (HA) for ResourceManager (RM) Key: MAPREDUCE-4345 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4345 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Harsh J One of the goals presented on MAPREDUCE-279 was to have high availability. One way that was discussed, per Mahadev/others on https://issues.apache.org/jira/browse/MAPREDUCE-2648 and other places, was ZK: {quote} Am not sure, if you already know about the MR-279 branch (the next version of MR framework). We've been trying to integrate ZK into the framework from the beginning. As for now, we are just doing restart with ZK but soon we should have a HA soln with ZK. {quote} There is now MAPREDUCE-4343 that tracks recoverability via ZK. This JIRA is meant to track HA via ZK. Currently there isn't a HA solution for RM, via ZK or otherwise. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393476#comment-13393476 ] Harsh J commented on MAPREDUCE-4329: Thanks Sho. Patch looks good, just one comment (about the comment): {code} + * Since TaskUmbilicalProtocol always uses job ID as username, setting + * specific user/group to security.task.umbilical.protocol.acl doesn't + * work. This property was removed from hadoop-policy.xml in + * MAPREDUCE-4329 but the property itself still works because we cannot + * remove the following code. {code} I think the following may be better, what are your thoughts? {code} Since TaskUmbilicalProtocol uses the job ID (of the task that uses it, hence dynamic) as its identifier, due to the security implementation, setting specific users/groups in security.task.umbilical.protocol.acl has no effect other than breaking jobs. This should never be configured to anything apart from '*', and hence MAPREDUCE-4329 removes it from the docs but this line remains to not break the protocol provider. {code} Also, is this issue present with MR2 ACLs too Sho? Or let me know if you'd like me to investigate that instead. We can do with a supplementary patch (Trunk first, branch-1 afterwards). > security.task.umbilical.protocol.acl should not be configurable > --- > > Key: MAPREDUCE-4329 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 1.0.3 >Reporter: Sho Shimauchi >Assignee: Sho Shimauchi > Attachments: MAPREDUCE-4329.txt > > > On running MapReduce job, username is changed to jobid and the job fails. > Exception is as follows: > {code} > 2012-06-08 19:39:26,555 WARN > org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying > to get groups for user job_201206081934_0002 > org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no > such user > at org.apache.hadoop.util.Shell.runCommand(Shell.java:255) > at org.apache.hadoop.util.Shell.run(Shell.java:182) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:461) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:444) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68) > at > org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45) > at org.apache.hadoop.security.Groups.getGroups(Groups.java:79) > at > org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026) > at > org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141) > at > org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99) > at org.apache.hadoop.ipc.Server.authorize(Server.java:1659) > at > org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320) > at > org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > {code} > This issue can be reproduced by following steps: > 1. set hadoop.security.authorization = true in core-site.xml > {code} > > hadoop.security.authorization > true > > {code} > 2. set any value except for '*' to security.task.umbilical.protocol.acl in > hadoop-policy.xml > {code} > > security.task.umbilical.protocol.acl > sho sho > ACL for TaskUmbilicalProtocol, used by the map and reduce > tasks to communicate with the parent tasktracker. > The ACL is a comma-separated list of user and group names. The user and > group list is separated by a blank. For e.g. "alice,bob users,wheel". > A special value of "*" means all users are allowed. > > {code} > 3. run any mapreduce job. > h4. Code Analysis > ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118 > {code} > UserGroupInformation taskOwner > = > UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString()); > taskOwner.addToken(jt); >