[jira] [Commented] (MAPREDUCE-4338) NodeManager daemon is failing to start.

2012-06-17 Thread srikanth ayalasomayajulu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393661#comment-13393661
 ] 

srikanth ayalasomayajulu commented on MAPREDUCE-4338:
-

I feel there is a connectivity. can you please let me know where to check the 
connectivity. This error is obstructing to move forward in my work. Can you 
please help me 

> NodeManager daemon is failing to start.
> ---
>
> Key: MAPREDUCE-4338
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4338
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 0.23.0
> Environment: Ubuntu Server 11.04, 
>Reporter: srikanth ayalasomayajulu
>  Labels: features, hadoop
> Fix For: 0.23.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Node manager daemons is not getting started on the slave machines. and giving 
> an error like stated below.
> 2012-06-12 19:05:56,172 FATAL nodemanager.NodeManager 
> (NodeManager.java:main(233)) - Error starting NodeManager
> org.apache.hadoop.yarn.YarnException: Failed to Start 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager
> at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:231)
> Caused by: org.apache.avro.AvroRuntimeException: 
> java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:132)
> at 
> org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
> ... 2 more
> Caused by: java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:66)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:161)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.start(NodeStatusUpdaterImpl.java:128)
> ... 3 more
> Caused by: com.google.protobuf.ServiceException: java.net.ConnectException: 
> Call From mvm5/192.168.100.177 to mvm4:8025 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
> at $Proxy14.registerNodeManager(Unknown Source)
> at 
> org.apache.hadoop.yarn.server.api.impl.pb.client.ResourceTrackerPBClientImpl.registerNodeManager(ResourceTrackerPBClientImpl.java:59)
> ... 5 more
> Caused by: java.net.ConnectException: Call From mvm5/192.168.100.177 to 
> mvm4:8025 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:617)
> at org.apache.hadoop.ipc.Client.call(Client.java:1089)
> at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
> ... 7 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:419)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:460)
> at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:557)
> at 
> org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:205)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1195)
> at org.apache.hadoop.ipc.Client.call(Client.java:1065)
> ... 8 more
> 2012-06-12 19:05:56,184 INFO  ipc.Server (Server.java:stop(1709)) - Stopping 
> server on 47645
> 2012-06-12 19:05:56,184 INFO  ipc.Server (Server.java:stop(1709)) - Stopping 
> server on 4344
> 2012-06-12 19:05:56,190 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stop(199)) - Stopping NodeManager metrics system...
> 2012-06-12 19:05:56,190 INFO  impl.MetricsSystemImpl 
> (MetricsSystemImpl.java:stopSources(408)) - Stopping metrics source JvmMetrics
> 2012-06-12 19:05:56,191 INFO  nodemanager.NodeManager 
> (StringUtils.java:run(605)) - SHUTDOWN_MSG:

[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393569#comment-13393569
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4343:
---

s/course/coarse/

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
> Attachments: MR-4343.1.patch
>
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393567#comment-13393567
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4343:
---

It's very course patch but works on my environment, so could you review it?

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
> Attachments: MR-4343.1.patch
>
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-4343:
--

Attachment: MR-4343.1.patch

Harsh,

The attached patch allows ResourceManager to use ZKStore.

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
> Attachments: MR-4343.1.patch
>
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4203) Create equivalent of ProcfsBasedProcessTree for Windows

2012-06-17 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393560#comment-13393560
 ] 

Jonathan Eagles commented on MAPREDUCE-4203:


Bikas, when you say based on winutils, it makes me think you are basing this 
patch on some existing code. If so, can you comment on the source and licensing.

> Create equivalent of ProcfsBasedProcessTree for Windows
> ---
>
> Key: MAPREDUCE-4203
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4203
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Bikas Saha
>Assignee: Bikas Saha
> Attachments: MAPREDUCE-4203.branch-1-win.1.patch, 
> MAPREDUCE-4203.patch, test.cpp
>
>
> ProcfsBasedProcessTree is used by the TaskTracker to get process information 
> like memory and cpu usage. This information is used to manage resources etc. 
> The current implementation is based on Linux procfs functionality and hence 
> does not work on other platforms, specifically windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (MAPREDUCE-297) generalize the TT / JT servers to handle more generic tasks

2012-06-17 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-297.
---

Resolution: Duplicate

Resolving as dupe of MAPREDUCE-279, that has provided a way for this.

But, do reopen if I am found wrong in my interpretation of the description.

> generalize the TT / JT servers to handle more generic tasks
> ---
>
> Key: MAPREDUCE-297
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-297
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: eric baldeschwieler
>
> We've been discussing a proposal to generalize the TT / JT servers to handle 
> more generic tasks and move job specific work out of the job tracker and into 
> client code so the whole system is both much more general and has more 
> coherent layering.  The result would look more like condor/pbs like systems 
> (or presumably borg) with map-reduce as a user job.
> Such a system would allow the current map-reduce code to coexist with other 
> work-queuing libraries or maybe even persistent services on the same Hadoop 
> cluster, although that would be a stretch goal.  We'll kick off a thread with 
> some documents soon.
> Our primary goal in going this way would be to get better utilization out of 
> map-reduce clusters and support a richer scheduling model.  The ability to 
> support alternative job frameworks would just be gravy!
> 
> Putting this in as a place holder.  Hope to get folks talking about this to 
> post some more detail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-3868) Reenable Raid

2012-06-17 Thread Weiyan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393520#comment-13393520
 ] 

Weiyan Wang commented on MAPREDUCE-3868:


Move to svn, @Scott, please follow below steps to apply the patch:
1. run MAPREDUCE-3868v1.sh svn (you may need to modify the script to use 
MKPARENT="--parents")
2. patch -p0 -i MAPREDUCE-3868-3.patch
3. svn add hadoop-assemblies/src/main/resources/assemblies/hadoop-raid-dist.xml
4. svn add hadoop-hdfs-project/hadoop-hdfs-raid/pom.xml

> Reenable Raid
> -
>
> Key: MAPREDUCE-3868
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3868
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/raid
>Reporter: Scott Chen
>Assignee: Weiyan Wang
> Attachments: MAPREDUCE-3868-1.patch, MAPREDUCE-3868-2.patch, 
> MAPREDUCE-3868-3.patch, MAPREDUCE-3868.patch, MAPREDUCE-3868v1.patch, 
> MAPREDUCE-3868v1.sh
>
>
> Currently Raid is outdated and not compiled. Make it compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393519#comment-13393519
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4343:
---

I'm trying to solve the 2nd problem now, and I've not estimated how large the 
work is.
I'll report the status before I go to sleep.

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-3868) Reenable Raid

2012-06-17 Thread Weiyan Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiyan Wang updated MAPREDUCE-3868:
---

Attachment: MAPREDUCE-3868-3.patch

> Reenable Raid
> -
>
> Key: MAPREDUCE-3868
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3868
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: contrib/raid
>Reporter: Scott Chen
>Assignee: Weiyan Wang
> Attachments: MAPREDUCE-3868-1.patch, MAPREDUCE-3868-2.patch, 
> MAPREDUCE-3868-3.patch, MAPREDUCE-3868.patch, MAPREDUCE-3868v1.patch, 
> MAPREDUCE-3868v1.sh
>
>
> Currently Raid is outdated and not compiled. Make it compile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393514#comment-13393514
 ] 

Harsh J commented on MAPREDUCE-4343:


Hi,

With (1) done in your local build, are you able to successfully validate that 
(2) is possible today already? In case it is (that the RM recovers its state 
properly, once ZKStore is fixed), we can just do (1) here and add some docs on 
how to use ZKStore (the configs) and resolve the ticket.

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4343) ZK recovery support for ResourceManager

2012-06-17 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393510#comment-13393510
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-4343:
---

Harsh,

I think this problem should be split into 2 tickets as follows:

1. Fix the runtime error of ZKStore by adding default constructor.
2. Add ZK recovery support to the resource manager.

In fact, I've already created the patch for fixing the 1st problem.
Should I attach the file here or on the new ticket?

> ZK recovery support for ResourceManager
> ---
>
> Key: MAPREDUCE-4343
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4343
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Harsh J
>
> MAPREDUCE-279 included bits and pieces of possible ZK integration for YARN's 
> RM, but looks like it failed to complete it (for scalability reasons? etc?) 
> and there seems to be no JIRA tracking this feature that has been already 
> claimed publicly as a good part about YARN.
> If it did complete it, we should document how to use it. Setting the 
> following only yields:
> {code}
> 
> yarn.resourcemanager.store.class
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore
> 
> 
> yarn.resourcemanager.zookeeper-store.address
> test.vm:2181/yarn-recovery-store
> 
> {code}
> {code}
> Error starting ResourceManager
> java.lang.RuntimeException: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:128)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.StoreFactory.getStore(StoreFactory.java:32)
> at 
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:621)
> Caused by: java.lang.NoSuchMethodException: 
> org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKStore.()
> at java.lang.Class.getConstructor0(Class.java:2706)
> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:122)
> ... 2 more
> {code}
> This JIRA is hence filed to track the addition/completion of recovery via ZK.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable

2012-06-17 Thread Sho Shimauchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sho Shimauchi updated MAPREDUCE-4329:
-

Attachment: MAPREDUCE-4329.txt

replace the comment on MapReducePolicyProvider to Harsh's one.

> security.task.umbilical.protocol.acl should not be configurable
> ---
>
> Key: MAPREDUCE-4329
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.3
>Reporter: Sho Shimauchi
>Assignee: Sho Shimauchi
> Attachments: MAPREDUCE-4329.txt, MAPREDUCE-4329.txt
>
>
> On running MapReduce job, username is changed to jobid and the job fails.
> Exception is as follows:
> {code}
> 2012-06-08 19:39:26,555 WARN 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying 
> to get groups for user job_201206081934_0002
> org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no 
> such user
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
> at org.apache.hadoop.util.Shell.run(Shell.java:182)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45)
> at org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
> at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026)
> at 
> org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141)
> at 
> org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99)
> at org.apache.hadoop.ipc.Server.authorize(Server.java:1659)
> at 
> org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320)
> at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286)
> at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> {code}
> This issue can be reproduced by following steps:
> 1. set hadoop.security.authorization = true in core-site.xml
> {code}
>   
> hadoop.security.authorization
> true
>   
> {code}
> 2. set any value except for '*' to security.task.umbilical.protocol.acl in 
> hadoop-policy.xml
> {code}
>   
> security.task.umbilical.protocol.acl
> sho sho
> ACL for TaskUmbilicalProtocol, used by the map and reduce 
> tasks to communicate with the parent tasktracker. 
> The ACL is a comma-separated list of user and group names. The user and 
> group list is separated by a blank. For e.g. "alice,bob users,wheel". 
> A special value of "*" means all users are allowed.
>   
> {code}
> 3. run any mapreduce job.
> h4. Code Analysis
> ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118
> {code}
> UserGroupInformation taskOwner 
>  = 
> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
> taskOwner.addToken(jt);
> 
> // Set the credentials
> defaultConf.setCredentials(credentials);
> 
> final TaskUmbilicalProtocol umbilical = 
>   taskOwner.doAs(new PrivilegedExceptionAction() {
> @Override
> public TaskUmbilicalProtocol run() throws Exception {
>   return 
> (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
>   TaskUmbilicalProtocol.versionID,
>   address,
>   defaultConf);
> }
> });
> {code}
> This code indicates that TaskUmbilicalProtocol uses jobid as username.
> This code came from MAPREDUCE-1457. 
> https://issues.apache.org/jira/browse/MAPREDUCE-1457
> Devaraj said as follows in the JIRA:
> {quote}
> 2) In Child.java, the task authenticates to the TaskTracker using the 
> jobtoken. The username in the jobtoken is jobId. The doAs block done using 
> taskOwner is required so that the username mentioned in the token and the one 
> doing the operation matches.
> {quote}
> We can't change security.task.umbilical.protocol

[jira] [Commented] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable

2012-06-17 Thread Sho Shimauchi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393480#comment-13393480
 ] 

Sho Shimauchi commented on MAPREDUCE-4329:
--

Thanks for reviewing the patch!
Yeah your comment makes makes more sense, so I'll update the patch to replace 
the old comment to yours.


This property was set as deprecated in MAPREDUCE-2746 so I don't think there is 
nothing to do for trunk.

{code}
Configuration.addDeprecation("security.task.umbilical.protocol.acl", 
new String[] {
MRJobConfig.MR_AM_SECURITY_SERVICE_AUTHORIZATION_TASK_UMBILICAL   
});
{code}

> security.task.umbilical.protocol.acl should not be configurable
> ---
>
> Key: MAPREDUCE-4329
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.3
>Reporter: Sho Shimauchi
>Assignee: Sho Shimauchi
> Attachments: MAPREDUCE-4329.txt
>
>
> On running MapReduce job, username is changed to jobid and the job fails.
> Exception is as follows:
> {code}
> 2012-06-08 19:39:26,555 WARN 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying 
> to get groups for user job_201206081934_0002
> org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no 
> such user
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
> at org.apache.hadoop.util.Shell.run(Shell.java:182)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45)
> at org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
> at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026)
> at 
> org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141)
> at 
> org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99)
> at org.apache.hadoop.ipc.Server.authorize(Server.java:1659)
> at 
> org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320)
> at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286)
> at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> {code}
> This issue can be reproduced by following steps:
> 1. set hadoop.security.authorization = true in core-site.xml
> {code}
>   
> hadoop.security.authorization
> true
>   
> {code}
> 2. set any value except for '*' to security.task.umbilical.protocol.acl in 
> hadoop-policy.xml
> {code}
>   
> security.task.umbilical.protocol.acl
> sho sho
> ACL for TaskUmbilicalProtocol, used by the map and reduce 
> tasks to communicate with the parent tasktracker. 
> The ACL is a comma-separated list of user and group names. The user and 
> group list is separated by a blank. For e.g. "alice,bob users,wheel". 
> A special value of "*" means all users are allowed.
>   
> {code}
> 3. run any mapreduce job.
> h4. Code Analysis
> ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118
> {code}
> UserGroupInformation taskOwner 
>  = 
> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
> taskOwner.addToken(jt);
> 
> // Set the credentials
> defaultConf.setCredentials(credentials);
> 
> final TaskUmbilicalProtocol umbilical = 
>   taskOwner.doAs(new PrivilegedExceptionAction() {
> @Override
> public TaskUmbilicalProtocol run() throws Exception {
>   return 
> (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
>   TaskUmbilicalProtocol.versionID,
>   address,
>   defaultConf);
> }
> });
> {code}
> This code indicates that TaskUmbilicalProtocol uses jobid as username.
> This code came from MAPREDUCE-1457. 
> https://issues.apache.org/jira/browse/MAPREDUCE

[jira] [Created] (MAPREDUCE-4345) ZK-based High Availability (HA) for ResourceManager (RM)

2012-06-17 Thread Harsh J (JIRA)
Harsh J created MAPREDUCE-4345:
--

 Summary: ZK-based High Availability (HA) for ResourceManager (RM)
 Key: MAPREDUCE-4345
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4345
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Harsh J


One of the goals presented on MAPREDUCE-279 was to have high availability. One 
way that was discussed, per Mahadev/others on 
https://issues.apache.org/jira/browse/MAPREDUCE-2648 and other places, was ZK:

{quote}
Am not sure, if you already know about the MR-279 branch (the next version of 
MR framework). We've been trying to integrate ZK into the framework from the 
beginning. As for now, we are just doing restart with ZK but soon we should 
have a HA soln with ZK.
{quote}

There is now MAPREDUCE-4343 that tracks recoverability via ZK. This JIRA is 
meant to track HA via ZK.

Currently there isn't a HA solution for RM, via ZK or otherwise.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (MAPREDUCE-4329) security.task.umbilical.protocol.acl should not be configurable

2012-06-17 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393476#comment-13393476
 ] 

Harsh J commented on MAPREDUCE-4329:


Thanks Sho. Patch looks good, just one comment (about the comment):

{code}
+   * Since TaskUmbilicalProtocol always uses job ID as username, setting
+   * specific user/group to security.task.umbilical.protocol.acl doesn't
+   * work. This property was removed from hadoop-policy.xml in
+   * MAPREDUCE-4329 but the property itself still works because we cannot
+   * remove the following code.
{code}

I think the following may be better, what are your thoughts?

{code}
Since TaskUmbilicalProtocol uses the job ID (of the task that uses it, hence 
dynamic) as its identifier, due to the security implementation, setting 
specific users/groups in security.task.umbilical.protocol.acl has no effect 
other than breaking jobs. This should never be configured to anything apart 
from '*', and hence MAPREDUCE-4329 removes it from the docs but this line 
remains to not break the protocol provider.
{code}

Also, is this issue present with MR2 ACLs too Sho? Or let me know if you'd like 
me to investigate that instead. We can do with a supplementary patch (Trunk 
first, branch-1 afterwards).

> security.task.umbilical.protocol.acl should not be configurable
> ---
>
> Key: MAPREDUCE-4329
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4329
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 1.0.3
>Reporter: Sho Shimauchi
>Assignee: Sho Shimauchi
> Attachments: MAPREDUCE-4329.txt
>
>
> On running MapReduce job, username is changed to jobid and the job fails.
> Exception is as follows:
> {code}
> 2012-06-08 19:39:26,555 WARN 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying 
> to get groups for user job_201206081934_0002
> org.apache.hadoop.util.Shell$ExitCodeException: id: job_201206081934_0002: no 
> such user
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
> at org.apache.hadoop.util.Shell.run(Shell.java:182)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
> at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getUnixGroups(ShellBasedUnixGroupsMapping.java:68)
> at 
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping.getGroups(ShellBasedUnixGroupsMapping.java:45)
> at org.apache.hadoop.security.Groups.getGroups(Groups.java:79)
> at 
> org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1026)
> at 
> org.apache.hadoop.security.authorize.AccessControlList.isUserAllowed(AccessControlList.java:141)
> at 
> org.apache.hadoop.security.authorize.ServiceAuthorizationManager.authorize(ServiceAuthorizationManager.java:99)
> at org.apache.hadoop.ipc.Server.authorize(Server.java:1659)
> at 
> org.apache.hadoop.ipc.Server$Connection.authorizeConnection(Server.java:1320)
> at 
> org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1286)
> at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1182)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:680)
> {code}
> This issue can be reproduced by following steps:
> 1. set hadoop.security.authorization = true in core-site.xml
> {code}
>   
> hadoop.security.authorization
> true
>   
> {code}
> 2. set any value except for '*' to security.task.umbilical.protocol.acl in 
> hadoop-policy.xml
> {code}
>   
> security.task.umbilical.protocol.acl
> sho sho
> ACL for TaskUmbilicalProtocol, used by the map and reduce 
> tasks to communicate with the parent tasktracker. 
> The ACL is a comma-separated list of user and group names. The user and 
> group list is separated by a blank. For e.g. "alice,bob users,wheel". 
> A special value of "*" means all users are allowed.
>   
> {code}
> 3. run any mapreduce job.
> h4. Code Analysis
> ./src/mapred/org/apache/hadoop/mapred/Child.java:102-118
> {code}
> UserGroupInformation taskOwner 
>  = 
> UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
> taskOwner.addToken(jt);
>