When is decomissioning done?
Hi, I'm trying to decommission some nodes. The process I tried to follow is: 1) add them to conf/excluding (hadoop-site points there) 2) invoke hadoop dfsadmin -refreshNodes This returns immediately, so I thought it was done, so i killed off the cluster and rebooted without the new nodes, but then fsck was very unhappy... Is there some way to watch the progress of decomissioning? Thanks, -- David
Re: When is decomissioning done?
I'm starting to think I'm doing things wrong. I have an absolute path to dfs.hosts.exclude that includes what i want decommissioned, and a dfs.hosts which includes those i want to remain commissioned (this points to the slaves file). Nothing seems to do anything... What am I missing? -- David On Thu, Dec 4, 2008 at 12:48 AM, David Hall [EMAIL PROTECTED] wrote: Hi, I'm trying to decommission some nodes. The process I tried to follow is: 1) add them to conf/excluding (hadoop-site points there) 2) invoke hadoop dfsadmin -refreshNodes This returns immediately, so I thought it was done, so i killed off the cluster and rebooted without the new nodes, but then fsck was very unhappy... Is there some way to watch the progress of decomissioning? Thanks, -- David
Re: When is decomissioning done?
Thanks for the link. I followed that guide, and now I have rather strange behavior. If I have dfs.hosts set (I didn't when I wrote my last email) to an empty file when I start the cluster, nothing happens when I refreshnodes; I take it that's expected. If it's set it to the hosts I want to keep, none of the datanodes come up at start up, and die with this error. On the dfshealth page, they're all listed as dead. If instead it's empty on startup and then I add the hosts, everyone dies when I refreshNodes. Thoughts? I'm running 0.18.2. (We haven't moved to java 6 here yet) Thanks! -- David 2008-12-04 01:18:10,909 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.dfs.DisallowedDatanodeException: Datanode denied communication with namenode: HOST:PORT # changed. at org.apache.hadoop.dfs.FSNamesystem.registerDatanode(FSNamesystem.java:1938) at org.apache.hadoop.dfs.NameNode.register(NameNode.java:585) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888) at org.apache.hadoop.ipc.Client.call(Client.java:715) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at org.apache.hadoop.dfs.$Proxy4.register(Unknown Source) at org.apache.hadoop.dfs.DataNode.register(DataNode.java:529) at org.apache.hadoop.dfs.DataNode.runDatanodeDaemon(DataNode.java:2960) at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2995) at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3116) On Thu, Dec 4, 2008 at 9:12 AM, Konstantin Shvachko [EMAIL PROTECTED] wrote: Just for the reference these links: http://wiki.apache.org/hadoop/FAQ#17 http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command Decommissioning is not happening at once. -refreshNodes just starts the process, but does not complete it. There could be a lot of blocks on the nodes you want to decommission, and replication takes time. The progress can be monitored on the name-node web UI. Right after -refreshNodes on the web ui you will see the nodes you chose for decommission have state Decommission In Progress you should wait until it is changed to Decommissioned and then turn the node off. --Konstantin David Hall wrote: I'm starting to think I'm doing things wrong. I have an absolute path to dfs.hosts.exclude that includes what i want decommissioned, and a dfs.hosts which includes those i want to remain commissioned (this points to the slaves file). Nothing seems to do anything... What am I missing? -- David On Thu, Dec 4, 2008 at 12:48 AM, David Hall [EMAIL PROTECTED] wrote: Hi, I'm trying to decommission some nodes. The process I tried to follow is: 1) add them to conf/excluding (hadoop-site points there) 2) invoke hadoop dfsadmin -refreshNodes This returns immediately, so I thought it was done, so i killed off the cluster and rebooted without the new nodes, but then fsck was very unhappy... Is there some way to watch the progress of decomissioning? Thanks, -- David
Killing hadoop streaming jobs
Hi, So, it seems that when you kill a hadoop streaming job, it doesn't kill underlying processes, but only stops the job from processing new input. In the event of a long running input (say, someone not using streaming as they probably should), this is less than ideal. Is there any way to quickly kill the job without ssh'ing into the machines running the task? Thanks, David Hall
Re: Adding $CLASSPATH to Map/Reduce tasks
On Fri, Sep 26, 2008 at 7:50 AM, Samuel Guo [EMAIL PROTECTED] wrote: maybe you can use bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args see details: http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html Most of our classes are in non-jars. I suppose it wouldn't be too bad to tell ant to jar them up, but with the hack, it's easy enough to not bother. -- David On Thu, Sep 25, 2008 at 12:26 PM, David Hall [EMAIL PROTECTED]wrote: On Sun, Sep 21, 2008 at 9:41 PM, David Hall [EMAIL PROTECTED] wrote: On Sun, Sep 21, 2008 at 9:35 PM, Arun C Murthy [EMAIL PROTECTED] wrote: On Sep 21, 2008, at 2:05 PM, David Hall wrote: (New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Using jars on NFS for too many tasks might hurt if you have thousands of tasks, causing too much load. The better solution might be to use the DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache Specifically: http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29 Arun Good point.. I hadn't thought of that, but at the moment we're dealing with barrier-to-adoption rather than efficiency. We'll have to go back to PBS if we can't get users (read: picky phd students) on board. I'd rather avoid that scenario... In the meantime, I think I figured out a hack that I'm going to try. In case anyone's curious, the hack is to create a jar file with a manifest that has the Class-Path field set to all the directories and jars you want, and to put that in the lib/ folder of another jar, and pass that final jar in as the User Jar to a job. Works like a charm. :-) -- David Thanks! -- David Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall
Adding $CLASSPATH to Map/Reduce tasks
(New to this list) Hi, My research group is setting up a small (20-node) cluster. All of these machines are linked by NFS. We have a fairly entrenched codebase/development cycle, and in particular we'd like to be able to access user $CLASSPATHs in the forked jvms run by the Map and Reduce tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to disallow this by specifying it's own. Is there any easy way to trick hadoop into making these visible? If not, if I were to submit a patch that would (optionally) add $CLASSPATH to the forked jvms' classpath, would it be considered? Thanks, David Hall