When is decomissioning done?

2008-12-04 Thread David Hall
Hi,

I'm trying to decommission some nodes. The process I tried to follow is:

1) add them to conf/excluding (hadoop-site points there)
2) invoke hadoop dfsadmin -refreshNodes

This returns immediately, so I thought it was done, so i killed off
the cluster and rebooted without the new nodes, but then fsck was very
unhappy...

Is there some way to watch the progress of decomissioning?

Thanks,
-- David


Re: When is decomissioning done?

2008-12-04 Thread David Hall
I'm starting to think I'm doing things wrong.

I have an absolute path to dfs.hosts.exclude that includes what i want
decommissioned, and a dfs.hosts which includes those i want to remain
commissioned (this points to the slaves file).

Nothing seems to do anything...

What am I missing?

-- David

On Thu, Dec 4, 2008 at 12:48 AM, David Hall [EMAIL PROTECTED] wrote:
 Hi,

 I'm trying to decommission some nodes. The process I tried to follow is:

 1) add them to conf/excluding (hadoop-site points there)
 2) invoke hadoop dfsadmin -refreshNodes

 This returns immediately, so I thought it was done, so i killed off
 the cluster and rebooted without the new nodes, but then fsck was very
 unhappy...

 Is there some way to watch the progress of decomissioning?

 Thanks,
 -- David



Re: When is decomissioning done?

2008-12-04 Thread David Hall
Thanks for the link.

I followed that guide, and now I have rather strange behavior. If I
have dfs.hosts set (I didn't when I wrote my last email) to an empty
file when I start the cluster, nothing happens when I refreshnodes; I
take it that's expected. If it's set it to the hosts I want to keep,
none of the datanodes come up at start up, and die with this error. On
the dfshealth page, they're all listed as dead. If instead it's empty
on startup and then I add the hosts, everyone dies when I
refreshNodes.

Thoughts? I'm running 0.18.2. (We haven't moved to java 6 here yet)

Thanks!
-- David

2008-12-04 01:18:10,909 ERROR org.apache.hadoop.dfs.DataNode:
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.DisallowedDatanodeException: Datanode denied
communication with namenode: HOST:PORT # changed.
at 
org.apache.hadoop.dfs.FSNamesystem.registerDatanode(FSNamesystem.java:1938)
at org.apache.hadoop.dfs.NameNode.register(NameNode.java:585)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)

at org.apache.hadoop.ipc.Client.call(Client.java:715)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
at org.apache.hadoop.dfs.$Proxy4.register(Unknown Source)
at org.apache.hadoop.dfs.DataNode.register(DataNode.java:529)
at org.apache.hadoop.dfs.DataNode.runDatanodeDaemon(DataNode.java:2960)
at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2995)
at org.apache.hadoop.dfs.DataNode.main(DataNode.java:3116)


On Thu, Dec 4, 2008 at 9:12 AM, Konstantin Shvachko [EMAIL PROTECTED] wrote:
 Just for the reference these links:
 http://wiki.apache.org/hadoop/FAQ#17
 http://hadoop.apache.org/core/docs/r0.19.0/hdfs_user_guide.html#DFSAdmin+Command

 Decommissioning is not happening at once.
 -refreshNodes just starts the process, but does not complete it.
 There could be a lot of blocks on the nodes you want to decommission,
 and replication takes time.
 The progress can be monitored on the name-node web UI.
 Right after -refreshNodes on the web ui you will see the nodes you chose for
 decommission have state Decommission In Progress you should wait until it
 is
 changed to Decommissioned and then turn the node off.

 --Konstantin


 David Hall wrote:

 I'm starting to think I'm doing things wrong.

 I have an absolute path to dfs.hosts.exclude that includes what i want
 decommissioned, and a dfs.hosts which includes those i want to remain
 commissioned (this points to the slaves file).

 Nothing seems to do anything...

 What am I missing?

 -- David

 On Thu, Dec 4, 2008 at 12:48 AM, David Hall [EMAIL PROTECTED] wrote:

 Hi,

 I'm trying to decommission some nodes. The process I tried to follow is:

 1) add them to conf/excluding (hadoop-site points there)
 2) invoke hadoop dfsadmin -refreshNodes

 This returns immediately, so I thought it was done, so i killed off
 the cluster and rebooted without the new nodes, but then fsck was very
 unhappy...

 Is there some way to watch the progress of decomissioning?

 Thanks,
 -- David





Killing hadoop streaming jobs

2008-11-28 Thread David Hall
Hi,

So, it seems that when you kill a hadoop streaming job, it doesn't
kill underlying processes, but only stops the job from processing new
input. In the event of a long running input (say, someone not using
streaming as they probably should), this is less than ideal. Is there
any way to quickly kill the job without ssh'ing into the machines
running the task?

Thanks,
David Hall


Re: Adding $CLASSPATH to Map/Reduce tasks

2008-09-26 Thread David Hall
On Fri, Sep 26, 2008 at 7:50 AM, Samuel Guo [EMAIL PROTECTED] wrote:
 maybe you can use
 bin/hadoop jar -libjars ${your-depends-jars} your.mapred.jar args

 see details:
 http://hadoop.apache.org/core/docs/r0.18.1/api/org/apache/hadoop/mapred/JobShell.html

Most of our classes are in non-jars. I suppose it wouldn't be too bad
to tell ant to jar them up, but with the hack, it's easy enough to not
bother.

-- David


 On Thu, Sep 25, 2008 at 12:26 PM, David Hall [EMAIL PROTECTED]wrote:

 On Sun, Sep 21, 2008 at 9:41 PM, David Hall [EMAIL PROTECTED]
 wrote:
  On Sun, Sep 21, 2008 at 9:35 PM, Arun C Murthy [EMAIL PROTECTED]
 wrote:
 
  On Sep 21, 2008, at 2:05 PM, David Hall wrote:
 
  (New to this list)
 
  Hi,
 
  My research group is setting up a small (20-node) cluster. All of
  these machines are linked by NFS. We have a fairly entrenched
  codebase/development cycle, and in particular we'd like to be able to
  access user $CLASSPATHs in the forked jvms run by the Map and Reduce
  tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to
  disallow this by specifying it's own.
 
 
  Using jars on NFS for too many tasks might hurt if you have thousands of
  tasks, causing too much load.
 
  The better solution might be to use the DistributedCache:
 
 http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache
 
  Specifically:
 
 http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29
 
 http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath%28org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration%29
 
  Arun
 
  Good point.. I hadn't thought of that, but at the moment we're dealing
  with barrier-to-adoption rather than efficiency. We'll have to go back
  to PBS if we can't get users (read: picky phd students) on board. I'd
  rather avoid that scenario...
 
  In the meantime, I think I figured out a hack that I'm going to try.

 In case anyone's curious, the hack is to create a jar file with a
 manifest that has the Class-Path field set to all the directories and
 jars you want, and to put that in the lib/ folder of another jar, and
 pass that final jar in as the User Jar to a job.

 Works like a charm. :-)

 -- David

 
  Thanks!
 
  -- David
 
 
  Is there any easy way to trick hadoop into making these visible? If
  not, if I were to submit a patch that would (optionally) add
  $CLASSPATH to the forked jvms' classpath, would it be considered?
 
  Thanks,
  David Hall
 
 
 




Adding $CLASSPATH to Map/Reduce tasks

2008-09-21 Thread David Hall
(New to this list)

Hi,

My research group is setting up a small (20-node) cluster. All of
these machines are linked by NFS. We have a fairly entrenched
codebase/development cycle, and in particular we'd like to be able to
access user $CLASSPATHs in the forked jvms run by the Map and Reduce
tasks. However, TaskRunner.java (http://tinyurl.com/4enkg4) seems to
disallow this by specifying it's own.

Is there any easy way to trick hadoop into making these visible? If
not, if I were to submit a patch that would (optionally) add
$CLASSPATH to the forked jvms' classpath, would it be considered?

Thanks,
David Hall