Re: Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode

2010-08-06 Thread He Chen
Way#3

1) bring up all 8 dn and the nn
2) retire one of your 4 nodes:
   kill the datanode process
   hadoop dfsadmin -refreshNodes  (this should be done on nn)
3) do 2) extra three times

On Fri, Aug 6, 2010 at 1:21 AM, Allen Wittenauer
awittena...@linkedin.comwrote:


 On Aug 5, 2010, at 10:42 PM, Steve Kuo wrote:

  As part of our experimentation, the plan is to pull 4 slave nodes out of
 a
  8-slave/1-master cluster. With replication factor set to 3, I thought
  losing half of the cluster may be too much for hdfs to recover.  Thus I
  copied out all relevant data from hdfs to local disk and reconfigure the
  cluster.

 It depends.  If you have configured Hadoop to have a topology such that the
 8 nodes were in 2 logical racks, then it would have worked just fine.  If
 you didn't have any topology configured, then each node is considered its
 own rack.  So pulling half of the grid down means you are likely losing a
 good chunk of all your blocks.




 
  The 4 slave nodes started okay but hdfs never left safe mode.  The nn.log
  has the following line.  What is the best way to deal with this?  Shall I
  restart the cluster with 8-node and then delete
  /data/hadoop-hadoop/mapred/system?  Or shall I reformat hdfs?

 Two ways to go:

 Way #1:

 1) configure dfs.hosts
 2) bring up all 8 nodes
 3) configure dfs.hosts.exclude to include the 4 you don't want
 4) dfsadmin -refreshNodes to start decommissioning the 4 you don't want

 Way #2:

 1) configure a topology
 2) bring up all 8 nodes
 3) setrep all files +1
 4) wait for nn to finish replication
 5) pull 4 nodes
 6) bring down nn
 7) remove topology
 8) bring nn up
 9) setrep -1






-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Research Assistant of Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588


Re: Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode

2010-08-06 Thread Allen Wittenauer

On Aug 6, 2010, at 8:35 AM, He Chen wrote:

 Way#3
 
 1) bring up all 8 dn and the nn
 2) retire one of your 4 nodes:
   kill the datanode process
   hadoop dfsadmin -refreshNodes  (this should be done on nn)

No need to refresh nodes.  It only re-reads the dfs.hosts.* files.


 3) do 2) extra three times

Depending upon what the bandwidth param is, this should theoretically take a 
significantly longer time.  Since you need for the grid to get back to healthy 
before each kill.

Best way to reduce a 8-node cluster in half and get hdfs to come out of safe mode

2010-08-05 Thread Steve Kuo
As part of our experimentation, the plan is to pull 4 slave nodes out of a
8-slave/1-master cluster.  With replication factor set to 3, I thought
losing half of the cluster may be too much for hdfs to recover.  Thus I
copied out all relevant data from hdfs to local disk and reconfigure the
cluster.

The 4 slave nodes started okay but hdfs never left safe mode.  The nn.log
has the following line.  What is the best way to deal with this?  Shall I
restart the cluster with 8-node and then delete
/data/hadoop-hadoop/mapred/system?  Or shall I reformat hdfs?

Thanks.

2010-08-05 22:28:12,921 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
ugi=hadoop,hadoop   ip=/10.128.135.100  cmd=listStatus
src=/data/hadoop-hadoop/mapred/system   dst=nullperm=null
2010-08-05 22:28:12,923 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 9000, call delete(/data/hadoop-hadoop/mapred/system, true) from
10.128.135.100:52368: error:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
/data/hadoop-hadoop/mapred/system. Name node is in safe mode.
The reported blocks 64 needs additional 3 blocks to reach the threshold
0.9990 of total blocks 68. Safe mode will be turned off automatically.
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
/data/hadoop-hadoop/mapred/system. Name node is in safe mode.
The reported blocks 64 needs additional 3 blocks to reach the threshold
0.9990 of total blocks 68. Safe mode will be turned off automatically.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1741)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1721)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:565)
at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:968)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:962)