Yes, it should print something along the lines of: The reported blocks 11 has reached the threshold 0.9990 of total blocks 11. Safe mode will be turned off automatically in 8 seconds.
-Joey On Fri, Jul 29, 2011 at 12:26 AM, Rahul Das <rahul.h...@gmail.com> wrote: > No there was no error only following things happens. > > 2011-07-21 14:14:30,039 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit: > ugi=hadoop,hadoop ip=/xx.xx.xx.xx cmd=create > src=/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f > dst=null perm=hadoop:supergroup:rw-r--r-- > 2011-07-21 14:14:30,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* > NameSystem.allocateBlock: > /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f . > blk_-3217626427379030207_15834365 > 2011-07-21 14:14:30,120 INFO org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.completeFile: file > /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f is closed by > DFSClient_1277823200 > > Is there any way I can find out from the log when the safe mode gets over. > > Regards, > Rahul > > On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria <j...@cloudera.com> wrote: >> >> Nothing from around 1630? >> -Joey >> >> >> >> On Jul 28, 2011, at 5:06, Rahul Das <rahul.h...@gmail.com> wrote: >> >> Hi Joey, >> >> The log is too big to attach into mail. What I found that there is no >> error during this time. >> Only few Warnings are coming like >> >> 2011-07-21 14:13:47,814 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >> PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223 >> ... >> ... >> 2011-07-21 14:30:49,511 WARN >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for >> block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010 >> current size is 1950720 reported size is 2448907 >> >> I think the edit file size was too huge thats why it took long time. >> >> Regards, >> Rahul >> >> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <j...@cloudera.com> >> wrote: >>> >>> The long startup time after the restart looks like it was caused because >>> the SecondaryNameNode hasn't been able to roll the edits log for some time. >>> Can you post your Namenode log from around the same time in this >>> SecondaryNameNode log (2011-07-21 16:00-16:30)? >>> -Joey >>> >>> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <rahul.h...@gmail.com> wrote: >>>> >>>> Yes I have a secondary Namenode running. Here are the log for >>>> SecondaryNamenode >>>> >>>> 2011-07-21 16:02:47,908 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>>> /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits # >>>> 138217 loaded in 1581 seconds. >>>> 2011-07-21 16:03:21,925 INFO >>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2045516451 >>>> saved in 29 seconds. >>>> 2011-07-21 16:03:24,974 INFO >>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of >>>> transactions: >>>> 0 Total time for transactions(ms): 0Number of transactions batched in >>>> Syncs: >>>> 0 Number of syncs: 0 SyncTimes(ms): 0 >>>> 2011-07-21 16:03:25,545 INFO >>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL >>>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442 >>>> 2011-07-21 16:29:24,356 ERROR >>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in >>>> doCheckpoint: >>>> 2011-07-21 16:29:24,358 ERROR >>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: >>>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception: >>>> java.io.IOException: Connection reset by peer >>>> >>>> Regards, >>>> Rahul >>>> >>>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <j...@cloudera.com> >>>> wrote: >>>>> >>>>> Do you have an instance of the SecondaryNamenode in your cluster? >>>>> -Joey >>>>> >>>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found >>>>>> that the Namenode was not responding ( No write/read to HDFS is >>>>>> happening). >>>>>> It got stuck for few hours, then I shut down the Namenode and found the >>>>>> following error from the Name node log. >>>>>> >>>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server >>>>>> Responder, call >>>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) >>>>>> from >>>>>> xx.xx.xx.xx:13568: output error >>>>>> >>>>>> This error was coming for every data node and data nodes are not able >>>>>> to communicate with the Name node >>>>>> >>>>>> After I restart the Namenode >>>>>> >>>>>> 2011-07-21 16:31:54,110 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: >>>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: >>>>>> Initializing RPC Metrics with hostName=NameNode, port=9000 >>>>>> 2011-07-21 16:31:54,223 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: >>>>>> xx.xx.xx.xx:9000 >>>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: >>>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null >>>>>> 2011-07-21 16:31:54,226 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: >>>>>> Initializing >>>>>> NameNodeMeterics using context >>>>>> object:org.apache.hadoop.metrics.spi.NullContext >>>>>> 2011-07-21 16:31:54,280 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>>>>> fsOwner=hadoop,hadoop >>>>>> 2011-07-21 16:31:54,280 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>>>>> supergroup=supergroup >>>>>> 2011-07-21 16:31:54,280 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: >>>>>> isPermissionEnabled=false >>>>>> 2011-07-21 16:31:54,287 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: >>>>>> Initializing FSNamesystemMetrics using context >>>>>> object:org.apache.hadoop.metrics.spi.NullContext >>>>>> 2011-07-21 16:31:54,289 INFO >>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered >>>>>> FSNamesystemStatusMBean >>>>>> 2011-07-21 16:31:54,880 INFO >>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482 >>>>>> 2011-07-21 16:34:38,463 INFO >>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under >>>>>> construction = 82 >>>>>> 2011-07-21 16:34:41,177 INFO >>>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size >>>>>> 2042701824 >>>>>> loaded in 166 seconds. >>>>>> 2011-07-21 16:58:07,624 INFO >>>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file >>>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406 >>>>>> seconds. >>>>>> >>>>>> And it goes for a long halt. After about an hour it starts working >>>>>> again. >>>>>> >>>>>> My question is when the error "IPC Server Responde" comes and is there >>>>>> a way to deal with it. >>>>>> Also if my Namenode is busy doing something then what is the way to >>>>>> find out what it is doing. >>>>>> >>>>>> Regards, >>>>>> Rahul >>>>> >>>>> >>>>> -- >>>>> Joseph Echeverria >>>>> Cloudera, Inc. >>>>> 443.305.9434 >>>> >>> >>> >>> >>> -- >>> Joseph Echeverria >>> Cloudera, Inc. >>> 443.305.9434 >> > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434