Re: Hadoop Namenode problem

Joey Echeverria Wed, 03 Aug 2011 12:38:06 -0700

Yes, it should print something along the lines of:

The reported blocks 11 has reached the threshold 0.9990 of total
blocks 11. Safe mode will be turned off automatically in 8 seconds.


-Joey

On Fri, Jul 29, 2011 at 12:26 AM, Rahul Das <rahul.h...@gmail.com> wrote:
> No there was no error only following things happens.
>
> 2011-07-21 14:14:30,039 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
> ugi=hadoop,hadoop       ip=/xx.xx.xx.xx  cmd=create
> src=/user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f
> dst=null        perm=hadoop:supergroup:rw-r--r--
> 2011-07-21 14:14:30,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f .
> blk_-3217626427379030207_15834365
> 2011-07-21 14:14:30,120 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f is closed by
> DFSClient_1277823200
>
> Is there any way I can find out from the log when the safe mode gets over.
>
> Regards,
> Rahul
>
> On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria <j...@cloudera.com> wrote:
>>
>> Nothing from around 1630?
>> -Joey
>>
>>
>>
>> On Jul 28, 2011, at 5:06, Rahul Das <rahul.h...@gmail.com> wrote:
>>
>> Hi Joey,
>>
>> The log is too big to attach into mail. What I found that there is no
>> error during this time.
>> Only few Warnings are coming like
>>
>> 2011-07-21 14:13:47,814 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> PendingReplicationMonitor timed out block blk_-6058282241824946206_13375223
>> ...
>> ...
>> 2011-07-21 14:30:49,511 WARN
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Inconsistent size for
>> block blk_8615896953045629213_15838442 reported from xx.xx.xx.xx:50010
>> current size is 1950720 reported size is 2448907
>>
>> I think the edit file size was too huge thats why it took long time.
>>
>> Regards,
>> Rahul
>>
>> On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria <j...@cloudera.com>
>> wrote:
>>>
>>> The long startup time after the restart looks like it was caused because
>>> the SecondaryNameNode hasn't been able to roll the edits log for some time.
>>> Can you post your Namenode log from around the same time in this
>>> SecondaryNameNode log (2011-07-21 16:00-16:30)?
>>> -Joey
>>>
>>> On Fri, Jul 22, 2011 at 8:29 AM, Rahul Das <rahul.h...@gmail.com> wrote:
>>>>
>>>> Yes I have a secondary Namenode running. Here are the log for
>>>> SecondaryNamenode
>>>>
>>>> 2011-07-21 16:02:47,908 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>> /home/hadoop/tmp/dfs/namesecondary/current/edits of size 12751835 edits #
>>>> 138217 loaded in 1581 seconds.
>>>> 2011-07-21 16:03:21,925 INFO
>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2045516451
>>>> saved in 29 seconds.
>>>> 2011-07-21 16:03:24,974 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of 
>>>> transactions:
>>>> 0 Total time for transactions(ms): 0Number of transactions batched in 
>>>> Syncs:
>>>> 0 Number of syncs: 0 SyncTimes(ms): 0
>>>> 2011-07-21 16:03:25,545 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
>>>> xx.xx.xx.xx:50070putimage=1&port=50090&machine=xx.xx.xx.xx&token=-18:1554828842:0:1311242583000:1311240481442
>>>> 2011-07-21 16:29:24,356 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
>>>> doCheckpoint:
>>>> 2011-07-21 16:29:24,358 ERROR
>>>> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
>>>> java.io.IOException: Call to xx.xx.xx.xx:9000 failed on local exception:
>>>> java.io.IOException: Connection reset by peer
>>>>
>>>> Regards,
>>>> Rahul
>>>>
>>>> On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria <j...@cloudera.com>
>>>> wrote:
>>>>>
>>>>> Do you have an instance of the SecondaryNamenode in your cluster?
>>>>> -Joey
>>>>>
>>>>> On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das <rahul.h...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am running a Hadoop cluster with 20 Data node. Yesterday I found
>>>>>> that the Namenode was not responding ( No write/read to HDFS is 
>>>>>> happening).
>>>>>> It got stuck for few hours, then I shut down the Namenode and found the
>>>>>> following error from the Name node log.
>>>>>>
>>>>>> 2011-07-21 16:15:31,500 WARN org.apache.hadoop.ipc.Server: IPC Server
>>>>>> Responder, call
>>>>>> getProtocolVersion(org.apache.hadoop.hdfs.protocol.ClientProtocol, 41) 
>>>>>> from
>>>>>> xx.xx.xx.xx:13568: output error
>>>>>>
>>>>>> This error was coming for every data node and data nodes are not able
>>>>>> to communicate with the Name node
>>>>>>
>>>>>> After I restart the Namenode
>>>>>>
>>>>>> 2011-07-21 16:31:54,110 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
>>>>>> 2011-07-21 16:31:54,216 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>>>>>> Initializing RPC Metrics with hostName=NameNode, port=9000
>>>>>> 2011-07-21 16:31:54,223 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>>>>>> xx.xx.xx.xx:9000
>>>>>> 2011-07-21 16:31:54,225 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>>>>>> Initializing JVM Metrics with processName=NameNode, sessionId=null
>>>>>> 2011-07-21 16:31:54,226 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: 
>>>>>> Initializing
>>>>>> NameNodeMeterics using context
>>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
>>>>>> fsOwner=hadoop,hadoop
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
>>>>>> supergroup=supergroup
>>>>>> 2011-07-21 16:31:54,280 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>>>>>> isPermissionEnabled=false
>>>>>> 2011-07-21 16:31:54,287 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>>>>>> Initializing FSNamesystemMetrics using context
>>>>>> object:org.apache.hadoop.metrics.spi.NullContext
>>>>>> 2011-07-21 16:31:54,289 INFO
>>>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>>>>>> FSNamesystemStatusMBean
>>>>>> 2011-07-21 16:31:54,880 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 15817482
>>>>>> 2011-07-21 16:34:38,463 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>>>>>> construction = 82
>>>>>> 2011-07-21 16:34:41,177 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 
>>>>>> 2042701824
>>>>>> loaded in 166 seconds.
>>>>>> 2011-07-21 16:58:07,624 INFO
>>>>>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>>>>>> /home/hadoop/current/edits of size 12751835 edits # 138217 loaded in 1406
>>>>>> seconds.
>>>>>>
>>>>>> And it goes for a long halt. After about an hour it starts working
>>>>>> again.
>>>>>>
>>>>>> My question is when the error "IPC Server Responde" comes and is there
>>>>>> a way to deal with it.
>>>>>> Also if my Namenode is busy doing something then what is the way to
>>>>>> find out what it is doing.
>>>>>>
>>>>>> Regards,
>>>>>> Rahul
>>>>>
>>>>>
>>>>> --
>>>>> Joseph Echeverria
>>>>> Cloudera, Inc.
>>>>> 443.305.9434
>>>>
>>>
>>>
>>>
>>> --
>>> Joseph Echeverria
>>> Cloudera, Inc.
>>> 443.305.9434
>>
>
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: Hadoop Namenode problem

Reply via email to