[ 
https://issues.apache.org/jira/browse/HADOOP-4866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656994#action_12656994
 ] 

Brian Bockelman commented on HADOOP-4866:
-----------------------------------------

Hey all,

Below is the exception which is happening on the datanode side (not for the 
exact same block, but I believe it's the same problem).

We have been growing our cluster constantly, meaning that almost every day we 
take out nodes (to install new drives and re-kickstart them) or put in new 
nodes.  This has caused a lot of churn through the decommissioning process and 
the balancer.  We also have continuous external transfer load test which delete 
files within seconds after they transfer successfully.  I'd believe you if you 
claimed we were pushing the boundaries :)

I have a few other patches to apply to the namenode today; I'll try getting to 
the one Nicholas posted and see if that solves it.

Brian

2008-12-16 08:31:46,084 INFO org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 50020, call recoverBlock(blk_5492981093339503298_94099, false, 
[Lorg.apache.hadoop.hdfs.protocol.DatanodeInfo;@6325950d) from 
129.93.239.144:39774: error: org.apache.hadoop.ipc.RemoteException: 
java.io.IOException: Block (=blk_5492981093339503298_94099) not found
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1898)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: Block 
(=blk_5492981093339503298_94099) not found
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.commitBlockSynchronization(FSNamesystem.java:1898)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.commitBlockSynchronization(NameNode.java:410)
        at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)

        at org.apache.hadoop.ipc.Client.call(Client.java:696)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
        at $Proxy4.commitBlockSynchronization(Unknown Source)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.syncBlock(DataNode.java:1461)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1442)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1508)
        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:892)


> NameNode error in commitBlockSynchronization
> --------------------------------------------
>
>                 Key: HADOOP-4866
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4866
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: Brian Bockelman
>         Attachments: 4866_20081215.patch
>
>
> The NameNode continuously has an error in the commitBlockSynchronization.  
> This happens for ~5 blocks at a rate of 5-10Hz.  I have no idea when this 
> started happening because this has been going on for days, well past the 
> start of our current logs.
> This appears to be a new symptom in 0.19.0, but I have no idea what could be 
> causing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to