Ugh, that solved the problem. Thanks Dhruba! Thanks, C G Dhruba Borthakur <[EMAIL PROTECTED]> wrote: If you look at the log message starting with "STARTUP_MSG: build =..." you will see that the namenode and good datanode was built by CG whereas the bad datanodes were compiled by hadoopqa!
thanks, dhruba On Fri, May 23, 2008 at 9:01 AM, C G wrote: > 2008-05-23 11:53:25,377 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting NameNode > STARTUP_MSG: host = primary/10.2.13.1 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.16.4-dev > STARTUP_MSG: build = svn+ssh://[EMAIL > PROTECTED]/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4 -r 2182; > compiled > by 'cg' on Mon May 19 17:47:05 EDT 2008 > ************************************************************/ > 2008-05-23 11:53:26,107 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: > Initializing RPC Metrics with hostName=NameNode, port=54310 > 2008-05-23 11:53:26,136 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: > overdrive1-node-primary/10.2.13.1:54310 > 2008-05-23 11:53:26,146 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=NameNode, sessionId=null > 2008-05-23 11:53:26,149 INFO org.apache.hadoop.dfs.NameNodeMetrics: > Initializing NameNodeMeterics using context object:org.apache.hadoop.metr > ics.spi.NullContext > 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=cg,cg > 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: > supergroup=supergroup > 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: > isPermissionEnabled=true > 2008-05-23 11:53:36,064 INFO org.apache.hadoop.fs.FSNamesystem: Finished > loading FSImage in 9788 msecs > 2008-05-23 11:53:36,079 INFO org.apache.hadoop.dfs.StateChange: STATE* > SafeModeInfo.enter: Safe mode is ON. > Safe mode will be turned off automatically. > 2008-05-23 11:53:36,115 INFO org.apache.hadoop.fs.FSNamesystem: Registered > FSNamesystemStatusMBean > 2008-05-23 11:53:36,339 INFO org.mortbay.util.Credential: Checking Resource > aliases > 2008-05-23 11:53:36,410 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 > 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started > HttpContext[/static,/static] > 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started > HttpContext[/logs,/logs] > 2008-05-23 11:53:36,752 INFO org.mortbay.util.Container: Started [EMAIL > PROTECTED] > 2008-05-23 11:53:36,925 INFO org.mortbay.util.Container: Started > WebApplicationContext[/,/] > 2008-05-23 11:53:36,926 INFO org.mortbay.http.SocketListener: Started > SocketListener on 0.0.0.0:50070 > 2008-05-23 11:53:36,926 INFO org.mortbay.util.Container: Started [EMAIL > PROTECTED] > 2008-05-23 11:53:36,926 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up > at: 0.0.0.0:50070 > 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server > Responder: starting > 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 0 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 1 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 2 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 3 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 4 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 5 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 6 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 7 on 54310: starting > 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 8 on 54310: starting > 2008-05-23 11:53:36,940 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 9 on 54310: starting > 2008-05-23 11:53:37,096 INFO org.apache.hadoop.dfs.NameNode: Error report > from worker9:50010: Incompatible build versions: na > menode BV = 2182; datanode BV = 652614 > 2008-05-23 11:53:37,097 INFO org.apache.hadoop.dfs.NameNode: Error report > from worker12:50010: Incompatible build versions: n > amenode BV = 2182; datanode BV = 652614 > [error above repeated for all nodes in system] > 2008-05-23 11:53:42,082 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.registerDatanode: node registration from 10.2.13.1:50010 st > orage DS-1855907496-10.2.13.1-50010-1198767012191 > 2008-05-23 11:53:42,094 INFO org.apache.hadoop.net.NetworkTopology: Adding a > new node: /default-rack/10.2.13.1:50010 > > Oddly enough, the DataNode associated with the master node is up and running: > > 2008-05-23 11:53:25,380 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: > /************************************************************ > STARTUP_MSG: Starting DataNode > STARTUP_MSG: host = primary/10.2.13.1 > STARTUP_MSG: args = [] > STARTUP_MSG: version = 0.16.4-dev > STARTUP_MSG: build = svn+ssh://[EMAIL > PROTECTED]/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4 -r 2182; > compiled > by 'cg' on Mon May 19 17:47:05 EDT 2008 > ************************************************************/ > 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Registered > FSDatasetStatusMBean > 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Opened server at > 50010 > 2008-05-23 11:53:40,793 INFO org.apache.hadoop.dfs.DataNode: Balancing > bandwith is 1048576 bytes/s > 2008-05-23 11:53:41,838 INFO org.mortbay.util.Credential: Checking Resource > aliases > 2008-05-23 11:53:41,868 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 > 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started > HttpContext[/static,/static] > 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started > HttpContext[/logs,/logs] > 2008-05-23 11:53:42,051 INFO org.mortbay.util.Container: Started [EMAIL > PROTECTED] > 2008-05-23 11:53:42,079 INFO org.mortbay.util.Container: Started > WebApplicationContext[/,/] > 2008-05-23 11:53:42,081 INFO org.mortbay.http.SocketListener: Started > SocketListener on 0.0.0.0:50075 > 2008-05-23 11:53:42,081 INFO org.mortbay.util.Container: Started [EMAIL > PROTECTED] > 2008-05-23 11:53:42,101 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=DataNode, sessionId=null > 2008-05-23 11:53:42,120 INFO org.apache.hadoop.dfs.DataNode: > 10.2.13.1:50010In DataNode.run, data = > FSDataset{dirpath='/data/HDFS/data/curren > t'} > 2008-05-23 11:53:42,121 INFO org.apache.hadoop.dfs.DataNode: using > BLOCKREPORT_INTERVAL of 3368704msec Initial delay: 60000msec > 2008-05-23 11:53:46,169 INFO org.apache.hadoop.dfs.DataNode: BlockReport of > 66383 blocks got processed in 3027 msecs > 2008-05-23 11:54:47,033 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_672882539393226281 > 2008-05-23 11:54:47,070 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_4933623861101284298 > 2008-05-23 11:54:51,834 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-4096375515223627412 > 2008-05-23 11:54:52,834 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-4329313062145243554 > 2008-05-23 11:54:52,869 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_5951529374648563965 > 2008-05-23 11:54:53,033 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_5526809302368511891 > 2008-05-23 11:55:07,101 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_3384706504442100270 > 2008-05-23 11:56:23,966 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-8100668927196678325 > 2008-05-23 11:56:24,165 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-9045089577001067802 > 2008-05-23 11:56:53,365 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-5156742068519955681 > 2008-05-23 11:56:53,375 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_8099933609289941991 > 2008-05-23 11:56:57,164 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-519952963742834206 > 2008-05-23 11:56:57,565 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-7514486773323267604 > 2008-05-23 11:56:59,366 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_5706035426017364787 > 2008-05-23 11:56:59,398 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-8163260915256505245 > 2008-05-23 11:57:08,455 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-3800057016159468929 > 2008-05-23 11:57:27,159 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-1945776220462007170 > 2008-05-23 11:57:41,058 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_1059797111434771921 > 2008-05-23 11:57:50,107 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_335910613100888045 > 2008-05-23 11:58:04,999 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-758702836140613218 > 2008-05-23 11:58:17,060 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_5680261036802662113 > 2008-05-23 11:58:31,128 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_6577967380328271133 > 2008-05-23 11:58:45,185 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-7268945479231310134 > 2008-05-23 11:58:59,450 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_5582966652198891861 > 2008-05-23 11:59:14,499 INFO org.apache.hadoop.dfs.DataBlockScanner: > Verification succeeded for blk_-8204668722708860846 > > Raghu Angadi wrote: > > Can you attach initialization part of NameNode? > > thanks, > Raghu. > > C G wrote: >> We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a >> problem where DFS nodes could not communicate. After not finding anything >> obviously wrong we decided to shut down DFS and restart. Following restart I >> was seeing a corrupted system with significant amounts of missing data. >> Further checking showed that DataNodes on all slaves did not start due to >> what looks like a version skew issue. >> >> Our distribution is a straight 0.16.4 dist, so I'm having difficulty >> understanding what's causing this issue. >> >> Note that we haven't finalized the upgrade yet. >> >> Any help understanding this problem would be very much appreciated. We have >> several TB of data in our system and reloading from scratch would be a big >> problem. >> >> Here is the log from one of the failed nodes: >> >> /************************************************************ >> STARTUP_MSG: Starting DataNode >> STARTUP_MSG: host = worker9/10.2.0.9 >> STARTUP_MSG: args = [] >> STARTUP_MSG: version = 0.16.4 >> STARTUP_MSG: build = >> http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; >> compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008 >> ************************************************************/ >> 2008-05-23 08:10:47,196 FATAL org.apache.hadoop.dfs.DataNode: Incompatible >> build versions: namenode BV = 2182; datanode BV = 652614 >> 2008-05-23 08:10:47,202 ERROR org.apache.hadoop.dfs.DataNode: >> java.io.IOException: Incompatible build versions: namenode BV = 2182; >> datanode BV = 652614 >> at org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:342) >> at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:213) >> at org.apache.hadoop.dfs.DataNode.(DataNode.java:162) >> at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512) >> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456) >> at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477) >> at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673) >> 2008-05-23 08:10:47,203 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG: >> /************************************************************ >> SHUTDOWN_MSG: Shutting down DataNode at worker9/10.2.0.9 >> ************************************************************/ >> >> > > > > >