Ugh, that solved the problem.  Thanks Dhruba!
   
  Thanks,
  C G

Dhruba Borthakur <[EMAIL PROTECTED]> wrote:
  If you look at the log message starting with "STARTUP_MSG: build
=..." you will see that the namenode and good datanode was built by CG
whereas the bad datanodes were compiled by hadoopqa!

thanks,
dhruba

On Fri, May 23, 2008 at 9:01 AM, C G 
wrote:
> 2008-05-23 11:53:25,377 INFO org.apache.hadoop.dfs.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG: host = primary/10.2.13.1
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.16.4-dev
> STARTUP_MSG: build = svn+ssh://[EMAIL 
> PROTECTED]/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4 -r 2182; 
> compiled
> by 'cg' on Mon May 19 17:47:05 EDT 2008
> ************************************************************/
> 2008-05-23 11:53:26,107 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
> Initializing RPC Metrics with hostName=NameNode, port=54310
> 2008-05-23 11:53:26,136 INFO org.apache.hadoop.dfs.NameNode: Namenode up at: 
> overdrive1-node-primary/10.2.13.1:54310
> 2008-05-23 11:53:26,146 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2008-05-23 11:53:26,149 INFO org.apache.hadoop.dfs.NameNodeMetrics: 
> Initializing NameNodeMeterics using context object:org.apache.hadoop.metr
> ics.spi.NullContext
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: fsOwner=cg,cg
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: 
> supergroup=supergroup
> 2008-05-23 11:53:26,463 INFO org.apache.hadoop.fs.FSNamesystem: 
> isPermissionEnabled=true
> 2008-05-23 11:53:36,064 INFO org.apache.hadoop.fs.FSNamesystem: Finished 
> loading FSImage in 9788 msecs
> 2008-05-23 11:53:36,079 INFO org.apache.hadoop.dfs.StateChange: STATE* 
> SafeModeInfo.enter: Safe mode is ON.
> Safe mode will be turned off automatically.
> 2008-05-23 11:53:36,115 INFO org.apache.hadoop.fs.FSNamesystem: Registered 
> FSNamesystemStatusMBean
> 2008-05-23 11:53:36,339 INFO org.mortbay.util.Credential: Checking Resource 
> aliases
> 2008-05-23 11:53:36,410 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started 
> HttpContext[/static,/static]
> 2008-05-23 11:53:36,410 INFO org.mortbay.util.Container: Started 
> HttpContext[/logs,/logs]
> 2008-05-23 11:53:36,752 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:36,925 INFO org.mortbay.util.Container: Started 
> WebApplicationContext[/,/]
> 2008-05-23 11:53:36,926 INFO org.mortbay.http.SocketListener: Started 
> SocketListener on 0.0.0.0:50070
> 2008-05-23 11:53:36,926 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:36,926 INFO org.apache.hadoop.fs.FSNamesystem: Web-server up 
> at: 0.0.0.0:50070
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server 
> Responder: starting
> 2008-05-23 11:53:36,927 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 0 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 1 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 2 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 3 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 5 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 6 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 7 on 54310: starting
> 2008-05-23 11:53:36,939 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 8 on 54310: starting
> 2008-05-23 11:53:36,940 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 9 on 54310: starting
> 2008-05-23 11:53:37,096 INFO org.apache.hadoop.dfs.NameNode: Error report 
> from worker9:50010: Incompatible build versions: na
> menode BV = 2182; datanode BV = 652614
> 2008-05-23 11:53:37,097 INFO org.apache.hadoop.dfs.NameNode: Error report 
> from worker12:50010: Incompatible build versions: n
> amenode BV = 2182; datanode BV = 652614
> [error above repeated for all nodes in system]
> 2008-05-23 11:53:42,082 INFO org.apache.hadoop.dfs.StateChange: BLOCK* 
> NameSystem.registerDatanode: node registration from 10.2.13.1:50010 st
> orage DS-1855907496-10.2.13.1-50010-1198767012191
> 2008-05-23 11:53:42,094 INFO org.apache.hadoop.net.NetworkTopology: Adding a 
> new node: /default-rack/10.2.13.1:50010
>
> Oddly enough, the DataNode associated with the master node is up and running:
>
> 2008-05-23 11:53:25,380 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting DataNode
> STARTUP_MSG: host = primary/10.2.13.1
> STARTUP_MSG: args = []
> STARTUP_MSG: version = 0.16.4-dev
> STARTUP_MSG: build = svn+ssh://[EMAIL 
> PROTECTED]/srv/svn/repositories/svnvmc/overdrive/trunk/hadoop-0.16.4 -r 2182; 
> compiled
> by 'cg' on Mon May 19 17:47:05 EDT 2008
> ************************************************************/
> 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Registered 
> FSDatasetStatusMBean
> 2008-05-23 11:53:40,786 INFO org.apache.hadoop.dfs.DataNode: Opened server at 
> 50010
> 2008-05-23 11:53:40,793 INFO org.apache.hadoop.dfs.DataNode: Balancing 
> bandwith is 1048576 bytes/s
> 2008-05-23 11:53:41,838 INFO org.mortbay.util.Credential: Checking Resource 
> aliases
> 2008-05-23 11:53:41,868 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started 
> HttpContext[/static,/static]
> 2008-05-23 11:53:41,869 INFO org.mortbay.util.Container: Started 
> HttpContext[/logs,/logs]
> 2008-05-23 11:53:42,051 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:42,079 INFO org.mortbay.util.Container: Started 
> WebApplicationContext[/,/]
> 2008-05-23 11:53:42,081 INFO org.mortbay.http.SocketListener: Started 
> SocketListener on 0.0.0.0:50075
> 2008-05-23 11:53:42,081 INFO org.mortbay.util.Container: Started [EMAIL 
> PROTECTED]
> 2008-05-23 11:53:42,101 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=DataNode, sessionId=null
> 2008-05-23 11:53:42,120 INFO org.apache.hadoop.dfs.DataNode: 
> 10.2.13.1:50010In DataNode.run, data = 
> FSDataset{dirpath='/data/HDFS/data/curren
> t'}
> 2008-05-23 11:53:42,121 INFO org.apache.hadoop.dfs.DataNode: using 
> BLOCKREPORT_INTERVAL of 3368704msec Initial delay: 60000msec
> 2008-05-23 11:53:46,169 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 
> 66383 blocks got processed in 3027 msecs
> 2008-05-23 11:54:47,033 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_672882539393226281
> 2008-05-23 11:54:47,070 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_4933623861101284298
> 2008-05-23 11:54:51,834 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-4096375515223627412
> 2008-05-23 11:54:52,834 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-4329313062145243554
> 2008-05-23 11:54:52,869 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_5951529374648563965
> 2008-05-23 11:54:53,033 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_5526809302368511891
> 2008-05-23 11:55:07,101 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_3384706504442100270
> 2008-05-23 11:56:23,966 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-8100668927196678325
> 2008-05-23 11:56:24,165 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-9045089577001067802
> 2008-05-23 11:56:53,365 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-5156742068519955681
> 2008-05-23 11:56:53,375 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_8099933609289941991
> 2008-05-23 11:56:57,164 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-519952963742834206
> 2008-05-23 11:56:57,565 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-7514486773323267604
> 2008-05-23 11:56:59,366 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_5706035426017364787
> 2008-05-23 11:56:59,398 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-8163260915256505245
> 2008-05-23 11:57:08,455 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-3800057016159468929
> 2008-05-23 11:57:27,159 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-1945776220462007170
> 2008-05-23 11:57:41,058 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_1059797111434771921
> 2008-05-23 11:57:50,107 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_335910613100888045
> 2008-05-23 11:58:04,999 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-758702836140613218
> 2008-05-23 11:58:17,060 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_5680261036802662113
> 2008-05-23 11:58:31,128 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_6577967380328271133
> 2008-05-23 11:58:45,185 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-7268945479231310134
> 2008-05-23 11:58:59,450 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_5582966652198891861
> 2008-05-23 11:59:14,499 INFO org.apache.hadoop.dfs.DataBlockScanner: 
> Verification succeeded for blk_-8204668722708860846
>
> Raghu Angadi wrote:
>
> Can you attach initialization part of NameNode?
>
> thanks,
> Raghu.
>
> C G wrote:
>> We've recently upgraded from 0.15.0 to 0.16.4. Two nights ago we had a 
>> problem where DFS nodes could not communicate. After not finding anything 
>> obviously wrong we decided to shut down DFS and restart. Following restart I 
>> was seeing a corrupted system with significant amounts of missing data. 
>> Further checking showed that DataNodes on all slaves did not start due to 
>> what looks like a version skew issue.
>>
>> Our distribution is a straight 0.16.4 dist, so I'm having difficulty 
>> understanding what's causing this issue.
>>
>> Note that we haven't finalized the upgrade yet.
>>
>> Any help understanding this problem would be very much appreciated. We have 
>> several TB of data in our system and reloading from scratch would be a big 
>> problem.
>>
>> Here is the log from one of the failed nodes:
>>
>> /************************************************************
>> STARTUP_MSG: Starting DataNode
>> STARTUP_MSG: host = worker9/10.2.0.9
>> STARTUP_MSG: args = []
>> STARTUP_MSG: version = 0.16.4
>> STARTUP_MSG: build = 
>> http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 652614; 
>> compiled by 'hadoopqa' on Fri May 2 00:18:12 UTC 2008
>> ************************************************************/
>> 2008-05-23 08:10:47,196 FATAL org.apache.hadoop.dfs.DataNode: Incompatible 
>> build versions: namenode BV = 2182; datanode BV = 652614
>> 2008-05-23 08:10:47,202 ERROR org.apache.hadoop.dfs.DataNode: 
>> java.io.IOException: Incompatible build versions: namenode BV = 2182; 
>> datanode BV = 652614
>> at org.apache.hadoop.dfs.DataNode.handshake(DataNode.java:342)
>> at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:213)
>> at org.apache.hadoop.dfs.DataNode.(DataNode.java:162)
>> at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:2512)
>> at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2456)
>> at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:2477)
>> at org.apache.hadoop.dfs.DataNode.main(DataNode.java:2673)
>> 2008-05-23 08:10:47,203 INFO org.apache.hadoop.dfs.DataNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down DataNode at worker9/10.2.0.9
>> ************************************************************/
>>
>>
>
>
>
>
>


       
       

Reply via email to