Our CDH2 production grid just crashed with some sort of master node failure. When I went in there, JobTracker was missing and NameNode was up. Trying to ls on HDFS met with no connection.
We decided to go for a restart. This is in the namenode log right now: 2011-12-17 01:37:35,568 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-12-17 01:37:35,612 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop 2011-12-17 01:37:35,613 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup 2011-12-17 01:37:35,613 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=true 2011-12-17 01:37:35,620 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext 2011-12-17 01:37:35,621 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean 2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 16978046 2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 1 2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2589456651 loaded in 348 seconds. 2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage: Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size 3885 edits # 43 loaded in 0 seconds. What's coming up in the startup sequence? We have a ton of data on there. Is there any way to estimate startup time?