Our CDH2 production grid just crashed with some sort of master node failure.
When I went in there, JobTracker was missing and NameNode was up.
Trying to ls on HDFS met with no connection.

We decided to go for a restart. This is in the namenode log right now:

2011-12-17 01:37:35,568 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,612 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2011-12-17 01:37:35,613 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2011-12-17 01:37:35,620 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
2011-12-17 01:37:35,621 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2011-12-17 01:37:35,648 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 16978046
2011-12-17 01:43:24,023 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 1
2011-12-17 01:43:24,025 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 2589456651 loaded in 348 seconds.
2011-12-17 01:43:24,030 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size
3885 edits # 43 loaded in 0 seconds.


What's coming up in the startup sequence? We have a ton of data on there.
Is there any way to estimate startup time?

Reply via email to