Maybe this is a bad sign -- the edits.new was created before the master
node crashed, and is huge:

-bash-3.2$ ls -lh /hadoop/hadoop-metadata/cache/dfs/name/current
total 41G
-rw-r--r-- 1 hadoop hadoop 3.8K Jan 27  2011 edits
-rw-r--r-- 1 hadoop hadoop  39G Dec 17 00:44 edits.new
-rw-r--r-- 1 hadoop hadoop 2.5G Jan 27  2011 fsimage
-rw-r--r-- 1 hadoop hadoop    8 Jan 27  2011 fstime
-rw-r--r-- 1 hadoop hadoop  101 Jan 27  2011 VERSION

could this mean something was up with our SecondaryNameNode and rolling the
edits file?

On Sat, Dec 17, 2011 at 2:53 AM, Meng Mao <meng...@gmail.com> wrote:

> All of the worker nodes datanodes' logs haven't logged anything after the
> initial startup announcement:
> STARTUP_MSG:   host = prod1-worker075/10.2.19.75
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.1+169.56
> STARTUP_MSG:   build =  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3;
> compiled by 'root' on Tue Feb  9 13:40:08 EST 2010
> ************************************************************/
>
> On Sat, Dec 17, 2011 at 2:00 AM, Meng Mao <meng...@gmail.com> wrote:
>
>> Our CDH2 production grid just crashed with some sort of master node
>> failure.
>> When I went in there, JobTracker was missing and NameNode was up.
>> Trying to ls on HDFS met with no connection.
>>
>>  We decided to go for a restart. This is in the namenode log right now:
>>
>> 2011-12-17 01:37:35,568 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>> Initializing NameNodeMeterics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-12-17 01:37:35,612 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>> 2011-12-17 01:37:35,613 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> 2011-12-17 01:37:35,613 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>> isPermissionEnabled=true
>> 2011-12-17 01:37:35,620 INFO
>> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>> Initializing FSNamesystemMetrics using context
>> object:org.apache.hadoop.metrics.spi.NullContext
>> 2011-12-17 01:37:35,621 INFO
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>> FSNamesystemStatusMBean
>> 2011-12-17 01:37:35,648 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 16978046
>> 2011-12-17 01:43:24,023 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>> construction = 1
>> 2011-12-17 01:43:24,025 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 2589456651
>> loaded in 348 seconds.
>> 2011-12-17 01:43:24,030 INFO
>> org.apache.hadoop.hdfs.server.common.Storage: Edits file
>> /hadoop/hadoop-metadata/cache/dfs/name/current/edits of size 3885 edits #
>> 43 loaded in 0 seconds.
>>
>>
>> What's coming up in the startup sequence? We have a ton of data on there.
>> Is there any way to estimate startup time?
>>
>
>

Reply via email to