NameNode would not come out of safe mode as it is still waiting for datanodes 
to report those blocks which it expects. 
I should have added, try to get a full output of fsck
fsck <path> -openforwrite -files -blocks -location.
-openforwrite files should tell you what files where open during the 
checkpoint, you might want to double check that is the case, the files were 
being writting during that moment. May be by looking at the filename you could 
tell if that was part of a job which was running.

For any missing block, you might also want to cross verify on the datanode to 
see if is really missing.

Once you are convinced that those are the only corrupt files which you can live 
with, start datanodes. 
Namenode woudl still not come out of safemode as you have missing blocks, leave 
it for a while, run fsck look around, if everything ok, bring namenode out of 
safemode.
I hope you had started this namenode with old image and empty edits. You do not 
want your latest edits to be replayed, which has your delete transactions.

Thanks,
Lohit



----- Original Message ----
From: Sagar Naik <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, November 14, 2008 12:11:46 PM
Subject: Re: Recovery of files in hadoop 18

Hey Lohit,

Thanks for you help.
I did as per your suggestion. imported from secondary namenode.
we have some corrupted files.

But for some reason, the namenode is still in safe_mode. It has been an hour or 
so.
The fsck report is :

Total size:    6954466496842 B (Total open files size: 543469222 B)
Total dirs:    1159
Total files:   1354155 (Files currently being written: 7673)
Total blocks (validated):      1375725 (avg. block size 5055128 B) (Total open 
file blocks (not validated): 50)
********************************
CORRUPT FILES:        1574
MISSING BLOCKS:       1574
MISSING SIZE:         1165735334 B
CORRUPT BLOCKS:       1574
********************************
Minimally replicated blocks:   1374151 (99.88559 %)
Over-replicated blocks:        0 (0.0 %)
Under-replicated blocks:       26619 (1.9349071 %)
Mis-replicated blocks:         0 (0.0 %)
Default replication factor:    3
Average block replication:     2.977127
Corrupt blocks:                1574
Missing replicas:              26752 (0.65317154 %)


Do you think, I should manually override the safemode and delete all the 
corrupted files and restart

-Sagar


lohit wrote:
> If you have enabled thrash. They should be moved to trash folder before 
> permanently deleting them, restore them back. (hope you have that set 
> fs.trash.interval)
> 
> If not Shut down the cluster.
> Take backup of you dfs.data.dir (both on namenode and secondary namenode).
> 
> Secondary namenode should have last updated image, try to start namenode from 
> that image, dont use the edits from namenode yet. Try do importCheckpoint 
> explained in here 
> https://issues.apache.org/jira/browse/HADOOP-2585?focusedCommentId=12558173#action_12558173.
>  Start only namenode and run fsck -files. it will throw lot of messages 
> saying you are missing blocks but thats fine since you havent started 
> datanodes yet. But if it shows your files, that means they havent been 
> deleted yet. This will give you a view of system of last backup. Start 
> datanode If its up, try running fsck and check consistency of the sytem. you 
> would lose all changes that has happened since the last checkpoint. 
> 
> Hope that helps,
> Lohit
> 
> 
> 
> ----- Original Message ----
> From: Sagar Naik <[EMAIL PROTECTED]>
> To: core-user@hadoop.apache.org
> Sent: Friday, November 14, 2008 10:38:45 AM
> Subject: Recovery of files in hadoop 18
> 
> Hi,
> I accidentally deleted the root folder in our hdfs.
> I have stopped the hdfs
> 
> Is there any way to recover the files from secondary namenode
> 
> Pl help
> 
> 
> -Sagar
>  

Reply via email to