Corrupt HDFS and salvaging data

Otis Gospodnetic Thu, 08 May 2008 20:35:39 -0700

Hi,

I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm trying 
not to lose the precious data in it.  I accidentally run bin/hadoop namenode 
-format on a *new DN* that I just added to the cluster.  Is it possible for 
that to corrupt HDFS?  I also had to explicitly kill DN daemons before that, 
because bin/stop-all.sh didn't stop them for some reason (it always did so 
before).


Is there any way to salvage the data?  I have a 4-node cluster with replication 
factor of 3, though fsck reports lots of under-replicated blocks:

  ********************************
  CORRUPT FILES:        3355
  MISSING BLOCKS:       3462
  MISSING SIZE:         17708821225 B
  ********************************
 Minimally replicated blocks:   28802 (89.269775 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       17025 (52.76779 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     1.7750744
 Missing replicas:              17025 (29.727087 %)
 Number of data-nodes:          4
 Number of racks:               1


The filesystem under path '/' is CORRUPT


What can one do at this point to save the data?  If I run bin/hadoop fsck -move 
or -delete will I lose some of the data?  Or will I simply end up with fewer 
block replicas and will thus have to force re-balancing in order to get back to 
a "safe" number of replicas?

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Corrupt HDFS and salvaging data

Reply via email to