Lohit,
I run fsck after I replaced 1 DN (with data on it) with 1 blank DN and started all daemons. I see the fsck report does include this: Missing replicas: 17025 (29.727087 %) According to your explanation, this means that after I removed 1 DN I started missing about 30% of the blocks, right? Wouldn't that mean that 30% of all blocks were *only* on the 1 DN that I removed? But how could that be when I have replication factor of 3? If I run bin/hadoop balancer with my old DN back in the cluster (and new DN removed), I do get the happy "The cluster is balanced" response. So wouldn't that mean that everything is peachy and that if my replication factor is 3 then when I remove 1 DN, I should have only some portion of blocks under-replicated, but not *completely* missing from HDFS? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: lohit <[EMAIL PROTECTED]> > To: core-user@hadoop.apache.org > Sent: Friday, May 9, 2008 1:33:56 AM > Subject: Re: Corrupt HDFS and salvaging data > > Hi Otis, > > Namenode has location information about all replicas of a block. When you run > fsck, namenode checks for those replicas. If all replicas are missing, then > fsck > reports the block as missing. Otherwise they are added to under replicated > blocks. If you specify -move or -delete option along with fsck, files with > such > missing blocks are moved to /lost+found or deleted depending on the option. > At what point did you run the fsck command, was it after the datanodes were > stopped? When you run namenode -format it would delete directories specified > in > dfs.name.dir. If directory exists it would ask for confirmation. > > Thanks, > Lohit > > ----- Original Message ---- > From: Otis Gospodnetic > To: core-user@hadoop.apache.org > Sent: Thursday, May 8, 2008 9:00:34 PM > Subject: Re: Corrupt HDFS and salvaging data > > Hi, > > Update: > It seems fsck reports HDFS is corrupt when a significant-enough number of > block > replicas is missing (or something like that). > fsck reported corrupt HDFS after I replaced 1 old DN with 1 new DN. After I > restarted Hadoop with the old set of DNs, fsck stopped reporting corrupt HDFS > and started reporting *healthy* HDFS. > > > I'll follow-up with re-balancing question in a separate email. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message ---- > > From: Otis Gospodnetic > > To: core-user@hadoop.apache.org > > Sent: Thursday, May 8, 2008 11:35:01 PM > > Subject: Corrupt HDFS and salvaging data > > > > Hi, > > > > I have a case of a corrupt HDFS (according to bin/hadoop fsck) and I'm > > trying > > not to lose the precious data in it. I accidentally run bin/hadoop > > namenode > > -format on a *new DN* that I just added to the cluster. Is it possible for > that > > to corrupt HDFS? I also had to explicitly kill DN daemons before that, > because > > bin/stop-all.sh didn't stop them for some reason (it always did so before). > > > > Is there any way to salvage the data? I have a 4-node cluster with > replication > > factor of 3, though fsck reports lots of under-replicated blocks: > > > > ******************************** > > CORRUPT FILES: 3355 > > MISSING BLOCKS: 3462 > > MISSING SIZE: 17708821225 B > > ******************************** > > Minimally replicated blocks: 28802 (89.269775 %) > > Over-replicated blocks: 0 (0.0 %) > > Under-replicated blocks: 17025 (52.76779 %) > > Mis-replicated blocks: 0 (0.0 %) > > Default replication factor: 3 > > Average block replication: 1.7750744 > > Missing replicas: 17025 (29.727087 %) > > Number of data-nodes: 4 > > Number of racks: 1 > > > > > > The filesystem under path '/' is CORRUPT > > > > > > What can one do at this point to save the data? If I run bin/hadoop fsck > -move > > or -delete will I lose some of the data? Or will I simply end up with > > fewer > > block replicas and will thus have to force re-balancing in order to get > > back > to > > a "safe" number of replicas? > > > > Thanks, > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch