2suresh: > When you brought down the DN, the blocks in it were > replicated to the remaining DNs. When the DN was > added back, the blocks in it were over replicated, resulting > in deletion of the extra replica.
Hm, this makes sense if after starting the DN which has some blocks data on it, the existing blocks are caught up. 2Stack: > Could it be that the du was counting the downed DNs blocks for a > while. Do you mean du may still count blocks of DN which was considered as "dead" by NN (and NN had already started replication of under-replicated blocks)? Sounds weird: if NN sees under-replicated blocks then du should also see them as under-replicated (i.e. not allocated on dead DN), right? > When you brought back the old DN, NN told it > clean up blocks it had replicated elsewhere? Not sure about this. I watched a bit for the status of DN and blocks on it via NN web UI: there's a table for that (shows DN list with block number on each). I *think* after the DN was back the table showed 0 blocks for it. Although, I think at some point after DN stop (or was it after DN already started back?) I noticed that total number of blocks is bigger than it was before I stopped it. In the long run hdfs status check told me that all blocks are replicated exactly 2 times. Unfortunately I haven't watched for the status & stats closely during the procedure of DN reconfiguring as I haven't expected smth goes weird. Will watch more closely next time. Alex. On Tue, Mar 15, 2011 at 10:32 AM, suresh srinivas <srini30...@gmail.com>wrote: > When you brought down the DN, the blocks in it were replicated to the > remaining DNs. When the DN was added back, the blocks in it were over > replicated, resulting in deletion of the extra replica. > > On Mon, Mar 14, 2011 at 7:34 AM, Alex Baranau <alex.barano...@gmail.com>wrote: > >> Hello, >> >> As far as I understand, since "hadoop fs -du" command uses Linux' "du" >> internally this mean that the number of replicas (at the moment of command >> run) affect the result. Is that correct? >> >> I have the following case. >> I have a small (1 master + 5 slaves each with DN, TT & RS) test HBase >> cluster with replication set to 2. The tables data size is monitoried with >> the help of "hadoop fs -du" command. There's a table which is constantly >> written to: data is only added in it. >> At some point I decided to reconfigure one of the slaves and shut it down. >> After reconfiguration (HBase already marked it as dead one) I brought it up >> again. Things went smoothly. However on the table size graph (I drew from >> data fetched with "hadoop fs -du" command) I noticed a little spike up on >> data size and then it went down to the normal/expected values. Can it be so >> that at some point of the taking out/reconfiguring/adding back node >> procedure at some point blocks were over-replicated? I'd expect them to be >> under-replicated for some time (as DN is down) and I'd expect to see the >> inverted spike: small decrease in data amount and then back to "expected" >> rate (after all blocks got replicated again). Any ideas? >> >> Thank you, >> >> Alex Baranau >> ---- >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - >> HBase >> > > > > -- > Regards, > Suresh > >