The namenode lazily instructs a Datanode to delete blocks. As a response to every heartbeat from a Datanode, the Namenode instructs it to delete a maximum on 100 blocks. Typically, the heartbeat periodicity is 3 seconds. The heartbeat thread in the Datanode deletes the block files synchronously before it can send the next heartbeat. That's the reason a small number (like 100) was chosen.
If you have 8 datanodes, your system will probably delete about 800 blocks every 3 seconds. Thanks, dhruba -----Original Message----- From: André Martin [mailto:[EMAIL PROTECTED] Sent: Friday, March 21, 2008 3:06 PM To: core-user@hadoop.apache.org Subject: Re: Performance / cluster scaling question After waiting a few hours (without having any load), the block number and "DFS Used" space seems to go down... My question is: is the hardware simply too weak/slow to send the block deletion request to the datanodes in a timely manner, or do simply those "crappy" HDDs cause the delay, since I noticed that I can take up to 40 minutes when deleting ~400.000 files at once manually using "rm -r"... Actually - my main concern is why the performance à la the throughput goes down - any ideas?