Thanks Samuel,
Your information is very correct.
I have also read code about garbage collection of invalidating blocks.

But I found the namenode is fair to process the invalidating for each
datanode.
In my cluster, there are 5 datanode. The storage IDs are:

node1: DS- 978762906-10.24.1.12-50010-1237686434530
node2: DS- 489086185-10.24.1.14-50010-1237686416330
node3: DS-1170985665-10.24.1.16-50010-1237686426395
node4: DS-1024388083-10.24.1.18-50010-1237686404482
node5: DS-2136798339-10.24.1.20-50010-1237686444430
I know the storage ID is generated
by org.apache.hadoop.hdfs.server.datanode.DataNode.setNewStorageID(...).

In org.apache.hadoop.hdfs.server.namenode.FSNamesystem

  // Keeps a Collection for every named machine containing
  // blocks that have recently been invalidated and are thought to live
  // on the machine in question.
  // Mapping: StorageID -> ArrayList<Block>
  //
  private Map<String, Collection<Block>> recentInvalidateSets =
    new TreeMap<String, Collection<Block>>();

In org.apache.hadoop.hdfs.server.namenode.FSNamesystem.ReplicationMonitor
This thread run in interval: replicationRecheckInterval=3000 milliseconds.

Into computeDatanodeWork()
nodesToProcess = 2.

then into computeInvalidateWork(nodesToProcess)
the for cycle will only exec 2 cycles.

for each cycle, go into invalidateWorkForOneNode()
it will always get the first node to invalidate blocks on this node.
  String firstNodeId = recentInvalidateSets.keySet().iterator().next();

TreeMap is a stored map, so, the ketSet is:
[1024388083-10.24.1.18-50010-1237686404482,
1170985665-10.24.1.16-50010-1237686426395,
2136798339-10.24.1.20-50010-1237686444430,
489086185-10.24.1.14-50010-1237686416330,
978762906-10.24.1.12-50010-1237686434530]

So, the sequence of node list in recentInvalidateSets is:
[node4, node3, node5, node2, node1]

So, every time in invalidateWorkForOneNode(), it will always process node4
then node3, then node2 and then node1.

My application is a HBase write-heavy application.
So, there is many blocks need invalidate in each datanode. So when each 3000
milliseconds, at most, there is only two datanode is processed. Since the
node1 is the last one in the TreeMap, it have no change to be garbage
collected.

I think HDFS namenode should fix this issue.

Schubert
On Thu, Mar 26, 2009 at 2:57 PM, Samuel Guo <guosi...@gmail.com> wrote:

> After a file is deleted, HDFS does not immediately reclaim the available
> physical storage. It does so only lazily during garbage collection. When a
> file is deleted by the application, the master remove the file's metadata
> from *FSNamesystem* and logs the deletion immediately. And the file's
> deleted blocks information will be collected in each DataNodeDescriptor's
> *invalidateBlocks* set in Namenode. During the heartbeats between NN and
> DN,
> NN will scan the specified DN's DataNodeDescriptor's invalidateBlocks set,
> find the blocks to be deleted in DN and send a *DNA_INVALIDATE*
> BlockCommand
> to DN. And the *BlockScanner* thread running on DN will scan, find and
> delete these blocks after DN receives the *DNA_INVALIDATE* BlockCommand.
>
> You can search *DNA_INVALIDATE* in DataNode.java and NameNode.java files,
> and find the logic of the garbage collection. Hope it will be helpful.
>
> On Thu, Mar 26, 2009 at 11:07 AM, schubert zhang <zson...@gmail.com>
> wrote:
>
> > Tanks Andrew and Billy.
> > I think the subject of this mail thread is not appropriate, it may not be
> a
> > balance issue.
> > The problem seems the block deleting scheduler in HDFS.
> >
> > Last night(timezone:+8), I slow down my application, and this morning, I
> > found almost all garbage blocks are deleted.
> > Here is the current blocks number of each datanode:
> > node1: 10651
> > node2: 10477
> > node3: 12185
> > node4: 11607
> > node5: 14000
> >
> > It seems fine.
> > But I want to study the code of HDFS and make clear the policy of
> deleting
> > blocks on datanodes. If anyone in the hadoop community can  give me some
> > advices?
> >
> > Schubert
> >
> > On Thu, Mar 26, 2009 at 7:55 AM, Andrew Purtell <apurt...@apache.org>
> > wrote:
> >
> >
> > >
> > > > From: schubert zhang <zson...@gmail.com>
> > > > From another point of view, I think HBase cannot control to
> > > > delete blocks on which node, it would just delete files, and
> > > > HDFS delete blocks where the blocks locating.
> > >
> > > Yes, that is exactly correct.
> > >
> > > Best regards,
> > >
> > >   - Andy
> > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to