[ https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134814#comment-13134814 ]
Konstantin Shvachko commented on HDFS-2477: ------------------------------------------- Tomasz, as I understand your intentions you want to optimize {{DatanodeDescriptor.moveBlockToHead()}} by replacing the combination of calls of {{listRemove()}} and {{listInsert()}} with a single call to a method that directly moves the list element to the head, because it avoids redundant {{findDatanode()}} calls. This sounds like a good idea. Implementation wise I'd propose to restructure it a bit Instead of introducing helper class DatanodeIndex, which is somewhat confusing I'd rather implement it with extra parameters and return values by changing the signature of {{DatanodeDescriptor.moveBlockToHead()}}, like {code} /** * DatanodeDescriptor.moveBlockToHead() * @return the index of the head of the blockList */ int moveBlockToHead(BlockInfo b, int curIndex, int headIndex) { blockList = b.listMoveToHead(blockList, this, curIndex, headIndex); return curIndex; // new headIndex } {code} where {{BlockInfo.listMoveToHead()}} is the implementation of your moveBlockToHeadFast(). I think this implementation belongs to BlockInfo rather than DatanodeDescriptor as I tried to confine all logic related to triplets and related list operations inside BlockInfo. Then the BlockManager code will look something like this {code} private void reportDiff() { .... int headIndex = 0; while(itBR.hasNext()) { .... // move block to the head of the list in curIndex = storedBlock.findDatanode(dn); if(storedBlock != null && curIndex >= 0){ headIndex = dn.moveBlockToHead(storedBlock, curIndex, headIndex); } .... } {code} I probably didn't get all the details right, but hope the idea is clear and makes sense. Also please - remove white space changes, - make sure tabs are replaced with spaces (need to correctly setup your Eclipse environment), and - add JavaDoc for {{getSetPrevious()}} and {{getSetNext()}}, if you decide to keep these methods in your implementation. > Optimize computing the diff between a block report and the namenode state. > -------------------------------------------------------------------------- > > Key: HDFS-2477 > URL: https://issues.apache.org/jira/browse/HDFS-2477 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: name-node > Reporter: Tomasz Nykiel > Assignee: Tomasz Nykiel > Attachments: reportDiff.patch, reportDiff.patch-2, reportDiff.patch-3 > > > When a block report is processed at the NN, the BlockManager.reportDiff > traverses all blocks contained in the report, and for each one block, which > is also present in the corresponding datanode descriptor, the block is moved > to the head of the list of the blocks in this datanode descriptor. > With HDFS-395 the huge majority of the blocks in the report, are also present > in the datanode descriptor, which means that almost every block in the report > will have to be moved to the head of the list. > Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, > which removes a block from a list and then inserts it. In this process, we > call findDatanode several times (afair 6 times for each moveBlockToHead > call). findDatanode is relatively expensive, since it linearly goes through > the triplets to locate the given datanode. > With this patch, we do some memoization of findDatanode, so we can reclaim 2 > findDatanode calls. Our experiments show that this can improve the reportDiff > (which is executed under write lock) by around 15%. Currently with HDFS-395, > reportDiff is responsible for almost 100% of the block report processing time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira