[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-11-11 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-9

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8, 
> hashStructures.patch-9
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-11-07 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-8

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6, hashStructures.patch-7, hashStructures.patch-8
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2495) Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock

2011-11-04 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2495:


Attachment: replicationMon.patch-1

> Increase granularity of write operations in ReplicationMonitor thus reducing 
> contention for write lock
> --
>
> Key: HDFS-2495
> URL: https://issues.apache.org/jira/browse/HDFS-2495
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: replicationMon.patch, replicationMon.patch-1
>
>
> For processing blocks in ReplicationMonitor 
> (BlockManager.computeReplicationWork), we first obtain a list of blocks to be 
> replicated by calling chooseUnderReplicatedBlocks, and then for each block 
> which was found, we call computeReplicationWorkForBlock. The latter processes 
> a block in three stages, acquiring the writelock twice per call:
> 1. obtaining block related info (livenodes, srcnode, etc.) under lock
> 2. choosing target for replication
> 3. scheduling replication (under lock)
> We would like to change this behaviour and decrease contention for the write 
> lock, by batching blocks and executing 1,2,3, for sets of blocks, rather than 
> for each one separately. This would decrease the number of writeLock to 2, 
> from 2*numberofblocks.
> Also, the info level logging can be pushed outside the writelock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-11-04 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-7

Synced with the trunk.

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6, hashStructures.patch-7
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-25 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: reportDiff.patch-5

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2, 
> reportDiff.patch-3, reportDiff.patch-4, reportDiff.patch-5
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-25 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-6

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5, 
> hashStructures.patch-6
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-25 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: reportDiff.patch-4

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2, 
> reportDiff.patch-3, reportDiff.patch-4
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2495) Increase granularity of write operations in ReplicationMonitor thus reducing contention for write lock

2011-10-24 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2495:


Attachment: replicationMon.patch

> Increase granularity of write operations in ReplicationMonitor thus reducing 
> contention for write lock
> --
>
> Key: HDFS-2495
> URL: https://issues.apache.org/jira/browse/HDFS-2495
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: replicationMon.patch
>
>
> For processing blocks in ReplicationMonitor 
> (BlockManager.computeReplicationWork), we first obtain a list of blocks to be 
> replicated by calling chooseUnderReplicatedBlocks, and then for each block 
> which was found, we call computeReplicationWorkForBlock. The latter processes 
> a block in three stages, acquiring the writelock twice per call:
> 1. obtaining block related info (livenodes, srcnode, etc.) under lock
> 2. choosing target for replication
> 3. scheduling replication (under lock)
> We would like to change this behaviour and decrease contention for the write 
> lock, by batching blocks and executing 1,2,3, for sets of blocks, rather than 
> for each one separately. This would decrease the number of writeLock to 2, 
> from 2*numberofblocks.
> Also, the info level logging can be pushed outside the writelock.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2362) More Improvements on NameNode Scalability

2011-10-24 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2362:


Description: 
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports (HDFS-2490)
4. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock (HDFS-2495)
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
11. Move logging out of write lock section.



  was:
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports (HDFS-2490)
4. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
11. Move logging out of write lock section.




> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock (HDFS-2495)
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-23 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: reportDiff.patch-3

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2, reportDiff.patch-3
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-23 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Status: Patch Available  (was: Open)

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2, reportDiff.patch-3
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-22 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-5

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4, hashStructures.patch-5
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-21 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-4

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3, hashStructures.patch-4
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-21 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Status: Patch Available  (was: Open)

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-21 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-3

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2, 
> hashStructures.patch-3
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2362) More Improvements on NameNode Scalability

2011-10-21 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2362:


Description: 
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports (HDFS-2490)
4. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
11. Move logging out of write lock section.



  was:
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports
4. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
11. Move logging out of write lock section.




> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports (HDFS-2490)
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2490) Upgradable lock to allow simutaleous read operation while reportDiff is in progress in processing block reports

2011-10-21 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2490:


Attachment: FSNamesystemLock.java

> Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports
> ---
>
> Key: HDFS-2490
> URL: https://issues.apache.org/jira/browse/HDFS-2490
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: FSNamesystemLock.java
>
>
> Currently, FSNamesystem operations are protected by a single 
> ReentrantReadWriteLock, which allows for having multiple concurrent readers 
> to perform reads, and a single writer to perform writes. There are, however, 
> operations whose execution has primarily reading nature, but occasionally 
> they write.
> The finest example is processing block reports - currently the entire 
> processing is done under writeLock(). With HDFS-395 (explicit deletion acks), 
> processing a block report is primarily a read operation (reportDiff()) after 
> which only very few blocks need to be updated. In fact, we noticed this 
> number to be very low, or even zero blocks.
> It would be desirable to have an upgradeable read lock, which would allow for 
> performing other reads during the first "read" part of reportDiff() (and 
> possibly other operations.
> We implemented such mechanism, which provides writeLock(), readLock(), 
> upgradeableReadLock, upgradeLock(), and downgradeLock(). I achieved this be 
> emloying two ReentrantReadWriteLock's - one protects writes (lock1), the 
> other one reads (lock2).
> Hence, we have:
> writeLock()
>   lock1.writeLock().lock()
>   lock2.writeLock().lock()
> readLock()
>   lock2.readLock().lock()
> upgradeableReadLock()
>   lock1.writeLock().lock()
> upgrade()
>   lock2.writeLock().lock()
> --
> Hence a writeLock() is essentially equivalent to upgradeableLock()+upgrade()
> - two writeLocks are mutually exclusive because of lock1.writeLock
> - a writeLock and upgradeableLock are mutually exclusive as above
> - readLock is mutually exclusive with upgradeableLock()+upgrade() OR 
> writeLock because of lock2.writeLock
> - readLock() + writeLock() causes a deadlock, the same as currently
> - writeLock() + readLock() does not cause deadlocks
> --
> I am conviced to the soundness of this mechanism.
> The overhead comes from having two locks, and in particular, writes need to 
> acquire both of them.
> We deployed this feature, we used the upgradeableLock() ONLY for processing 
> reports.
> Our initial, but not exhaustive experiments have shown that it had a very 
> detrimental effect on the NN throughput - writes were taking up to twice as 
> long.
> This is very unexpected, and hard to explain by only the overhead of 
> acquiring additional lock for writes.
> I would like to ask for input, as maybe I am missing some fundamental problem 
> here.
> I am attaching a java class which implements this locking mechanism.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-20 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: reportDiff.patch-2

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-20 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: (was: hashStructures.patch-2)

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch, reportDiff.patch-2
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-20 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: hashStructures.patch-2

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch-2, reportDiff.patch
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-20 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch-2

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch, hashStructures.patch-2
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2362) More Improvements on NameNode Scalability

2011-10-19 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2362:


Description: 
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. BlockManager.reportDiff optimization for processing block reports (HDFS-2477)
3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports
4. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
5. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
6. Support variable block sizes
7. Release RPC handlers while waiting for edit log is synced to disk
8. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
9. A standalone KeepAlive heartbeat thread
10. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
11. Move logging out of write lock section.



  was:
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports
3. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
4. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
5. Support variable block sizes
6. Release RPC handlers while waiting for edit log is synced to disk
7. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
8. A standalone KeepAlive heartbeat thread
9. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
10. Move logging out of write lock section.



> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. BlockManager.reportDiff optimization for processing block reports 
> (HDFS-2477)
> 3. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports
> 4. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 5. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock
> 6. Support variable block sizes
> 7. Release RPC handlers while waiting for edit log is synced to disk
> 8. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 9. A standalone KeepAlive heartbeat thread
> 10. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 11. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2477) Optimize computing the diff between a block report and the namenode state.

2011-10-19 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2477:


Attachment: reportDiff.patch

> Optimize computing the diff between a block report and the namenode state.
> --
>
> Key: HDFS-2477
> URL: https://issues.apache.org/jira/browse/HDFS-2477
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: reportDiff.patch
>
>
> When a block report is processed at the NN, the BlockManager.reportDiff 
> traverses all blocks contained in the report, and for each one block, which 
> is also present in the corresponding datanode descriptor, the block is moved 
> to the head of the list of the blocks in this datanode descriptor.
> With HDFS-395 the huge majority of the blocks in the report, are also present 
> in the datanode descriptor, which means that almost every block in the report 
> will have to be moved to the head of the list.
> Currently this operation is performed by DatanodeDescriptor.moveBlockToHead, 
> which removes a block from a list and then inserts it. In this process, we 
> call findDatanode several times (afair 6 times for each moveBlockToHead 
> call). findDatanode is relatively expensive, since it linearly goes through 
> the triplets to locate the given datanode.
> With this patch, we do some memoization of findDatanode, so we can reclaim 2 
> findDatanode calls. Our experiments show that this can improve the reportDiff 
> (which is executed under write lock) by around 15%. Currently with HDFS-395, 
> reportDiff is responsible for almost 100% of the block report processing time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2476) More CPU efficient data structure for under-replicated/over-replicated/invalidate blocks

2011-10-19 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2476:


Attachment: hashStructures.patch

> More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks
> 
>
> Key: HDFS-2476
> URL: https://issues.apache.org/jira/browse/HDFS-2476
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Reporter: Tomasz Nykiel
>Assignee: Tomasz Nykiel
> Attachments: hashStructures.patch
>
>
> This patch introduces two hash data structures for storing under-replicated, 
> over-replicated and invalidated blocks. 
> 1. LightWeightHashSet
> 2. LightWeightLinkedSet
> Currently in all these cases we are using java.util.TreeSet which adds 
> unnecessary overhead.
> The main bottlenecks addressed by this patch are:
> -cluster instability times, when these queues (especially under-replicated) 
> tend to grow quite drastically,
> -initial cluster startup, when the queues are initialized, after leaving 
> safemode,
> -block reports,
> -explicit acks for block addition and deletion
> 1. The introduced structures are CPU-optimized.
> 2. They shrink and expand according to current capacity.
> 3. Add/contains/delete ops are performed in O(1) time (unlike current log n 
> for TreeSet).
> 4. The sets are equipped with fast access methods for polling a number of 
> elements (get+remove), which are used for handling the queues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2362) More Improvements on NameNode Scalability

2011-10-19 Thread Tomasz Nykiel (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Nykiel updated HDFS-2362:


Description: 
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports
3. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks (HDFS-2476)
4. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
5. Support variable block sizes
6. Release RPC handlers while waiting for edit log is synced to disk
7. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
8. A standalone KeepAlive heartbeat thread
9. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
10. Move logging out of write lock section.


  was:
This jira acts as an umbrella jira to track all the improvements we've done 
recently to improve Namenode's performance, responsiveness, and hence 
scalability. Those improvements include:
1. Incremental block reports (HDFS-395)
2. Upgradable lock to allow simutaleous read operation while reportDiff is in 
progress in processing block reports
3. More CPU efficient data structure for 
under-replicated/over-replicated/invalidate blocks
4. Increase granularity of write operations in ReplicationMonitor thus reducing 
contention for write lock
5. Support variable block sizes
6. Release RPC handlers while waiting for edit log is synced to disk
7. Reduce network traffic pressure to the master rack where NN is located by 
lowering read priority of the replicas on the rack
8. A standalone KeepAlive heartbeat thread
9. Reduce Multiple traversals of path directory to one for most namespace 
manipulations
10. Move logging out of write lock section.



> More Improvements on NameNode Scalability
> -
>
> Key: HDFS-2362
> URL: https://issues.apache.org/jira/browse/HDFS-2362
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Reporter: Hairong Kuang
>
> This jira acts as an umbrella jira to track all the improvements we've done 
> recently to improve Namenode's performance, responsiveness, and hence 
> scalability. Those improvements include:
> 1. Incremental block reports (HDFS-395)
> 2. Upgradable lock to allow simutaleous read operation while reportDiff is in 
> progress in processing block reports
> 3. More CPU efficient data structure for 
> under-replicated/over-replicated/invalidate blocks (HDFS-2476)
> 4. Increase granularity of write operations in ReplicationMonitor thus 
> reducing contention for write lock
> 5. Support variable block sizes
> 6. Release RPC handlers while waiting for edit log is synced to disk
> 7. Reduce network traffic pressure to the master rack where NN is located by 
> lowering read priority of the replicas on the rack
> 8. A standalone KeepAlive heartbeat thread
> 9. Reduce Multiple traversals of path directory to one for most namespace 
> manipulations
> 10. Move logging out of write lock section.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira