[ https://issues.apache.org/jira/browse/HBASE-16393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15416309#comment-15416309 ]
Ted Yu commented on HBASE-16393: -------------------------------- Do you have performance improvement comparison with vs. without the patch ? Thanks > Improve computeHDFSBlocksDistribution > ------------------------------------- > > Key: HBASE-16393 > URL: https://issues.apache.org/jira/browse/HBASE-16393 > Project: HBase > Issue Type: Improvement > Reporter: binlijin > Attachments: HBASE-16393.patch > > > With our cluster is big, i can see the balancer is slow from time to time. > And the balancer will be called on master startup, so we can see the startup > is slow also. > The first thing i think whether if we can parallel compute different region's > HDFSBlocksDistribution. > The second i think we can improve compute single region's > HDFSBlocksDistribution. > When to compute a storefile's HDFSBlocksDistribution first we call > FileSystem#getFileStatus(path) and then > FileSystem#getFileBlockLocations(status, start, length), so two namenode rpc > call for every storefile. Instead we can use FileSystem#listLocatedStatus to > get a LocatedFileStatus for the information we need, so reduce the namenode > rpc call to one. This can speed the computeHDFSBlocksDistribution, but also > send out less rpc call to namenode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)