[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482516#comment-13482516 ]
Gregory Chanan commented on HBASE-7018: --------------------------------------- no problem rajeshbabu. Going to commit to trunk and 94 later today if I hear no objections. > Fix and Improve TableDescriptor caching for bulk assignment > ----------------------------------------------------------- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver > Reporter: Gregory Chanan > Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira