[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490365#comment-13490365 ] Hudson commented on HBASE-7018: --- Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/]) HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment (Revision 1401526) Result = FAILURE gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483172#comment-13483172 ] Hudson commented on HBASE-7018: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #233 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/233/]) HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment (Revision 1401525) Result = FAILURE gchanan : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482919#comment-13482919 ] Hudson commented on HBASE-7018: --- Integrated in HBase-TRUNK #3478 (See [https://builds.apache.org/job/HBase-TRUNK/3478/]) HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment (Revision 1401525) Result = FAILURE gchanan : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482910#comment-13482910 ] Hudson commented on HBASE-7018: --- Integrated in HBase-0.94 #550 (See [https://builds.apache.org/job/HBase-0.94/550/]) HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment (Revision 1401526) Result = FAILURE gchanan : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482537#comment-13482537 ] Lars Hofhansl commented on HBASE-7018: -- +1 > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482516#comment-13482516 ] Gregory Chanan commented on HBASE-7018: --- no problem rajeshbabu. Going to commit to trunk and 94 later today if I hear no objections. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482167#comment-13482167 ] rajeshbabu commented on HBASE-7018: --- bq. we get the tableName from the map but write the regionName. Sorry for this. I have prepared patch internally and tested thoroughly but made mistake when it was rebased for community version. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482159#comment-13482159 ] Gregory Chanan commented on HBASE-7018: --- Latest version doesn't have refresh. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482151#comment-13482151 ] Lars Hofhansl commented on HBASE-7018: -- Should we bother with checking the mod time if refresh is false. We could just load the file again, since getting the mod stamp is a NN roundtrip anyway (just avoid the getting the data from a DN). > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482042#comment-13482042 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550386/HBASE-7018-94-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3119//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482003#comment-13482003 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550373/HBASE-7018-v4-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3116//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, > HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481944#comment-13481944 ] Ted Yu commented on HBASE-7018: --- Thanks for the quick turnaround. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch, HBASE-7018-v3-trunk.patch, > HBASE-7018-v4-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481921#comment-13481921 ] Ted Yu commented on HBASE-7018: --- Patch v3 looks good. {code} + // If refresh, check mod time has not changed (this is trip to NN). + if (getTableInfoModtime(this.fs, this.rootdir, tablename) <= cachedtdm.getModtime()) { {code} Was the above comment leftover from previous patch ? {code} +return tdmt == null? null: tdmt.getTableDescriptor(); {code} nit: insert space following the words 'null'. I ran TestFromClientSide3#testAdvancedConfigOverride, TestConstraint and TestMasterObserver They passed with patch v3. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch, HBASE-7018-v3-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481889#comment-13481889 ] Gregory Chanan commented on HBASE-7018: --- Test failures seem real. I think that previously, we would load the latest version the HTableDescriptor once on a call openRegion, but now, if the HTableDescriptor is already in the cache, we won't load the latest version. This seems like a real problem. We could just store which tables we've re-loaded, and use the current "refresh" code, but I think it's probably clearer to just blow that away and only keep the optimization about not making multiple roundtrips to the NN. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481035#comment-13481035 ] Lars Hofhansl commented on HBASE-7018: -- Patch looks good. I wonder whether we should even bother with checking the mod time if refresh is false. We could just load the file again, since getting the mod stamp is a NN roundtrip anyway. +1 otherwise > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481033#comment-13481033 ] Lars Hofhansl commented on HBASE-7018: -- @Ted: Good find. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481001#comment-13481001 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550189/7018-trunk.v2 against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide3 org.apache.hadoop.hbase.coprocessor.TestClassLoading org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.coprocessor.TestMasterObserver org.apache.hadoop.hbase.constraint.TestConstraint org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3108//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: 7018-trunk.v2, HBASE-7018-94.patch, > HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administra
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480861#comment-13480861 ] stack commented on HBASE-7018: -- Nice numbers G. Minor, FileNotFoundException is an IOException so the FNFE redundant? +1 on patch (after figuring the test fail above). Good stuff. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, > HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480600#comment-13480600 ] Ted Yu commented on HBASE-7018: --- The test failure in TestHRegionInfo#testGetSetOfHTD is reproducible. It is on this line in testGetSetOfHTD(): {code} assertTrue(htd.equals(htd2)); {code} > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, > HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480587#comment-13480587 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550120/HBASE-7018-trunk.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 82 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionInfo Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3098//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, > HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480581#comment-13480581 ] Gregory Chanan commented on HBASE-7018: --- Benchmarks for trunk: First test: Bulk assign 500 empty regions in one table on my local box. Run each trial 3 times. Without patch: 1.299 s With patch:1.194 s Second test: Bulk assign 400 regions, all belonging to separate tables (so 400 tables). Same setup otherwise. This is really to test the RPC improvements. Without patch: 3.338 s With patch:2.648 s > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, > HBASE-7018-trunk.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480552#comment-13480552 ] Gregory Chanan commented on HBASE-7018: --- Testing a patch for trunk now. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480528#comment-13480528 ] Ted Yu commented on HBASE-7018: --- v2 looks good to me. Is patch for trunk necessary ? > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480518#comment-13480518 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550107/HBASE-7018-94-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3094//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480495#comment-13480495 ] Ted Yu commented on HBASE-7018: --- {code} + * @throws TableExistsException {code} I don't see the above exception declared to be thrown by the new get() methods. Where is true passed for the refresh parameter in the new get() methods ? > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480474#comment-13480474 ] Hadoop QA commented on HBASE-7018: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12550099/HBASE-7018-94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3093//console This message is automatically generated. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > Attachments: HBASE-7018-94.patch > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480432#comment-13480432 ] Ted Yu commented on HBASE-7018: --- Nice finding and nice improvement. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480418#comment-13480418 ] Gregory Chanan commented on HBASE-7018: --- Second test: Bulk assign 400 regions, all belonging to separate tables (so 400 tables). Same setup otherwise. This is really to test the RPC improvements. Results a) N/A (Didn't test) b) 3.725 seconds c) 2.848 seconds d) 2.860 seconds Talking off almost an entire second here isn't too shabby. > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment
[ https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480414#comment-13480414 ] Gregory Chanan commented on HBASE-7018: --- First test: Bulk assign 500 empty regions in one table on my local box. Run each trial 3 times. Versions: a) 0.94 today (broken cache) b) 0.94 w/ fixed "extra" cache (the cache introduced in HBASE-6214) c) 0.94 w/ fixed "extra" cache + faster existing cache (don't go to NN each time) + reduce RPC overhead d) Same as c) but remove "extra cache" Results: a) 14 seconds (only ran test once) b) 1.458 seconds c) 1.379 seconds d) 1.227 seconds > Fix and Improve TableDescriptor caching for bulk assignment > --- > > Key: HBASE-7018 > URL: https://issues.apache.org/jira/browse/HBASE-7018 > Project: HBase > Issue Type: Bug > Components: regionserver >Reporter: Gregory Chanan >Assignee: Gregory Chanan > Fix For: 0.94.3, 0.96.0 > > > HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization > by using a temporary cache for table descriptors when receiving an open > regions request), but it's buggy on 0.94 (0.96 appears correct): > {code} > HTableDescriptor htd = null; > if (htds == null) { > htd = this.tableDescriptors.get(region.getTableName()); > } else { > htd = htds.get(region.getTableNameAsString()); > if (htd == null) { > htd = this.tableDescriptors.get(region.getTableName()); > htds.put(region.getRegionNameAsString(), htd); > } > } > {code} > i.e. we get the tableName from the map but write the regionName. > Even fixing this, it looks like there are areas for improvement: > 1) FSTableDescriptors already has a cache (though it goes to the NameNode > each time through to check we have the latest copy. May as well combine > these two caches, might be a performance win as well since we don't need to > write to multiple caches. > 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new > table. So the total number of RPCs necessary for a bulk assign (without > caching is): > #regions + #tables > (with caching): > min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables > We can make this only one RPC, yielding: > #tables > Probably not a big deal for most users, but in a multi-tenant situation where > the number of regions being bulk assigned approaches the number of tables > being bulk assigned, this could be a nice performance win. > Benchmarks coming. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira