subject:"\[jira\] \[Commented\] \(HBASE\-7018\) Fix and Improve TableDescriptor caching for bulk assignment"

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-11-04 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490365#comment-13490365
 ] 

Hudson commented on HBASE-7018:
---

Integrated in HBase-0.94-security-on-Hadoop-23 #9 (See 
[https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/9/])
HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment 
(Revision 1401526)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-24 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483172#comment-13483172
 ] 

Hudson commented on HBASE-7018:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #233 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/233/])
HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment 
(Revision 1401525)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482919#comment-13482919
 ] 

Hudson commented on HBASE-7018:
---

Integrated in HBase-TRUNK #3478 (See 
[https://builds.apache.org/job/HBase-TRUNK/3478/])
HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment 
(Revision 1401525)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482910#comment-13482910
 ] 

Hudson commented on HBASE-7018:
---

Integrated in HBase-0.94 #550 (See 
[https://builds.apache.org/job/HBase-0.94/550/])
HBASE-7018 Fix and Improve TableDescriptor caching for bulk assignment 
(Revision 1401526)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/TableDescriptors.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/FSTableDescriptors.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java


> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-23 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482537#comment-13482537
 ] 

Lars Hofhansl commented on HBASE-7018:
--

+1

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-23 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482516#comment-13482516
 ] 

Gregory Chanan commented on HBASE-7018:
---

no problem rajeshbabu.

Going to commit to trunk and 94 later today if I hear no objections.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-23 Thread rajeshbabu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482167#comment-13482167
 ] 

rajeshbabu commented on HBASE-7018:
---

bq. we get the tableName from the map but write the regionName.
Sorry for this. I have prepared patch internally and tested thoroughly but made 
mistake when it was rebased for community version.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482159#comment-13482159
 ] 

Gregory Chanan commented on HBASE-7018:
---

Latest version doesn't have refresh.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482151#comment-13482151
 ] 

Lars Hofhansl commented on HBASE-7018:
--

Should we bother with checking the mod time if refresh is false. We could just 
load the file again, since getting the mod stamp is a NN roundtrip anyway (just 
avoid the getting the data from a DN).

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482042#comment-13482042
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550386/HBASE-7018-94-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3119//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482003#comment-13482003
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550373/HBASE-7018-v4-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3116//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-94-v3.patch, HBASE-7018-trunk.patch, 
> HBASE-7018-v3-trunk.patch, HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481944#comment-13481944
 ] 

Ted Yu commented on HBASE-7018:
---

Thanks for the quick turnaround.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch, HBASE-7018-v3-trunk.patch, 
> HBASE-7018-v4-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481921#comment-13481921
 ] 

Ted Yu commented on HBASE-7018:
---

Patch v3 looks good.
{code}
+  // If refresh, check mod time has not changed (this is trip to NN).
+  if (getTableInfoModtime(this.fs, this.rootdir, tablename) <= 
cachedtdm.getModtime()) {
{code}
Was the above comment leftover from previous patch ?
{code}
+return tdmt == null? null: tdmt.getTableDescriptor();
{code}
nit: insert space following the words 'null'.

I ran TestFromClientSide3#testAdvancedConfigOverride, TestConstraint and 
TestMasterObserver
They passed with patch v3.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch, HBASE-7018-v3-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-22 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481889#comment-13481889
 ] 

Gregory Chanan commented on HBASE-7018:
---

Test failures seem real.  I think that previously, we would load the latest 
version the HTableDescriptor once on a call openRegion, but now, if the 
HTableDescriptor is already in the cache, we won't load the latest version.  
This seems like a real problem.

We could just store which tables we've re-loaded, and use the current "refresh" 
code, but I think it's probably clearer to just blow that away and only keep 
the optimization about not making multiple roundtrips to the NN.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481035#comment-13481035
 ] 

Lars Hofhansl commented on HBASE-7018:
--

Patch looks good. I wonder whether we should even bother with checking the mod 
time if refresh is false. We could just load the file again, since getting the 
mod stamp is a NN roundtrip anyway.

+1 otherwise


> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-21 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481033#comment-13481033
 ] 

Lars Hofhansl commented on HBASE-7018:
--

@Ted: Good find.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481001#comment-13481001
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550189/7018-trunk.v2
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide3
  org.apache.hadoop.hbase.coprocessor.TestClassLoading
  org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.coprocessor.TestMasterObserver
  org.apache.hadoop.hbase.constraint.TestConstraint
  
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
  org.apache.hadoop.hbase.master.TestZKBasedOpenCloseRegion

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3108//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 7018-trunk.v2, HBASE-7018-94.patch, 
> HBASE-7018-94-v2.patch, HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administra

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-20 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480861#comment-13480861
 ] 

stack commented on HBASE-7018:
--

Nice numbers G.

Minor, FileNotFoundException is an IOException so the FNFE redundant?

+1 on patch (after figuring the test fail above).  Good stuff.



> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, 
> HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480600#comment-13480600
 ] 

Ted Yu commented on HBASE-7018:
---

The test failure in TestHRegionInfo#testGetSetOfHTD is reproducible.
It is on this line in testGetSetOfHTD():
{code}
assertTrue(htd.equals(htd2));
{code}

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, 
> HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480587#comment-13480587
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550120/HBASE-7018-trunk.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 4 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestHRegionInfo

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3098//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, 
> HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480581#comment-13480581
 ] 

Gregory Chanan commented on HBASE-7018:
---

Benchmarks for trunk:

First test: Bulk assign 500 empty regions in one table on my local box. Run 
each trial 3 times.
Without patch: 1.299 s
With patch:1.194 s

Second test: Bulk assign 400 regions, all belonging to separate tables (so 400 
tables). Same setup otherwise. This is really to test the RPC improvements.
Without patch: 3.338 s
With patch:2.648 s

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch, 
> HBASE-7018-trunk.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480552#comment-13480552
 ] 

Gregory Chanan commented on HBASE-7018:
---

Testing a patch for trunk now.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480528#comment-13480528
 ] 

Ted Yu commented on HBASE-7018:
---

v2 looks good to me.
Is patch for trunk necessary ?

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480518#comment-13480518
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12550107/HBASE-7018-94-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3094//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch, HBASE-7018-94-v2.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480495#comment-13480495
 ] 

Ted Yu commented on HBASE-7018:
---

{code}
+   * @throws TableExistsException
{code}
I don't see the above exception declared to be thrown by the new get() methods.

Where is true passed for the refresh parameter in the new get() methods ?

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480474#comment-13480474
 ] 

Hadoop QA commented on HBASE-7018:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12550099/HBASE-7018-94.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3093//console

This message is automatically generated.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
> Attachments: HBASE-7018-94.patch
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480432#comment-13480432
 ] 

Ted Yu commented on HBASE-7018:
---

Nice finding and nice improvement.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480418#comment-13480418
 ] 

Gregory Chanan commented on HBASE-7018:
---

Second test: Bulk assign 400 regions, all belonging to separate tables (so 400 
tables).  Same setup otherwise.  This is really to test the RPC improvements.

Results
a) N/A (Didn't test)
b) 3.725 seconds
c) 2.848 seconds
d) 2.860 seconds

Talking off almost an entire second here isn't too shabby.

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

2012-10-19 Thread Gregory Chanan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480414#comment-13480414
 ] 

Gregory Chanan commented on HBASE-7018:
---

First test: Bulk assign 500 empty regions in one table on my local box.  Run 
each trial 3 times.

Versions: a) 0.94 today (broken cache)
  b) 0.94 w/ fixed "extra" cache (the cache introduced in HBASE-6214)
  c) 0.94 w/ fixed "extra" cache + faster existing cache (don't go to 
NN each time) + reduce RPC overhead
  d) Same as c) but remove "extra cache"

Results:
a) 14 seconds (only ran test once)
b) 1.458 seconds
c) 1.379 seconds
d) 1.227 seconds

> Fix and Improve TableDescriptor caching for bulk assignment
> ---
>
> Key: HBASE-7018
> URL: https://issues.apache.org/jira/browse/HBASE-7018
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Fix For: 0.94.3, 0.96.0
>
>
> HBASE-6214 backported HBASE-5998 (Bulk assignment: regionserver optimization 
> by using a temporary cache for table descriptors when receiving an open 
> regions request), but it's buggy on 0.94 (0.96 appears correct):
> {code}
> HTableDescriptor htd = null;
> if (htds == null) {
>   htd = this.tableDescriptors.get(region.getTableName());
> } else {
>   htd = htds.get(region.getTableNameAsString());
>   if (htd == null) {
> htd = this.tableDescriptors.get(region.getTableName());
> htds.put(region.getRegionNameAsString(), htd);
>   }
> }
> {code}
> i.e. we get the tableName from the map but write the regionName.
> Even fixing this, it looks like there are areas for improvement:
> 1) FSTableDescriptors already has a cache (though it goes to the NameNode 
> each time through to check we have the latest copy.  May as well combine 
> these two caches, might be a performance win as well since we don't need to 
> write to multiple caches.
> 2) FSTableDescriptors makes two RPCs to the NameNode when it encounters a new 
> table.  So the total number of RPCs necessary for a bulk assign (without 
> caching is):
> #regions + #tables
> (with caching):
> min(#regions,#tables) + #tables = #tables + #tables = 2 * #tables
> We can make this only one RPC, yielding:
> #tables
> Probably not a big deal for most users, but in a multi-tenant situation where 
> the number of regions being bulk assigned approaches the number of tables 
> being bulk assigned, this could be a nice performance win.
> Benchmarks coming.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

[jira] [Commented] (HBASE-7018) Fix and Improve TableDescriptor caching for bulk assignment

29 matches

Site Navigation

Mail list logo

Footer information