[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112374#comment-13112374
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2020
---



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


If the location in zk is the recent data do we still need to check with the 
AM address? if this zk data is null then we can check for AM address? Just a 
thought .


- ramkrishna


On 2011-09-22 00:38:16, Ming Ma wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  ---
bq.  
bq.  (Updated 2011-09-22 00:38:16)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the 
handling of server shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, 
don't block and wait for .META. availability. Resubmit another 
ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.  https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
 1172205 
bq.  
bq.  Diff: https://reviews.apache.org/r/2007/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Keep Master up all the time, do rolling restart of RSs like this - stop 
RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, 
start RS2, wait for 2 seconds, etc. The program can run for couple hours until 
it stops. -ROOT- and .META. are available during that time.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.



> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
> AssignmentManager
> --
>
> Key: HBASE-4455
> URL: https://issues.apache.org/jira/browse/HBASE-4455
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
> wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
> RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
> regions aren't in "regions in transtion" from AssignmentManager point of 
> view, but they aren't assigned to any regions. Here are the issues.
> 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
> invoked to check if it c

[jira] [Updated] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4131:


Attachment: replicationInterface1.txt

This patch creates an interface called ReplicationInterface. This is the 
interface that should be implemented by any Replication Module.

The classname shouldbe specified via a config variable 
hbase.replication.service. If this is not set, then the default name of the 
Replication class is 
org.apache.hadoop.hbase.replication.regionserver.Replication to maintain 
backward compatibility.

Stack: does this answer the question you asked earlier?

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4458) HBase should give actionable information when a region is compressed with a codec that is not available.

2011-09-22 Thread Jonathan Hsieh (JIRA)
HBase should give actionable information when a region is compressed with a 
codec that is not available.


 Key: HBASE-4458
 URL: https://issues.apache.org/jira/browse/HBASE-4458
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.3
Reporter: Jonathan Hsieh


A cluster that previously used LZO codec was upgraded with the intent of moving 
away from the codec to another.  Several regions failed to deploy because the 
LZO codec was no longer present.  However, there was little indication that 
this as the problem.

Ideally, the master web ui or hbck would detect these problems and provide why 
it fails to deploy and also provide an actionable error message.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112454#comment-13112454
 ] 

ramkrishna.s.vasudevan commented on HBASE-4153:
---

I have fixed the testcase
I will submit the addendum.  Ted thanks for tracking this. 

> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, HBASE-4153_6.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4452:
--

Attachment: HBASE-4452.patch

Ted testcases are running fine with this.
Hope it does not cause any problem like HBASE-4153 :)

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4452:
--

Status: Patch Available  (was: Open)

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4153:
--

Attachment: HBASE-4153_addendum.patch

Fix for testcase and also interface change in RegionServerServices. 

> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, 
> HBASE-4153_6.patch, HBASE-4153_addendum.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112530#comment-13112530
 ] 

Ted Yu commented on HBASE-4153:
---

I ran the Ramkrishna's addendum locally and 
TestZKBasedOpenCloseRegion#testRSAlreadyProcessingRegion passed.

Integrated the addendum to 0.92 and TRUNK.

Thanks for the quick response Ramkrishna.

> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, 
> HBASE-4153_6.patch, HBASE-4153_addendum.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112535#comment-13112535
 ] 

Ted Yu commented on HBASE-4452:
---

Minor comment:
{code}
+// InterruptedException too.  If so, we failed.  Even if tickle opening 
fails
+// then it is a failure.
{code}
I think we don't need 'Even' above.

Also, I would initialize the new boolean with false.

Running test suite.

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112539#comment-13112539
 ] 

ramkrishna.s.vasudevan commented on HBASE-4452:
---

@Ted, reason for initializing to true is like the tickleOpening may not be 
invoked always unlesss the PostOpenDeploytask takes more time.  That is why i 
initialized to true.

Want to change it ?

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4457) Starting region server on non-default info port is resulting in broken URL's in master UI

2011-09-22 Thread Praveen Patibandla (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Praveen Patibandla updated HBASE-4457:
--

Attachment: 4457-V1.patch

> Starting region server on non-default info port is resulting in broken URL's 
> in master UI
> -
>
> Key: HBASE-4457
> URL: https://issues.apache.org/jira/browse/HBASE-4457
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: Praveen Patibandla
>Priority: Minor
>  Labels: newbie
> Fix For: 0.92.0
>
> Attachments: 4457-V1.patch, 4457.patch
>
>
> When  "hbase.regionserver.info.port" is set to non-default port, Master UI 
> has broken URL's in the region server table because it's hard coded to 
> default port.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4457) Starting region server on non-default info port is resulting in broken URL's in master UI

2011-09-22 Thread Praveen Patibandla (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112548#comment-13112548
 ] 

Praveen Patibandla commented on HBASE-4457:
---

@Ted, Thank you for reviewing the patch. I've fixed the formatting.

> Starting region server on non-default info port is resulting in broken URL's 
> in master UI
> -
>
> Key: HBASE-4457
> URL: https://issues.apache.org/jira/browse/HBASE-4457
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: Praveen Patibandla
>Priority: Minor
>  Labels: newbie
> Fix For: 0.92.0
>
> Attachments: 4457-V1.patch, 4457.patch
>
>
> When  "hbase.regionserver.info.port" is set to non-default port, Master UI 
> has broken URL's in the region server table because it's hard coded to 
> default port.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112558#comment-13112558
 ] 

Ted Yu commented on HBASE-4452:
---

@Ramkrishna:
Your consideration makes sense.
There is no need to change.

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112682#comment-13112682
 ] 

Ted Yu commented on HBASE-4452:
---

Test suite passed with patch.
+1.

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112683#comment-13112683
 ] 

Hudson commented on HBASE-4153:
---

Integrated in HBase-TRUNK #2243 (See 
[https://builds.apache.org/job/HBase-TRUNK/2243/])
HBASE-4153 Ramkrishna's fix for 
TestZKBasedOpenCloseRegion#testRSAlreadyProcessingRegion

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java


> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, 
> HBASE-4153_6.patch, HBASE-4153_addendum.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4344:
--

Attachment: 4344-v5.txt

Patch v5.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4345) Ensure that Scanners that read from the storefiles respect MVCC

2011-09-22 Thread Amitanand Aiyer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112693#comment-13112693
 ] 

Amitanand Aiyer commented on HBASE-4345:


@stack: I'm going to redo this patch.  I gave walk through of this code last 
week, internally, and one of the 
comments I got was that doing the filter approach is *not* going to ignore 
delete-markers that occur after
the read point.  i.e. they will take effect, even though they should not.

I've been in austin for a recruiting trip, so don't have the patch ready. Will 
try to get it and update asap.

> Ensure that Scanners that read from the storefiles respect MVCC
> ---
>
> Key: HBASE-4345
> URL: https://issues.apache.org/jira/browse/HBASE-4345
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: patch-3
>
>
> Currently, the key-values written to the disk do not include the MVCC (RWCC) 
> version information. Once we add that
> information, and make it persistent to disk; let us make the scanners respect 
> the MVCC mechanism by ignoring 
> "newer" writes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112708#comment-13112708
 ] 

Ted Yu commented on HBASE-4344:
---

This is the error for testRollback based on patch v5:
{code}
testRollback(org.apache.hadoop.hbase.regionserver.TestSplitTransaction)  Time 
elapsed: 1.166 sec  <<< ERROR!
java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.hadoop.hbase.regionserver.MemStore.maybeCloneWithAllocator(MemStore.java:246)
at org.apache.hadoop.hbase.regionserver.MemStore.add(MemStore.java:213)
at org.apache.hadoop.hbase.regionserver.Store.add(Store.java:312)
at 
org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:2079)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:2041)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1572)
at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1521)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.loadRegion(HBaseTestingUtility.java:834)
at 
org.apache.hadoop.hbase.regionserver.TestSplitTransaction.testRollback(TestSplitTransaction.java:213)
{code}

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4344:
--

Attachment: 4344-v6.txt

Patch v6 fixes ArrayIndexOutOfBoundsException mentioned above.
Still need to investigate 'Already used this rwcc. Too late to initialize' 
issue.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4344:
--

Attachment: (was: 4344-v6.txt)

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4344:
--

Attachment: 4344-v6.txt

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112736#comment-13112736
 ] 

Lars Hofhansl commented on HBASE-4344:
--

Hmm... See this code in SplitTransaction.java:
{code}
  case CLOSED_PARENT_REGION:
// So, this returns a seqid but if we just closed and then reopened, we
// should be ok. On close, we flushed using sequenceid obtained from
// hosting regionserver so no need to propagate the sequenceid returned
// out of initialize below up into regionserver as we normally do.
// TODO: Verify.
this.parent.initialize();
break;
{code}

This is where it fails. Moving initialization of rwcc to HRegion.initialize 
rather then at construction time fixes this for me.


> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112739#comment-13112739
 ] 

Lars Hofhansl commented on HBASE-4344:
--

It is also interesting to note that TestSplitTransaction finishes in 3-4s 
without this patch but takes > 30s with it!!


> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112747#comment-13112747
 ] 

Lars Hofhansl commented on HBASE-4344:
--

It's probably all that extra logging, but need to make sure.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4457) Starting region server on non-default info port is resulting in broken URL's in master UI

2011-09-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112754#comment-13112754
 ] 

stack commented on HBASE-4457:
--

@Praveen I thought about going this route.  I couldn't convince myself that 
webport was important enough info to make it into the ServerName fundamental 
type (its used everywhere, in fs and persisted to zk, serialized over rpc) and 
truth be told, I punted on what to do about it; thanks for raising it again.

So I still think having webui port in the servername is not the right place for 
it.

What about adding webui param to HMasterRegionInterface#regionServerStartup?

Pass it through to ServerManager.

OnlineServers is ServerName->HServerLoad Map.  Maybe the value needs to change 
to be a datastructure that has HSL and webui?  That would be pretty disruptive? 
 Perhaps a new ServerInfo or ServerVitals that has HSL and webui?  Maybe later 
we'll add more 'server info' to this data structure?

I'd almost be ok with a new map of SN->webui that sat beside the onlineServers 
and was expired when onlineServer expired

Sorry if this is more than you bargained for.

> Starting region server on non-default info port is resulting in broken URL's 
> in master UI
> -
>
> Key: HBASE-4457
> URL: https://issues.apache.org/jira/browse/HBASE-4457
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.92.0
>Reporter: Praveen Patibandla
>Priority: Minor
>  Labels: newbie
> Fix For: 0.92.0
>
> Attachments: 4457-V1.patch, 4457.patch
>
>
> When  "hbase.regionserver.info.port" is set to non-default port, Master UI 
> has broken URL's in the region server table because it's hard coded to 
> default port.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112812#comment-13112812
 ] 

Hudson commented on HBASE-4153:
---

Integrated in HBase-0.92 #12 (See 
[https://builds.apache.org/job/HBase-0.92/12/])
HBASE-4153 Ramkrishna's fix for 
TestZKBasedOpenCloseRegion#testRSAlreadyProcessingRegion

tedyu : 
Files : 
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/RegionServerServices.java
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/master/TestZKBasedOpenCloseRegion.java


> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, 
> HBASE-4153_6.patch, HBASE-4153_addendum.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4438) Speed the build test phase

2011-09-22 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112816#comment-13112816
 ] 

Jesse Yates commented on HBASE-4438:


Right now its not too hard to get the integration tests named differently to 
run in a separate phase (basically add in the failsafe plugin and rename them 
all to **/IT*.java, or whatever regex we want). However with the split proposed 
in HBASE-4336, everything is going to move around anyways, so we might as well 
just get it done all together. 

Granted there is going to be some serious pain when we break out into modules.

Adding ticket (HBASE-4454) so we can just do the integration tests with naming, 
for the moment.

> Speed the build test phase
> --
>
> Key: HBASE-4438
> URL: https://issues.apache.org/jira/browse/HBASE-4438
> Project: HBase
>  Issue Type: Umbrella
>  Components: test
>Reporter: stack
>
> We're up to 2hrs 15mins running tests on jenkins which is kinda crazy.  Akash 
> started a thread over in dev on fixing this: 
> http://search-hadoop.com/m/cZjDH1ykGIA/Running+UnitTests+before+submitting+a+patch&subj=Running+UnitTests+before+submitting+a+patch
> Jessie brings up http://maven.apache.org/plugins/maven-failsafe-plugin/ which 
> looks like something we need to start using; having a tiering of our testing 
> would make sense to me.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-09-22 Thread Jonathan Gray (JIRA)
HbaseObjectWritable code is a byte, we will eventually run out of codes
---

 Key: HBASE-4459
 URL: https://issues.apache.org/jira/browse/HBASE-4459
 Project: HBase
  Issue Type: Bug
  Components: io
Reporter: Jonathan Gray
Priority: Critical
 Fix For: 0.94.0


There are about 90 classes/codes in HbaseObjectWritable currently and 
Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
not break compatibility might want to leave a gap before using codes and that's 
difficult in such limited space.

Eventually we should get rid of this pattern that makes compatibility difficult 
(better client/server protocol handshake) but we should probably at least bump 
this to a short for 0.94.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4454) Add failsafe plugin to build and rename integration tests

2011-09-22 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112825#comment-13112825
 ] 

Jesse Yates commented on HBASE-4454:


Patch up for this at https://reviews.apache.org/r/2022/

> Add failsafe plugin to build and rename integration tests
> -
>
> Key: HBASE-4454
> URL: https://issues.apache.org/jira/browse/HBASE-4454
> Project: HBase
>  Issue Type: Improvement
>Reporter: Jesse Yates
>
> Add the maven-failsafe-plugin to the build process so we can run integration 
> tests with "mvn verify". This will also involve a renaming of integration 
> tests to conform to a new integration test regex.
> This is a stopgap measure while we until break them out into their own module.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112849#comment-13112849
 ] 

Jonathan Gray commented on HBASE-4131:
--

Talking with Dhruba I think it doesn't make sense to have the 
replicateLogEntries() method in the ReplicationService API.  There are really 
two replication service types, a source and a sink.  HBase replication uses the 
same service for both, but other services may only want to be a source or a 
sink.  I will let Dhruba propose the specific interfaces.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-22 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4448:
-

Attachment: hbase_hbaseTestingUtility_uses_2011_09_22.xlsx

Attaching spreadsheet documenting HBaseTestingUtility configurations by package

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-22 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112869#comment-13112869
 ] 

Jesse Yates commented on HBASE-4448:


Doug, had the same thoughts you did on whether or not all the classes really 
need to have all the region servers, etc. 

With the reusability, in some cases I have some doubts about what is actually 
practical. In some cases (eg. coproc), they are doing a bunch of configuration 
on the table, which has implications on whether or not the table/cluster can be 
immediately/concurrently reused.

There are a couple things we could do to make usage easier.

(1) A lot of times people are setting the configuration statically, so we want 
to remove that if people are going to reuse it (though different jvms solves 
that for the moment).

(2) Create unique clusters - these would not be cached and really good for 
situations where people are injecting faults, etc.

(3) Add some modularization for table configuration etc - a lot seems to be 
setting up properties for the db, but really making properties that you are 
just testing on some table. This may be a little pie in the sky...

(4) Reset properties method - so are allowed to make changes to a table when 
you use it, but then when done/released, we just reset the properties/config.

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4344:
--

Attachment: 4344-v7.txt

Patch version 7 passes all unit tests.
I used Lars' patch for TestStoreFile.

> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> 4344-v7.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-22 Thread Doug Meil (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112884#comment-13112884
 ] 

Doug Meil commented on HBASE-4448:
--

I agree that some tests will need special configuration for special situations.

But based on the sheet, I think there is still a benefit in going this route.  
There are still 39 MiniCluster instances that could be shared between tests, 
and that's not even considering MiniZk instances which could also be shared.  
That's roughly 10 or 11 minutes of extra startup & teardown time right there, 
and we're just getting started.  

I think it might be worth reviewing the tests in terms of what the tests need 
vs. what they were coded for, especially with Client and REST packages.  Does 
the REST unit test really need a MiniCluster with 3 slaves?  I would hazard a 
guess that there was some copy-paste going on.

There isn't any single thing that can fix the build, but I still think this 
approach seems like a reasonable start.

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)
Support running an embedded ThriftServer within a RegionServer
--

 Key: HBASE-4460
 URL: https://issues.apache.org/jira/browse/HBASE-4460
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray


Rather than a separate process, it can be advantageous in some situations for 
each RegionServer to embed their own ThriftServer.  This allows each embedded 
ThriftServer to short-circuit any queries that should be executed on the local 
RS and skip the extra hop.  This then enables the building of fat Thrift 
clients that cache region locations and avoid extra hops all together.

This JIRA is just about the embedded ThriftServer.  Will open others for the 
rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-22 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112889#comment-13112889
 ] 

Jesse Yates commented on HBASE-4448:


Definitely agree that doing this still makes sense. 

I was just thinking about adding some functionality to help people testing down 
the road.

> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4448) HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility instances across unit tests

2011-09-22 Thread Doug Meil (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112891#comment-13112891
 ] 

Doug Meil commented on HBASE-4448:
--

Agreed.



> HBaseTestingUtilityFactory - pattern for re-using HBaseTestingUtility 
> instances across unit tests
> -
>
> Key: HBASE-4448
> URL: https://issues.apache.org/jira/browse/HBASE-4448
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Assignee: Doug Meil
>Priority: Minor
> Attachments: HBaseTestingUtilityFactory.java, 
> hbase_hbaseTestingUtility_uses_2011_09_22.xlsx
>
>
> Setting up and tearing down HBaseTestingUtility instances in unit tests is 
> very expensive.  On my MacBook it takes about 10 seconds to set up a 
> MiniCluster, and 7 seconds to tear it down.  When multiplied by the number of 
> test classes that use this facility, that's a lot of time in the build.
> This factory assumes that the JVM is being re-used across test classes in the 
> build, otherwise this pattern won't work. 
> I don't think this is appropriate for every use, but I think it can be 
> applicable in a great many cases - especially where developers just want a 
> simple MiniCluster with 1 slave.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4131:


Attachment: replicationInterface2.txt

Addressed Stacks' comments. Split up the Replication Interface into two parts, 
the ReplicationSource and ReplicationSink Interfaces.

{code}


public interface ReplicationService {


  public void initialize(Server rs, FileSystem fs, Path logdir,
 Path oldLogDir) throws IOException;


  public void startReplicationService() throws IOException;


  public void stopReplicationService();
}

public interface ReplicationSourceService extends ReplicationService {

  public WALActionsListener getWALActionsListener();
}


public interface ReplicationSinkService extends ReplicationService {

  public void replicateLogEntries(HLog.Entry[] entries) throws IOException;
}



> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4460:
-

Attachment: HBASE-4460-v1.patch

Adds {{HRegionThriftServer}}, a RegionServer hosted ThriftServer.  Default is 
off, can be turned on with "hbase.regionserver.export.thrift" set to true.

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)
Expose getRowOrBefore via Thrift


 Key: HBASE-4461
 URL: https://issues.apache.org/jira/browse/HBASE-4461
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Jonathan Gray
Assignee: Jonathan Gray


In order for fat Thrift-based clients to locate region locations they need to 
utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112924#comment-13112924
 ] 

Andrew Purtell commented on HBASE-4460:
---

This is possibly a first step toward replacing HRPC with Thrift. Your thoughts 
there, please.

If so, we should consider bringing the capability to wrap sockets with SASL 
into the ThriftServer.

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112931#comment-13112931
 ] 

Ted Yu commented on HBASE-4461:
---

@Jonathan:
HBASE-4296 is related to this issue.

> Expose getRowOrBefore via Thrift
> 
>
> Key: HBASE-4461
> URL: https://issues.apache.org/jira/browse/HBASE-4461
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>
> In order for fat Thrift-based clients to locate region locations they need to 
> utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112933#comment-13112933
 ] 

Jonathan Gray commented on HBASE-4460:
--

Replacing HRPC is another story but I think many of us are in agreement that 
we'd like to do that eventually.  The scope here is much smaller and I'm 
working on a set of changes to allow fat Thrift-based clients, not necessarily 
replacing normal HRPC.

Open to your feedback on what I can do to better integrate with security stuff 
but not sure what I can do at this point.

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112936#comment-13112936
 ] 

Jonathan Gray commented on HBASE-4461:
--

Thanks Ted.

> Expose getRowOrBefore via Thrift
> 
>
> Key: HBASE-4461
> URL: https://issues.apache.org/jira/browse/HBASE-4461
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
>
> In order for fat Thrift-based clients to locate region locations they need to 
> utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112935#comment-13112935
 ] 

Jonathan Gray commented on HBASE-4296:
--

Over in HBASE-4461 I am exposing this method to Thrift to enable building fat 
Thrift-based clients.  Rather than deprecating this, could we just notate that 
it is an expensive operation and not for normal operations?  Or even only allow 
it to work on ROOT and META?

> Deprecate HTable[Interface].getRowOrBefore(...)
> ---
>
> Key: HBASE-4296
> URL: https://issues.apache.org/jira/browse/HBASE-4296
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 4296.txt
>
>
> HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
> That method was created to allow our scanning of .META. (see HBASE-2600).
> Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
> performant that a user of HTable will not be aware of.
> I propose deprecating this in the public interface in 0.92 and removing it 
> from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
> will still remain as internal interface for scanning meta.
> Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4462) Properly treating SocketTimeoutException

2011-09-22 Thread Jean-Daniel Cryans (JIRA)
Properly treating SocketTimeoutException


 Key: HBASE-4462
 URL: https://issues.apache.org/jira/browse/HBASE-4462
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.0


SocketTimeoutException is currently treated like any IOE inside of 
HCM.getRegionServerWithRetries and I think this is a problem. This method 
should only do retries in cases where we are pretty sure the operation will 
complete, but with STE we already waited for (by default) 60 seconds and 
nothing happened.

I found this while debugging Douglas Campbell's problem on the mailing list 
where it seemed like he was using the same scanner from multiple threads, but 
actually it was just the same client doing retries while the first run didn't 
even finish yet (that's another problem). You could see the first scanner, then 
up to two other handlers waiting for it to finish in order to run (because of 
the synchronization on RegionScanner).

So what should we do? We could treat STE as a DoNotRetryException and let the 
client deal with it, or we could retry only once.

There's also the option of having a different behavior for get/put/icv/scan, 
the issue with operations that modify a cell is that you don't know if the 
operation completed or not (same when a RS dies hard after completing let's say 
a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4449:
--

Summary: LoadIncrementalHFiles should be able to handle CFs with blooms  
(was: LoadIncrementalHFiles can't handle CFs with blooms)

> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: David Revell
>Assignee: David Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-22 Thread Karthik Ranganathan (JIRA)
Run more aggressive compactions during off peak hours
-

 Key: HBASE-4463
 URL: https://issues.apache.org/jira/browse/HBASE-4463
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Karthik Ranganathan
Assignee: Karthik Ranganathan


The number of iops on the disk and the top of the rack bandwidth utilization at 
off peak hours is much lower than at peak hours depending on the application 
usage pattern. We can utilize this knowledge to improve the performance of the 
HBase cluster by increasing the compact selection ratio to a much larger value 
during off-peak hours than otherwise - increasing hbase.hstore.compaction.ratio 
(1.3 default) to hbase.hstore.compaction.ratio.offpeak (5 default). This will 
help reduce the average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112947#comment-13112947
 ] 

Ted Yu commented on HBASE-4449:
---

HBASE-4449-trunk-testsonly.patch has been integrated to TRUNK and 0.92

Thanks for the patch David.

> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: David Revell
>Assignee: David Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112949#comment-13112949
 ] 

Ted Yu commented on HBASE-4463:
---

This is a great idea.
How do we determine the off peak hours ?

> Run more aggressive compactions during off peak hours
> -
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization 
> at off peak hours is much lower than at peak hours depending on the 
> application usage pattern. We can utilize this knowledge to improve the 
> performance of the HBase cluster by increasing the compact selection ratio to 
> a much larger value during off-peak hours than otherwise - increasing 
> hbase.hstore.compaction.ratio (1.3 default) to 
> hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
> average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4464) Make region balancing parallel with balancer.balanceCluster()

2011-09-22 Thread Ted Yu (JIRA)
Make region balancing parallel with balancer.balanceCluster()
-

 Key: HBASE-4464
 URL: https://issues.apache.org/jira/browse/HBASE-4464
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu


balancer.balanceCluster() generates RegionPlans for HMaster.balance() to 
execute.
We don't retract any RegionPlan in balancer.balanceCluster().
In the near future, more complex algorithm would be introduced to try achieving 
maximum block location affinity for the regions to be moved. This means 
balancer.balanceCluster() would take longer to return.

This JIRA makes region balancing parallel with balancer.balanceCluster()
Meaning region balancing would be performed when balancer.balanceCluster() is 
still running.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112965#comment-13112965
 ] 

Jean-Daniel Cryans commented on HBASE-4131:
---

Looks good to me, but even patch v2 is incomplete right? I'm pretty sure some 
tests don't even compile.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112971#comment-13112971
 ] 

Gary Helmling commented on HBASE-4460:
--

.bq Open to your feedback on what I can do to better integrate with security 
stuff but not sure what I can do at this point.

For the current patch on HBASE-4099, I think not much other than make sure we 
have a way of flagging that the ThriftServer is embedded so we skip the login.  
Though in that case I can't picture wanting to do embedded thrift + security at 
the same time, since all thrift clients would have effective access as the 
region server process user (circumventing security).

The embedded thrift server + login + security might all work together if we:
* add a User.loginAndReturnUser() variant that delegates to 
UserGroupInformation.loginUserFromKeytabAndReturnUGI(), then returns a wrapping 
User instance
* call this method on startup for the embedded thrift server to get the thrift 
user instance
* use User.runAs() to execute the body of HRegionThriftServer.run() as the 
logged in thrift user

In any case, all of that seems like it should go in a separate JIRA.

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4427) It would help to run a standalone HBase's ZK on a different port

2011-09-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112977#comment-13112977
 ] 

Jean-Daniel Cryans commented on HBASE-4427:
---

For a moment I liked the embeddedClientPort idea, but then if you have an 
external client it means you have to change its default config in order to talk 
to a cluster that has a managed ZK, and that port wouldn't be easy to find. 
There's also the problem of those already running that kind of setup that all 
of a sudden would need to change their configs.

I'm -1.

Something to consider would be refusing to start HBase if while starting it's 
own ZK it gets a port binding exception. I'm not sure what the current behavior 
is.

> It would help to run a standalone HBase's ZK on a different port
> 
>
> Key: HBASE-4427
> URL: https://issues.apache.org/jira/browse/HBASE-4427
> Project: HBase
>  Issue Type: Improvement
>  Components: zookeeper
>Affects Versions: 0.90.4
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
>
> It would be extremely helpful to have standalone HBase default to a 
> non-standard port for running its embedded ZK. This would help to run HBase 
> on the same host where a legitimate fully distributed ZK server, etc.
> It seems that the following addition to hbase-default.xml would be enough to 
> make it happen:
> {noformat}
> +  
> +hbase.zookeeper.property.clientPort
> +4181
> +  
> {noformat}
> This will take care of the master/client for HBase and can be overridden in 
> hbase-site if needed.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4427) It would help to run a standalone HBase's ZK on a different port

2011-09-22 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112983#comment-13112983
 ] 

Roman Shaposhnik commented on HBASE-4427:
-

@Jean-Daniel Cryans 

> but then if you have an external client it means you have to 
> change its default config in order to talk to a cluster that has a managed 
> ZK, 

how "external" of a client are we talking here? running on the same host or 
remote?

> Something to consider would be refusing to start HBase if while starting it's 
> own ZK it gets a port binding exception.

I believe that's the current behavior, which is, in fact, part of the problem
with out-of-the-box experience on nodes where there's an already running ZK



> It would help to run a standalone HBase's ZK on a different port
> 
>
> Key: HBASE-4427
> URL: https://issues.apache.org/jira/browse/HBASE-4427
> Project: HBase
>  Issue Type: Improvement
>  Components: zookeeper
>Affects Versions: 0.90.4
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
>
> It would be extremely helpful to have standalone HBase default to a 
> non-standard port for running its embedded ZK. This would help to run HBase 
> on the same host where a legitimate fully distributed ZK server, etc.
> It seems that the following addition to hbase-default.xml would be enough to 
> make it happen:
> {noformat}
> +  
> +hbase.zookeeper.property.clientPort
> +4181
> +  
> {noformat}
> This will take care of the master/client for HBase and can be overridden in 
> hbase-site if needed.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112984#comment-13112984
 ] 

Jonathan Gray commented on HBASE-4460:
--

Gary, want to open another JIRA and link it here?

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4460) Support running an embedded ThriftServer within a RegionServer

2011-09-22 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112985#comment-13112985
 ] 

Gary Helmling commented on HBASE-4460:
--

Also just to second what Andy said for longer-term, we'll still want to somehow 
provide SASL auth for thrift clients, with the thrift server acting as a proxy 
on their behalf, but that seems a much bigger project.

> Support running an embedded ThriftServer within a RegionServer
> --
>
> Key: HBASE-4460
> URL: https://issues.apache.org/jira/browse/HBASE-4460
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4460-v1.patch
>
>
> Rather than a separate process, it can be advantageous in some situations for 
> each RegionServer to embed their own ThriftServer.  This allows each embedded 
> ThriftServer to short-circuit any queries that should be executed on the 
> local RS and skip the extra hop.  This then enables the building of fat 
> Thrift clients that cache region locations and avoid extra hops all together.
> This JIRA is just about the embedded ThriftServer.  Will open others for the 
> rest.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112988#comment-13112988
 ] 

stack commented on HBASE-4452:
--

+1

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112987#comment-13112987
 ] 

Jonathan Gray commented on HBASE-4131:
--

@JD, everything compiles for me.  What are you seeing?

Running test suite now.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13112990#comment-13112990
 ] 

Jean-Daniel Cryans commented on HBASE-4131:
---

TestReplicationSourceManager instantiates a Replication object and that patch 
changes its constructor.

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4452:
-

Fix Version/s: 0.92.0

lgtm.  nice catch.  pulling in to 0.92

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.92.0
>
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4452:
--

Fix Version/s: (was: 0.92.0)
   0.90.5

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4452:
--

Attachment: 4452.90

Patch for 0.90 branch.

TestOpenRegionHandler passes.

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113010#comment-13113010
 ] 

Ted Yu commented on HBASE-4452:
---

Integrated to 0.90, 0.92 and TRUNK.

Thanks for the patch Ramkrishna.

Thanks for the review Stack and Jonathan.

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4452:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113011#comment-13113011
 ] 

Jonathan Gray commented on HBASE-4462:
--

+1 on treating STE differently.  I think we should treat it as DNRE and kick it 
back to the client.  There could be a configurable policy for socket timeouts 
(or network level errors in general?) if some people want the HBase client to 
retry once or something.

> Properly treating SocketTimeoutException
> 
>
> Key: HBASE-4462
> URL: https://issues.apache.org/jira/browse/HBASE-4462
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
>
> SocketTimeoutException is currently treated like any IOE inside of 
> HCM.getRegionServerWithRetries and I think this is a problem. This method 
> should only do retries in cases where we are pretty sure the operation will 
> complete, but with STE we already waited for (by default) 60 seconds and 
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list 
> where it seemed like he was using the same scanner from multiple threads, but 
> actually it was just the same client doing retries while the first run didn't 
> even finish yet (that's another problem). You could see the first scanner, 
> then up to two other handlers waiting for it to finish in order to run 
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the 
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, 
> the issue with operations that modify a cell is that you don't know if the 
> operation completed or not (same when a RS dies hard after completing let's 
> say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4462) Properly treating SocketTimeoutException

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113015#comment-13113015
 ] 

Ted Yu commented on HBASE-4462:
---

+1 on treating socket timeout as DoNotRetryException.

> Properly treating SocketTimeoutException
> 
>
> Key: HBASE-4462
> URL: https://issues.apache.org/jira/browse/HBASE-4462
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.4
>Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
>
> SocketTimeoutException is currently treated like any IOE inside of 
> HCM.getRegionServerWithRetries and I think this is a problem. This method 
> should only do retries in cases where we are pretty sure the operation will 
> complete, but with STE we already waited for (by default) 60 seconds and 
> nothing happened.
> I found this while debugging Douglas Campbell's problem on the mailing list 
> where it seemed like he was using the same scanner from multiple threads, but 
> actually it was just the same client doing retries while the first run didn't 
> even finish yet (that's another problem). You could see the first scanner, 
> then up to two other handlers waiting for it to finish in order to run 
> (because of the synchronization on RegionScanner).
> So what should we do? We could treat STE as a DoNotRetryException and let the 
> client deal with it, or we could retry only once.
> There's also the option of having a different behavior for get/put/icv/scan, 
> the issue with operations that modify a cell is that you don't know if the 
> operation completed or not (same when a RS dies hard after completing let's 
> say a Put but just before returning to the client).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-22 Thread Karthik Ranganathan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Ranganathan updated HBASE-4463:
---

Description: The number of iops on the disk and the top of the rack 
bandwidth utilization at off peak hours is much lower than at peak hours 
depending on the application usage pattern. We can utilize this knowledge to 
improve the performance of the HBase cluster by increasing the compact 
selection ratio to a much larger value during off-peak hours than otherwise - 
increasing hbase.hstore.compaction.ratio (1.2 default) to 
hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
average number of files per store.  (was: The number of iops on the disk and 
the top of the rack bandwidth utilization at off peak hours is much lower than 
at peak hours depending on the application usage pattern. We can utilize this 
knowledge to improve the performance of the HBase cluster by increasing the 
compact selection ratio to a much larger value during off-peak hours than 
otherwise - increasing hbase.hstore.compaction.ratio (1.3 default) to 
hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
average number of files per store.)

> Run more aggressive compactions during off peak hours
> -
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization 
> at off peak hours is much lower than at peak hours depending on the 
> application usage pattern. We can utilize this knowledge to improve the 
> performance of the HBase cluster by increasing the compact selection ratio to 
> a much larger value during off-peak hours than otherwise - increasing 
> hbase.hstore.compaction.ratio (1.2 default) to 
> hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
> average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-22 Thread Karthik Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113017#comment-13113017
 ] 

Karthik Ranganathan commented on HBASE-4463:


Initially we are going to specify a start and stop for off peak hours... a more 
automatic detection based on response latencies and data read/transferred could 
be done, but is much harder to get right.

> Run more aggressive compactions during off peak hours
> -
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization 
> at off peak hours is much lower than at peak hours depending on the 
> application usage pattern. We can utilize this knowledge to improve the 
> performance of the HBase cluster by increasing the compact selection ratio to 
> a much larger value during off-peak hours than otherwise - increasing 
> hbase.hstore.compaction.ratio (1.3 default) to 
> hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
> average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4427) It would help to run a standalone HBase's ZK on a different port

2011-09-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113021#comment-13113021
 ] 

Jean-Daniel Cryans commented on HBASE-4427:
---

bq. how "external" of a client are we talking here? running on the same host or 
remote?

Anything that doesn't use HBase's own configuration. Actually even a local 
client wouldn't know which port to look for since the config for whether HBase 
manages zk is in hbase-env.sh

bq. I believe that's the current behavior, which is, in fact, part of the 
problem with out-of-the-box experience on nodes where there's an already 
running ZK

You should use it if it's already there :)

> It would help to run a standalone HBase's ZK on a different port
> 
>
> Key: HBASE-4427
> URL: https://issues.apache.org/jira/browse/HBASE-4427
> Project: HBase
>  Issue Type: Improvement
>  Components: zookeeper
>Affects Versions: 0.90.4
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
>
> It would be extremely helpful to have standalone HBase default to a 
> non-standard port for running its embedded ZK. This would help to run HBase 
> on the same host where a legitimate fully distributed ZK server, etc.
> It seems that the following addition to hbase-default.xml would be enough to 
> make it happen:
> {noformat}
> +  
> +hbase.zookeeper.property.clientPort
> +4181
> +  
> {noformat}
> This will take care of the master/client for HBase and can be overridden in 
> hbase-site if needed.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113023#comment-13113023
 ] 

stack commented on HBASE-4131:
--

nit: Have replication constants in the replication package rather than up in 
HConstants (do they need to be there?)

nit: Looks like you are importing a bunch of crud you are not using in 
Interfaces at least.

nit: Do you need to pass the filesystem on initialize of ReplicationService?  
You can do:

FileSystem fs = FileSystem.get(server.getConfiguration());

Else +1 on patch if tests pass (Above nits can be fixed on commit)

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master

2011-09-22 Thread Eugene Koontz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4070:
-

Status: Patch Available  (was: Open)

This patch requires HBASE-4014 (as noted in "Issue Links")

> [Coprocessors] Improve region server metrics to report loaded coprocessors to 
> master
> 
>
> Key: HBASE-4070
> URL: https://issues.apache.org/jira/browse/HBASE-4070
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Mingjie Lai
>Assignee: Eugene Koontz
> Attachments: HBASE-4070.patch
>
>
> HBASE-3512 is about listing loaded cp classes at shell. To make it more 
> generic, we need a way to report this piece of information from region to 
> master (or just at region server level). So later on, we can display the 
> loaded class names at shell as well as web console. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master

2011-09-22 Thread Eugene Koontz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koontz updated HBASE-4070:
-

Attachment: HBASE-4070.patch

> [Coprocessors] Improve region server metrics to report loaded coprocessors to 
> master
> 
>
> Key: HBASE-4070
> URL: https://issues.apache.org/jira/browse/HBASE-4070
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Mingjie Lai
>Assignee: Eugene Koontz
> Attachments: HBASE-4070.patch
>
>
> HBASE-3512 is about listing loaded cp classes at shell. To make it more 
> generic, we need a way to report this piece of information from region to 
> master (or just at region server level). So later on, we can display the 
> loaded class names at shell as well as web console. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4427) It would help to run a standalone HBase's ZK on a different port

2011-09-22 Thread Roman Shaposhnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113035#comment-13113035
 ] 

Roman Shaposhnik commented on HBASE-4427:
-

@Jean-Daniel Cryans 

If by local client you mean things like hbase shell -- that works quite nicely 
with the proposed change.

> It would help to run a standalone HBase's ZK on a different port
> 
>
> Key: HBASE-4427
> URL: https://issues.apache.org/jira/browse/HBASE-4427
> Project: HBase
>  Issue Type: Improvement
>  Components: zookeeper
>Affects Versions: 0.90.4
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
>
> It would be extremely helpful to have standalone HBase default to a 
> non-standard port for running its embedded ZK. This would help to run HBase 
> on the same host where a legitimate fully distributed ZK server, etc.
> It seems that the following addition to hbase-default.xml would be enough to 
> make it happen:
> {noformat}
> +  
> +hbase.zookeeper.property.clientPort
> +4181
> +  
> {noformat}
> This will take care of the master/client for HBase and can be overridden in 
> hbase-site if needed.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113034#comment-13113034
 ] 

stack commented on HBASE-4296:
--

@Jon We want to replace it with something more performant.  We don't have the 
alternative at the moment.  The deprecate was to frighten folks away and have 
it so this was then an internal method only.  Then in 0.94 we could swap in the 
alternative.  Can you shim this in your thrift?  I suppose you can't if you 
want to do the logic for regionserving over in your client that is on the other 
side of thrift.  And if you are building clients that come to depend on this, 
they'll have to be moved to the new method when the new method shows up.

This issue is for 0.92.  Will your fat thrift client ship against 0.92 or 0.94? 
 If 0.92, its there.  If 0.94, perhaps hbase-2600 needs to happen sooner rather 
than later? (Lars?)

> Deprecate HTable[Interface].getRowOrBefore(...)
> ---
>
> Key: HBASE-4296
> URL: https://issues.apache.org/jira/browse/HBASE-4296
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 4296.txt
>
>
> HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
> That method was created to allow our scanning of .META. (see HBASE-2600).
> Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
> performant that a user of HTable will not be aware of.
> I propose deprecating this in the public interface in 0.92 and removing it 
> from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
> will still remain as internal interface for scanning meta.
> Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113037#comment-13113037
 ] 

Jonathan Gray commented on HBASE-4296:
--

We are already using the fat thrift client on our 0.90 branch.  I'm in the 
process of pushing this all out into open source so we can then pull it back in 
to our 0.92 based branch.  I'm happy to put this stuff into 0.92 in Apache as 
well but it's somewhat featurish :)

Was the method removed in 0.94 already?  Can we just hold off on removing it 
into 2600 happens and that way it won't matter and we can commit it anywhere.  
Following 2600 we can modify how it works and just use a normal scanner then?

> Deprecate HTable[Interface].getRowOrBefore(...)
> ---
>
> Key: HBASE-4296
> URL: https://issues.apache.org/jira/browse/HBASE-4296
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 4296.txt
>
>
> HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
> That method was created to allow our scanning of .META. (see HBASE-2600).
> Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
> performant that a user of HTable will not be aware of.
> I propose deprecating this in the public interface in 0.92 and removing it 
> from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
> will still remain as internal interface for scanning meta.
> Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4463) Run more aggressive compactions during off peak hours

2011-09-22 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113040#comment-13113040
 ] 

dhruba borthakur commented on HBASE-4463:
-

Can we do something like ability to throttle the max bandwidth/server allowed 
for compaction? (A similar philosophy is used the HDFS to ensure that 
background replication does not swamp the network).

> Run more aggressive compactions during off peak hours
> -
>
> Key: HBASE-4463
> URL: https://issues.apache.org/jira/browse/HBASE-4463
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Karthik Ranganathan
>Assignee: Karthik Ranganathan
>
> The number of iops on the disk and the top of the rack bandwidth utilization 
> at off peak hours is much lower than at peak hours depending on the 
> application usage pattern. We can utilize this knowledge to improve the 
> performance of the HBase cluster by increasing the compact selection ratio to 
> a much larger value during off-peak hours than otherwise - increasing 
> hbase.hstore.compaction.ratio (1.2 default) to 
> hbase.hstore.compaction.ratio.offpeak (5 default). This will help reduce the 
> average number of files per store.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4461) Expose getRowOrBefore via Thrift

2011-09-22 Thread Jonathan Gray (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4461:
-

Attachment: HBASE-4461-v2.patch

Adds getRowOrBefore() exposed to Thrift.  Also adds server name and port to 
TRegionInfo so we can get assignment info through existing APIs in Thrift.

> Expose getRowOrBefore via Thrift
> 
>
> Key: HBASE-4461
> URL: https://issues.apache.org/jira/browse/HBASE-4461
> Project: HBase
>  Issue Type: Improvement
>  Components: thrift
>Reporter: Jonathan Gray
>Assignee: Jonathan Gray
> Attachments: HBASE-4461-v2.patch
>
>
> In order for fat Thrift-based clients to locate region locations they need to 
> utilize the getRowOrBefore method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-2742) Provide strong authentication with a secure RPC engine

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113044#comment-13113044
 ] 

Ted Yu commented on HBASE-2742:
---

When I ran the test suite, I got:
{code}
initializationError(org.apache.hadoop.hbase.security.token.TestTokenAuthentication)
  Time elapsed: 0.005 sec  <<< ERROR!
java.lang.NoClassDefFoundError: org/apache/hadoop/security/token/SecretManager
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
{code}

> Provide strong authentication with a secure RPC engine
> --
>
> Key: HBASE-2742
> URL: https://issues.apache.org/jira/browse/HBASE-2742
> Project: HBase
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Gary Helmling
>
> The HBase RPC code (org.apache.hadoop.hbase.ipc.*) was originally forked off 
> of Hadoop RPC classes, with some performance tweaks added.  Those 
> optimizations have come at a cost in keeping up with Hadoop RPC changes 
> however, both bug fixes and improvements/new features.  
> In particular, this impacts how we implement security features in HBase (see 
> HBASE-1697 and HBASE-2016).  The secure Hadoop implementation (HADOOP-4487) 
> relies heavily on RPC changes to support client authentication via kerberos 
> and securing and mutual authentication of client/server connections via SASL. 
>  Making use of the built-in Hadoop RPC classes will gain us these pieces for 
> free in a secure HBase.
> So, I'm proposing that we drop the HBase forked version of RPC and convert to 
> direct use of Hadoop RPC, while working to contribute important fixes back 
> upstream to Hadoop core.  Based on a review of the HBase RPC changes, the key 
> divergences seem to be:
> HBaseClient:
>  - added use of TCP keepalive (HBASE-1754)
>  - made connection retries and sleep configurable (HBASE-1815)
>  - prevent NPE if socket == null due to creation failure (HBASE-2443)
> HBaseRPC:
>  - mapping of method names <-> codes (removed in HBASE-2219)
> HBaseServer:
>  - use of TCP keep alives (HBASE-1754)
>  - OOME in server does not trigger abort (HBASE-1198)
> HbaseObjectWritable:
>  - allows List<> serialization
>  - includes it's own class <-> code mapping (HBASE-328)
> Proposed process is:
> 1. open issues with patches on Hadoop core for important fixes/adjustments 
> from HBase RPC (HBASE-1198, HBASE-1815, HBASE-1754, HBASE-2443, plus a 
> pluggable ObjectWritable implementation in RPC.Invocation to allow use of 
> HbaseObjectWritable).
> 2. ship a Hadoop version with RPC patches applied -- ideally we should avoid 
> another copy-n-paste code fork, subject to ability to isolate changes from 
> impacting Hadoop internal RPC wire formats
> 3. if all Hadoop core patches are applied we can drop back to a plain vanilla 
> Hadoop version
> I realize there are many different opinions on how to proceed with HBase RPC, 
> so I'm hoping this issue will kick off a discussion on what the best approach 
> might be.  My own motivation is maximizing re-use of the authentication and 
> connection security work that's already gone into Hadoop core.  I'll put 
> together a set of patches around #1 and #2, but obviously we need some 
> consensus around this to move forward.  If I'm missing other differences 
> between HBase and Hadoop RPC, please list as well.  Discuss!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4070) [Coprocessors] Improve region server metrics to report loaded coprocessors to master

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113049#comment-13113049
 ] 

jirapos...@reviews.apache.org commented on HBASE-4070:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2029/
---

Review request for hbase and Mingjie Lai.


Summary
---

Proposed fix for HBASE-4070. Relies on functionality provided by HBASE-4014.


This addresses bug HBASE-4070.
https://issues.apache.org/jira/browse/HBASE-4070


Diffs
-

  src/main/java/org/apache/hadoop/hbase/HServerLoad.java 0c680e4 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java bff1f6c 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestCoprocessorEndpoint.java 
a8f2a9c 

Diff: https://reviews.apache.org/r/2029/diff


Testing
---

One new test : testServerManagerCoprocessorReport, added to 
src/test/java/o.a.h.h/coprocessor/TestCoprocessorEndpoint.java.


Thanks,

Eugene



> [Coprocessors] Improve region server metrics to report loaded coprocessors to 
> master
> 
>
> Key: HBASE-4070
> URL: https://issues.apache.org/jira/browse/HBASE-4070
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.3
>Reporter: Mingjie Lai
>Assignee: Eugene Koontz
> Attachments: HBASE-4070.patch
>
>
> HBASE-3512 is about listing loaded cp classes at shell. To make it more 
> generic, we need a way to report this piece of information from region to 
> master (or just at region server level). So later on, we can display the 
> loaded class names at shell as well as web console. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4427) It would help to run a standalone HBase's ZK on a different port

2011-09-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113050#comment-13113050
 ] 

Jean-Daniel Cryans commented on HBASE-4427:
---

I was referring to the embeddedClientPort idea, the idea of setting 
hbase.zookeeper.property.clientPort to a new default breaks about every setup I 
know of.

> It would help to run a standalone HBase's ZK on a different port
> 
>
> Key: HBASE-4427
> URL: https://issues.apache.org/jira/browse/HBASE-4427
> Project: HBase
>  Issue Type: Improvement
>  Components: zookeeper
>Affects Versions: 0.90.4
>Reporter: Roman Shaposhnik
>Assignee: Roman Shaposhnik
>Priority: Minor
>
> It would be extremely helpful to have standalone HBase default to a 
> non-standard port for running its embedded ZK. This would help to run HBase 
> on the same host where a legitimate fully distributed ZK server, etc.
> It seems that the following addition to hbase-default.xml would be enough to 
> make it happen:
> {noformat}
> +  
> +hbase.zookeeper.property.clientPort
> +4181
> +  
> {noformat}
> This will take care of the master/client for HBase and can be overridden in 
> hbase-site if needed.
> Thoughts?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113052#comment-13113052
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/
---

(Updated 2011-09-23 00:15:11.441661)


Review request for hbase.


Changes
---

Thanks folks for the review. I have fixed all the issues raised. The fix for 
the test failure is to make sure the deadserver-in-progress count is correct by 
calling this.deadServers.add(serverName) before MetaServerShutdownHandler 
schedules ServerShutdownHandler.

 


Summary
---

1. Add more logging.
2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
4. Check for the latest -ROOT- and .META. region location during the handling 
of server shutdown.
5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, don't 
block and wait for .META. availability. Resubmit another ServerShutdownHandler 
for regular regions.


This addresses bug HBASE-4455.
https://issues.apache.org/jira/browse/HBASE-4455


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
 1172205 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
 1172205 

Diff: https://reviews.apache.org/r/2007/diff


Testing
---

Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
RS2, wait for 2 seconds, etc. The program can run for couple hours until it 
stops. -ROOT- and .META. are available during that time.


Thanks,

Ming



> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
> AssignmentManager
> --
>
> Key: HBASE-4455
> URL: https://issues.apache.org/jira/browse/HBASE-4455
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
> wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
> RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
> regions aren't in "regions in transtion" from AssignmentManager point of 
> view, but they aren't assigned to any regions. Here are the issues.
> 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
> invoked to check if it contains -ROOT- region. That is due to long delay from 
> ZK notification and async nature of the system. Here is an example, even 
> though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
> T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
> still points to old server sea-lab-3,60020,1316380037898.
> T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:6
> -0x1327e43175e Retrieved 29 byte(s) of data from znode 
> /hbase/root-regio
> n-server and set watcher; sea-lab-3,60020,1316380037898
>

[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113053#comment-13113053
 ] 

Hudson commented on HBASE-4449:
---

Integrated in HBase-0.92 #14 (See 
[https://builds.apache.org/job/HBase-0.92/14/])
HBASE-4449  LoadIncrementalHFiles should be able to handle CFs with blooms
   (David Revell)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java


> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: David Revell
>Assignee: David Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4131:


Attachment: replicationInterface3.txt

I removed redundant imports, rearranged the order of the constructors in 
Replication.java .

I left the const in HBaseConstants.java because they are needed by HBase even 
if no replication moduleis configured. But please feel free to put them 
somewhere else (at time of commit) if you so desire.

I also left the FileSystem object as a parameter into the initialize call, 
otherwise more code changes will be required in the Replication module. I can 
do that as a separate jira if you so desire (keeps this jira easier to review).

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt, 
> replicationInterface3.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-4131:


Release Note: The Replication Interface can be used to plug in external 
software for the purpose of cluster-to-cluster HBase replication.
  Status: Patch Available  (was: Open)

> Make the Replication Service pluggable via a standard interface definition
> --
>
> Key: HBASE-4131
> URL: https://issues.apache.org/jira/browse/HBASE-4131
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: replicationInterface1.txt, replicationInterface2.txt, 
> replicationInterface3.txt
>
>
> The current HBase code supports a replication service that can be used to 
> sync data from from one hbase cluster to another. It would be nice to make it 
> a pluggable interface so that other cross-data-center replication services 
> can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3833) ability to support includes/excludes list in Hbase

2011-09-22 Thread dhruba borthakur (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HBASE-3833:


Status: Open  (was: Patch Available)

Not ready yet.

> ability to support includes/excludes list in Hbase
> --
>
> Key: HBASE-3833
> URL: https://issues.apache.org/jira/browse/HBASE-3833
> Project: HBase
>  Issue Type: Improvement
>  Components: client, regionserver
>Affects Versions: 0.90.2
>Reporter: dhruba borthakur
>Assignee: dhruba borthakur
> Attachments: excl-patch.txt, excl-patch.txt
>
>
> An HBase cluster currently does not have the ability to specify that the 
> master should accept regionservers only from a specified list. This helps 
> preventing administrative errors where the same machine could be included in 
> two clusters. It also allows the administrator to easily remove un-ssh-able 
> machines from the cluster.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4296) Deprecate HTable[Interface].getRowOrBefore(...)

2011-09-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113083#comment-13113083
 ] 

Lars Hofhansl commented on HBASE-4296:
--

I wonder if we should just rename it to scanMeta or something and force it to 
work on .Meta. only.
Then nobody would use it, unless it's used for Meta scanning, and we can then 
change the implementation. 




> Deprecate HTable[Interface].getRowOrBefore(...)
> ---
>
> Key: HBASE-4296
> URL: https://issues.apache.org/jira/browse/HBASE-4296
> Project: HBase
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.92.0
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Trivial
> Fix For: 0.92.0
>
> Attachments: 4296.txt
>
>
> HTable's getRowOrBefore(...) internally calls into Store.getRowKeyAtOrBefore. 
> That method was created to allow our scanning of .META. (see HBASE-2600).
> Store.getRowKeyAtOrBefore(...) lists a bunch of requirements for this to be 
> performant that a user of HTable will not be aware of.
> I propose deprecating this in the public interface in 0.92 and removing it 
> from the public interface in 0.94. If we don't get to HBASE-2600 in 0.94 it 
> will still remain as internal interface for scanning meta.
> Comments?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4153) Handle RegionAlreadyInTransitionException in AssignmentManager

2011-09-22 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4153:
--

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

> Handle RegionAlreadyInTransitionException in AssignmentManager
> --
>
> Key: HBASE-4153
> URL: https://issues.apache.org/jira/browse/HBASE-4153
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.0
>Reporter: Jean-Daniel Cryans
>Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.0
>
> Attachments: 4153-v3.txt, HBASE-4153_1.patch, HBASE-4153_2.patch, 
> HBASE-4153_3.patch, HBASE-4153_4.patch, HBASE-4153_5.patch, 
> HBASE-4153_6.patch, HBASE-4153_addendum.patch
>
>
> Comment from Stack over in HBASE-3741:
> {quote}
> Question: Looking at this patch again, if we throw a 
> RegionAlreadyInTransitionException, won't we just assign the region elsewhere 
> though RegionAlreadyInTransitionException in at least one case here is saying 
> that the region is already open on this regionserver?
> {quote}
> Indeed looking at the code it's going to be handled the same way other 
> exceptions are. Need to add special cases for assign and unassign.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4014) Coprocessors: Flag the presence of coprocessors in logged exceptions

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113098#comment-13113098
 ] 

jirapos...@reviews.apache.org commented on HBASE-4014:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/969/
---

(Updated 2011-09-23 01:39:08.251129)


Review request for hbase, Gary Helmling and Mingjie Lai.


Changes
---

Updated patch that addresses Gary's remarks:
-use fail() rather than assertFalse(..,true);
-use RegionServer.isAborted() rather than Zookeeper Watches.
-use X.class.getName() to get coprocessor's names to improve test robustness.


Summary
---

https://issues.apache.org/jira/browse/HBASE-4014 Coprocessors: Flag the 
presence of coprocessors in logged exceptions

The general gist here is to wrap each of {Master,RegionServer}CoprocessorHost's 
coprocessor call inside a 

"try { ... } catch (Throwable e) { handleCoprocessorThrowable(e) }"

block. 

handleCoprocessorThrowable() is responsible for either passing 'e' along to the 
client (if 'e' is an IOException) or, otherwise, aborting the service 
(Regionserver or Master).

The abort message contains a list of the loaded coprocessors for crash analysis.


This addresses bug HBASE-4014.
https://issues.apache.org/jira/browse/HBASE-4014


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java 
4e492e1 
  src/main/java/org/apache/hadoop/hbase/master/HMaster.java 06bf814 
  src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java 
0c95017 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java bff1f6c 
  src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java 
a6cf6a8 
  src/main/resources/hbase-default.xml 2c8f44b 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithAbort.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterCoprocessorExceptionWithRemove.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
 PRE-CREATION 
  
src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/969/diff


Testing
---

patch includes two tests:

TestMasterCoprocessorException.java
TestRegionServerCoprocessorException.java

both tests pass in my build environment.


Thanks,

Eugene



> Coprocessors: Flag the presence of coprocessors in logged exceptions
> 
>
> Key: HBASE-4014
> URL: https://issues.apache.org/jira/browse/HBASE-4014
> Project: HBase
>  Issue Type: Improvement
>  Components: coprocessors
>Reporter: Andrew Purtell
>Assignee: Eugene Koontz
> Fix For: 0.92.0
>
> Attachments: HBASE-4014.patch, HBASE-4014.patch, HBASE-4014.patch, 
> HBASE-4014.patch, HBASE-4014.patch
>
>
> For some initial triage of bug reports for core versus for deployments with 
> loaded coprocessors, we need something like the Linux kernel's taint flag, 
> and list of linked in modules that show up in the output of every OOPS, to 
> appear above or below exceptions that appear in the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4449) LoadIncrementalHFiles should be able to handle CFs with blooms

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113108#comment-13113108
 ] 

Hudson commented on HBASE-4449:
---

Integrated in HBase-TRUNK #2244 (See 
[https://builds.apache.org/job/HBase-TRUNK/2244/])
HBASE-4449  LoadIncrementalHFiles should be able to handle CFs with blooms
   (David Revell)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java


> LoadIncrementalHFiles should be able to handle CFs with blooms
> --
>
> Key: HBASE-4449
> URL: https://issues.apache.org/jira/browse/HBASE-4449
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.4
>Reporter: David Revell
>Assignee: David Revell
> Fix For: 0.90.5
>
> Attachments: HBASE-4449-trunk-testsonly.patch, HBASE-4449-v2.patch, 
> HBASE-4449.patch
>
>
> When LoadIncrementalHFiles loads a store file that crosses region boundaries, 
> it will split the file at the boundary to create two store files. If the 
> store file is for a column family that has a bloom filter, then a 
> "java.lang.ArithmeticException: / by zero" will be raised because 
> ByteBloomFilter() is called with maxKeys of 0.
> The included patch assumes that the number of keys in each split child will 
> be equal to the number of keys in the parent's bloom filter (instead of 0). 
> This is an overestimate, but it's safe and easy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113109#comment-13113109
 ] 

Hudson commented on HBASE-4452:
---

Integrated in HBase-TRUNK #2244 (See 
[https://builds.apache.org/job/HBase-TRUNK/2244/])
HBASE-4452  Possibility of RS opening a region though tickleOpening fails 
due to
   znode version mismatch (Ramkrishna)

tedyu : 
Files : 
* /hbase/trunk/CHANGES.txt
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113123#comment-13113123
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2033
---

Ship it!


One minor comment. Can be fixed at time of commit.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java


addressFromZK != null can be omitted - it is the condition of if block.


- Ted


On 2011-09-23 00:15:11, Ming Ma wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  ---
bq.  
bq.  (Updated 2011-09-23 00:15:11)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the 
handling of server shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, 
don't block and wait for .META. availability. Resubmit another 
ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.  https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/MetaServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/CloseRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperNodeTracker.java
 1172205 
bq.  
bq.  Diff: https://reviews.apache.org/r/2007/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Keep Master up all the time, do rolling restart of RSs like this - stop 
RS1, wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, 
start RS2, wait for 2 seconds, etc. The program can run for couple hours until 
it stops. -ROOT- and .META. are available during that time.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Ming
bq.  
bq.



> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
> AssignmentManager
> --
>
> Key: HBASE-4455
> URL: https://issues.apache.org/jira/browse/HBASE-4455
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
> wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
> RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
> regions aren't in "regions in transtion" from AssignmentManager point of 
> view, but they aren't assigned to any regions. Here are the issues.
> 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
> invoked to check if it contains -ROOT- region. That is due to 

[jira] [Created] (HBASE-4465) Lazy-seek optimization for StoreFile scanners

2011-09-22 Thread Mikhail Bautin (JIRA)
Lazy-seek optimization for StoreFile scanners
-

 Key: HBASE-4465
 URL: https://issues.apache.org/jira/browse/HBASE-4465
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
 Fix For: 0.92.0, 0.94.0, 0.89.20100924


Previously, if we had several StoreFiles for a column family in a region, we 
would seek in each of them and only then merge the results, even though the 
row/column we are looking for might only be in the most recent (and the 
smallest) file. Now we prioritize our reads from those files so that we check 
the most recent file first. This is done by doing a "lazy seek" which pretends 
that the next value in the StoreFile is (seekRow, seekColumn, 
lastTimestampInStoreFile), which is earlier in the KV order than anything that 
might actually occur in the file. So if we don't find the result in earlier 
files, that fake KV will bubble up to the top of the KV heap and a real seek 
will be done. This is expected to significantly reduce the amount of disk IO 
(as of 09/22/2011 we are doing dark launch testing and measurement).

This is joint work with Liyin Tang -- huge thanks to him for many helpful 
discussions on this and the idea of putting fake KVs with the highest timestamp 
of the StoreFile in the scanner priority queue.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113135#comment-13113135
 ] 

Ted Yu commented on HBASE-4455:
---

All unit tests passed.

> Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
> AssignmentManager
> --
>
> Key: HBASE-4455
> URL: https://issues.apache.org/jira/browse/HBASE-4455
> Project: HBase
>  Issue Type: Bug
>Reporter: Ming Ma
>Assignee: Ming Ma
>
> Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
> wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
> RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
> regions aren't in "regions in transtion" from AssignmentManager point of 
> view, but they aren't assigned to any regions. Here are the issues.
> 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
> invoked to check if it contains -ROOT- region. That is due to long delay from 
> ZK notification and async nature of the system. Here is an example, even 
> though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
> T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
> still points to old server sea-lab-3,60020,1316380037898.
> T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:6
> -0x1327e43175e Retrieved 29 byte(s) of data from znode 
> /hbase/root-regio
> n-server and set watcher; sea-lab-3,60020,1316380037898
> T2: 2011-09-18 14:08:57,173 INFO 
> org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
> location in ZooKeeper as sea-lab-1,60020,1316380133656
> T3: 2011-09-18 14:10:26,393 DEBUG 
> org.apache.hadoop.hbase.master.ServerManager: Adde
> d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler 
> to be executed, root=false, meta=true, current Root Location: 
> sea-lab-3,60020,1316380037898
> T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
> master:6
> -0x1327e43175e Retrieved 29 byte(s) of data from znode 
> /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
> 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or 
> .META. availability could be blocked. If meanwhile, the new server that 
> -ROOT- or .META. is being assigned restarted, another instance of 
> MetaServerShutdownHandler is queued. Eventually, all 
> MetaServerShutdownHandler worker threads are filled up. It looks like 
> HBASE-4245.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4459) HbaseObjectWritable code is a byte, we will eventually run out of codes

2011-09-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113136#comment-13113136
 ] 

Ted Yu commented on HBASE-4459:
---

+1 on making code of type short.

> HbaseObjectWritable code is a byte, we will eventually run out of codes
> ---
>
> Key: HBASE-4459
> URL: https://issues.apache.org/jira/browse/HBASE-4459
> Project: HBase
>  Issue Type: Bug
>  Components: io
>Reporter: Jonathan Gray
>Priority: Critical
> Fix For: 0.94.0
>
>
> There are about 90 classes/codes in HbaseObjectWritable currently and 
> Byte.MAX_VALUE is 127.  In addition, anyone wanting to add custom classes but 
> not break compatibility might want to leave a gap before using codes and 
> that's difficult in such limited space.
> Eventually we should get rid of this pattern that makes compatibility 
> difficult (better client/server protocol handshake) but we should probably at 
> least bump this to a short for 0.94.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk

2011-09-22 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113141#comment-13113141
 ] 

Lars Hofhansl commented on HBASE-4344:
--

I ran TestSplitTransaction again with the patch but the various high volume 
LOG.debugs removed. It back to the original speed.

So IMHO, we cannot ship this with all the LOG.debug statements, because it'd be 
so slow that nobody could switch on debug logging to diagnose a production 
problem.


> Persist memstoreTS to disk
> --
>
> Key: HBASE-4344
> URL: https://issues.apache.org/jira/browse/HBASE-4344
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Amitanand Aiyer
>Assignee: Amitanand Aiyer
> Fix For: 0.89.20100924
>
> Attachments: 4344-v2.txt, 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 
> 4344-v7.txt, patch-2
>
>
> Atomicity can be achieved in two ways -- (i) by using  a multiversion 
> concurrency system (MVCC), or (ii) by ensuring that "new" writes do not 
> complete, until the "old" reads complete.
> Currently, Memstore uses something along the lines of MVCC (called RWCC for 
> read-write-consistency-control). But, this mechanism is not incorporated for 
> the key-values written to the disk, as they do not include the memstore TS.
> Let us make the two approaches be similar, by persisting the memstoreTS along 
> with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4452) Possibility of RS opening a region though tickleOpening fails due to znode version mismatch

2011-09-22 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113145#comment-13113145
 ] 

Hudson commented on HBASE-4452:
---

Integrated in HBase-0.92 #15 (See 
[https://builds.apache.org/job/HBase-0.92/15/])
HBASE-4452  Possibility of RS opening a region though tickleOpening fails 
due to
   znode version mismatch (Ramkrishna)

tedyu : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java


> Possibility of RS opening a region though tickleOpening fails due to znode 
> version mismatch
> ---
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
>   long elapsed = now - lastUpdate;
>   if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
>   }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to 
> tickleOpening so that there is no timeout deducted.  But before it could do 
> this if the TimeoutMonitor tries to assign the node to another RS then the 
> other RS will move the node from OFFLINE to OPENING.  Hence when the first RS 
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
>   this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
>   this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after 
> tickleOpening() fails we go on with moving the node to OPENED.  Here again we 
> dont have any check for this condition as already the version has been 
> changed to -1.  Hence the OPENING to OPENED becomes successful. Chances of 
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned 
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the 
> expected version 2
> 2011-09-22 00:57:33,494 WARN 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed 
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3, 
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node 
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to 
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4466) Fix accounting of the number of blocks read for the one-level index case

2011-09-22 Thread Kannan Muthukkaruppan (JIRA)
Fix accounting of the number of blocks read for the one-level index case


 Key: HBASE-4466
 URL: https://issues.apache.org/jira/browse/HBASE-4466
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan
Assignee: Mikhail Bautin


In HFileBlockIndex.seekToDataBlock if the current block is the same as the 
requested block, then in the 1-level index case, we were read block from cache 
again. Although this would be a cache hit, it unnecessarily increases the block 
cache read count and block cache hit count, and also makes it harder to keep 
the accounting straight for tests like TestBlocksRead (introduced in 
HBASE-4450).

Basically, even in the 1-level index case, in 
HFileBlockIndex.seekToDataBlock(), if currentBlock.getOffset() == 
currentOffset, we can avoid looking up the block in cache. 

Assigning to Mikhail (he's already fixed this in our internal branch).



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4434) Don't do HFile Scanner next() unless the next KV is needed:

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113160#comment-13113160
 ] 

jirapos...@reviews.apache.org commented on HBASE-4434:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2034/
---

Review request for Michael Stack, Jonathan Gray and Mikhail Bautin.


Summary
---

The increase in blocks read in one sub-test case is due to HBASE-4466, which 
should be fixed soon. It is primarily an accounting issue, not an extra cache 
miss.


This addresses bug HBASE-4434.
https://issues.apache.org/jira/browse/HBASE-4434


Diffs
-

  
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
 1174514 
  
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
 1174514 

Diff: https://reviews.apache.org/r/2034/diff


Testing
---

Ran TestBlocksRead. Will run the entire suite as well. On our 89 based internal 
branch, the tests passed without any issues.


Thanks,

Kannan



> Don't do HFile Scanner next() unless the next KV is needed:
> ---
>
> Key: HBASE-4434
> URL: https://issues.apache.org/jira/browse/HBASE-4434
> Project: HBase
>  Issue Type: Improvement
>Reporter: Kannan Muthukkaruppan
>Assignee: Kannan Muthukkaruppan
>
> When a seek/reseek is done on StoreFileScanner, in addition to setting the 
> current KV, it also does a HFileScanner level next() ahead of time even if 
> the next KV is never actually required. This inefficiency can potentially 
> result in additional disk seeks and sub-optimal use of the block cache 
> (primarily for cases where the KVs are large and each occupies an HFile block 
> of its own).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-22 Thread jirapos...@reviews.apache.org (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113168#comment-13113168
 ] 

jirapos...@reviews.apache.org commented on HBASE-4455:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2007/#review2035
---

Ship it!


Really good stuff.  I just need a bit of it explained to me below because I'm 
being a little slow.  I think you've also nailed HBASE-3809 with this patch 
Ming (What you think?)  Good stuff.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


Why make this public?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


Why this change? If we did not ask to refresh, why not return what we found?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


oh.  this is a dumb bug!  Good one Ming.  Ugh this is bad.  I'm in here 
because of hbase-3446 and I did not see this.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


Good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


Good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java


nit: No need of the parens



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


When would this be false?  I don't see it in your patch.  (NVM -- I see it 
later)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


We will always have split logs before we got here? (NVM I see point below)



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


This looks like a good change.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java


Oh, here I see the don't split logs flag.

Funny.

We used to do something like this in the old days.

So, we come back in again, we don't split logs, but we are still a server 
that was carrying root or meta -- we reassign again?  I don't get how it works 
here.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/MetaNodeTracker.java


Whats this?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/RootRegionTracker.java


Good.


- Michael


On 2011-09-23 00:15:11, Ming Ma wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2007/
bq.  ---
bq.  
bq.  (Updated 2011-09-23 00:15:11)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  1. Add more logging.
bq.  2. Clean up CatalogTracker. waitForMeta waits for "timeout" value. When 
waitForMetaServerConnectionDefault is called by MetaNodeTracker, the timeout 
value is large. So it doesn't retry in case .ROOT. is updated; add the proper 
implementation for CatalogTracker.verifyMetaRegionLocation
bq.  4. Check for the latest -ROOT- and .META. region location during the 
handling of server shutdown.
bq.  5. Right after assigning the -ROOT- or .META. in ServerShutdownHandler, 
don't block and wait for .META. availability. Resubmit another 
ServerShutdownHandler for regular regions.
bq.  
bq.  
bq.  This addresses bug HBASE-4455.
bq.  https://issues.apache.org/jira/browse/HBASE-4455
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/catalog/CatalogTracker.java
 1172205 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1172205 
bq.
http://svn.apache.org/

  1   2   >