[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-09 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12930285#action_12930285
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net


bq.  On 2010-11-09 11:05:46, Jean-Daniel Cryans wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 
444
bq.   http://review.cloudera.org/r/1187/diff/5/?file=17034#file17034line444
bq.  
bq.   I gave you my comments in person. Short version, I think that those 
methods' method shouldn't change and that we should have methods clearly marked 
as async, and then do a job of educating people towards using them.
bq.  
bq.  Jean-Daniel Cryans wrote:
bq.  I meant method's behavior

Yeah, I agree with you after chatting.  Will fix (And you spotted prob. w/ way 
async was running anyways).


bq.  On 2010-11-09 11:05:46, Jean-Daniel Cryans wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java,
 line 135
bq.   http://review.cloudera.org/r/1187/diff/5/?file=17043#file17043line135
bq.  
bq.   Looks an awful lot like BulkDisabler

I disagree.  The overrides each differ substantially (They look similar if you 
don't look close -- smile).


- stack


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/#review1866
---





 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929689#action_12929689
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/
---

Review request for hbase and Jonathan Gray.


Summary
---

Renamed ZKTableDisable as ZKTable, making it a generic zk util for managing 
'tables'.
Added enabing/disabling states to table the current set of enabled/disabled 
only.

M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
  (createSetData): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  Removed offlining region utility methods no longer used.
  (We do it now over in MetaEditor)
M src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
  Javadoc.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Add a base abstract class to do 'bulk assignments'.  Redo
  assignAllUserRegions to use subclass of new bulk assigner class.
  Added isTableEnabled, disablingTable, enablingTable.
M src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
  Redid to use new bulk assigner class.
M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
A Added TestZKTable


This addresses bug hbase-3112.
http://issues.apache.org/jira/browse/hbase-3112


Diffs
-

  trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableDisable.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1032652 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
1032652 
  trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/1187/diff


Testing
---


Thanks,

stack




 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929700#action_12929700
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: Jonathan Gray jg...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/#review1840
---


Going to continue review on v2 patch


trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5962

whitespace (and we don't have IOException in javadoc but that's not your 
fault)



trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5964

What does this mean?  done vs. not done?  I think we should be more 
descriptive in the logging (if done, then we've completed assignment of regions 
on cluster startup).  But if not done, on startup, what does this mean?  
There's comment later that RIT timeouts should fix it up, so should be in log 
message here?  Or on startup case of bulk assign, should we fail startup here 
if this doesn't pass?



trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5961

Can we move all this stuff into a separate class?  AssignmentManager is 
getting huge.  Maybe BulkAssign could be class that contains all these other 
class definitions?

Also gives good opportunity in class comment to describe in general how 
this stuff works.


- Jonathan





 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929703#action_12929703
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net


bq.  On 2010-11-08 11:50:55, Jonathan Gray wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 
1143
bq.   http://review.cloudera.org/r/1187/diff/1/?file=16872#file16872line1143
bq.  
bq.   What does this mean?  done vs. not done?  I think we should be more 
descriptive in the logging (if done, then we've completed assignment of regions 
on cluster startup).  But if not done, on startup, what does this mean?  
There's comment later that RIT timeouts should fix it up, so should be in log 
message here?  Or on startup case of bulk assign, should we fail startup here 
if this doesn't pass?

Removed it.  It confuses (see above for exhibit A).

If problem doing bulk assign, we'll crash out master.  Otherwise, timeout of 
RIT should fix bulk assign stragglers.


bq.  On 2010-11-08 11:50:55, Jonathan Gray wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 
1149
bq.   http://review.cloudera.org/r/1187/diff/1/?file=16872#file16872line1149
bq.  
bq.   Can we move all this stuff into a separate class?  AssignmentManager 
is getting huge.  Maybe BulkAssign could be class that contains all these other 
class definitions?
bq.   
bq.   Also gives good opportunity in class comment to describe in general 
how this stuff works.

BulkAssigner class comment says -- perhaps a little curtly -- what it does?  
I'll move it out.  The implementations though I'll leave beside where they are 
used -- in class.


- stack


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/#review1840
---





 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929709#action_12929709
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: Jonathan Gray jg...@apache.org

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/#review1841
---


Looks pretty good.  Should be much improved.

What about client-side semantics and checks?  I thought there was going to be 
some change there?  An async trigger and then a fast way to see if it's done or 
not?  Should be clear on javadoc since enable/disable is always thing we get 
questions on.

I think most of my issues with patch is that it adds huge amount of new code 
into already big AssignmentManager class.  Should move these classes out and 
not sure if we should have these methods which just touch ZK.  I think having 
all logic about the state transitions, usage of ZK, etc more explicitly 
controlled in the handlers themselves might be more clear to follow how this 
works.  Maybe not but was a little hard to follow (that, for example, we don't 
have enabling/disabling states in memory... where we were talking about 
in-memory state vs zk states, etc).

Also, ZKTable can be simplified (and made more future proof) with an enum.

Now, for failover, I know we said we would punt to 0.92 on handling of this 
case, but we should at least make it so we don't get into broken state if we 
have a master failover.  What will happen if we are in disabling up in zk but 
not all closes have been done?

Lastly... I think we definitely need some unit tests on this stuff.  
TestMasterFailover does a little but not really relevant to these changes (only 
deals with regions already RIT).  TestRollingRestart does an enable/disable of 
table.  But none of this stuff takes into account failure of stuff in handlers, 
failure of RS, etc... Don't need to hold up this patch on unit tests if it's 
working in cluster testing on large tables, but it's kind of thing that would 
be good to test at some point.


trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5966

I thought we had this method somewhere already?



trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5967

nevermind, you just moved it :)



trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5968

Whitespace, and should make note that this is checking ZK not in-memory 
state?



trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
http://review.cloudera.org/r/1187/#comment5969

Are these methods necessary?  Seems like unnecessary stuff in 
AssignmentManager.  The Handlers can just use the ZK methods directly?  (keep 
the AM methods as check/set of it's internal state?)



trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
http://review.cloudera.org/r/1187/#comment5970

This is for repeated runs of enable?  Should we log if this actually 
removes regions (table has X total regions, already Y online, assigning Z still 
offline)?

It's okay that this operation is not done under any locks?  Couldn't 
regions be coming online/offline concurrent with this operation?



trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
http://review.cloudera.org/r/1187/#comment5972

whitespace



trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
http://review.cloudera.org/r/1187/#comment5973

enum?



trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java
http://review.cloudera.org/r/1187/#comment5975

yeah, instead of all these permutations on isEnabledOrDisabling and what 
not, should just use an enum and pass that in as argument to method which 
checks node state.



trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
http://review.cloudera.org/r/1187/#comment5976

What is this method's policy on watches?  Please note it in javadoc.  This 
method is not at all strict and is multiple operations so is not safe to use on 
nodes where we rely on watches / monitoring of state transitions, so let's make 
some kind of note.


- Jonathan





 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in 

[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929843#action_12929843
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/
---

(Updated 2010-11-08 16:09:44.239145)


Review request for hbase and Jonathan Gray.


Changes
---

Added shell is_enabled, is_disabled. Did some fixup because of issues revealed 
testing.  Implemented jgray suggestions.


Summary
---

Renamed ZKTableDisable as ZKTable, making it a generic zk util for managing 
'tables'.
Added enabing/disabling states to table the current set of enabled/disabled 
only.

M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
  (createSetData): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  Removed offlining region utility methods no longer used.
  (We do it now over in MetaEditor)
M src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
  Javadoc.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Add a base abstract class to do 'bulk assignments'.  Redo
  assignAllUserRegions to use subclass of new bulk assigner class.
  Added isTableEnabled, disablingTable, enablingTable.
M src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
  Redid to use new bulk assigner class.
M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
A Added TestZKTable


This addresses bug hbase-3112.
http://issues.apache.org/jira/browse/hbase-3112


Diffs (updated)
-

  trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
 1032652 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableDisable.java 
1032652 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1032652 
  trunk/src/main/ruby/hbase/admin.rb 1032778 
  trunk/src/main/ruby/shell.rb 1032778 
  trunk/src/main/ruby/shell/commands/disable.rb 1032778 
  trunk/src/main/ruby/shell/commands/enable.rb 1032778 
  trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1032652 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
1032652 
  trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/1187/diff


Testing
---


Thanks,

stack




 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929841#action_12929841
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net


bq.  On 2010-11-08 12:13:11, Jonathan Gray wrote:
bq.   Looks pretty good.  Should be much improved.
bq.   
bq.   What about client-side semantics and checks?  I thought there was going 
to be some change there?  An async trigger and then a fast way to see if it's 
done or not?  Should be clear on javadoc since enable/disable is always thing 
we get questions on.
bq.   
bq.   I think most of my issues with patch is that it adds huge amount of new 
code into already big AssignmentManager class.  Should move these classes out 
and not sure if we should have these methods which just touch ZK.  I think 
having all logic about the state transitions, usage of ZK, etc more explicitly 
controlled in the handlers themselves might be more clear to follow how this 
works.  Maybe not but was a little hard to follow (that, for example, we don't 
have enabling/disabling states in memory... where we were talking about 
in-memory state vs zk states, etc).
bq.   
bq.   Also, ZKTable can be simplified (and made more future proof) with an 
enum.
bq.   
bq.   Now, for failover, I know we said we would punt to 0.92 on handling of 
this case, but we should at least make it so we don't get into broken state if 
we have a master failover.  What will happen if we are in disabling up in zk 
but not all closes have been done?
bq.   
bq.   Lastly... I think we definitely need some unit tests on this stuff.  
TestMasterFailover does a little but not really relevant to these changes (only 
deals with regions already RIT).  TestRollingRestart does an enable/disable of 
table.  But none of this stuff takes into account failure of stuff in handlers, 
failure of RS, etc... Don't need to hold up this patch on unit tests if it's 
working in cluster testing on large tables, but it's kind of thing that would 
be good to test at some point.

Added to shell is_enabled and is_disabled.  Added javadoc to enableTable and 
disableTable explaining how these methods return immediately now but that 
process can take a while to complete.

On methods that just touch zk, they are of a class of which one member -- 
isTableDisabled -- does checks of AM data structures... so I can't really move 
them out.

Change ZKTable to use enums.

If disabling and master failover, when new master comes online, will be broke. 
But enable/disable are idempotent and a rerun of enable/disable will start up 
the process again and it should finish off properly.  Thats the theory anyways.

Regards unit tests, enable/disable is used all over unit test code base.  If 
you are looking for a test that simulates big table enable/disable, that's hard 
to do in a unit test.


bq.  On 2010-11-08 12:13:11, Jonathan Gray wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 
1302
bq.   http://review.cloudera.org/r/1187/diff/2/?file=16884#file16884line1302
bq.  
bq.   I thought we had this method somewhere already?

there is a similarly named one for zk that you added -- is that what you are 
referring to?


bq.  On 2010-11-08 12:13:11, Jonathan Gray wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java, line 
1573
bq.   http://review.cloudera.org/r/1187/diff/2/?file=16884#file16884line1573
bq.  
bq.   Are these methods necessary?  Seems like unnecessary stuff in 
AssignmentManager.  The Handlers can just use the ZK methods directly?  (keep 
the AM methods as check/set of it's internal state?)

See note above where argument is that these are of a set and that since one of 
the set members does AM machinations, then all belong in AM.  For now I think 
its fine.


bq.  On 2010-11-08 12:13:11, Jonathan Gray wrote:
bq.   
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java,
 line 97
bq.   http://review.cloudera.org/r/1187/diff/2/?file=16888#file16888line97
bq.  
bq.   This is for repeated runs of enable?  Should we log if this actually 
removes regions (table has X total regions, already Y online, assigning Z still 
offline)?
bq.   
bq.   It's okay that this operation is not done under any locks?  Couldn't 
regions be coming online/offline concurrent with this operation?

Added logging to enable/disable.

Regards 'locks', these are my own local copies of these Lists.  Also notion 
that regions in table count can change is sort of allowed..  Each time through 
loop we'll recheck .META. for enabling and we'll re-look at the list of table 
regions in AssignmentManager


bq.  On 2010-11-08 12:13:11, Jonathan Gray wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java, line 
41
bq.   http://review.cloudera.org/r/1187/diff/2/?file=16890#file16890line41
bq. 

[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-08 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12929983#action_12929983
 ] 

HBase Review Board commented on HBASE-3112:
---

Message from: st...@duboce.net

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1187/
---

(Updated 2010-11-08 22:35:19.30)


Review request for hbase and Jonathan Gray.


Changes
---

Addressed Jon comments.   Still to do is finish up synchronizations on cache 
passed into ZKTable, cache of table states to save on trips to zk ensemble.   
Also, need to run tests an put in place calls to synchronized enable/disable 
table methods that this patch adds so tests start passing again.  Will do in 
morning.


Summary
---

Renamed ZKTableDisable as ZKTable, making it a generic zk util for managing 
'tables'.
Added enabing/disabling states to table the current set of enabled/disabled 
only.

M src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
  (createSetData): Added.
M src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  Removed offlining region utility methods no longer used.
  (We do it now over in MetaEditor)
M src/main/java/org/apache/hadoop/hbase/HRegionInfo.java
  Javadoc.
M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
  Add a base abstract class to do 'bulk assignments'.  Redo
  assignAllUserRegions to use subclass of new bulk assigner class.
  Added isTableEnabled, disablingTable, enablingTable.
M src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
  Redid to use new bulk assigner class.
M src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
A Added TestZKTable


This addresses bug hbase-3112.
http://issues.apache.org/jira/browse/hbase-3112


Diffs (updated)
-

  trunk/src/main/java/org/apache/hadoop/hbase/HRegionInfo.java 1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java 
1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/master/BulkAssigner.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/master/LoadBalancer.java 1032841 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DeleteTableHandler.java
 1032841 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/DisableTableHandler.java
 1032841 
  
trunk/src/main/java/org/apache/hadoop/hbase/master/handler/EnableTableHandler.java
 1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTable.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKTableDisable.java 
1032841 
  trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 1032841 
  trunk/src/main/ruby/hbase/admin.rb 1032841 
  trunk/src/main/ruby/shell.rb 1032841 
  trunk/src/main/ruby/shell/commands/disable.rb 1032841 
  trunk/src/main/ruby/shell/commands/enable.rb 1032841 
  trunk/src/main/ruby/shell/commands/is_disabled.rb PRE-CREATION 
  trunk/src/main/ruby/shell/commands/is_enabled.rb PRE-CREATION 
  trunk/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java 1032841 
  trunk/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
1032841 
  trunk/src/test/java/org/apache/hadoop/hbase/zookeeper/TestZKTable.java 
PRE-CREATION 

Diff: http://review.cloudera.org/r/1187/diff


Testing
---


Thanks,

stack




 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0

 Attachments: 3112-v2.txt, 3112-v3.txt, 3112.txt


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- 

[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928876#action_12928876
 ] 

stack commented on HBASE-3112:
--

So, its possible for the 'enabled' flag to be set in zk but all regions get 
offlined (e.g. server crashed while disabling some regions).  The way we online 
is that we get the list of all the regions in a table and regardless, we go 
assign WHETHER OR NOT THE REGION HAS BEEN OFFLINED OR NOT.  Can double the 
regions on your cluster.

Currently also, the client will not go to the server if it sees that the table 
state is disabled up in zk.  This is not helpful for the case where not all 
regions were offlined (see above).

So, need to think about it some.  Could be as simple as checking state of 
region before we enable/disable (assign/unassign) since we got its state from 
.META. (if not in RIT that is).

Could introduce a disabling/enabling state and disabling a table, you make 
entry in zk, one for each region.

Need to add to hbck note as to whether table is enabled or disabled.

Could do minimum for now and come back to this later after 0.90.0.

 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-05 Thread Jonathan Gray (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928881#action_12928881
 ] 

Jonathan Gray commented on HBASE-3112:
--

I don't think we need disabling/enabling at region granularity.  When master 
acknowledges opens/closes of regions, he already checks if a table is disabled. 
 I think that stuff is mostly good.

I think adding enabling/disabling to the disabled table node that we've added 
should be sufficient.  Just need a master to have an internal state about an 
enabling or disabling table.  If failover, master will rebuild this state for 
any table in a enabling/disabling state (he'll have to actually ask everyone 
what they serve) and will issue assign/unassign as necessary.

Can you describe the case you've seen a little better?  I'd like to add a test 
of it.  It's when enabling a table?  Not quite sure I understand exactly at 
which point who is dying?

 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928929#action_12928929
 ] 

stack commented on HBASE-3112:
--

On way home was thinking of how I need a disabling/enabling state for a table.  
I need at least for case where a split is happening when a region is being 
disabled.  When splits go to open on the regionserver, they should check zk if 
table is disabled or disabling and pass on the open (splits don't do state up 
in zk).

@Jon Disable and enable are run from executor.  In each case, we do a get of 
all regions in a table and then per region we run unassign inline inside in the 
executor.  Any failure of a server while this process is going on makes for a 
table that is partially disabled/enabled.



 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928932#action_12928932
 ] 

stack commented on HBASE-3112:
--

Currently enable/disable runs synchronously and serially in that the 
enable/disable handler runs each table regions unassign one after the other.  
This is at a minimum slow when it doesn't have to be.

What if we changed it so it was async.

Here is how it might work.

 * Client invokes disable table on master
 ** Disable invocation returns immediately
 ** We add a new is_disabled/is_enabled call to client so client can check on 
state of enable/disable.
 * Master queues a DisableTableHandler.
 ** DTH checks if table already disabled, if so returns
 ** Otherwise, sets disabling flag up in zk then loops on
 *** Getting all regions in a table from meta
 *** Per region, checks if in RIT or not already offline, and if not queues 
close region executor
 *** Waits around till all table regions clear RIT
 ** On exit from the loop, it sets up in zk that table is disabled.

Same for ETH

If master dies midway, new master will start up a DTH or ETH per table that has 
disabling or enabling up in zk.

DTH/ETH must be made idempotent because I think there are situations even still 
in which they might fail. If they fail, user might reschedule the 
disable/enable which would start up a new DTH (if already a DTH queued and 
running, they won't clash since only one thread for this exectuor and the 
second will just early out because when it checks flag in zk it'll be disabled 
so it'll have no work to do).

 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-11-04 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12928294#action_12928294
 ] 

stack commented on HBASE-3112:
--

Shell waits on enable/disable.  Thats probably fine except I see this on enable 
of 600 regions that are taking a little while:

{code}

ERROR: java.net.SocketTimeoutException: Call to sv2borg180/10.20.20.180:6 
failed on socket timeout exception: java.net.SocketTimeoutException: 6 
millis timeout while waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/10.20.20.180:40901 
remote=sv2borg180/10.20.20.180:6]
{code}

... which would be a little disorientating.

St.Ack

 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-3112) Enable and disable of table needs a bit of loving in new master

2010-10-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924075#action_12924075
 ] 

Jean-Daniel Cryans commented on HBASE-3112:
---

While testing if HBASE-2522 was still there with the new master, I found out 
that we assign the parent regions when we re-enable the table (but disabling 
seems to work ok).

 Enable and disable of table needs a bit of loving in new master
 ---

 Key: HBASE-3112
 URL: https://issues.apache.org/jira/browse/HBASE-3112
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Critical
 Fix For: 0.90.0


 The tools are in place to do a more reliable enable/disable of tables.  Some 
 work has been done to hack in a basic enable/disable but its not enough -- 
 see the test avro/thrift tests where a disable/enable/disable switchback can 
 confuse the table state (and has been disabled until this issue addressed).
 This issue is about finishing off enable/disable in the new master.   I think 
 we need to add to the table znode an enabling/disabling state rather than 
 have them binary with a watcher that will stop an enable (or disable) 
 starting until the previous completes (Currently we atomically switch the 
 state though the region close/open lags -- some work in enable/disable 
 handlers helps in that they won't complete till all regions have 
 transitioned.. but its not enough).
 Need to add tests too.
 Marking issue critical bug because loads of the questions we get on lists are 
 about enable/disable probs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.