[ 
https://issues.apache.org/jira/browse/HBASE-5209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223994#comment-13223994
 ] 

[email protected] commented on HBASE-5209:
------------------------------------------------------



bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > Should probably file follow-on issues to update AvroUtil.csToACS 
(convert to ClusterStatus to Avro for avro server) and likely to thrift/rest 
clients as well.

OK.


bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 234
bq.  > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line234>
bq.  >
bq.  >     Would a copy be safer?

backupMasters has no way of being modified after the instance is created, so I 
don't think a copy is any safer, if you are thinking about "safer" == 
"immutable".  If someone added a mutator method for backupMasters, then yes we 
would have a problem, but I find that unlikely in this case given the mission 
of ClusterStatus being a snapshot of the cluster state.


bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 180
bq.  > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line180>
bq.  >
bq.  >     Since the type is Collection<xxx>, I have a feeling that there may 
be a equality and hash issues here. This should probably be:
bq.  >     
bq.  >     this.backupMasters.containsAll(((ClusterStatus)o).backupMasters)
bq.  >     
bq.  >     alternately, (if not present), add unit tests to convince that 
equality and hash operations work?

Easier to just change this to containsAll() rather than add unit tests, since I 
think another patch is needed here.


bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/ClusterStatus.java, line 70
bq.  > <https://reviews.apache.org/r/3892/diff/3/?file=75401#file75401line70>
bq.  >
bq.  >     Unlikely, but someone could possibly name their host "unknown".  
Maybe change this to be something that is normally illegal as a host name 
(start with a symbol, maybe "#unknown#".)  
bq.  >

OK.


bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1387
bq.  > <https://reviews.apache.org/r/3892/diff/3/?file=75403#file75403line1387>
bq.  >
bq.  >     Is this guaranteed to read/return values in the same order all the 
time?  This may be is important for the equality comparison and hashcode 
comparisons. (Doesn't seemed to be used in hbase core code, but exposed to 
clients).

I'll add a sort with appropriate Comparator, as the backing function from 
ZooKeeper does not guarantee ordering.


bq.  On 2012-03-07 00:53:15, jmhsieh wrote:
bq.  > src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java, 
line 143
bq.  > <https://reviews.apache.org/r/3892/diff/3/?file=75402#file75402line143>
bq.  >
bq.  >     Would this create a small race -- why not have all masters write 
into a "potential-masters" zdir instead of "backup-masters" zdir and then juat 
have a node for the active master?
bq.  >     
bq.  >     Then we don't have to worry about someone accidentally reading an in 
between state where the a master is active and still in backup-masters or not 
master and not in backup-masters.

I think you are suggesting having a list of potential masters, and deriving the 
list of backup masters from that list by just subtracting the active master.

The problem then is that there is then a race whenever you derive the backup 
masters, between when you read the potential masters and when you read the 
active master, as those are separate operations and therefore something can 
change in between them.  You're just moving where the race happens.

What is really needed here is a read-write lock that protects this area.  There 
are no read-write (or any) ZK-based locks currently in HBase, so I'm not sure 
if we want to add an implementation of that just for this case.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3892/#review5673
-----------------------------------------------------------


On 2012-02-16 06:30:31, David Wang wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3892/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-02-16 06:30:31)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Problem:
bq.  There is no method in the HBase client-facing APIs to determine which of 
the masters is currently active.  This can be especially useful in setups with 
multiple backup masters.
bq.  
bq.  Solution:
bq.  Augment ClusterStatus to return the currently active master and the list 
of backup masters.
bq.  
bq.  Notes:
bq.  * I uncovered a race condition in ActiveMasterManager, between when it 
determines that it did not win the original race to be the active master, and 
when it reads the ServerName of the active master.  If the active master goes 
down in that time, the read to determine the active master's ServerName will 
fail ungracefully and the candidate master will abort.  The solution 
incorporated in this patch is to check to see if the read of the ServerName 
succeeded before trying to use it.
bq.  * I fixed some minor formatting issues while going through the code.  I 
can take these changes out if it is considered improper to commit such 
non-related changes with the main changes.
bq.  
bq.  
bq.  This addresses bug HBASE-5209.
bq.      https://issues.apache.org/jira/browse/HBASE-5209
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    src/main/java/org/apache/hadoop/hbase/ClusterStatus.java b849429 
bq.    src/main/java/org/apache/hadoop/hbase/master/ActiveMasterManager.java 
2f60b23 
bq.    src/main/java/org/apache/hadoop/hbase/master/HMaster.java 9d21903 
bq.    src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java f6f3f71 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java 111f76e 
bq.    src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java 
3e3d131 
bq.    
src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java 
16e4744 
bq.    src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java 
bc98fb0 
bq.  
bq.  Diff: https://reviews.apache.org/r/3892/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  * Ran mvn -P localTests test multiple times - no new tests fail
bq.  * Ran mvn -P localTests -Dtest=TestActiveMasterManager test multiple runs 
- no failures
bq.  * Ran mvn -P localTests -Dtest=TestMasterFailover test multiple runs - no 
failures
bq.  * Started active and multiple backup masters, then killed active master, 
then brought it back up (will now be a backup master)
bq.    * Did the following before and after killing
bq.      * hbase hbck -details - checked output to see that active and backup 
masters are reported properly
bq.      * zk_dump - checked that active and backup masters are reported 
properly
bq.  * Started cluster with no backup masters to make sure change operates 
correctly that way
bq.  * Tested build with this diff vs. build without this diff, in all 
combinations of client and server
bq.    * Verified that new client can run against old servers without incident 
and with the defaults applied.
bq.    * Note that old clients get an error when running against new servers, 
because the old readFields() code in ClusterStatus does not handle exceptions 
of any kind.  This is not solvable, at least in the scope of this change.
bq.  
bq.  12/02/15 15:15:38 INFO zookeeper.ClientCnxn: Session establishment 
complete on server haus02.sf.cloudera.com/172.29.5.33:30181, sessionid = 
0x135834c75e20008, negotiated timeout = 5000
bq.  12/02/15 15:15:39 ERROR io.HbaseObjectWritable: Error in readFields
bq.  A record version mismatch occured. Expecting v2, found v3
bq.          at 
org.apache.hadoop.io.VersionedWritable.readFields(VersionedWritable.java:46)
bq.          at 
org.apache.hadoop.hbase.ClusterStatus.readFields(ClusterStatus.java:247)
bq.          at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readObject(HbaseObjectWritable.java:583)
bq.          at 
org.apache.hadoop.hbase.io.HbaseObjectWritable.readFields(HbaseObjectWritable.java:297)
bq.  
bq.  * Ran dev-support/test-patch.sh - no new issues fail:
bq.  
bq.  -1 overall.
bq.  
bq.      +1 @author.  The patch does not contain any @author tags.
bq.  
bq.      +1 tests included.  The patch appears to include 7 new or modified 
tests.
bq.  
bq.      -1 javadoc.  The javadoc tool appears to have generated -136 warning 
messages.  
bq.  
bq.      +1 javac.  The applied patch does not increase the total number of 
javac compiler warnings.
bq.  
bq.      +1 findbugs.  The patch does not introduce any new Findbugs (version ) 
warnings.
bq.  
bq.      +1 release audit.  The applied patch does not increase the total 
number of release audit warnings.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  David
bq.  
bq.


                
> HConnection/HMasterInterface should allow for way to get hostname of 
> currently active master in multi-master HBase setup
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5209
>                 URL: https://issues.apache.org/jira/browse/HBASE-5209
>             Project: HBase
>          Issue Type: Improvement
>          Components: master
>    Affects Versions: 0.90.5, 0.92.0, 0.94.0
>            Reporter: Aditya Acharya
>            Assignee: David S. Wang
>             Fix For: 0.92.1, 0.94.0
>
>         Attachments: 5209.addendum, HBASE_5209_v5.diff
>
>
> I have a multi-master HBase set up, and I'm trying to programmatically 
> determine which of the masters is currently active. But the API does not 
> allow me to do this. There is a getMaster() method in the HConnection class, 
> but it returns an HMasterInterface, whose methods do not allow me to find out 
> which master won the last race. The API should have a 
> getActiveMasterHostname() or something to that effect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to