[ https://issues.apache.org/jira/browse/HBASE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098605#comment-13098605 ]
jirapos...@reviews.apache.org commented on HBASE-4015: ------------------------------------------------------ ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1668/#review1778 ----------------------------------------------------------- I have some feedback below. On how to write a test, I'm not sure how -- so many systems are touched. We'd need to break out AM so it standalone but that is too much to ask of this patch. Good work Ram. Thanks for persevering with this one. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4056> I see. You have full path here to distingush from o.a.h.h.executor.ExecutorService. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4057> I wonder about passing this in rather than create it in AM on construction? If you need to do a close on the ExecutorService, add a close to AM and have the host of the AM class call the close on the way out? Otherwise, you are exposing the implementation. If you need configuration setting up the ExecutorService, you have the conf object by doing master.getConfiguration? There may be other reasons for passing in the ExecutorService that I'm not seeing... Above is just a thought. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4058> How do we know this will not result in a double assignment? What if the .META. is OPENING and the master crashes. A new master comes up but meantime the .META. has moved to OPENED and here we will assign the new .META. out without checking if .META. successfully OPENED? Should we at least check the znode again before going ahead with assign? Should we actually try grabbing the znode completely before we reassign so if .META. is opening, it will fail to complete? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4059> Thanks for making this little method disableRegionIfInRIT http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4060> If you make a new patch, can you fix formatting of this comment? Its strange the way its broke over a number of lines. So, the flag here is 'reassign'. Does that mean 'called from timeout monitor'? If so, its not clear (Change name of flag or at least fixup javadoc). Else this comment is hard to understand. But even so, do we care where its called from? All thats important is if we are to reassign or not (What if we figure we need to set reassign to true elsewhere than in timeout monitor? Then this comment would be extra confusing?) http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4062> This is great. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4063> Should this method be called isDisabledOrDisabling? Then a reader will expect the boolean return. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4064> This javadoc is better in that it says the flag 'reassign' is set if we were invoked by timeout monitor. Is it important that we're called by the timeout monitor? Do we need to know about the caller when this method is run? What if we decide to call this method from elsewhere than from timeout monitor? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4065> Again is it important that we know we are being called from timeout monitor? What does it mean when we are called from timeout monitor? Is it that we are trying to interrupt normal processing because it seems to have failed? If so, 'reassign' may be the wrong name for this flag? Would 'hijack' be a better flag name? Or 'appropriate' or some such? Sorry for harping on the flag name Ram. If you have good names that convey intent/purpose, it makes it easier on the reader. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4066> Can you fix formatting of this comment? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4067> Should we change znode state here? Read it, get the znode id and try to change it. If we fail, then RS still has control? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4068> Making a new method is good here. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4069> This is nice. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4070> Try not to make these changes going forward (you are moving the + from end of line to start of next line and thats all -- it bulks up your patch unnecessarily). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4071> Nice. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4072> Does doing this clear the watch on the znode such that we will not notice if it moves from OPENING to OPENED? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java <https://reviews.apache.org/r/1668/#comment4073> Is there something wrong here? The comment says do not make znode OFFLINE, wait on timeout monitor, but we are queuing up an ASSIGN it looks like? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java <https://reviews.apache.org/r/1668/#comment4074> Yeah, I wonder if this even belongs out here. Just have it inside in AM? Do we even call close on it? Should we? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java <https://reviews.apache.org/r/1668/#comment4075> No biggie but you could have written this as: return (versionOfflineNode == -1)? openRegion(region): openRegion(region, versionOffilneNode); http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java <https://reviews.apache.org/r/1668/#comment4077> I'd suggest you move the enum here from AM rather than have this strange import of an inner type. Maybe you'd have two classes in here. AssignCallable and UnassignCallable then no need of the enum? For this class name, do you think we need TimeOutManager in its name? In fact, is there a TimeoutManager? It seems to be the ExcecutorService? Or is it the method invokeTimeoutManager? Seems like method should be something like invoke and you queue AssignCallable or UnassignCallable? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java <https://reviews.apache.org/r/1668/#comment4076> misspelling http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java <https://reviews.apache.org/r/1668/#comment4078> I think I asked this already and I should check the response above, but is there any duplication here with current openRegion? Should we factor out common code? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4079> For sure this method should not know about timeout monitor. This is a class in another package from master AM code. Should this flag be named 'hijack' or 'appropriate' or 'preempt' http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4082> You do not javadoc your return. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4080> Can you clean up the formatting on this comment? Do we need to know about timeout monitor in here? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4081> Should this check be done up in timeout monitor or in a callable so its not inline? If it still is in OPENING state up in timeout monitor, then come in here? Just a suggestion (May not be a good one). http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4083> Is it correct adding 1? http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java <https://reviews.apache.org/r/1668/#comment4084> This is a strange location for a comment... I'd say put it into the else clause. - Michael On 2011-08-29 12:11:23, ramkrishna vasudevan wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/1668/ bq. ----------------------------------------------------------- bq. bq. (Updated 2011-08-29 12:11:23) bq. bq. bq. Review request for Ted Yu, Michael Stack, Jean-Daniel Cryans, and Jonathan Gray. bq. bq. bq. Summary bq. ------- bq. bq. HBASE-4015 - updated patch. bq. bq. bq. invokeTimeOutManager(regionState.getRegion(), bq. TimeOutOperationType.ASSIGN); bq. bq. Instead of storing the state when timeout was deducted we process it bq. using future task. bq. ================================================================ bq. bq. case RS_ZK_REGION_OPENING: bq. // TODO: Could check if it was on deadServers. If it was, then we could bq. // do what happens in TimeoutMonitor when it sees this condition. bq. bq. // Just insert region into RIT bq. // If this never updates the timeout will trigger new assignment bq. if (regionInfo.isMetaRegion() || regionInfo.isRootRegion()) { bq. regionsInTransition.put(encodedRegionName, new RegionState( bq. regionInfo, RegionState.State.OPENING, data.getStamp(), data bq. .getOrigin())); bq. processOpeningState(regionInfo); bq. break; bq. } bq. regionsInTransition.put(encodedRegionName, new RegionState(regionInfo, bq. RegionState.State.OPENING, data.getStamp(), data.getOrigin())); bq. break; bq. bq. This change is for HBASE-4203. META and ROOT table need not wait till timeout bq. ====================================================================== bq. bq. In forceRegionStateToOffline() bq. bq. } else { bq. // If invoked from timeout monitor donot force it to OFFLINE. Based on the bq. // state we will decide if to change in-memory state to OFFLINE or not. It will bq. // be done before setting the znode to OFFLINE state. bq. if (!timeOutMonitorReAllocate) { bq. LOG.debug("Forcing OFFLINE; was=" + state); bq. state.update(RegionState.State.OFFLINE); bq. } bq. If the timeout monitor tries to reallcoate the node then dont make bq. the inmemory state to OFFLINE. bq. But the noraml assign flow doesnot expect the inmemory state to OFFLINE. bq. Hence the above change. This is continued with the check in bq. assign() bq. int setOfflineInZooKeeper(final RegionState state, bq. boolean timeOutMonitorReAllocate) { bq. // If invoked from timeoutmonitor the current state in memory need not be bq. // OFFLINE. bq. if (!timeOutMonitorReAllocate && !state.isClosed() && !state.isOffline()) { bq. this.master.abort("Unexpected state trying to OFFLINE; " + state, bq. new IllegalStateException()); bq. return -1; bq. } bq. ====================================================================== bq. bq. boolean allowCreation = false; bq. // If the isReAllocate is true and the current state is PENDING_OPEN bq. // or OPENING then update the inmemory state to PENDING_OPEN. This is bq. // important because bq. // if timeoutmonitor deducts that a region was in OPENING state for a long bq. // time but by the bq. // time timeout monitor tranits the node to OFFLINE the RS would have opened bq. // the node and the bq. // state in znode will be RS_ZK_REGION_OPENED. Inorder to invoke the bq. // OpenedRegionHandler bq. // we expect the inmemeory state to be PENDING_OPEN or OPENING. bq. // For all other cases we can change the inmemory state to OFFLINE. bq. if (timeOutMonitorReAllocate bq. && (state.getState().equals(RegionState.State.PENDING_OPEN) || state bq. .getState().equals(RegionState.State.OPENING))) { bq. state.update(RegionState.State.PENDING_OPEN); bq. allowCreation = false; bq. } else { bq. state.update(RegionState.State.OFFLINE); bq. allowCreation = true; bq. } bq. This change is quite tricky. bq. In normal assign flow the unassigned node for the region will not be present bq. Hence we need to allow the creation of the node newly. bq. But in timeout monitor case we will have the node present in some state hence bq. we decide whether to create node newly or not inside ZKAssign.createOrForceNodeOffline bq. bq. The above code also updates the inmemory state of OFFLINE or PENDING_OPEN bq. which was not update in the previous forceRegionStateToOffline() call. bq. ============================================================================== bq. In ZKAssign.java() bq. if (version == -1) { bq. // If timeoutmonitor deducts a node to be in OPENING state but before it bq. // could bq. // transit to OFFLINE state if RS had opened the region then the Master bq. // deletes the bq. // assigned region znode. In that case the znode will not exist. So we bq. // should not bq. // create the znode again which will lead to double assignment. bq. if (timeOutMonitorReAllocate && !allowCreation) { bq. return -1; bq. } bq. this part prevents double assignment bq. If timeoutmonitor tries to force to OFFLINE state an existing region which was in RIT bq. but before it could do that if the node was opened then openedregionhandler will delete bq. the node hence we should not create the node again. bq. bq. ======================================================================================= bq. In ZkAssign.java() bq. } else { bq. RegionTransitionData curDataInZNode = ZKAssign.getDataNoWatch(zkw, region bq. .getEncodedName(), stat); bq. // Do not move the node to OFFLINE if znode is in any of the following bq. // state. bq. // Because these are already executed states. bq. if (timeOutMonitorReAllocate && null != curDataInZNode) { bq. EventType eventType = curDataInZNode.getEventType(); bq. if (eventType.equals(EventType.RS_ZK_REGION_CLOSING) bq. || eventType.equals(EventType.RS_ZK_REGION_CLOSED) bq. || eventType.equals(EventType.RS_ZK_REGION_OPENED)) { bq. return -1; bq. } bq. } bq. This check prevents from moving the node to OFFLINE state if just before bq. the node is tried to force to OFFLINE the RS would have changed the state bq. to either CLOSING or CLOSED or OPENED. now again moving to OFFLINE bq. will lead to douoble assignment and will an additional operaition. bq. bq. ==================================================================================== bq. In ZKassign.java() bq. boolean setData = false; bq. try { bq. setData = ZKUtil.setData(zkw, node, data.getBytes(), version); bq. // Setdata throws KeeperException which aborts the Master. So we are bq. // catching it here. bq. // If just before setting the znode to OFFLINE if the RS has made any bq. // change to the bq. // znode state then we need to return -1. bq. } catch (KeeperException kpe) { bq. LOG.info("Version mismatch while setting the node to OFFLINE state."); bq. return -1; bq. } bq. This change is actually to avoid the master from abortng. If the forceful OFFLINE is in bq. progrss but just before setting the RS has changed the state. bq. Then the setData will fail leading to master abort. bq. Hence we are catching the exception. bq. ==================================================================================== bq. bq. In assignmentManager.java bq. + if (setOfflineInZK && versionOfOfflineNode == -1) bq. + return; bq. This is nothing but the refactoring done in the existing code. bq. - if (setOfflineInZK && !setOfflineInZooKeeper(state)) return; bq. So if setting the version is unsuccessful return. bq. ===================================================================================== bq. In ZKassign.java() bq. // the below check ensure that double assignment doesnot happen. bq. // When the node is created for the first time then the expected version bq. // that is bq. // passed will be -1 and the version in znode will be 0. bq. // In all other cases the version in znode will be > 0. bq. else if (beginState.equals(EventType.M_ZK_REGION_OFFLINE) bq. && endState.equals(EventType.RS_ZK_REGION_OPENING) bq. && expectedVersion == -1 && stat.getVersion() != 0) { bq. LOG.warn(zkw.prefix("Attempt to transition the " + "unassigned node for " bq. + encoded + " from " + beginState + " to " + endState + " failed, " bq. + "the node existed but was version " + stat.getVersion() bq. + " not the expected version " + expectedVersion)); bq. return -1; bq. } bq. As the comment explains when the node is created for first time the expectedversion will bq. be -1 but the actual version will be 0. Here the scenario is bq. If RS1 has not tranitioned the node from OFFLINE to OPENING if bq. RS2 gets the call from Master after forcefully chaning to OFFLINE the bq. Rs2 will take the control of the node. bq. At that time if RS1 starts transmitting the node then we should not allow bq. it. bq. =============================================================================== bq. bq. bq. This addresses bug HBASE-4015. bq. https://issues.apache.org/jira/browse/HBASE-4015 bq. bq. bq. Diffs bq. ----- bq. bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/ipc/HRegionInterface.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/TimeOutManagerCallable.java PRE-CREATION bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenMetaHandler.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRootHandler.java 1161985 bq. http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java 1161985 bq. bq. Diff: https://reviews.apache.org/r/1668/diff bq. bq. bq. Testing bq. ------- bq. bq. Yes. But could not add new test case. bq. TestMasterFailOver is passing with the current changes also. bq. bq. bq. Thanks, bq. bq. ramkrishna bq. bq. > Refactor the TimeoutMonitor to make it less racy > ------------------------------------------------ > > Key: HBASE-4015 > URL: https://issues.apache.org/jira/browse/HBASE-4015 > Project: HBase > Issue Type: Sub-task > Affects Versions: 0.90.3 > Reporter: Jean-Daniel Cryans > Assignee: ramkrishna.s.vasudevan > Priority: Blocker > Fix For: 0.92.0 > > Attachments: HBASE-4015_1_trunk.patch, HBASE-4015_2_trunk.patch, > HBASE-4015_reprepared_trunk_2.patch, Timeoutmonitor with state diagrams.pdf > > > The current implementation of the TimeoutMonitor acts like a race condition > generator, mostly making things worse rather than better. It does it's own > thing for a while without caring for what's happening in the rest of the > master. > The first thing that needs to happen is that the regions should not be > processed in one big batch, because that sometimes can take minutes to > process (meanwhile a region that timed out opening might have opened, then > what happens is it will be reassigned by the TimeoutMonitor generating the > never ending PENDING_OPEN situation). > Those operations should also be done more atomically, although I'm not sure > how to do it in a scalable way in this case. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira