What state would follow RE_ALLOCATE ? The solution depends on the fact that RSx would see RE_ALLOCATE before the next state of the region.
Cheers On Thu, Aug 4, 2011 at 7:37 AM, Ramkrishna S Vasudevan < [email protected]> wrote: > Hi > > I was able to find to identify why PENDING_OPEN happens and under what > scenario. The defect HBASE-3937 is the one where we can identify why there > are race conditions in TimeoutMonitor. > > > The following is the scenario. This scenario few people may already be > aware > of > but i would like to highlight few thing in this, > > 1) Master says Region A to be opened in RS1. While it was about to open > the > NN went down hence instantiating the Region A was a failure. So now the > state of the Znode is OPENING. (JD was interested to know why opening did > not happen in the defect HBASE-3937). > But for the above scenario to happen it took sometime as RS1 was busy and > so > the master deducted it as too much time taken in PENDING_OPEN state and > added it to the assign list. > 2) By the time as the znode state is changed to OPENING the master gets a > call back and the current state in master is OPENING. But we dont do much > about it. > 3) Now the list of regions in the assign list populated in step 1 is now > taken and the Region A is allocated to RS2. Before this the state in master > memory is updated to PENDING_OPEN. > 4) It tries to open but it is not able to do it as the Region A is already > hijacked by RS1. > 5) Now again PENDING_OPEN timeout happens and again Region A is tried to > assign to RS1. > 6) Here again version mismatch occurs and the state continues to be in > PENDING_OPEN. The existing code handles the version mismatches but the > version is created by > the RS and only the RS is aware of the version. > > Points to be noted: > ================== > ->Assignment is done in batch. > ->Though the master memory state for the Region A is updated to OPENING we > are not able to make use of it as already we have populated the assign > list. > -> And other than that we have actually handled other scenarios like what > if > the timeout happens when it is in OPENING state. In that case we try to > OFFLINE the state in znode so that fresh allocation can happen. And also > we > check the current state also > before handling it. > > Our soln 1: > ======== > -> Do not add it to assign list. Instead invoke future task then and there > when we deduct timeout has happened for new assignemt. > -> Add one more state RE_ALLOCATE whenever the master deducts the previous > assignment has timedout. > -> Before changing to RE_ALLOCATE check if the state is altered by RS if > not > change it to RE_ALLOCATE. > -> Similar change to be done in RS so that before it changes the node from > OFFLINE->OPENING-OPENED he will check if the state is RE_ALLOCATE if so RS > is for sure aware that the master has taken control of the node because the > RS was too slow in processing the region assignment. > -> If the master finds that before changing the state to RE_ALLOCATE if the > state has changed it means the RS has done his job correctly and so stops > from changing to RE_ALLOCATE. > > This new state RE_ALLOCATE will help both MASTER and RS to know about the > state. > This is a first cut solution. > Reviews and suggestions are welcome. If you find any problems in this soln > pls do specify. > > Any other solution if you have pls feel free to share. > > > Regards > Ram > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Thursday, August 04, 2011 7:49 AM > To: [email protected] > Subject: Re: HBASE-4060 - TimeOutMonitor refactoring > > Bring the following discussion to public. > HBASE-4015 is in the critical path of 0.92 > > Cheers > > On Wed, Aug 3, 2011 at 8:12 AM, Ramkrishna S Vasudevan < > [email protected]> wrote: > > > Hi JD > > > > I was working on finalising a strategy to avoid Timeoutmonitor race > > condition. I have few queries when i tried reproducing the issue and > while > > going through the code. > > The scenario that is mentioned in the defect where the region is left in > > PENDING_OPEN state when RS1 who was first not opening the region, moved > the > > state from OFFLINE to OPENING when the RS2 started opening the same > region. > > > > > > When i tried to reproduce and went thro the code if the RS that tries to > > make the state changes from OFFLINE->OPENING->OPENED we always check for > > the > > version of the znode before proceeding with the state updation. > > So for the above mentioned scenario I get a log saying > > "Region already hijacked? " > > > > Pls correct me if am wrong? Could you brief me more on the problem that > > causes this race condition. > > > > We are working on a strategy so that every RS is made aware whether it > > should take up the assignment or not by implementing some STATEs which is > > visible to both master and RS. > > > > Once am clear with the real root cause i will upload our idea of > overcoming > > the race condition. > > > > Thanks & Regards > > Ram > > > > > > > > > > -----Original Message----- > > From: [email protected] [mailto:[email protected]] On Behalf Of > > Jean-Daniel Cryans > > Sent: Tuesday, August 02, 2011 3:52 AM > > To: [email protected]; [email protected] > > Cc: stack; Ted Yu > > Subject: Re: HBASE-4060 - TimeOutMonitor refactoring > > > > I've not started working on this yet, happy to review your ideas/code > Ram. > > > > Thanks, > > > > J-D > > > > On Fri, Jul 29, 2011 at 7:54 AM, Ted Yu <[email protected]> wrote: > > > Copying J-D. > > > > > > On Fri, Jul 29, 2011 at 7:38 AM, Ramkrishna S Vasudevan > > > <[email protected]> wrote: > > >> > > >> Hi Ted/Stack, > > >> > > >> > > >> > > >> We analyzed and found similar issues are occurring even in our cluster > a > > >> couple of times. > > >> > > >> > > >> > > >> So we are very much interested in taking it up though we have not yet > > >> analyzed/started the ground work on it. I would also like to know if > > any > > >> one is currently working on it. Particularly JD was very much keen on > > this > > >> issue. > > >> > > >> > > >> > > >> Even if you guys have a plan or solution for that I would like to take > > >> part in it or even ready to implement few things as part of it. > > >> > > >> > > >> > > >> I would like to know your comments and suggestions on this. > > >> > > >> > > >> > > >> Regards > > >> > > >> Ram > > >> > > >> > > >> > > >> > > >> > > >> P.S: Plz do reply to the id in CC also as i will be in travel over the > > >> weekend. > > >> > > >> > > >> > > >> > > > > > > > > >
