Re: HBASE-4060 - TimeOutMonitor refactoring

Ted Yu Thu, 04 Aug 2011 09:11:09 -0700

What state would follow RE_ALLOCATE ?
The solution depends on the fact that RSx would see RE_ALLOCATE before the
next state of the region.


Cheers

On Thu, Aug 4, 2011 at 7:37 AM, Ramkrishna S Vasudevan <
[email protected]> wrote:

> Hi
>
> I was able to find to identify why PENDING_OPEN happens and under what
> scenario.  The defect HBASE-3937 is the one where we can identify why there
> are race conditions in TimeoutMonitor.
>
>
> The following is the scenario. This scenario few people may already be
> aware
> of
> but i would like to highlight few thing in this,
>
> 1) Master says Region A to be opened in RS1.  While it was about to open
> the
> NN went down hence instantiating the Region A was a failure.  So now the
> state of the Znode is OPENING.  (JD was interested to know why opening did
> not happen in the defect HBASE-3937).
> But for the above scenario to happen it took sometime as RS1 was busy and
> so
> the master deducted it as too much time taken in PENDING_OPEN state and
> added it to the assign list.
> 2) By the time as the znode state is changed to OPENING the master gets a
> call back and the current state in master is OPENING.  But we dont do much
> about it.
> 3) Now the list of regions in the assign list populated in step 1 is now
> taken and the Region A is allocated to RS2. Before this the state in master
> memory is updated to PENDING_OPEN.
> 4) It tries to open but it is not able to do it as the Region A is already
> hijacked by RS1.
> 5) Now again PENDING_OPEN timeout happens and again Region A is tried to
> assign to RS1.
> 6) Here again version mismatch occurs and the state continues to be in
> PENDING_OPEN.  The existing code handles the version mismatches but the
> version is created by
> the RS and only the RS is aware of the version.
>
> Points to be noted:
> ==================
> ->Assignment is done in batch.
> ->Though the master memory state for the Region A is updated to OPENING we
> are not able to make use of it as already we have populated the assign
> list.
> -> And other than that we have actually handled other scenarios like what
> if
> the timeout happens when it is in OPENING state.  In that case we try to
> OFFLINE the state in znode so that fresh allocation can happen.  And also
> we
> check the current state also
> before handling it.
>
> Our soln 1:
> ========
> -> Do not add it to assign list.  Instead invoke future task then and there
> when we deduct timeout has happened for new assignemt.
> -> Add one more state RE_ALLOCATE whenever the master deducts the previous
> assignment has timedout.
> -> Before changing to RE_ALLOCATE check if the state is altered by RS if
> not
> change it to RE_ALLOCATE.
> -> Similar change to be done in RS so that before it changes the node from
> OFFLINE->OPENING-OPENED he will check if the state is RE_ALLOCATE if so RS
> is for sure aware that the master has taken control of the node because the
> RS was too slow in processing the region assignment.
> -> If the master finds that before changing the state to RE_ALLOCATE if the
> state has changed it means the RS has done his job correctly and so stops
> from changing to RE_ALLOCATE.
>
> This new state RE_ALLOCATE will help both MASTER and RS to know about the
> state.
> This is a first cut solution.
> Reviews and suggestions are welcome.  If you find any problems in this soln
> pls do specify.
>
> Any other solution if you have pls feel free to share.
>
>
> Regards
> Ram
>
> -----Original Message-----
> From: Ted Yu [mailto:[email protected]]
> Sent: Thursday, August 04, 2011 7:49 AM
> To: [email protected]
> Subject: Re: HBASE-4060 - TimeOutMonitor refactoring
>
> Bring the following discussion to public.
> HBASE-4015 is in the critical path of 0.92
>
> Cheers
>
> On Wed, Aug 3, 2011 at 8:12 AM, Ramkrishna S Vasudevan <
> [email protected]> wrote:
>
> > Hi JD
> >
> > I was working on finalising a strategy to avoid Timeoutmonitor race
> > condition.  I have few queries when i tried reproducing the issue and
> while
> > going through the code.
> > The scenario that is mentioned in the defect where the region is left in
> > PENDING_OPEN state when RS1 who was first not opening the region, moved
> the
> > state from OFFLINE to OPENING when the RS2 started opening the same
> region.
> >
> >
> > When i tried to reproduce and went thro the code if the RS that tries to
> > make the state changes from OFFLINE->OPENING->OPENED we always check for
> > the
> > version of the znode before proceeding with the state updation.
> > So for the above mentioned scenario I get a log saying
> > "Region already hijacked? "
> >
> > Pls correct me if am wrong? Could you brief me more on the problem that
> > causes this race condition.
> >
> > We are working on a strategy so that every RS is made aware whether it
> > should take up the assignment or not by implementing some STATEs which is
> > visible to both master and RS.
> >
> > Once am clear with the real root cause i will upload our idea of
> overcoming
> > the race condition.
> >
> > Thanks & Regards
> > Ram
> >
> >
> >
> >
> > -----Original Message-----
> > From: [email protected] [mailto:[email protected]] On Behalf Of
> > Jean-Daniel Cryans
> > Sent: Tuesday, August 02, 2011 3:52 AM
> > To: [email protected]; [email protected]
> > Cc: stack; Ted Yu
> > Subject: Re: HBASE-4060 - TimeOutMonitor refactoring
> >
> > I've not started working on this yet, happy to review your ideas/code
> Ram.
> >
> > Thanks,
> >
> > J-D
> >
> > On Fri, Jul 29, 2011 at 7:54 AM, Ted Yu <[email protected]> wrote:
> > > Copying J-D.
> > >
> > > On Fri, Jul 29, 2011 at 7:38 AM, Ramkrishna S Vasudevan
> > > <[email protected]> wrote:
> > >>
> > >> Hi Ted/Stack,
> > >>
> > >>
> > >>
> > >> We analyzed and found similar issues are occurring even in our cluster
> a
> > >> couple of times.
> > >>
> > >>
> > >>
> > >> So we are very much interested in taking it up though we have not yet
> > >> analyzed/started the ground work on it.  I would also like to know if
> > any
> > >> one is currently working on it.  Particularly JD was very much keen on
> > this
> > >> issue.
> > >>
> > >>
> > >>
> > >> Even if you guys have a plan or solution for that I would like to take
> > >> part in it or even ready to implement few things as part of it.
> > >>
> > >>
> > >>
> > >> I would like to know your comments and suggestions on this.
> > >>
> > >>
> > >>
> > >> Regards
> > >>
> > >> Ram
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> P.S: Plz do reply to the id in CC also as i will be in travel over the
> > >> weekend.
> > >>
> > >>
> > >>
> > >>
> > >
> >
> >
>
>

Re: HBASE-4060 - TimeOutMonitor refactoring

Reply via email to