Rrgards this duplicated assignment issue.
In my consideration, both the interim fix and persistence fix may be
not robust.
Following MSC chart is my proposal.
I am not familar with the latest Hypertable code (I had studied
0.9.0.7), if I am wrong, please point me.
char1: success assignment case, we should design a acknowledge
mechanism.
origRS -----------------------------
Master-----------------------------------RS1
--------split range notify------->
select a RS
------------
assign to RS1-------->
<--------succ ack--------------- <-----------succ
ack---------------
chart2: failuer/timeout assignment case
origRS -----------------------------
Master-----------------------------------RS1-------------------RS2
--------split range notify------->
select a RS
------------
assign to RS1-------->
timeout or failed
-------retry 2
times assign ------>
still timeout or failed
select another RS
------------
deassign-------->
----------------------- assign to another RS2--------->
still timeout or failed
<--------report failure-----------
...................(another round)...................
3. a mechanism to avoid duplicated or wrong assigment
origRS -----------------------------
Master-----------------------------------RS1
<-----------
succ ack---------------
check, but find
the range is in RS2
------------
deassign------------->
<-----------
succ ack---------------
On Mar 21, 1:41 am, Doug Judd <[email protected]> wrote:
> P.S. The memory exhaustion problem will be fixed in the 0.9.2.4 release.
>
> On Fri, Mar 20, 2009 at 10:37 AM, Doug Judd <[email protected]> wrote:
> > With the help of Earle Ady, we've found and fixed the large load corruption
> > problem with the 0.9.2.2 release. To get the fixed version, please pull the
> > latest code from the git
> > repository<http://code.google.com/p/hypertable/wiki/SourceCode?tm=4>.
> > We'll be releasing 0.9.2.3 soon.
>
> > Here's a summary of the problem:
>
> > With the fix of issue
> > 246<http://code.google.com/p/hypertable/issues/detail?id=246>,
> > compactions are now happening regularly as they should. However, this has
> > added substantial load on the system. When a range split and the master was
> > notified of the newly split-off range, the master selected (round-robin) a
> > new RangeServer to own the range. However, due to the increased load on the
> > system and a 30 second hardcoded timeout in the Master, the
> > RangeServer::load_range() command was timing out (It was taking 32 to 37
> > seconds). This timeout was reported back to the originating RangeServer,
> > which paused a fifteen seconds and tried it again. But on the second
> > attempt to notify the Master of the newly split-off range, the Master would
> > (round-robin) select another RangeServer and invoke
> > RangeServer::load_range() on that (different) server. This had the effect
> > of the same range being loaded by three different RangeServers which was
> > wreaking havoc with the system. There were two fixes for this problem:
>
> > 1. The hardcoded timeout was removed and (almost) all timeouts in the
> > system are based on the "Hypertable.Request.Timeout" property which now has
> > a default value of 180 seconds.
>
> > 2. An interim fix was put in place in the Master where upon
> > RangeServer::load_range() failure, the Master will remember what RangeServer
> > it attmpted to do the load on. The next time it gets notified and attempts
> > to load the same range, it will choose the same RangeServer. If it gets an
> > error message back, RANGE_ALREADY_LOADED, it will interpret that as
> > success. The reason this fix is interim is because it does not persist the
> > Range-to-RangeServer mapping information, so if it were to fail at an
> > inopportune time and come back up, we'd be subject to the same failure.
> > This will get fixed with Issue 74 - Master directed
> > RangeServer<http://code.google.com/p/hypertable/issues/detail?id=79>recovery
> > since the Master will have a meta-log and will be able to persist
> > this mapping as re-constructible state information.
>
> > After we fixed this problem, the next problem that Earle ran into was that
> > the RangeServer was exhausting memory and crashing. To fix this, we added
> > the following property to the hypertable.cfg file on the machine that was
> > doing the LOAD DATA INFILE:
>
> > Hypertable.Lib.Mutator.FlushDelay=100
>
> > Keep this in mind if you encounter the same problem.
>
> > - Doug
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---