We're actively working on the failover implementation now. It took us a little longer to work out the kinks with the load balancer, so that date may slip a little. RangeServer failover is our #1 priority and will be be available as soon as possible.
- Doug On Tue, Oct 25, 2011 at 6:53 AM, Yingfeng <[email protected]> wrote: > According to the roadmap, it seems such feature will be available on > version 0.9.6.0 by 15.Nov, is that right? > > I think such feature would be one of the major issue on whether > bigdata company would choose to use. > > > > On Sep 8, 12:40 am, Doug Judd <[email protected]> wrote: > > Hi Sanjit, > > > > Here's some feedback about the RangeServer changes.... > > > > 1. I think you'll probably need a phantom "load" to load each range and a > > "commit" to flip the phantom ranges live. You'll have to handle a > possible > > race condition with the commit API. If the Master issues commit, but the > > RangeServer dies before it sends a response back, there will need to be > some > > way to determine whether or not the phantom ranges got flipped live. > > > > 2. I recommend dropping the word "fragment" from the API names for the > > receiving RangeServer. Conceptually, the APIs don't deal with fragments, > > they just load ranges and receive updates. For example: > > > > phantom_load > > phantom_update > > phantom_cancel > > phantom_commit > > > > 3. There's another race condition that you'll need to address. To flip a > > set of phantom ranges live, the RangeServer needs to 1) write the Ranges > to > > the RSML, and 2) link the recovery log into the Commit log. A simple > > approach might be to link the recovery log first and then write the RSML. > > > > - Doug > > > > > > > > > > > > > > > > On Thu, Sep 1, 2011 at 4:03 PM, Sanjit Jhala <[email protected]> wrote: > > > Since all data in Hypertable is persisted in an underlying DFS (with > > > replication), when a RangeServer dies its state can be recovered from > the > > > filesystem. Here is a design proposal for RangeServer failover: > > > > > *High Level Failover Algorithm* > > > > > 1. Master receives server left notification for RangeServer X, waits > > > for some time after which it declares the server dead and starts > recovery > > > 2. Master looks at RangeServer MetaLog (RSML) and Master MetaLog > (MML) > > > and figures out which ranges were on the failed RS and in what state > > > 3. Master looks at X's CommitLog (CL) fragments to see which range > > > servers have local copies. Master assigns CL "players" biased > towards > > > RangeServers with a local copy of the fragment > > > 4. Master re-assigns ranges (round robin for now) > > > 5. Master sends lists of ranges and new locations to players and > issues > > > play. > > > 6. Players replay CL frags to new range locations. Say we have > ranges > > > R1 .. RM, players P1 .. PN. For each recovered fragment RiPj all > writes are > > > stored in a CellCache only. Once the RangeServer receives all data > from Pj > > > for range Ri it writes the entire contents of the CellCache RiPj to > recovery > > > log under /servers/rsY/recovery_rsX/range_i and merges RiPj into a > CellCache > > > for Ri and deletes the CellCache RiPj. > > > 7. RangeServer X tells master it has committed data from Pj in its > > > recovery logs > > > 8. When the Master knows that all data for a range has been > committed > > > it tells the destination RangeServer to flip the range live. > > > 9. RangeServer links its range recovery log for Ri into its CL, > flips > > > the CellCache for Ri live and schedules a major compaction for Ri > and sends > > > confirmation to Master. If the range was in the middle of a split > the new > > > location reads the split log and proceeds with the split. > > > 10. Steps 5-9 are repeated for Root, Metadata, System and User > ranges > > > (in that order) until all ranges are recovered > > > > > * > > > * > > > > > *Master Changes* > > > > > Master will have a RecoverServer operation with 4 sub-operations: > > > > > - 1. RecoverServerRoot (obstructions RecoverServerRoot/Root) > > > - 2. RecoverServerMetadata (dependencies RecoverServerRoot, > > > obstructions RecoverServerMetadata) > > > - 3. RecoverServerSystem (dependencies RecoverServerRoot, > > > RecoverServerMetadata obstructions RecoverServerSystem) > > > - 4. RecoverServerUser (dependencies RecoverServerRoot, > > > RecoverServerMetadata, RecoverServerSystem obstructions > RecoverServerUser) > > > > > The logic for the "execute" step is the same for all and can be in a > base > > > class called RecoverServerBase. Meta operations such as create > table/alter > > > table will be dependent on RecoverServer operations. > > > > > Steps 1-4 above are done in the RecoverServer operation. As part of > step 4 > > > the RecoverServer operation creates 4 sub operations to recover root, > > > metadata, system and user ranges respectively, which are dependencies > for > > > the overall RecoverServer operation > > > > > *Range Server changes * > > > > > New commands/APIs > > > > > 1. play_fragment(failed server id (X) + fragment id, mapping of ranges > to > > > new locations). The RangeServer starts reading this fragment and plays > > > updates to the destination rangeservers. [Maybe buffer 200K per call or > > > cumulative as well as per range buffer limits.] If a send fails it > stops > > > sending updates to the failed range and continues. > > > > > 2. cancel_play(failed server id X + fragment id, locations): master > will > > > call this method to inform the player not to send any updates to a > location. > > > This will be called in case one of the destination range servers dies > during > > > recovery. > > > > > 2. phantom_fragment_update(table, range, fragment, update_list, eos): > > > receive updates and write them to phantom CellCache. When eos==true > append > > > CellCache out to recovery log in one write + sync > > > > > 3. phantom_fragment_cancel(...): called by master in case a player dies > and > > > the CellCaches from Pj need to be tossed away. > > > > > No changes needed for the RSML since the recovered range is either in a > > > phantom state or live state. If its in the phantom state and the > RangeServer > > > dies then the master reassigns the recovery ranges to a new location > and > > > replays the CL fragments from the beginning > > > > > *Recovery failures:* > > > > > - If destination RangeServer fails, potentially all players have to > > > replay to new destination (all play operations get serialized behind > > > root, metadata, system replays). Players inform the master of any > failed > > > range updates and the master will later tell the player to replay > the > > > fragment either to the same or another RangeSever. Master maintains > map of > > > (X, fragment id) --> players and (X, range) --> new location > > > - If player dies then the master re-assigns a new player. R1Pj .. > RMPj > > > are tossed away and the new player replays the fragment. > > > > > Any thoughts? > > > -Sanjit > > > > > -- > > > You received this message because you are subscribed to the Google > Groups > > > "Hypertable Development" group. > > > To post to this group, send email to [email protected]. > > > To unsubscribe from this group, send email to > > > [email protected]. > > > For more options, visit this group at > > >http://groups.google.com/group/hypertable-dev?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "Hypertable Development" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/hypertable-dev?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
