According to the roadmap, it seems such feature will be available on version 0.9.6.0 by 15.Nov, is that right?
I think such feature would be one of the major issue on whether bigdata company would choose to use. On Sep 8, 12:40 am, Doug Judd <[email protected]> wrote: > Hi Sanjit, > > Here's some feedback about the RangeServer changes.... > > 1. I think you'll probably need a phantom "load" to load each range and a > "commit" to flip the phantom ranges live. You'll have to handle a possible > race condition with the commit API. If the Master issues commit, but the > RangeServer dies before it sends a response back, there will need to be some > way to determine whether or not the phantom ranges got flipped live. > > 2. I recommend dropping the word "fragment" from the API names for the > receiving RangeServer. Conceptually, the APIs don't deal with fragments, > they just load ranges and receive updates. For example: > > phantom_load > phantom_update > phantom_cancel > phantom_commit > > 3. There's another race condition that you'll need to address. To flip a > set of phantom ranges live, the RangeServer needs to 1) write the Ranges to > the RSML, and 2) link the recovery log into the Commit log. A simple > approach might be to link the recovery log first and then write the RSML. > > - Doug > > > > > > > > On Thu, Sep 1, 2011 at 4:03 PM, Sanjit Jhala <[email protected]> wrote: > > Since all data in Hypertable is persisted in an underlying DFS (with > > replication), when a RangeServer dies its state can be recovered from the > > filesystem. Here is a design proposal for RangeServer failover: > > > *High Level Failover Algorithm* > > > 1. Master receives server left notification for RangeServer X, waits > > for some time after which it declares the server dead and starts recovery > > 2. Master looks at RangeServer MetaLog (RSML) and Master MetaLog (MML) > > and figures out which ranges were on the failed RS and in what state > > 3. Master looks at X's CommitLog (CL) fragments to see which range > > servers have local copies. Master assigns CL "players" biased towards > > RangeServers with a local copy of the fragment > > 4. Master re-assigns ranges (round robin for now) > > 5. Master sends lists of ranges and new locations to players and issues > > play. > > 6. Players replay CL frags to new range locations. Say we have ranges > > R1 .. RM, players P1 .. PN. For each recovered fragment RiPj all writes > > are > > stored in a CellCache only. Once the RangeServer receives all data from > > Pj > > for range Ri it writes the entire contents of the CellCache RiPj to > > recovery > > log under /servers/rsY/recovery_rsX/range_i and merges RiPj into a > > CellCache > > for Ri and deletes the CellCache RiPj. > > 7. RangeServer X tells master it has committed data from Pj in its > > recovery logs > > 8. When the Master knows that all data for a range has been committed > > it tells the destination RangeServer to flip the range live. > > 9. RangeServer links its range recovery log for Ri into its CL, flips > > the CellCache for Ri live and schedules a major compaction for Ri and > > sends > > confirmation to Master. If the range was in the middle of a split the new > > location reads the split log and proceeds with the split. > > 10. Steps 5-9 are repeated for Root, Metadata, System and User ranges > > (in that order) until all ranges are recovered > > > * > > * > > > *Master Changes* > > > Master will have a RecoverServer operation with 4 sub-operations: > > > - 1. RecoverServerRoot (obstructions RecoverServerRoot/Root) > > - 2. RecoverServerMetadata (dependencies RecoverServerRoot, > > obstructions RecoverServerMetadata) > > - 3. RecoverServerSystem (dependencies RecoverServerRoot, > > RecoverServerMetadata obstructions RecoverServerSystem) > > - 4. RecoverServerUser (dependencies RecoverServerRoot, > > RecoverServerMetadata, RecoverServerSystem obstructions > > RecoverServerUser) > > > The logic for the "execute" step is the same for all and can be in a base > > class called RecoverServerBase. Meta operations such as create table/alter > > table will be dependent on RecoverServer operations. > > > Steps 1-4 above are done in the RecoverServer operation. As part of step 4 > > the RecoverServer operation creates 4 sub operations to recover root, > > metadata, system and user ranges respectively, which are dependencies for > > the overall RecoverServer operation > > > *Range Server changes * > > > New commands/APIs > > > 1. play_fragment(failed server id (X) + fragment id, mapping of ranges to > > new locations). The RangeServer starts reading this fragment and plays > > updates to the destination rangeservers. [Maybe buffer 200K per call or > > cumulative as well as per range buffer limits.] If a send fails it stops > > sending updates to the failed range and continues. > > > 2. cancel_play(failed server id X + fragment id, locations): master will > > call this method to inform the player not to send any updates to a location. > > This will be called in case one of the destination range servers dies during > > recovery. > > > 2. phantom_fragment_update(table, range, fragment, update_list, eos): > > receive updates and write them to phantom CellCache. When eos==true append > > CellCache out to recovery log in one write + sync > > > 3. phantom_fragment_cancel(...): called by master in case a player dies and > > the CellCaches from Pj need to be tossed away. > > > No changes needed for the RSML since the recovered range is either in a > > phantom state or live state. If its in the phantom state and the RangeServer > > dies then the master reassigns the recovery ranges to a new location and > > replays the CL fragments from the beginning > > > *Recovery failures:* > > > - If destination RangeServer fails, potentially all players have to > > replay to new destination (all play operations get serialized behind > > root, metadata, system replays). Players inform the master of any failed > > range updates and the master will later tell the player to replay the > > fragment either to the same or another RangeSever. Master maintains map > > of > > (X, fragment id) --> players and (X, range) --> new location > > - If player dies then the master re-assigns a new player. R1Pj .. RMPj > > are tossed away and the new player replays the fragment. > > > Any thoughts? > > -Sanjit > > > -- > > You received this message because you are subscribed to the Google Groups > > "Hypertable Development" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group at > >http://groups.google.com/group/hypertable-dev?hl=en. -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
