[hypertable-dev] Re: Hyperspace recovery design proposal

Sanjit Jhala Fri, 04 Sep 2009 11:06:35 -0700

Definitely. With respect to BDB usage & client retries, there are basically
4 types of Hyperspace operations:


1.read only (readdir, attrget etc)
2.single write txn (open, lock etc.)
3.multi write txn (close, release)
4.remove expired sessions (multi write txn, not client initiated)

Of these, only 2 and 3 need request logging since they maybe retried by the
client (or be genuinely new requests) while the master is trying to complete
execution of such a previously interrupted operation.

-Sanjit


On Thu, Sep 3, 2009 at 4:56 PM, Luke <[email protected]> wrote:

>
> Is it possible to only persist modifying requests? and not the
> majority of read/keepalive requests?
>
> On Thu, Sep 3, 2009 at 4:10 PM, Sanjit Jhala<[email protected]> wrote:
> > This is required to ensure that the state remains consistent whether or
> not
> > the client retries. On a retry the client provides the master with the
> > initial request id, allowing the master to synchronize the response with
> the
> > completion of the initial request.
> >
> > There could be some requests (eg close() ) which can cause multiple BDB
> > transactions (release lock, grant pending lock requests, delete if
> > ephemeral, delete handle) along with having to persist notifications (to
> > ensure they are not lost in a crash). A crash in the middle of these
> > operations leaves the system in an inconsistent state. The idea is to
> resume
> > incomplete operations while simultaneously handling client retries.
> >
> > -Sanjit
> >
> > On Thu, Sep 3, 2009 at 3:35 PM, Luke <[email protected]> wrote:
> >>
> >> Why do master needs to persist a list of requests if the client can
> >> already retry? I think we should store as little as possible in BDB,
> >> as it's the bottle neck.
> >>
> >> On Thu, Sep 3, 2009 at 1:39 PM, Sanjit Jhala<[email protected]> wrote:
> >> > Recovery algorithm:
> >> >
> >> > 1. Set Master state to recovering, respond to any client
> >> > requests/keepalives
> >> > with "master_recovering" status (clients move into a recovery state
> >> > where
> >> > they don't send new requests but continue sending keepalives)
> >> > 2. Read in session data  persisted in BDB to memory and recreate
> session
> >> > map
> >> > 3. Identify in progress operations, add them to the "completion map"
> and
> >> > en
> >> > queue in the worker queue.
> >> > 4. Create another thread to complete the session expiration for any
> >> > sessions
> >> > marked for expiry
> >> > 5. Set Master state to ready and  resume normal operations
> >> >
> >> > In order to ensure correct completion of requests interrupted by the
> >> > master
> >> > crash and ensure the client can safely retry these operations we need
> >> > request ids (to uniquely identify requests) and a completion map.
> >> >
> >> > Request ids:
> >> > Clients maintain an increasing 64-bit request id and a min heap of
> >> > outstanding request ids. Before sending a request to the server, the
> >> > client
> >> > inserts the new id
> >> > and deletes it after it receives a server response. On each keepalive
> >> > request the client sends the top of the heap (or 0 if the heap is
> empty)
> >> > which its lowest
> >> > in progress request. (Requires a mutex lock on insert and delete from
> >> > heap)
> >> >
> >> > The server maintains a list of in-progress request ids per session (in
> >> > BDB)
> >> > as well as the most recently purged request id. New ids are added as
> >> > part
> >> > of processing the request. The result of the processing the request is
> >> > also
> >> > persisted (the state of the processing might also be stored if
> needed).
> >> > When the server receives a keepalive request it checks the client
> >> > reported
> >> > in-progress request id. If this value is non-zero and different from
> the
> >> > most
> >> > recently purged request id (stored at the server) then the server
> >> > updates to
> >> > the new id and deletes info on all previously store requests with
> >> > smaller
> >> > ids.
> >> > (Shouldn't require BDB ops/locks in the common case, ie no in-progress
> >> > requests)
> >> >
> >> > Completion map:
> >> > During master recovery, all incomplete requests will be enqueued and
> >> > resumed
> >> > from where they were left off. In addition an entry in a completion
> map
> >> > containing  "session id+request id" --> "completion object" will be
> >> > created.
> >> > Clients will attempt to retry these requests (with a retry flag). When
> >> > the
> >> > master sees the retry flag it will check the completion map and wait
> for
> >> > the
> >> > completion object to signal that the request processing is complete.
> The
> >> > master then looks up the request result (stored in BDB and possibly in
> >> > the
> >> > completion object) and sends it back to the client.
> >> >
> >> >
> >> > -Sanjit
> >> >
> >> > >
> >> >
> >>
> >>
> >
> >
> > >
> >
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[hypertable-dev] Re: Hyperspace recovery design proposal

Reply via email to