This looks like a nice way to add eventual consistency to Hypertable. I like the fact that once it makes it into the proxy log it guarantees that the write will eventually make it into the system. The only issue I see is that updates for a cell could get written out-of-order. The client could end up writing a newer version of a cell before the proxy writer gets a chance to write the older version. The application can just write self ordering entries using a monotonically increasing sequence number to solve this problem.
I do question the need for eventual consistency. I feel that this "concern" is theoretical. The problem is that people do not have a well implemented Bigtable implementation to try out. I suspect that this perceived problem is much less of an issue than people think. Amazon developed this concept for their shopping cart. If once every 1000th shopping cart update the system spun for 30 seconds with a message "System busy", would you really care? If 999 times out of 1000, the shopping cart updated instantly, you would perceive the system as highly available. I think we should wait on this until it is determined to be a real problem, not a theoretical one. It might also be a worthy exercise to do a back of the envelope calculation based on failure rate data to determine the real impact of failures on availability. - Doug On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote: > > One of the biggest "concerns" from potential "real-time" users of > Hypertable is write latency spike when some nodes are down and being > recovered. Read latency/availability are usually masked by the caching > layer. > > Cassandra tries solve the problem by using "hinted handoff" (write > data tagged with a destination to an alternative node when the > destination node is down). Of course this mandates relaxing > consistency guarantee to "eventual", which is a trade-off many are > willing to make. > > I just thought that it's not that hard to implement something similar > in Hypertable and give user a choice between immediate and eventual > consistency: > > When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead > of keep retrying writes in the client when a destination node is down, > it tries to write to an alternative range server with a special update > flag, which persists the writes to a proxy log. The maintenance > threads on the alternative range server will try to to empty proxy log > by retry the writes. Alternative range servers can be picked using a > random (sort the server list by their md5 of their ip address and the > alternatives are the next n servers) or a location (data center/rack) > aware scheme. Note this approach works even when the alternative node > dies when proxy logs are not yet cleared. > > Thoughts? > > __Luke > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
