Seems nice to have to have the ability to control the consistency behavior on a per client/app in stead of it being system wide. I think its a good idea to have a design for eventual consistency in mind for now and implement as required post 1.0.
-Sanjit On Sun, Sep 13, 2009 at 3:54 PM, Luke <[email protected]> wrote: > > On Sun, Sep 13, 2009 at 2:13 PM, Doug Judd <[email protected]> wrote: > > This looks like a nice way to add eventual consistency to Hypertable. I > > like the fact that once it makes it into the proxy log it guarantees that > > the write will eventually make it into the system. The only issue I see > is > > that updates for a cell could get written out-of-order. The client could > > end up writing a newer version of a cell before the proxy writer gets a > > chance to write the older version. The application can just write self > > ordering entries using a monotonically increasing sequence number to > solve > > this problem. > > Yeah, client or the proxy (when writing to the proxy log) can fill out > the revision/timestamp field of the cells. > > > I do question the need for eventual consistency. I feel that this > "concern" > > is theoretical. The problem is that people do not have a well > implemented > > Bigtable implementation to try out. I suspect that this perceived > problem > > is much less of an issue than people think. Amazon developed this > concept > > for their shopping cart. If once every 1000th shopping cart update the > > system spun for 30 seconds with a message "System busy", would you really > > care? If 999 times out of 1000, the shopping cart updated instantly, you > > would perceive the system as highly available. > > I'm with you on this one (shopping cart), I personally would suspect > my net connection issues first :) OTOH, if I'm an > front-end/application programmer who wants to log stuff directly into > Hypertable and don't really care about consistency (must log the > transactions but wouldn't read until batch processing later), having > to make sure the call doesn't timeout and lose the transaction in the > log is very annoying. I'd choose a back-end that makes my life easier. > > > I think we should wait on this until it is determined to be a real > problem, > > not a theoretical one. It might also be a worthy exercise to do a back > of > > the envelope calculation based on failure rate data to determine the real > > impact of failures on availability. > > I think the choice really belongs to the users. I'd suggest that we > add "multiple path write proxy" (MPWP) feature (easy to implement and > TBD of course) to the slides to assuage people's irrational (or not) > fear about write latency under recovery :) > > __Luke > > > - Doug > > > > On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote: > >> > >> One of the biggest "concerns" from potential "real-time" users of > >> Hypertable is write latency spike when some nodes are down and being > >> recovered. Read latency/availability are usually masked by the caching > >> layer. > >> > >> Cassandra tries solve the problem by using "hinted handoff" (write > >> data tagged with a destination to an alternative node when the > >> destination node is down). Of course this mandates relaxing > >> consistency guarantee to "eventual", which is a trade-off many are > >> willing to make. > >> > >> I just thought that it's not that hard to implement something similar > >> in Hypertable and give user a choice between immediate and eventual > >> consistency: > >> > >> When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead > >> of keep retrying writes in the client when a destination node is down, > >> it tries to write to an alternative range server with a special update > >> flag, which persists the writes to a proxy log. The maintenance > >> threads on the alternative range server will try to to empty proxy log > >> by retry the writes. Alternative range servers can be picked using a > >> random (sort the server list by their md5 of their ip address and the > >> alternatives are the next n servers) or a location (data center/rack) > >> aware scheme. Note this approach works even when the alternative node > >> dies when proxy logs are not yet cleared. > >> > >> Thoughts? > >> > >> __Luke > >> > > > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
