On Sun, Sep 13, 2009 at 2:13 PM, Doug Judd <[email protected]> wrote: > This looks like a nice way to add eventual consistency to Hypertable. I > like the fact that once it makes it into the proxy log it guarantees that > the write will eventually make it into the system. The only issue I see is > that updates for a cell could get written out-of-order. The client could > end up writing a newer version of a cell before the proxy writer gets a > chance to write the older version. The application can just write self > ordering entries using a monotonically increasing sequence number to solve > this problem.
Yeah, client or the proxy (when writing to the proxy log) can fill out the revision/timestamp field of the cells. > I do question the need for eventual consistency. I feel that this "concern" > is theoretical. The problem is that people do not have a well implemented > Bigtable implementation to try out. I suspect that this perceived problem > is much less of an issue than people think. Amazon developed this concept > for their shopping cart. If once every 1000th shopping cart update the > system spun for 30 seconds with a message "System busy", would you really > care? If 999 times out of 1000, the shopping cart updated instantly, you > would perceive the system as highly available. I'm with you on this one (shopping cart), I personally would suspect my net connection issues first :) OTOH, if I'm an front-end/application programmer who wants to log stuff directly into Hypertable and don't really care about consistency (must log the transactions but wouldn't read until batch processing later), having to make sure the call doesn't timeout and lose the transaction in the log is very annoying. I'd choose a back-end that makes my life easier. > I think we should wait on this until it is determined to be a real problem, > not a theoretical one. It might also be a worthy exercise to do a back of > the envelope calculation based on failure rate data to determine the real > impact of failures on availability. I think the choice really belongs to the users. I'd suggest that we add "multiple path write proxy" (MPWP) feature (easy to implement and TBD of course) to the slides to assuage people's irrational (or not) fear about write latency under recovery :) __Luke > - Doug > > On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote: >> >> One of the biggest "concerns" from potential "real-time" users of >> Hypertable is write latency spike when some nodes are down and being >> recovered. Read latency/availability are usually masked by the caching >> layer. >> >> Cassandra tries solve the problem by using "hinted handoff" (write >> data tagged with a destination to an alternative node when the >> destination node is down). Of course this mandates relaxing >> consistency guarantee to "eventual", which is a trade-off many are >> willing to make. >> >> I just thought that it's not that hard to implement something similar >> in Hypertable and give user a choice between immediate and eventual >> consistency: >> >> When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead >> of keep retrying writes in the client when a destination node is down, >> it tries to write to an alternative range server with a special update >> flag, which persists the writes to a proxy log. The maintenance >> threads on the alternative range server will try to to empty proxy log >> by retry the writes. Alternative range servers can be picked using a >> random (sort the server list by their md5 of their ip address and the >> alternatives are the next n servers) or a location (data center/rack) >> aware scheme. Note this approach works even when the alternative node >> dies when proxy logs are not yet cleared. >> >> Thoughts? >> >> __Luke >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
