Sounds reasonable. File and issue and we can put together a slide. But let's get Namespaces (dataspaces) in place first. The system feels a hokey having one big flat namespace. It feels like the ancient non-heirarchical filesystems.
- Doug On Sun, Sep 13, 2009 at 7:48 PM, Luke <[email protected]> wrote: > > On Sun, Sep 13, 2009 at 7:14 PM, Sanjit Jhala <[email protected]> wrote: > > Seems nice to have to have the ability to control the consistency > behavior > > on a per client/app in stead of it being system wide. > > Yeah, the behavior is controllable per mutator, which is already finer > granularity than per client/app. > > > I think its a good idea to have a design for eventual consistency in mind > > for now and implement as required post 1.0. > > I was just sick of people picking on the write availability issue, > which was brought up in about every conversation about Hypertable :) > Eventual consistency is easier to build on top of real consistency, > not vice versa. > > __Luke > > > -Sanjit > > > > On Sun, Sep 13, 2009 at 3:54 PM, Luke <[email protected]> wrote: > >> > >> On Sun, Sep 13, 2009 at 2:13 PM, Doug Judd <[email protected]> > wrote: > >> > This looks like a nice way to add eventual consistency to Hypertable. > I > >> > like the fact that once it makes it into the proxy log it guarantees > >> > that > >> > the write will eventually make it into the system. The only issue I > see > >> > is > >> > that updates for a cell could get written out-of-order. The client > >> > could > >> > end up writing a newer version of a cell before the proxy writer gets > a > >> > chance to write the older version. The application can just write > self > >> > ordering entries using a monotonically increasing sequence number to > >> > solve > >> > this problem. > >> > >> Yeah, client or the proxy (when writing to the proxy log) can fill out > >> the revision/timestamp field of the cells. > >> > >> > I do question the need for eventual consistency. I feel that this > >> > "concern" > >> > is theoretical. The problem is that people do not have a well > >> > implemented > >> > Bigtable implementation to try out. I suspect that this perceived > >> > problem > >> > is much less of an issue than people think. Amazon developed this > >> > concept > >> > for their shopping cart. If once every 1000th shopping cart update > the > >> > system spun for 30 seconds with a message "System busy", would you > >> > really > >> > care? If 999 times out of 1000, the shopping cart updated instantly, > >> > you > >> > would perceive the system as highly available. > >> > >> I'm with you on this one (shopping cart), I personally would suspect > >> my net connection issues first :) OTOH, if I'm an > >> front-end/application programmer who wants to log stuff directly into > >> Hypertable and don't really care about consistency (must log the > >> transactions but wouldn't read until batch processing later), having > >> to make sure the call doesn't timeout and lose the transaction in the > >> log is very annoying. I'd choose a back-end that makes my life easier. > >> > >> > I think we should wait on this until it is determined to be a real > >> > problem, > >> > not a theoretical one. It might also be a worthy exercise to do a > back > >> > of > >> > the envelope calculation based on failure rate data to determine the > >> > real > >> > impact of failures on availability. > >> > >> I think the choice really belongs to the users. I'd suggest that we > >> add "multiple path write proxy" (MPWP) feature (easy to implement and > >> TBD of course) to the slides to assuage people's irrational (or not) > >> fear about write latency under recovery :) > >> > >> __Luke > >> > >> > - Doug > >> > > >> > On Sat, Sep 12, 2009 at 1:37 PM, Luke <[email protected]> wrote: > >> >> > >> >> One of the biggest "concerns" from potential "real-time" users of > >> >> Hypertable is write latency spike when some nodes are down and being > >> >> recovered. Read latency/availability are usually masked by the > caching > >> >> layer. > >> >> > >> >> Cassandra tries solve the problem by using "hinted handoff" (write > >> >> data tagged with a destination to an alternative node when the > >> >> destination node is down). Of course this mandates relaxing > >> >> consistency guarantee to "eventual", which is a trade-off many are > >> >> willing to make. > >> >> > >> >> I just thought that it's not that hard to implement something similar > >> >> in Hypertable and give user a choice between immediate and eventual > >> >> consistency: > >> >> > >> >> When a mutator is created with BEST_EFFORT/EVENTUAL_OK flag, instead > >> >> of keep retrying writes in the client when a destination node is > down, > >> >> it tries to write to an alternative range server with a special > update > >> >> flag, which persists the writes to a proxy log. The maintenance > >> >> threads on the alternative range server will try to to empty proxy > log > >> >> by retry the writes. Alternative range servers can be picked using a > >> >> random (sort the server list by their md5 of their ip address and the > >> >> alternatives are the next n servers) or a location (data center/rack) > >> >> aware scheme. Note this approach works even when the alternative node > >> >> dies when proxy logs are not yet cleared. > >> >> > >> >> Thoughts? > >> >> > >> >> __Luke > >> >> > >> > > >> > > >> > > > >> > > >> > >> > > > > > > > > > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en -~----------~----~----~----~------~----~------~--~---
