Hello, Currently, if a ubik site becomes unavailable during a write transaction, all reads are blocked for quite some time. (Need to wait for timeouts, etc.) This can grind a cell to a halt (temporarily) with the loss of a single ubik site.
To alleviate this, we've been thinking about allowing reads to occur on a ubik database even when there's a conflicting write lock. When the db is write-locked, we know that the data is consistent, since no actual changes occur until the commit. So, a site would allow reads until a write transaction was committed, and reads would be blocked while we commit the changes to the local db. The problem with doing this is that it allows different data to be visible from the ubik db at the same time to clients (some sites could have committed data, while others are waiting for a commit and are serving old data). Is that horrible? Will that break everything? If that is out of the question, another idea I had was to return some error code to clients if the current site notices that the db is write-locked from another site (before a commit arrives). This would be something such that clients would retry other sites, until they get one with fresh information. That approach would prevent sites from serving old information, and would still allow db information to be available (clients should at least eventually hit the sync site, which would always have fresh info). This has the downside of additional load on the dbservers, though. Possibly the sync site in particular, when another site fails. At least with pthreaded ubik, we could also do something like "return an error after time X if the write hasn't been committed/aborted yet" as a load/responsiveness tradeoff. -- Andrew Deason [email protected] _______________________________________________ OpenAFS-devel mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-devel
