> > *Re: > Update: One of the developers of PNUTS > commented<http://glinden.blogspot.com/2009/02/details-on-yahoos-distributed-database.html?showComment=1233884340000#c1254841206330803677> > on > this post, pointing out that PNUTS performance is much better in practice > (1-10ms/request) when caching layers are in place and making a few > comparisons to Bigtable.* >
Note this very interesting conversation (below) which IMHO has potential for STM implementation in clojure <http://clojure.org/refs> (excerpt: "PNUTs supports row-level transactions too-- they are done through a get followed by test-and-set. This is nothing new: it is called optimistic concurrency control and has been around in database literature for ages, and is also used by BigTable."). ........................... Can and would clojure help and simplify the construction of a framework for high-volume, high-availability, distributed web apps? Has something like PNUTs been implemented for Clojure and Java Clouds? ............................ <http://www.blogger.com/profile/07702915403582319413> Brian Cooper <http://www.blogger.com/profile/07702915403582319413> said... Hi folks, I'm another member of the PNUTS group at Yahoo! This has been a very interesting discussion; I'm glad you folks are as interested in this stuff as we are. Just to reiterate a few points that Utkarsh brought up: - There's no free lunch for performance, and if you want consistency some writes will have to go cross-datacenter, increasing the average latency. Cross-datacenter communication is required because of our mastership protocol. Consider a user who's record is mastered in California. If she flies to Europe, or a network problem causes her request to be redirected to a datacenter on the East coast, then her write will originate in a non-master datacenter and be forwarded back to the master for that record. In practice this happens 10-20% of the time, just because of the way web requests happen. Even if you had Paxos, and managed to put the "local" participants in the same datacenter, occasionally a write would originate in the non-"local" datacenter and pay the cross-datacenter latency to find enough members of the quorum. So this cost is really unavoidable. - You could weaken the consistency, to something like "eventual consistency" (write anywhere and resolve conflicts later) or even "best effort" (write anywhere and don't worry about conflicts) and avoid ever paying the cross-datacenter cost. And in fact it is possible to turn off mastership in PNUTS. But then you need a resolution protocol, and until conflicts are resolved inconsistent copies of the data are visible to readers. So again there is no free lunch. - Anonymous is write that this system is not as optimized for scans a la MapReduce. In fact, you make different architectural decisions if you are optimizing for low-latency updates to a geographic database than if you are optimizing for scanning for MapReduce within a single datacenter. We have been able to run Hadoop jobs over PNUTS and get performance that is pretty good, just not as good as a native store (HDFS) optimized for MapReduce. So if you want to transactionally update data with very low latency and occasionally run MapReduce, you can use PNUTS; if you want to always run MapReduce but don't need a lot of high performance updates, use HDFS. - As Utkarsh has said, the hardware used for the paper is not as good as production hardware (e.g. no battery-backed write cache, and other limitations). We hope to publish some numbers from live workloads on the production system soon. .................. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en