>
> *Re:
> Update: One of the developers of PNUTS 
> commented<http://glinden.blogspot.com/2009/02/details-on-yahoos-distributed-database.html?showComment=1233884340000#c1254841206330803677>
>  on
> this post, pointing out that PNUTS performance is much better in practice
> (1-10ms/request) when caching layers are in place and making a few
> comparisons to Bigtable.*
>

Note this very interesting conversation  (below) which IMHO has potential
for STM implementation in clojure <http://clojure.org/refs> (excerpt: "PNUTs
supports row-level transactions too-- they are done through a get followed
by test-and-set. This is nothing new: it is called optimistic concurrency
control and has been around in database literature for ages, and is also
used by BigTable.").

...........................

Can and would clojure help and simplify the construction of a framework for
high-volume, high-availability, distributed web apps? Has something like
PNUTs been implemented for Clojure and Java Clouds?
............................

<http://www.blogger.com/profile/07702915403582319413>
Brian Cooper <http://www.blogger.com/profile/07702915403582319413> said...

Hi folks,

I'm another member of the PNUTS group at Yahoo! This has been a very
interesting discussion; I'm glad you folks are as interested in this stuff
as we are.

Just to reiterate a few points that Utkarsh brought up:

- There's no free lunch for performance, and if you want consistency some
writes will have to go cross-datacenter, increasing the average latency.
Cross-datacenter communication is required because of our mastership
protocol. Consider a user who's record is mastered in California. If she
flies to Europe, or a network problem causes her request to be redirected to
a datacenter on the East coast, then her write will originate in a
non-master datacenter and be forwarded back to the master for that record.
In practice this happens 10-20% of the time, just because of the way web
requests happen.

Even if you had Paxos, and managed to put the "local" participants in the
same datacenter, occasionally a write would originate in the non-"local"
datacenter and pay the cross-datacenter latency to find enough members of
the quorum. So this cost is really unavoidable.

- You could weaken the consistency, to something like "eventual consistency"
(write anywhere and resolve conflicts later) or even "best effort" (write
anywhere and don't worry about conflicts) and avoid ever paying the
cross-datacenter cost. And in fact it is possible to turn off mastership in
PNUTS. But then you need a resolution protocol, and until conflicts are
resolved inconsistent copies of the data are visible to readers. So again
there is no free lunch.

- Anonymous is write that this system is not as optimized for scans a la
MapReduce. In fact, you make different architectural decisions if you are
optimizing for low-latency updates to a geographic database than if you are
optimizing for scanning for MapReduce within a single datacenter. We have
been able to run Hadoop jobs over PNUTS and get performance that is pretty
good, just not as good as a native store (HDFS) optimized for MapReduce. So
if you want to transactionally update data with very low latency and
occasionally run MapReduce, you can use PNUTS; if you want to always run
MapReduce but don't need a lot of high performance updates, use HDFS.

- As Utkarsh has said, the hardware used for the paper is not as good as
production hardware (e.g. no battery-backed write cache, and other
limitations). We hope to publish some numbers from live workloads on the
production system soon.
..................

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to