[The Java Posse] Re: Rethinking persistence

Eric Newcomer Sun, 19 Jul 2009 07:42:07 -0700

Yes, Ok.  I can agree it's a valid comparison at the application level, and
a very important one.

I'm don't equate ACID with RDBMS.  ACID is an entertaining acronym for a set
of properties that can (and are) achieved using a wide variety of
mechanisms.  I agree a lot of BASE style implementations use non SQL
databases, or file systems.  But this does not mean they are non-ACID.

I think the updates are still atomic (A), isolated (I), and durable (D) - at
least at some point, if not immediately.  (Consistency is actually an
application responsibility more than a storage system's responsibility.)

I would also say that any data storage system implements these properties in
some way. Otherwise there are pretty big risks to ending up with partial
results, interleaved (overwritten) updates, and lost data.  Using SQL and
RDBMS style products are definitely not the only way these properties can be
achieved.  I didn't mean to imply that at all.

I think BigTable is used mostly for query analysis, isn't it?  And of course
inconsistencies in query results is not as big a problem as inconsistencies
for updates. But what about the writes to stable storage (disk or flash
memory) against which BigTable queries are performed?  I would argue these
follow AID, despite the fact they don't use an RDBMS.

I suppose my objection is somewhat picky, since I completely agree BASE
style systems have a lot of advantages, and I would actually recommend that
anyone with large IT systems understand what's going on there.  I also think
these concepts are among those that will enable the low-cost commodity data
centers to be used in more types of transaction processing systems.

But ACID does not equal RDBMS, nor does it equal 2PC. It is a set of
properties that are used to evaulate the capabilities of storate systems.  I
would say instead that BASE is really just another way to achieve the ACID
properties, mostly by introducing asynchronicity between volitaile and
persistent storage devices.  The end goal however remains AID.

Phil and I have cover the fundamentals of ACID in our TP book, and in the
second edition (which came out last month) we also cover many of the
alternatives to classic commit protocols that are used to implement the
properties, such as sagas, compensations, queuing, and replicated memory. We
also cover the CAP theorem, but unfortunately for the timing of the book the
entertaining BASE acronym was not yet widely adopted.  Although the "BASE"
acronym isn't there, the concepts in it are covered, and we did research the
papers and conferences on the topic - I suppose up until about Nov/Dec 2008
or so, when the final manuscript had to be submitted.  But I think it's all
there, starting from the principles, and ending with implementation examples
such as EJB3, JPA, .NET Entities, REST/HTTP, etc.

Because the focus of the book is on widely adopted current practice, we
could not use product examples to illustrate all of the "BASE" concepts,
since everyone seems to be implementing it in a different way, primarily
using custom code, and products are only now starting to emerge around these
concepts.  Nati Shalom of GigaSpaces takes the position that the new
generation of products is far enough along to be considered mainstream in
his comments to my
<a href="
http://ericnewcomer.wordpress.com/2009/06/19/second-edition-of-tp-book-out-today/";>blog
post. </a>

I don't think BASE concepts are very well integrated into products yet,
certainly not sufficiently for use throughout the multiple tiers typically
used in a large scale TP application, but hopefully it won't be long.
However, as with any change as significant as the move toward BASE and
eventual consistency models, it is likely to take a long time.

Eric

On Sat, Jul 18, 2009 at 6:56 PM, Peter Becker <peter.becker...@gmail.com>wrote:

>
> While I agree with your description of what BASE is and that usually
> ACID will be used in the lower layers, I do think the BASE vs. ACID
> question makes sense as long as you apply it to the application layer,
> not the whole stack. Traditionally enterprise systems are build with
> ACID assumptions on the top layer. In many cases that has been replaced
> with BASE, which on that layer is an either-or decision. Chosing BASE on
> the top layer does not imply not having ACID anywhere, though.
>
> Depending on your replication mechanisms in the storage layer the BASE
> can go very deep. You seem to imply that all applications use a
> traditional RDBMS at the bottom, but that is not necessarily true. AFAIK
> BigTable, CouchDB and the like do not provide you with ACID transactions
> at all. Data written into one node will eventually appear on the others,
> but if your webserver hits two different backends it can get
> inconsistent data. If you build your stack on top of these databases,
> you can not assume ACID anywhere.
>
> But it is important to make sure you ask the ACID vs. BASE question on
> every layer separately.
>
>  Peter
>
>
> Eric Newcomer wrote:
> > Yes, BASE and ACID are different concepts, that's why I suggested a
> > direct comparison isn't really accurate.  BASE systems would use ACID
> > for persistence to stable storage.
> >
> > To me the difference really seems to apply more to the application
> > level relationship to persistence than the database's.  The database
> > is still going to use ACID transactions for updates to stable storage.
> >
> > The idea of BASE systems is to allow the application to update
> > volatile storage and receive control back immediately, without waiting
> > for the update to be written to stable storage.  This opens the door
> > to potential inconsistencies when the data is later written to stable
> > storage, but improves latency.
> >
> > A lot of the system designs that fall into the BASE category also rely
> > on replicated memory for failure handling and load balancing, but this
> > opens another window to potential inconsistencies among replicas
> > (since it's impossible to update all replicas simultaneously).
> >
> > The concept of "eventual consistency" says that at some point these
> > kinds of inconsistencies  will be reconciled, and there are a variety
> > of techniques for doing so. None of them guarantee consistency of the
> > stable storage, since there's always a window of time between the
> > update to volatile storage and the update to persistent storage in
> > which interleaving updates can occur, or volatile data can be lost.
> > However, the risk of inconsistency, and the additional effort to
> > resolve them "eventually" once they occur is seen to be worth the
> > improved latency for the majority of cases.
> >
> > The point I was trying to make is that even in the BASE style systems
> > I'm familiar with, ACID transactions are used when the data is
> > (eventually) persisted to stable storage. The concept of BASE just
> > returns control to the application after the update volatile storage,
> > and doesn't wait for the additional time it takes to also perform the
> > update to persistent storage, therefore reducing latency for the
> > application/user.
> >
> > It's a classic trade off, but its discussion seems to have created an
> > over simplification of positioning one against the other, as if BASE
> > were a potential replacement for ACID, which I don't think it is. Good
> > marketing maybe, but not very accurate.
> >
> > Eric
> >
> > On Fri, Jul 17, 2009 at 9:32 PM, Peter Becker <peter.becker.de
> > <http://peter.becker.de>@gmail.com <http://gmail.com>> wrote:
> >
> >
> >     Eric Newcomer wrote:
> >     > ORM to me is like one of those impossible tasks, like automatically
> >     > converting data types between Java and XML.
> >     I think the latter is actually easier :-)
> >     >
> >     > I think EJB3 is a big improvement over EJB2 and JPA a big
> >     improvement
> >     > over entity beans. We are in the middle of mapping JDBC and JPA to
> >     > OSGi BTW and hopefully this will result in more pluggability for
> >     > persistence providers.
> >     >
> >     > BASE is the kind of thing I was referring to in the earlier post in
> >     > that it represents a persistence design based on a different set of
> >     > assumptions.  I would not really agree however with a
> >     characterization
> >     > of BASE vs ACID, since even in the BASE style systems I'm aware of,
> >     > ACID is still used by the databases when persistence happens. The
> >     > difference seems much more about the decision and timing of
> >     > persistence to stable storage than whether BASE is used in place of
> >     > ACID.  AFAIK ACID is still used - if what's meant is 2PC then
> >     that is
> >     > probably a more correct comparison, i.e. BASE vs 2PC.
> >      From what I understand BASE and ACID are different concepts. If it
> is
> >     BASE, it is not ACID -- it doesn't matter if something underneath
> uses
> >     ACID semantics. If you want ACID at the top, you have to control
> >     it all
> >     the way down. BASE is about giving up some of that control in
> >     favour of
> >     weaker assumptions. Once you did that, you lost ACID from that layer
> >     upwards.
> >
> >     Here is the relevant paper:
> http://queue.acm.org/detail.cfm?id=1394128
> >
> >      Peter
> >
> >
> >     > On Fri, Jul 17, 2009 at 4:34 AM, Peter Becker <peter.becker.de
> >     <http://peter.becker.de>
> >     > <http://peter.becker.de>@gmail.com <http://gmail.com>
> >     <http://gmail.com>> wrote:
> >     >
> >     >
> >     >     Rick wrote:
> >     >     > I think one of the reasons that relational databases are
> >     popular as
> >     >     > compared to other solutions is that they map well to the
> >     theoretical
> >     >     > tools, such as relational algebra/calculus.
> >     >     >
> >     >     My problem is that relational databases map to most of the
> >     theory only
> >     >     in theory. E.g. SQL does not map to relational algebra, it
> >     is more a
> >     >     "Based upon a true story" type of thing. I've done this rant
> >     a few
> >     >     types
> >     >     before (including on this forum), but one of the things I
> really
> >     >     miss is
> >     >     a true implementation of the relational algebra, which includes
> >     >     having a
> >     >     proper notion of domains (which could easily be mapped to
> >     OO-classes).
> >     >     > For an upcoming e-commerce project I suggested trying out
> >     >     couchDB (as
> >     >     > promoted by the posse) and sCouchDB (the Scala version of
> >     same?)....
> >     >     > and a friend with an architectural leaning asked something
> >     along the
> >     >     > lines of:
> >     >     >
> >     >     > "but can you guarantee atomicity?"
> >     >     >
> >     >     > which shut me up pretty quickly.
> >     >     >
> >     >     I believe the ACID vs. BASE question will become more
> >     dominant in the
> >     >     near future, though. I am somehow afraid that many project will
> >     >     pick the
> >     >     BASE option when they really need ACID.
> >     >     > Disclaimer: I'm a fan of EJB 3.0
> >     >     I've used only JPA, which is really not too bad. They certainly
> >     >     seem to
> >     >     have learned from the experiences of other products in the
> >     area, which
> >     >     is unfortunately not that common with these standards. You
> still
> >     >     need to
> >     >     like ORM to like EJB3, though :-) I just find the ORM idea
> >     to be too
> >     >     much of a neither here nor there thing.
> >     >
> >     >      Peter
> >     >
> >     >
> >     >
> >     >
> >     >
> >     > >
> >
> >
> >
> >
> >
> > >
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

[The Java Posse] Re: Rethinking persistence

Reply via email to