[The Java Posse] Re: Rethinking persistence

Alex Turner Wed, 22 Jul 2009 21:53:11 -0700

http://en.wikipedia.org/wiki/ACID


Think maybe you should read this and rethink that position.

On Mon, Jul 20, 2009 at 9:59 AM, Eric Newcomer<enewco...@gmail.com> wrote:
> Hi Peter,
>
> The classic definition of consistency as the C in ACID refers to application
> level consistency. Storage consistency is ensured using mostly A and I (i.e.
> writes are atomic and isolated, or locked).
>
> The description of "eventual consistency" does indeed refer to what happens
> on the storage end of things, but to achieve eventual consistency A and I
> are still required when writing to disk (again using the classic definition
> of ACID). That's what I meant by saying that the major change in BASE
> compared to classic ACID systems is to allow temporary updates to memory,
> which can introduce inconsistencies either among replicated memory or
> between the memory and persistent storage. I think this is the way the term
> "consistency" is used in the BASE model, but the C in ACID refers to
> application level consistency.
>
> I certainly agree BASE style systems are sufficiently proven to work to be
> considered mainstream from one point of view, i.e. they are a valid
> implementation choice.  But the focus of our book is on describing how large
> TP systems work, and we could not really say that the majority of large TP
> systems in production today use BASE style designs.  We do say that BASE
> style systems are a future trend influencing TP products and applications,
> but it will be a while before the influence is fully realized.
>
> I also agree with what I consider the main point here, which you describe
> as:
>
>    But the idea behind BASE is that you don't need ACID all the time. The
>    question is "what kind of guarantees do we need for application X?", the
>    traditional answer (and the one you give here) is "ACID", but sometimes
>    it should be "BASE", which allows for more performance.
>
> What I am saying is that using BASE doesn't mean you are not using ACID at
> some level, or at some time (or more correctly using the AID properties),
> since these properties are used when writing data to stable storage. The big
> distinction seems to be at what point in time these properties are used,
> i.e. at what point in time the data is persisted.  The big difference I see
> in BASE style designs is the ability for the application to work with data
> in memory without having to persisting every update immediately, thus
> gaining a performance advantage at the cost of the potential
> inconsistencies.
>
> Eric
>
> On Sun, Jul 19, 2009 at 7:31 PM, Peter Becker <peter.becker...@gmail.com>
> wrote:
>>
>> Eric Newcomer wrote:
>> > Yes, Ok.  I can agree it's a valid comparison at the application
>> > level, and a very important one.
>> >
>> > I'm don't equate ACID with RDBMS.  ACID is an entertaining acronym for
>> > a set of properties that can (and are) achieved using a wide variety
>> > of mechanisms.  I agree a lot of BASE style implementations use non
>> > SQL databases, or file systems.  But this does not mean they are
>> > non-ACID.
>> Of course there are non-RDBMS, but ACID storage systems. Most modern VCS
>> and things like JCR fall into that category.
>> >
>> > I think the updates are still atomic (A), isolated (I), and durable
>> > (D) - at least at some point, if not immediately.  (Consistency is
>> > actually an application responsibility more than a storage system's
>> > responsibility.)
>> But the application has to rely on the storage to be consistent first.
>> If the value of some property depends on how I query (e.g. which node in
>> a cluster I pick), then my application will never be consistent.
>> >
>> > I would also say that any data storage system implements these
>> > properties in some way. Otherwise there are pretty big risks to ending
>> > up with partial results, interleaved (overwritten) updates, and lost
>> > data.  Using SQL and RDBMS style products are definitely not the only
>> > way these properties can be achieved.  I didn't mean to imply that at
>> > all.
>> But the idea behind BASE is that you don't need ACID all the time. The
>> question is "what kind of guarantees do we need for application X?", the
>> traditional answer (and the one you give here) is "ACID", but sometimes
>> it should be "BASE", which allows for more performance.
>> >
>> > I think BigTable is used mostly for query analysis, isn't it?
>> AFAIK BigTable is the only storage you get if you use the Google
>> Application Engine.
>> > And of course inconsistencies in query results is not as big a problem
>> > as inconsistencies for updates. But what about the writes to stable
>> > storage (disk or flash memory) against which BigTable queries are
>> > performed?  I would argue these follow AID, despite the fact they
>> > don't use an RDBMS.
>> Assuming a journaled filesystem they do, but is that relevant?
>> >
>> > I suppose my objection is somewhat picky, since I completely agree
>> > BASE style systems have a lot of advantages, and I would actually
>> > recommend that anyone with large IT systems understand what's going on
>> > there.  I also think these concepts are among those that will enable
>> > the low-cost commodity data centers to be used in more types of
>> > transaction processing systems.
>> >
>> > But ACID does not equal RDBMS, nor does it equal 2PC. It is a set of
>> > properties that are used to evaulate the capabilities of storate
>> > systems.
>> I agree.
>> > I would say instead that BASE is really just another way to achieve
>> > the ACID properties, mostly by introducing asynchronicity between
>> > volitaile and persistent storage devices.
>> I disagree since a BASE system is not guaranteed to be entirely at any
>> point in time. Although it is true if you assume changes to your data
>> set of interest stop.
>> > The end goal however remains AID.
>> >
>> > Phil and I have cover the fundamentals of ACID in our TP book, and in
>> > the second edition (which came out last month) we also cover many of
>> > the alternatives to classic commit protocols that are used to
>> > implement the properties, such as sagas, compensations, queuing, and
>> > replicated memory. We also cover the CAP theorem, but unfortunately
>> > for the timing of the book the entertaining BASE acronym was not yet
>> > widely adopted.
>> It is amusing, isn't it :-) I am not really sure the paper I cited had
>> any other real contribution, but they deserve the credit just for coming
>> up with the catchy name. I'm one of these weird people who think Martin
>> Fowler's biggest contribution was being one of the inventors of the
>> "POJO" term :-) Not that his books are bad, but that name outshines them.
>> > Although the "BASE" acronym isn't there, the concepts in it are
>> > covered, and we did research the papers and conferences on the topic -
>> > I suppose up until about Nov/Dec 2008 or so, when the final manuscript
>> > had to be submitted.  But I think it's all there, starting from the
>> > principles, and ending with implementation examples such as EJB3, JPA,
>> > .NET Entities, REST/HTTP, etc.
>> >
>> > Because the focus of the book is on widely adopted current practice,
>> > we could not use product examples to illustrate all of the "BASE"
>> > concepts, since everyone seems to be implementing it in a different
>> > way, primarily using custom code, and products are only now starting
>> > to emerge around these concepts.  Nati Shalom of GigaSpaces takes the
>> > position that the new generation of products is far enough along to be
>> > considered mainstream in his comments to my
>> > <a
>> >
>> > href="http://ericnewcomer.wordpress.com/2009/06/19/second-edition-of-tp-book-out-today/";>blog
>> > post. </a>
>> AFAIK a lot of Amazon's storage is based on the idea, all of GAE is
>> based on BigTable (which might imply GMail and others) and there are
>> quite a few other large users of CouchDB. Sounds mainstream enough for
>> me :-)
>>
>>  Peter
>>
>>
>> >
>> > I don't think BASE concepts are very well integrated into products
>> > yet, certainly not sufficiently for use throughout the multiple tiers
>> > typically used in a large scale TP application, but hopefully it won't
>> > be long. However, as with any change as significant as the move toward
>> > BASE and eventual consistency models, it is likely to take a long time.
>> >
>> > Eric
>> >
>> >
>> >
>> > On Sat, Jul 18, 2009 at 6:56 PM, Peter Becker <peter.becker.de
>> > <http://peter.becker.de>@gmail.com <http://gmail.com>> wrote:
>> >
>> >
>> >     While I agree with your description of what BASE is and that usually
>> >     ACID will be used in the lower layers, I do think the BASE vs. ACID
>> >     question makes sense as long as you apply it to the application
>> > layer,
>> >     not the whole stack. Traditionally enterprise systems are build with
>> >     ACID assumptions on the top layer. In many cases that has been
>> >     replaced
>> >     with BASE, which on that layer is an either-or decision. Chosing
>> >     BASE on
>> >     the top layer does not imply not having ACID anywhere, though.
>> >
>> >     Depending on your replication mechanisms in the storage layer the
>> > BASE
>> >     can go very deep. You seem to imply that all applications use a
>> >     traditional RDBMS at the bottom, but that is not necessarily true.
>> >     AFAIK
>> >     BigTable, CouchDB and the like do not provide you with ACID
>> >     transactions
>> >     at all. Data written into one node will eventually appear on the
>> >     others,
>> >     but if your webserver hits two different backends it can get
>> >     inconsistent data. If you build your stack on top of these
>> > databases,
>> >     you can not assume ACID anywhere.
>> >
>> >     But it is important to make sure you ask the ACID vs. BASE question
>> > on
>> >     every layer separately.
>> >
>> >      Peter
>> >
>> >
>> >     Eric Newcomer wrote:
>> >     > Yes, BASE and ACID are different concepts, that's why I suggested
>> > a
>> >     > direct comparison isn't really accurate.  BASE systems would use
>> >     ACID
>> >     > for persistence to stable storage.
>> >     >
>> >     > To me the difference really seems to apply more to the application
>> >     > level relationship to persistence than the database's.  The
>> > database
>> >     > is still going to use ACID transactions for updates to stable
>> >     storage.
>> >     >
>> >     > The idea of BASE systems is to allow the application to update
>> >     > volatile storage and receive control back immediately, without
>> >     waiting
>> >     > for the update to be written to stable storage.  This opens the
>> > door
>> >     > to potential inconsistencies when the data is later written to
>> >     stable
>> >     > storage, but improves latency.
>> >     >
>> >     > A lot of the system designs that fall into the BASE category
>> >     also rely
>> >     > on replicated memory for failure handling and load balancing,
>> >     but this
>> >     > opens another window to potential inconsistencies among replicas
>> >     > (since it's impossible to update all replicas simultaneously).
>> >     >
>> >     > The concept of "eventual consistency" says that at some point
>> > these
>> >     > kinds of inconsistencies  will be reconciled, and there are a
>> >     variety
>> >     > of techniques for doing so. None of them guarantee consistency
>> >     of the
>> >     > stable storage, since there's always a window of time between the
>> >     > update to volatile storage and the update to persistent storage in
>> >     > which interleaving updates can occur, or volatile data can be
>> > lost.
>> >     > However, the risk of inconsistency, and the additional effort to
>> >     > resolve them "eventually" once they occur is seen to be worth the
>> >     > improved latency for the majority of cases.
>> >     >
>> >     > The point I was trying to make is that even in the BASE style
>> >     systems
>> >     > I'm familiar with, ACID transactions are used when the data is
>> >     > (eventually) persisted to stable storage. The concept of BASE just
>> >     > returns control to the application after the update volatile
>> >     storage,
>> >     > and doesn't wait for the additional time it takes to also
>> >     perform the
>> >     > update to persistent storage, therefore reducing latency for the
>> >     > application/user.
>> >     >
>> >     > It's a classic trade off, but its discussion seems to have
>> >     created an
>> >     > over simplification of positioning one against the other, as if
>> > BASE
>> >     > were a potential replacement for ACID, which I don't think it
>> >     is. Good
>> >     > marketing maybe, but not very accurate.
>> >     >
>> >     > Eric
>> >     >
>> >     > On Fri, Jul 17, 2009 at 9:32 PM, Peter Becker <peter.becker.de
>> >     <http://peter.becker.de>
>> >     > <http://peter.becker.de>@gmail.com <http://gmail.com>
>> >     <http://gmail.com>> wrote:
>> >     >
>> >     >
>> >     >     Eric Newcomer wrote:
>> >     >     > ORM to me is like one of those impossible tasks, like
>> >     automatically
>> >     >     > converting data types between Java and XML.
>> >     >     I think the latter is actually easier :-)
>> >     >     >
>> >     >     > I think EJB3 is a big improvement over EJB2 and JPA a big
>> >     >     improvement
>> >     >     > over entity beans. We are in the middle of mapping JDBC
>> >     and JPA to
>> >     >     > OSGi BTW and hopefully this will result in more
>> >     pluggability for
>> >     >     > persistence providers.
>> >     >     >
>> >     >     > BASE is the kind of thing I was referring to in the
>> >     earlier post in
>> >     >     > that it represents a persistence design based on a
>> >     different set of
>> >     >     > assumptions.  I would not really agree however with a
>> >     >     characterization
>> >     >     > of BASE vs ACID, since even in the BASE style systems I'm
>> >     aware of,
>> >     >     > ACID is still used by the databases when persistence
>> >     happens. The
>> >     >     > difference seems much more about the decision and timing of
>> >     >     > persistence to stable storage than whether BASE is used in
>> >     place of
>> >     >     > ACID.  AFAIK ACID is still used - if what's meant is 2PC
>> > then
>> >     >     that is
>> >     >     > probably a more correct comparison, i.e. BASE vs 2PC.
>> >     >      From what I understand BASE and ACID are different
>> >     concepts. If it is
>> >     >     BASE, it is not ACID -- it doesn't matter if something
>> >     underneath uses
>> >     >     ACID semantics. If you want ACID at the top, you have to
>> > control
>> >     >     it all
>> >     >     the way down. BASE is about giving up some of that control in
>> >     >     favour of
>> >     >     weaker assumptions. Once you did that, you lost ACID from
>> >     that layer
>> >     >     upwards.
>> >     >
>> >     >     Here is the relevant paper:
>> >     http://queue.acm.org/detail.cfm?id=1394128
>> >     >
>> >     >      Peter
>> >     >
>> >     >
>> >     >     > On Fri, Jul 17, 2009 at 4:34 AM, Peter Becker
>> >     <peter.becker.de <http://peter.becker.de>
>> >     >     <http://peter.becker.de>
>> >     >     > <http://peter.becker.de>@gmail.com <http://gmail.com>
>> >     <http://gmail.com>
>> >     >     <http://gmail.com>> wrote:
>> >     >     >
>> >     >     >
>> >     >     >     Rick wrote:
>> >     >     >     > I think one of the reasons that relational databases
>> > are
>> >     >     popular as
>> >     >     >     > compared to other solutions is that they map well to
>> > the
>> >     >     theoretical
>> >     >     >     > tools, such as relational algebra/calculus.
>> >     >     >     >
>> >     >     >     My problem is that relational databases map to most of
>> > the
>> >     >     theory only
>> >     >     >     in theory. E.g. SQL does not map to relational algebra,
>> > it
>> >     >     is more a
>> >     >     >     "Based upon a true story" type of thing. I've done
>> >     this rant
>> >     >     a few
>> >     >     >     types
>> >     >     >     before (including on this forum), but one of the
>> >     things I really
>> >     >     >     miss is
>> >     >     >     a true implementation of the relational algebra, which
>> >     includes
>> >     >     >     having a
>> >     >     >     proper notion of domains (which could easily be mapped
>> > to
>> >     >     OO-classes).
>> >     >     >     > For an upcoming e-commerce project I suggested
>> >     trying out
>> >     >     >     couchDB (as
>> >     >     >     > promoted by the posse) and sCouchDB (the Scala
>> >     version of
>> >     >     same?)....
>> >     >     >     > and a friend with an architectural leaning asked
>> >     something
>> >     >     along the
>> >     >     >     > lines of:
>> >     >     >     >
>> >     >     >     > "but can you guarantee atomicity?"
>> >     >     >     >
>> >     >     >     > which shut me up pretty quickly.
>> >     >     >     >
>> >     >     >     I believe the ACID vs. BASE question will become more
>> >     >     dominant in the
>> >     >     >     near future, though. I am somehow afraid that many
>> >     project will
>> >     >     >     pick the
>> >     >     >     BASE option when they really need ACID.
>> >     >     >     > Disclaimer: I'm a fan of EJB 3.0
>> >     >     >     I've used only JPA, which is really not too bad. They
>> >     certainly
>> >     >     >     seem to
>> >     >     >     have learned from the experiences of other products in
>> > the
>> >     >     area, which
>> >     >     >     is unfortunately not that common with these standards.
>> >     You still
>> >     >     >     need to
>> >     >     >     like ORM to like EJB3, though :-) I just find the ORM
>> > idea
>> >     >     to be too
>> >     >     >     much of a neither here nor there thing.
>> >     >     >
>> >     >     >      Peter
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     >
>> >     >     > >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     >
>> >     > >
>> >
>> >
>> >
>> >
>> >
>> > >
>>
>>
>>
>>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

[The Java Posse] Re: Rethinking persistence

Reply via email to