[The Java Posse] Re: Rethinking persistence

Peter Becker Mon, 20 Jul 2009 21:31:25 -0700

Eric,

I think we mostly agree. What caused me to jump in was your initial 
statement that any BASE type application will use ACID underneath. It 
can, but IMO if you use some of the cloud-style, lazily replicated 
databases they don't.


Maybe you didn't mean to say that there's always ACID at the bottom, 
maybe you didn't actually say it -- sometimes I read too much into 
people's posts ;-) But let's agree to at least mostly agree since I feel 
we did a good enough job exploring the topic.

Ok?

  Peter



Eric Newcomer wrote:
> Hi Peter,
>
> The classic definition of consistency as the C in ACID refers to 
> application level consistency. Storage consistency is ensured using 
> mostly A and I (i.e. writes are atomic and isolated, or locked). 
>
> The description of "eventual consistency" does indeed refer to what 
> happens on the storage end of things, but to achieve eventual 
> consistency A and I are still required when writing to disk (again 
> using the classic definition of ACID). That's what I meant by saying 
> that the major change in BASE compared to classic ACID systems is to 
> allow temporary updates to memory, which can introduce inconsistencies 
> either among replicated memory or between the memory and persistent 
> storage. I think this is the way the term "consistency" is used in the 
> BASE model, but the C in ACID refers to application level consistency. 
>
> I certainly agree BASE style systems are sufficiently proven to work 
> to be considered mainstream from one point of view, i.e. they are a 
> valid implementation choice.  But the focus of our book is on 
> describing how large TP systems work, and we could not really say that 
> the majority of large TP systems in production today use BASE style 
> designs.  We do say that BASE style systems are a future trend 
> influencing TP products and applications, but it will be a while 
> before the influence is fully realized.
>
> I also agree with what I consider the main point here, which you 
> describe as:
>   
>    But the idea behind BASE is that you don't need ACID all the time. The
>    question is "what kind of guarantees do we need for application 
> X?", the
>    traditional answer (and the one you give here) is "ACID", but sometimes
>    it should be "BASE", which allows for more performance.
>
> What I am saying is that using BASE doesn't mean you are not using 
> ACID at some level, or at some time (or more correctly using the AID 
> properties), since these properties are used when writing data to 
> stable storage. The big distinction seems to be at what point in time 
> these properties are used, i.e. at what point in time the data is 
> persisted.  The big difference I see in BASE style designs is the 
> ability for the application to work with data in memory without having 
> to persisting every update immediately, thus gaining a performance 
> advantage at the cost of the potential inconsistencies.
>
> Eric
>
> On Sun, Jul 19, 2009 at 7:31 PM, Peter Becker <peter.becker.de 
> <http://peter.becker.de>@gmail.com <http://gmail.com>> wrote:
>
>
>     Eric Newcomer wrote:
>     > Yes, Ok.  I can agree it's a valid comparison at the application
>     > level, and a very important one.
>     >
>     > I'm don't equate ACID with RDBMS.  ACID is an entertaining
>     acronym for
>     > a set of properties that can (and are) achieved using a wide variety
>     > of mechanisms.  I agree a lot of BASE style implementations use non
>     > SQL databases, or file systems.  But this does not mean they are
>     non-ACID.
>     Of course there are non-RDBMS, but ACID storage systems. Most
>     modern VCS
>     and things like JCR fall into that category.
>     >
>     > I think the updates are still atomic (A), isolated (I), and durable
>     > (D) - at least at some point, if not immediately.  (Consistency is
>     > actually an application responsibility more than a storage system's
>     > responsibility.)
>     But the application has to rely on the storage to be consistent first.
>     If the value of some property depends on how I query (e.g. which
>     node in
>     a cluster I pick), then my application will never be consistent.
>     >
>     > I would also say that any data storage system implements these
>     > properties in some way. Otherwise there are pretty big risks to
>     ending
>     > up with partial results, interleaved (overwritten) updates, and lost
>     > data.  Using SQL and RDBMS style products are definitely not the
>     only
>     > way these properties can be achieved.  I didn't mean to imply
>     that at all.
>     But the idea behind BASE is that you don't need ACID all the time. The
>     question is "what kind of guarantees do we need for application
>     X?", the
>     traditional answer (and the one you give here) is "ACID", but
>     sometimes
>     it should be "BASE", which allows for more performance.
>     >
>     > I think BigTable is used mostly for query analysis, isn't it?
>     AFAIK BigTable is the only storage you get if you use the Google
>     Application Engine.
>     > And of course inconsistencies in query results is not as big a
>     problem
>     > as inconsistencies for updates. But what about the writes to stable
>     > storage (disk or flash memory) against which BigTable queries are
>     > performed?  I would argue these follow AID, despite the fact they
>     > don't use an RDBMS.
>     Assuming a journaled filesystem they do, but is that relevant?
>     >
>     > I suppose my objection is somewhat picky, since I completely agree
>     > BASE style systems have a lot of advantages, and I would actually
>     > recommend that anyone with large IT systems understand what's
>     going on
>     > there.  I also think these concepts are among those that will enable
>     > the low-cost commodity data centers to be used in more types of
>     > transaction processing systems.
>     >
>     > But ACID does not equal RDBMS, nor does it equal 2PC. It is a set of
>     > properties that are used to evaulate the capabilities of storate
>     > systems.
>     I agree.
>     > I would say instead that BASE is really just another way to achieve
>     > the ACID properties, mostly by introducing asynchronicity between
>     > volitaile and persistent storage devices.
>     I disagree since a BASE system is not guaranteed to be entirely at any
>     point in time. Although it is true if you assume changes to your data
>     set of interest stop.
>     > The end goal however remains AID.
>     >
>     > Phil and I have cover the fundamentals of ACID in our TP book,
>     and in
>     > the second edition (which came out last month) we also cover many of
>     > the alternatives to classic commit protocols that are used to
>     > implement the properties, such as sagas, compensations, queuing, and
>     > replicated memory. We also cover the CAP theorem, but unfortunately
>     > for the timing of the book the entertaining BASE acronym was not yet
>     > widely adopted.
>     It is amusing, isn't it :-) I am not really sure the paper I cited had
>     any other real contribution, but they deserve the credit just for
>     coming
>     up with the catchy name. I'm one of these weird people who think
>     Martin
>     Fowler's biggest contribution was being one of the inventors of the
>     "POJO" term :-) Not that his books are bad, but that name
>     outshines them.
>     > Although the "BASE" acronym isn't there, the concepts in it are
>     > covered, and we did research the papers and conferences on the
>     topic -
>     > I suppose up until about Nov/Dec 2008 or so, when the final
>     manuscript
>     > had to be submitted.  But I think it's all there, starting from the
>     > principles, and ending with implementation examples such as
>     EJB3, JPA,
>     > .NET Entities, REST/HTTP, etc.
>     >
>     > Because the focus of the book is on widely adopted current practice,
>     > we could not use product examples to illustrate all of the "BASE"
>     > concepts, since everyone seems to be implementing it in a different
>     > way, primarily using custom code, and products are only now starting
>     > to emerge around these concepts.  Nati Shalom of GigaSpaces
>     takes the
>     > position that the new generation of products is far enough along
>     to be
>     > considered mainstream in his comments to my
>     > <a
>     >
>     
> href="http://ericnewcomer.wordpress.com/2009/06/19/second-edition-of-tp-book-out-today/";>blog
>     > post. </a>
>     AFAIK a lot of Amazon's storage is based on the idea, all of GAE is
>     based on BigTable (which might imply GMail and others) and there are
>     quite a few other large users of CouchDB. Sounds mainstream enough for
>     me :-)
>
>      Peter
>
>
>     >
>     > I don't think BASE concepts are very well integrated into products
>     > yet, certainly not sufficiently for use throughout the multiple
>     tiers
>     > typically used in a large scale TP application, but hopefully it
>     won't
>     > be long. However, as with any change as significant as the move
>     toward
>     > BASE and eventual consistency models, it is likely to take a
>     long time.
>     >
>     > Eric
>     >
>     >
>     >
>     > On Sat, Jul 18, 2009 at 6:56 PM, Peter Becker <peter.becker.de
>     <http://peter.becker.de>
>     > <http://peter.becker.de>@gmail.com <http://gmail.com>
>     <http://gmail.com>> wrote:
>     >
>     >
>     >     While I agree with your description of what BASE is and that
>     usually
>     >     ACID will be used in the lower layers, I do think the BASE
>     vs. ACID
>     >     question makes sense as long as you apply it to the
>     application layer,
>     >     not the whole stack. Traditionally enterprise systems are
>     build with
>     >     ACID assumptions on the top layer. In many cases that has been
>     >     replaced
>     >     with BASE, which on that layer is an either-or decision. Chosing
>     >     BASE on
>     >     the top layer does not imply not having ACID anywhere, though.
>     >
>     >     Depending on your replication mechanisms in the storage
>     layer the BASE
>     >     can go very deep. You seem to imply that all applications use a
>     >     traditional RDBMS at the bottom, but that is not necessarily
>     true.
>     >     AFAIK
>     >     BigTable, CouchDB and the like do not provide you with ACID
>     >     transactions
>     >     at all. Data written into one node will eventually appear on the
>     >     others,
>     >     but if your webserver hits two different backends it can get
>     >     inconsistent data. If you build your stack on top of these
>     databases,
>     >     you can not assume ACID anywhere.
>     >
>     >     But it is important to make sure you ask the ACID vs. BASE
>     question on
>     >     every layer separately.
>     >
>     >      Peter
>     >
>     >
>     >     Eric Newcomer wrote:
>     >     > Yes, BASE and ACID are different concepts, that's why I
>     suggested a
>     >     > direct comparison isn't really accurate.  BASE systems
>     would use
>     >     ACID
>     >     > for persistence to stable storage.
>     >     >
>     >     > To me the difference really seems to apply more to the
>     application
>     >     > level relationship to persistence than the database's.
>      The database
>     >     > is still going to use ACID transactions for updates to stable
>     >     storage.
>     >     >
>     >     > The idea of BASE systems is to allow the application to update
>     >     > volatile storage and receive control back immediately, without
>     >     waiting
>     >     > for the update to be written to stable storage.  This
>     opens the door
>     >     > to potential inconsistencies when the data is later written to
>     >     stable
>     >     > storage, but improves latency.
>     >     >
>     >     > A lot of the system designs that fall into the BASE category
>     >     also rely
>     >     > on replicated memory for failure handling and load balancing,
>     >     but this
>     >     > opens another window to potential inconsistencies among
>     replicas
>     >     > (since it's impossible to update all replicas simultaneously).
>     >     >
>     >     > The concept of "eventual consistency" says that at some
>     point these
>     >     > kinds of inconsistencies  will be reconciled, and there are a
>     >     variety
>     >     > of techniques for doing so. None of them guarantee consistency
>     >     of the
>     >     > stable storage, since there's always a window of time
>     between the
>     >     > update to volatile storage and the update to persistent
>     storage in
>     >     > which interleaving updates can occur, or volatile data can
>     be lost.
>     >     > However, the risk of inconsistency, and the additional
>     effort to
>     >     > resolve them "eventually" once they occur is seen to be
>     worth the
>     >     > improved latency for the majority of cases.
>     >     >
>     >     > The point I was trying to make is that even in the BASE style
>     >     systems
>     >     > I'm familiar with, ACID transactions are used when the data is
>     >     > (eventually) persisted to stable storage. The concept of
>     BASE just
>     >     > returns control to the application after the update volatile
>     >     storage,
>     >     > and doesn't wait for the additional time it takes to also
>     >     perform the
>     >     > update to persistent storage, therefore reducing latency
>     for the
>     >     > application/user.
>     >     >
>     >     > It's a classic trade off, but its discussion seems to have
>     >     created an
>     >     > over simplification of positioning one against the other,
>     as if BASE
>     >     > were a potential replacement for ACID, which I don't think it
>     >     is. Good
>     >     > marketing maybe, but not very accurate.
>     >     >
>     >     > Eric
>     >     >
>     >     > On Fri, Jul 17, 2009 at 9:32 PM, Peter Becker
>     <peter.becker.de <http://peter.becker.de>
>     >     <http://peter.becker.de>
>     >     > <http://peter.becker.de>@gmail.com <http://gmail.com>
>     <http://gmail.com>
>     >     <http://gmail.com>> wrote:
>     >     >
>     >     >
>     >     >     Eric Newcomer wrote:
>     >     >     > ORM to me is like one of those impossible tasks, like
>     >     automatically
>     >     >     > converting data types between Java and XML.
>     >     >     I think the latter is actually easier :-)
>     >     >     >
>     >     >     > I think EJB3 is a big improvement over EJB2 and JPA
>     a big
>     >     >     improvement
>     >     >     > over entity beans. We are in the middle of mapping JDBC
>     >     and JPA to
>     >     >     > OSGi BTW and hopefully this will result in more
>     >     pluggability for
>     >     >     > persistence providers.
>     >     >     >
>     >     >     > BASE is the kind of thing I was referring to in the
>     >     earlier post in
>     >     >     > that it represents a persistence design based on a
>     >     different set of
>     >     >     > assumptions.  I would not really agree however with a
>     >     >     characterization
>     >     >     > of BASE vs ACID, since even in the BASE style
>     systems I'm
>     >     aware of,
>     >     >     > ACID is still used by the databases when persistence
>     >     happens. The
>     >     >     > difference seems much more about the decision and
>     timing of
>     >     >     > persistence to stable storage than whether BASE is
>     used in
>     >     place of
>     >     >     > ACID.  AFAIK ACID is still used - if what's meant is
>     2PC then
>     >     >     that is
>     >     >     > probably a more correct comparison, i.e. BASE vs 2PC.
>     >     >      From what I understand BASE and ACID are different
>     >     concepts. If it is
>     >     >     BASE, it is not ACID -- it doesn't matter if something
>     >     underneath uses
>     >     >     ACID semantics. If you want ACID at the top, you have
>     to control
>     >     >     it all
>     >     >     the way down. BASE is about giving up some of that
>     control in
>     >     >     favour of
>     >     >     weaker assumptions. Once you did that, you lost ACID from
>     >     that layer
>     >     >     upwards.
>     >     >
>     >     >     Here is the relevant paper:
>     >     http://queue.acm.org/detail.cfm?id=1394128
>     >     >
>     >     >      Peter
>     >     >
>     >     >
>     >     >     > On Fri, Jul 17, 2009 at 4:34 AM, Peter Becker
>     >     <peter.becker.de <http://peter.becker.de>
>     <http://peter.becker.de>
>     >     >     <http://peter.becker.de>
>     >     >     > <http://peter.becker.de>@gmail.com
>     <http://gmail.com> <http://gmail.com>
>     >     <http://gmail.com>
>     >     >     <http://gmail.com>> wrote:
>     >     >     >
>     >     >     >
>     >     >     >     Rick wrote:
>     >     >     >     > I think one of the reasons that relational
>     databases are
>     >     >     popular as
>     >     >     >     > compared to other solutions is that they map
>     well to the
>     >     >     theoretical
>     >     >     >     > tools, such as relational algebra/calculus.
>     >     >     >     >
>     >     >     >     My problem is that relational databases map to
>     most of the
>     >     >     theory only
>     >     >     >     in theory. E.g. SQL does not map to relational
>     algebra, it
>     >     >     is more a
>     >     >     >     "Based upon a true story" type of thing. I've done
>     >     this rant
>     >     >     a few
>     >     >     >     types
>     >     >     >     before (including on this forum), but one of the
>     >     things I really
>     >     >     >     miss is
>     >     >     >     a true implementation of the relational algebra,
>     which
>     >     includes
>     >     >     >     having a
>     >     >     >     proper notion of domains (which could easily be
>     mapped to
>     >     >     OO-classes).
>     >     >     >     > For an upcoming e-commerce project I suggested
>     >     trying out
>     >     >     >     couchDB (as
>     >     >     >     > promoted by the posse) and sCouchDB (the Scala
>     >     version of
>     >     >     same?)....
>     >     >     >     > and a friend with an architectural leaning asked
>     >     something
>     >     >     along the
>     >     >     >     > lines of:
>     >     >     >     >
>     >     >     >     > "but can you guarantee atomicity?"
>     >     >     >     >
>     >     >     >     > which shut me up pretty quickly.
>     >     >     >     >
>     >     >     >     I believe the ACID vs. BASE question will become
>     more
>     >     >     dominant in the
>     >     >     >     near future, though. I am somehow afraid that many
>     >     project will
>     >     >     >     pick the
>     >     >     >     BASE option when they really need ACID.
>     >     >     >     > Disclaimer: I'm a fan of EJB 3.0
>     >     >     >     I've used only JPA, which is really not too bad.
>     They
>     >     certainly
>     >     >     >     seem to
>     >     >     >     have learned from the experiences of other
>     products in the
>     >     >     area, which
>     >     >     >     is unfortunately not that common with these
>     standards.
>     >     You still
>     >     >     >     need to
>     >     >     >     like ORM to like EJB3, though :-) I just find
>     the ORM idea
>     >     >     to be too
>     >     >     >     much of a neither here nor there thing.
>     >     >     >
>     >     >     >      Peter
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     > >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     >
>     >     > >
>     >
>     >
>     >
>     >
>     >
>     > >
>
>
>
>
>
>
> >



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

[The Java Posse] Re: Rethinking persistence

Reply via email to