[The Java Posse] Re: Rethinking persistence

Eric Newcomer Thu, 16 Jul 2009 06:08:03 -0700

This is definitely one of those "right tool for the job" situations.  There
are a lot of requirements, especially for large scale web sites, that
relational databases (and application servers for that matter) can't meet.
A lot of this is due to the different assumptions behind "traditional"
application infrastructure and web based infrastructures.

Before databases came along we basically had the choice of different file
types, structured, indexed, indexed sequential, etc. Databases basically
aggregated the file types into a single storage system mechanism and added
an abstract programming language for interacting with it. SQL has proven the
most successful of those, for a variety of reasons, high among them its
ability to dynamically change the storage structure definition (i.e. schema)
and separate the relationships among tables (i.e. unlike hierarchical
databases, which embedded pointers in the records).

Throughout this sort of evolution of programming and storage systems, the
core design assumptions did not change significantly, which was that the
major goal of a storage system was to persist the data safely and securely
before returning control to the application. When the transaction
abstraction was introduced in the early to mid 80s storage systems also were
able to protect developers from having to manually back out partial updates
that could occur during a crash.

Newer systems like Google's Big Table and many variations developed by other
large web companies changed the assumptions.  At HPTS in 2007 for example
quite a few of the web companies presented details of their infrastructures,
none of which used relational databases and application servers as core
elements. (see http://www.hpts.ws/papers/2007/agenda.html for agenda and
links to some of the presentations)

Andrew Fikes from Google said that the basis of their "scale out"
infrastructure is the assumption that all systems are going to fail,
regardless of how much money you spend, how big a box you get etc. So if
they are going to fail, why not design for failure and use the cheapest
possible hardware?

The Amazon folks published a paper on their "Dynamo" architecture in which
they describe the trade off between latency and persistence (see
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html). (Strangely
the Amazon folks were prevented from presenting at HTPS at the last minute,
which was pretty embarassing for them.) In other words, the core design
assumption of "traditional" relational databases, and other traditional
storage systems for that matter, has been to persist data as quickly and
reliably as possible since getting the data onto persistent storage was the
top priority. As Amazon and other web companies are saying, this is not true
for web based businesses - the customer experience is the priority, i.e. the
latency of the response to the HTTP GET etc.

Because of the different trade offs and different assumptions on REST/HTTP
based systems that need to put priority on giving the user a good
experience, people started developing memory based systems.

I would have loved to have based the second edition of "Principles of
Transaction Processing" on these new designs, but as Phil and I discovered,
everyone does something different right now. Some common patterns are
starting to emerge in products such as Oracle Coherence, GigaSpaces, and IBM
ExtremeScale, as Nati Shalom of GigaSpaces argues in comments to my blog
post about the book (see
http://ericnewcomer.wordpress.com/2009/06/19/second-edition-of-tp-book-out-today),
but things have not really settled down to the point where we could talk
about coherent designs, systems, and products.

Nonetheless this is a really fascinating area, and I think it's not an
exaggeration to say that Google is reinventing computing.  They are basing
their infrastructure designs on commodity data centers, which James Hamilton
documents pretty well in many presentations on his web page
http://www.mvdirona.com/jrh/work/ (this is the one he presented at HPTS
http://mvdirona.com/jrh/talksAndPapers/JamesRH_CommodityDataCenterDesign.ppt
).

I can see the potential for commodity data centers overtaking mainframes in
the future, but of course this will require application redesigns and system
rearchitectures.

This all suggests to me a somewhat oversimplified compare and contrast
between mainframe based "scale up" designs and commodity data center "scale
out" designs. Current middleware and storage system products are still
designed around what I call mainframe style assumptions.

Eric

On Thu, Jul 16, 2009 at 8:21 AM, Martin Wildam <mwil...@gmail.com> wrote:

>
> On 16 Jul., 12:43, Steven Herod <steven.he...@gmail.com> wrote:
> > XML would be the
> > 'modern' equivalent.  To do it in a relational database with Java
> > would involve a lot more work and a lot less flexibility.
>
> I am working in DMS and ECM field and here there is a wave of products
> using alternatives (although sometimes just as an addition). What I
> observe is that "stores" that are not structured as tables - stores
> that allow the storage of any data (beeing an object, xml or whatever)
> increases the "data mess". Many people have difficulties in keeping
> their data structured and organized (as many having problems keeping
> their household in pretty order ;-) ). A relational database might
> give you limitations in freedom how to archive your data but my
> observation is that relational databases produce less chaos. - YMMV.
>
>
> On 16 Jul., 12:59, Christian Catchpole <christ...@catchpole.net>
> wrote:
> > I am more so questioning how the SQL ecosystem
> > has evolved as the only serious choice for persistence.
>
> Are you thinking of "persistence" basically in relation with "objects"
> or in general in the way of making your data persistant?
>
>
> > [...] products which try to offer alternatives to relational
> > SQL.  But I guess none of these ever gain any real traction because it
> > all seems so risky to trust any of them with your most valuable
> > asset.. your data.  Especially if they don't support Crystal Reports.
>
> I think it is similar to people using lists - or in IT the Excel type
> applications. People think in lists and tables when it comes to the
> need of getting an overview. Next step is creating graphical views on
> the data. So I think, SQL is still a good choice in general for
> storing and querying data...
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "The 
Java Posse" group.
To post to this group, send email to javaposse@googlegroups.com
To unsubscribe from this group, send email to 
javaposse+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/javaposse?hl=en
-~----------~----~----~----~------~----~------~--~---

[The Java Posse] Re: Rethinking persistence

Reply via email to