Re: Real-world limitations of simplistic optimistic concurrency s chemes

Juan Lorandi Thu, 28 Sep 2000 17:08:20 -0700
I think this problem derives from the fact that
Database vendors have integrated locking schemes into the DB, in a
user transparent way.

Whenever a CMP EJB is deployed with an ISOLATION_SERIALIZABLE attribute,
data integrity must be assured by any means necessary, but also, transparent
to the user.

For performance reasons, this implies moving locking schemes out of the DB
and into
the Persistence Manager, EJB server or EJB container.

How do DBs implement locking? mainly by using to timestamps or version
numbers.
But isn't CMP table composition responsability of the Server?

then why not do this?
whenever a bean must be ISOLATION_SERIALIZABLE, add a timestamp/version
field to the table, and
use it to provide the optimistic yet consistent locking.

If you go for timestamp, use the

when SELECTING, read and store in private/semantically-private member
timestamp/version

before UPDATING, SELECT FOR UPDATE the same ROW
compare if the Timestamp/version is the same, if so, UPDATE (the SELECT FOR
UPDATE row)
when UPDATING, also update the timestamp(with current time, to the
nanosecond)

What do you think guys?

Rifle


-----Original Message-----
From: Jonathan K. Weedon [mailto:[EMAIL PROTECTED]]
Sent: Jueves, 28 de Septiembre de 2000 18:08
To: [EMAIL PROTECTED]
Subject: Real-world limitations of simplistic optimistic concurrency
schemes


Real-world limitations of simplistic optimistic concurrency schemes

(or How entity bean based applications are corrupting your database)

First off, I want to thank both Evan and Rickard for the positive
feedback.  It is so much nicer than what I am used to, which is
writing a many hundred line posting and coming back the next day to
find multiple people have picked out a sentence fragment and decided
to beat the s*** out of me with it.  Again, thanks!

I have been struggling with how to implement optimistic concurrency
and provide serializable isolation on the major databases, without
sacrificing too much performance.

As many people are aware (and many more are not) a number of the major
databases do not provide "true" serializable isolation (as if there is
really any other kind!) using standard SQL.  For example, if you use
Oracle and you want serializable isolation, you need to use their
"SELECT FOR UPDATE" syntax.  So, a simple-minded way to get
serializability is to use proprietary SQL in your ejbLoad
implementation (or the CMP engine's implementation, hopefully).
However, doing a SELECT FOR UPDATE (or the equivalent, on a different
database) acquires an exclusive lock on that row (or page, or table)
for the duration of your transaction, which means you are using
pessimistic concurrency.  That is, it means your performance will
suffer substantially.

The alternative is to load the data using a normal (non locking)
SELECT, and then double-check when the transaction commits that the
data you read is still valid.

To understand the problem (and some of the solutions), consider a
trivial transaction whereby we increment a value.  So, for this bean,
we have two CMP (or BMP) fields:

        public String theKey;
        public int theValue;

In our ejbLoad, one would execute:

        SELECT theValue FROM SomeTable WHERE theKey = :theKey

(I am using embedded SQL syntax to indicate that we have to substitute
the Java value of "theKey" for ":theKey".)

Now, consider we have the following method:

        public int increment() {
          return ++this.theValue;
        }

Now, our last step is to implement the ejbStore method (or have the
CMP engine do it for us).  The standard implementation of this method
will be:

        UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey

So, let's now assume that we have two clients calling the method
increment() on the same entity bean simultaneously.  (Now I know some
AppServers will serialize calls to a given entity bean if both calls
go to the same Containers, but then assume that you are running a
cluster, so that you have multiple Containers hosting the bean, and
the calls go to different containers simultaneously.

Let's say that we have two transaction, t1 and t2.  It is possible to
interleave the above SQL calls as follows:

t1:     SELECT theValue FROM SomeTable WHERE theKey = :theKey
t2:     SELECT theValue FROM SomeTable WHERE theKey = :theKey
t1:       return ++this.theValue
t2:       return ++this.theValue
t1:     UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey
t2:     UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey

Now, if let's rewrite this with some real data.  Assume
SomeTable.theKey starts with a value of 1, and SomeTable.theKey is
"test".  Then we have:

t1:     SELECT theValue FROM SomeTable WHERE theKey = 'test' => result: 1
t2:     SELECT theValue FROM SomeTable WHERE theKey = 'test' => result: 1
t1:       return ++this.theValue => result 2
t2:       return ++this.theValue => result 2
t1:     UPDATE SomeTable SET (theValue = 2) WHERE theKey = 'test'
t2:     UPDATE SomeTable SET (theValue = 2) WHERE theKey = 'test'

Now, we will have executed these two transactions, and both clients
will see the result 2, and the record will be update to 2.  Obviously
this is invalid: not only has one of your clients obtained the wrong
value, but you have corrupted your database.

(Note: if you think that one of the above transactions should have
rolled back, if you set your transaction isolation to serializable,
then you are correct.  However, if you execute the above code against
Oracle, Sybase, etc., you will see that both transaction do in fact
commit, and again, you have corrupted your database.)

Of course, in some applications you may not be concerned if a counter
is off by one, but obviously other applications are more sensitive to
any sort of data corruption.  Consider if this is a medical database,
and the business logic is "if he has had one dose, give him another,
but if he has already had two doses, don't give him a third because it
might kill him."

<vendor>

The way that we fix this problem is to use verified, or searched
updates.  That is, instead of doing what is shown in every BMP example
I have ever seen (and what is done by most EJB Containers today), we
execute a different update statement:

        UPDATE SomeTable SET (theValue = :theValue) WHERE
          theKey = :theKey AND theValue = :oldValue

Now, with this verified update, only t1 will be able to commit, not
t2, and you are saved from getting a lethal dose.

</vendor>

The EJB 2.0 specification discusses these problems briefly in "9.6.10
Concurrent access from multiple transactions":

> When writing the entity bean business methods, the Bean Provider
> does not have to worry about concurrent access from multiple
> transactions. The Bean Provider may assume that the container and
> persistence manager will ensure appropriate synchronization for
> entity objects that are accessed concurrently from multiple
> transactions.  The container typically uses one of the following
> implementation strategies to achieve proper synchronization.  (These
> strategies are illustrative, not prescriptive.)
>
> � The container activates multiple instances of the entity bean, one
> for each transaction in which the entity object is being
> accessed. The transaction synchronization is performed automatically
> by the underlying Persistence Manager during the accessor method
> calls performed by the business methods, and by the ejbLoad,
> ejbCreate<METHOD>, ejbStore, and ejbRemove methods. The Persistence
> Manager, together with the database system, provides all the
> necessary transaction synchronization; the container does not have
> to perform any synchronization logic. The commit-time options B and
> C in Subsection 9.12.4 apply to this type of container.
>
> With this strategy, the type of lock acquired by ejbLoad or get
> accessor method (if a lazy loading cache management strategy is
> used) leads to a trade-off. If ejbLoad or the accessor method
> acquires an exclusive lock on the instance's state in the database,
> the throughput of read-only transactions could be impacted. If
> ejbLoad or the accessor method acquires a shared lock and the
> instance is updated, then either ejbStore or a set accessor method
> will need to promote the lock to an exclusive lock (which may cause
> a deadlock if it happens concurrently under multiple transactions),
> or, if the Persistence Manager uses an optimistic cache concurrency
> control strategy, the Persistence Manager will need to validate the
> state of the cache against the database at transaction commit (which
> may result in a rollback of the transaction).

So far, I think I have a good grasp of the issues involved.  The
problem I am struggling with is what about cross-bean dependencies.
That is, the above "increment()" transactions involved only a single
bean.  But what if it involved multiple beans?  That is, what if the
first transaction were "give him a dose of this if he has never had a
dose of that" and the second transaction were "give him a dose of that
if he has never had a dose of this."  Here, to implement both of these
transactions in parallel, and to make sure we never give a dose of
both this and that, we need to verify both records.

But this gets us back to where we started.  If, at the end of the
transaction, we go back and reload the state, using a standard SELECT
statement, we will still not have a guarantee that this is the value
at commit time.  Instead, we need to go back and use a SELECT FOR
UPDATE.  Of course, things are better than in the simplistic solution
(where we did the SELECT FOR UPDATE in the ejbLoad).  At least now we
are only acquiring the exclusive lock in the ejbStore, which is at the
end of the transaction, not the beginning.  So our locks are held for
much less time that previously.

Again, this seems fine, unless you really care about performance.  I
think this means that if I have a transaction that modifies one bean,
and also uses 100 other beans, I need to go and lock 101 rows (or
pages or tables).  Is this really needed, or could I get away with
only verifying the modified row(s)?  If I really have to double-check
(and lock) all objects accessed (even those which were read-only) is
this ever going to let me have decent performance.  Or are people
willing to give up on cross-bean dependencies?  That is, are you
willing to have a dose of "this" and "that".  Probably not.  The
solution here is probably to escape to the user.  That is, the user
should be able to configure which beans need "bullet proof"
synchronization, and which beans don't.  So, if my transaction is
"give him a dose of this if he has never had a dose of that, and send
the prescription to his mailing address" I may not be so concerned
about verifying the consistency of my accesss to the Address entity.
That is, if the address was modified concurrently by another
transaction, I may not care: the shipment will probably be forwarded
to the new address.

Although I opened by objecting to criticism, I would in fact love to
learn that I am wrong.  That is, I would love to learn that the major
databases either have, or are planning to provide better support for
serializable isolation, meaning support for serializable isolation in
conjunction with optimistic concurrency.  As I understand it, that
means are they planning to provide what the EJB specification
requires, which is the ability "to promote the [shared read] lock to
an exclusive [write] lock", while simultaneously providing
serializable isolation for other records read in that transaction.  If
the major databases were to provide (or already had) this capability,
it would certainly make life simpler for us AppServer implementors.

Thoughts?

-jkw

Rickard Oberg wrote:
>
> On Wed, 27 Sep 2000 21:25:18 +1200, Evan Ireland <[EMAIL PROTECTED]>
> wrote:
>
> >"Jonathan K. Weedon" wrote:
> >>
> >> This is a good question: is there any AppServer that has "done it
> >> right".  I know of at least one (not us), but unfortunately they
> >> do it by providing a proprietary CMP model.  Certainly none of the
> >> "brand name" products provide this all correctly.
> >> ...
> >
> >Thanks for taking the time to respond. It would be nice to hear from
other
> >people on the list of their opinions in this area, especially with regard
> >to unexpected update inconsistencies when using either BMP or CMP without
> >optimistic concurrency control.
>
> I was just about to send a "Thank you!" to jkw privately when I saw your
> response Evan.
>
> I agree completely with jkw's assessment of the state of affairs and the
> possible problems that go along with that. What is distressing is not
> really that it is the way it is (after all, this is largely "virgin
> land", even though some people likes to say "we've been doing this for
> decades"), but that noone (/not that many) have raised these flags
> before. The effects of what jkw outline are both subtle and can affect
> the system during extended production use.
>
> EJB 2.0 CMP makes things better (although at the same time being a heck
> of a lot more complex *and* simpler for the developer... but that's
> another topic..), but only really gives better hooks for containers do
> "do it right". Then there's the issue of actually "doing it right"....
>
> No solution to provide, just wanted to say "yes, jkw is on target" and
> "yes, this needs to be discussed".
>
> /Rickard

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".
Re: Real-world limitations of simplistic optimistic concurrency s chemes

Reply via email to