I think this problem derives from the fact that Database vendors have integrated locking schemes into the DB, in a user transparent way. Whenever a CMP EJB is deployed with an ISOLATION_SERIALIZABLE attribute, data integrity must be assured by any means necessary, but also, transparent to the user. For performance reasons, this implies moving locking schemes out of the DB and into the Persistence Manager, EJB server or EJB container. How do DBs implement locking? mainly by using to timestamps or version numbers. But isn't CMP table composition responsability of the Server? then why not do this? whenever a bean must be ISOLATION_SERIALIZABLE, add a timestamp/version field to the table, and use it to provide the optimistic yet consistent locking. If you go for timestamp, use the when SELECTING, read and store in private/semantically-private member timestamp/version before UPDATING, SELECT FOR UPDATE the same ROW compare if the Timestamp/version is the same, if so, UPDATE (the SELECT FOR UPDATE row) when UPDATING, also update the timestamp(with current time, to the nanosecond) What do you think guys? Rifle -----Original Message----- From: Jonathan K. Weedon [mailto:[EMAIL PROTECTED]] Sent: Jueves, 28 de Septiembre de 2000 18:08 To: [EMAIL PROTECTED] Subject: Real-world limitations of simplistic optimistic concurrency schemes Real-world limitations of simplistic optimistic concurrency schemes (or How entity bean based applications are corrupting your database) First off, I want to thank both Evan and Rickard for the positive feedback. It is so much nicer than what I am used to, which is writing a many hundred line posting and coming back the next day to find multiple people have picked out a sentence fragment and decided to beat the s*** out of me with it. Again, thanks! I have been struggling with how to implement optimistic concurrency and provide serializable isolation on the major databases, without sacrificing too much performance. As many people are aware (and many more are not) a number of the major databases do not provide "true" serializable isolation (as if there is really any other kind!) using standard SQL. For example, if you use Oracle and you want serializable isolation, you need to use their "SELECT FOR UPDATE" syntax. So, a simple-minded way to get serializability is to use proprietary SQL in your ejbLoad implementation (or the CMP engine's implementation, hopefully). However, doing a SELECT FOR UPDATE (or the equivalent, on a different database) acquires an exclusive lock on that row (or page, or table) for the duration of your transaction, which means you are using pessimistic concurrency. That is, it means your performance will suffer substantially. The alternative is to load the data using a normal (non locking) SELECT, and then double-check when the transaction commits that the data you read is still valid. To understand the problem (and some of the solutions), consider a trivial transaction whereby we increment a value. So, for this bean, we have two CMP (or BMP) fields: public String theKey; public int theValue; In our ejbLoad, one would execute: SELECT theValue FROM SomeTable WHERE theKey = :theKey (I am using embedded SQL syntax to indicate that we have to substitute the Java value of "theKey" for ":theKey".) Now, consider we have the following method: public int increment() { return ++this.theValue; } Now, our last step is to implement the ejbStore method (or have the CMP engine do it for us). The standard implementation of this method will be: UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey So, let's now assume that we have two clients calling the method increment() on the same entity bean simultaneously. (Now I know some AppServers will serialize calls to a given entity bean if both calls go to the same Containers, but then assume that you are running a cluster, so that you have multiple Containers hosting the bean, and the calls go to different containers simultaneously. Let's say that we have two transaction, t1 and t2. It is possible to interleave the above SQL calls as follows: t1: SELECT theValue FROM SomeTable WHERE theKey = :theKey t2: SELECT theValue FROM SomeTable WHERE theKey = :theKey t1: return ++this.theValue t2: return ++this.theValue t1: UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey t2: UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey Now, if let's rewrite this with some real data. Assume SomeTable.theKey starts with a value of 1, and SomeTable.theKey is "test". Then we have: t1: SELECT theValue FROM SomeTable WHERE theKey = 'test' => result: 1 t2: SELECT theValue FROM SomeTable WHERE theKey = 'test' => result: 1 t1: return ++this.theValue => result 2 t2: return ++this.theValue => result 2 t1: UPDATE SomeTable SET (theValue = 2) WHERE theKey = 'test' t2: UPDATE SomeTable SET (theValue = 2) WHERE theKey = 'test' Now, we will have executed these two transactions, and both clients will see the result 2, and the record will be update to 2. Obviously this is invalid: not only has one of your clients obtained the wrong value, but you have corrupted your database. (Note: if you think that one of the above transactions should have rolled back, if you set your transaction isolation to serializable, then you are correct. However, if you execute the above code against Oracle, Sybase, etc., you will see that both transaction do in fact commit, and again, you have corrupted your database.) Of course, in some applications you may not be concerned if a counter is off by one, but obviously other applications are more sensitive to any sort of data corruption. Consider if this is a medical database, and the business logic is "if he has had one dose, give him another, but if he has already had two doses, don't give him a third because it might kill him." <vendor> The way that we fix this problem is to use verified, or searched updates. That is, instead of doing what is shown in every BMP example I have ever seen (and what is done by most EJB Containers today), we execute a different update statement: UPDATE SomeTable SET (theValue = :theValue) WHERE theKey = :theKey AND theValue = :oldValue Now, with this verified update, only t1 will be able to commit, not t2, and you are saved from getting a lethal dose. </vendor> The EJB 2.0 specification discusses these problems briefly in "9.6.10 Concurrent access from multiple transactions": > When writing the entity bean business methods, the Bean Provider > does not have to worry about concurrent access from multiple > transactions. The Bean Provider may assume that the container and > persistence manager will ensure appropriate synchronization for > entity objects that are accessed concurrently from multiple > transactions. The container typically uses one of the following > implementation strategies to achieve proper synchronization. (These > strategies are illustrative, not prescriptive.) > > � The container activates multiple instances of the entity bean, one > for each transaction in which the entity object is being > accessed. The transaction synchronization is performed automatically > by the underlying Persistence Manager during the accessor method > calls performed by the business methods, and by the ejbLoad, > ejbCreate<METHOD>, ejbStore, and ejbRemove methods. The Persistence > Manager, together with the database system, provides all the > necessary transaction synchronization; the container does not have > to perform any synchronization logic. The commit-time options B and > C in Subsection 9.12.4 apply to this type of container. > > With this strategy, the type of lock acquired by ejbLoad or get > accessor method (if a lazy loading cache management strategy is > used) leads to a trade-off. If ejbLoad or the accessor method > acquires an exclusive lock on the instance's state in the database, > the throughput of read-only transactions could be impacted. If > ejbLoad or the accessor method acquires a shared lock and the > instance is updated, then either ejbStore or a set accessor method > will need to promote the lock to an exclusive lock (which may cause > a deadlock if it happens concurrently under multiple transactions), > or, if the Persistence Manager uses an optimistic cache concurrency > control strategy, the Persistence Manager will need to validate the > state of the cache against the database at transaction commit (which > may result in a rollback of the transaction). So far, I think I have a good grasp of the issues involved. The problem I am struggling with is what about cross-bean dependencies. That is, the above "increment()" transactions involved only a single bean. But what if it involved multiple beans? That is, what if the first transaction were "give him a dose of this if he has never had a dose of that" and the second transaction were "give him a dose of that if he has never had a dose of this." Here, to implement both of these transactions in parallel, and to make sure we never give a dose of both this and that, we need to verify both records. But this gets us back to where we started. If, at the end of the transaction, we go back and reload the state, using a standard SELECT statement, we will still not have a guarantee that this is the value at commit time. Instead, we need to go back and use a SELECT FOR UPDATE. Of course, things are better than in the simplistic solution (where we did the SELECT FOR UPDATE in the ejbLoad). At least now we are only acquiring the exclusive lock in the ejbStore, which is at the end of the transaction, not the beginning. So our locks are held for much less time that previously. Again, this seems fine, unless you really care about performance. I think this means that if I have a transaction that modifies one bean, and also uses 100 other beans, I need to go and lock 101 rows (or pages or tables). Is this really needed, or could I get away with only verifying the modified row(s)? If I really have to double-check (and lock) all objects accessed (even those which were read-only) is this ever going to let me have decent performance. Or are people willing to give up on cross-bean dependencies? That is, are you willing to have a dose of "this" and "that". Probably not. The solution here is probably to escape to the user. That is, the user should be able to configure which beans need "bullet proof" synchronization, and which beans don't. So, if my transaction is "give him a dose of this if he has never had a dose of that, and send the prescription to his mailing address" I may not be so concerned about verifying the consistency of my accesss to the Address entity. That is, if the address was modified concurrently by another transaction, I may not care: the shipment will probably be forwarded to the new address. Although I opened by objecting to criticism, I would in fact love to learn that I am wrong. That is, I would love to learn that the major databases either have, or are planning to provide better support for serializable isolation, meaning support for serializable isolation in conjunction with optimistic concurrency. As I understand it, that means are they planning to provide what the EJB specification requires, which is the ability "to promote the [shared read] lock to an exclusive [write] lock", while simultaneously providing serializable isolation for other records read in that transaction. If the major databases were to provide (or already had) this capability, it would certainly make life simpler for us AppServer implementors. Thoughts? -jkw Rickard Oberg wrote: > > On Wed, 27 Sep 2000 21:25:18 +1200, Evan Ireland <[EMAIL PROTECTED]> > wrote: > > >"Jonathan K. Weedon" wrote: > >> > >> This is a good question: is there any AppServer that has "done it > >> right". I know of at least one (not us), but unfortunately they > >> do it by providing a proprietary CMP model. Certainly none of the > >> "brand name" products provide this all correctly. > >> ... > > > >Thanks for taking the time to respond. It would be nice to hear from other > >people on the list of their opinions in this area, especially with regard > >to unexpected update inconsistencies when using either BMP or CMP without > >optimistic concurrency control. > > I was just about to send a "Thank you!" to jkw privately when I saw your > response Evan. > > I agree completely with jkw's assessment of the state of affairs and the > possible problems that go along with that. What is distressing is not > really that it is the way it is (after all, this is largely "virgin > land", even though some people likes to say "we've been doing this for > decades"), but that noone (/not that many) have raised these flags > before. The effects of what jkw outline are both subtle and can affect > the system during extended production use. > > EJB 2.0 CMP makes things better (although at the same time being a heck > of a lot more complex *and* simpler for the developer... but that's > another topic..), but only really gives better hooks for containers do > "do it right". Then there's the issue of actually "doing it right".... > > No solution to provide, just wanted to say "yes, jkw is on target" and > "yes, this needs to be discussed". > > /Rickard =========================================================================== To unsubscribe, send email to [EMAIL PROTECTED] and include in the body of the message "signoff EJB-INTEREST". For general help, send email to [EMAIL PROTECTED] and include in the body of the message "help". =========================================================================== To unsubscribe, send email to [EMAIL PROTECTED] and include in the body of the message "signoff EJB-INTEREST". For general help, send email to [EMAIL PROTECTED] and include in the body of the message "help".
