On 1/13/06, Jules Gosnell <[EMAIL PROTECTED]> wrote:
[EMAIL PROTECTED] wrote:

>>[EMAIL PROTECTED] wrote:
>>Okay. I will ask you a question then. What are you doing as far caching
>entity beans?
>
>
In terms or replication or some form of distributed invalidation, I'm
not aware that this has been discussed yet.

This is another one for the forthcoming doc - briefly :

If you cluster an entity bean on two nodes naively, you lose many of the
benefits of caching. This is because neither node, at the beginning of a
transaction, knows whether the other node has changed the beans contents
since it was last loaded into cache, so the cache must be assumed
invalid. Thus, you find yourself going to the db much more frequently
than you would like, and the number of trips increases linearly with the
number of clients - i.e. you are no longer scalable.

It depends on your transaction isolation level; i.e. do you want to do a dirty read or not. You should be able to enable dirty reads to get scalability & performance.

The only way to really and truly know if the cache is up to date is to use a pessimistic read lock; but thats what databases are great for - so you might as well use the DB and not the cache in those circumstances. i.e. you always use caches for dirty reads

If you can arrange for the cache on one node to notify the cache on
other nodes, whenever an entity is changed, then the other caches can
optimise their actions, so that rather assuming that all beans are
invalid, they can pinpoint the ones that actually are invalid and only
reload those.

You could go one step further and send, not an invalidation, but a
replication message. This would contain the Entity's new value and head
off any reloading from the DB at all....

The two main strategies here are

* invalidation; as each entity changes it sends an invalidation message - which is really simple & doesn't need total ordering, just something lightweight & fast. (Actually pure multicast is fine for invalidation stuff since messages are tiny & reliability is not that big a deal, particularly if coupled with a cache timeout/reload policy).
 
* broadcasting the new data to interested parties (say everyone else in the cluster). This typically requires either (i) a global publisher (maybe listening to the DB transaction log) or (ii) total ordering if each entity bean server sends its changes.

The former is good for very high update rates or very sparse caches, the latter is good when everyone interested in the cluster needs to cache mostly the same stuff & the cache size is sufficient that most nodes have all the same data in their cache. The former is more lightweight and simpler & a good first step :)

James

--

James
-------
http://radio.weblogs.com/0112098/

Reply via email to