RE: [orion-interest]CMP/BMP and standard JDBC, speed is of essence

2002-04-07 Thread The elephantwalker

You will find that if you want the _enterprise_ features offered by cmp with
straight jdbc calls, your classes for jdbc calls will be slower than cmp,
and _much more_ difficult to develop.

I am not a smarter developer than Karl or Magnus. Their algorithms for
caching, transaction management, pooling (connection, threads, objects, you
name it), are among the best in the business. If I were to _attempt_ to
reproduce these features, I would certainly _fail_ to reproduce these
features in a reasonable time.

There will always be a place for jdbc in an enterprise application. But in
an enterprise application where the transaction is as important as the data,
and the data structure itself may change from time to time, cmp ejbs are the
way to go.

A resultset relies on datastructure strings, and are closely coupled with
the underlying database structure. All my experience indicates that the
datastructure will change many times during a project, after the project is
finished, and before the application is retired.

jdbc is tightly linked with the datastructure. This link goes to a low
level, as many text strings need to be modified when the datastructure
changes. Argh! Don't get caught in the performance trap where you
application performance increases by 2%, but its almost impossible to change
your application without competely rewriting it.


regards,

the elephantwalker
www.elephantwalker.com

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Jeff Schnitzer
Sent: Saturday, April 06, 2002 10:41 PM
To: Orion-Interest
Subject: RE: [orion-interest]CMP/BMP and standard JDBC, speed is of
essence


 From: Hani Suleiman [mailto:[EMAIL PROTECTED]]
 Sent: Saturday, April 06, 2002 6:20 PM
 To: Orion-Interest
 Subject: Re: [orion-interest]CMP/BMP and standard JDBC, speed is of
 essence

 CMP will load in all the entities in one go (in orion at least).

 There will be a performance difference between straight JDBC and EJB,
 since
 there's more involved with an EJB query. Transactions, constructing
 entities
 and so on are extra overhead that just getting a resultset back will
not
 have.

If you use JDBC from a connection obtained from a DataSource in a
Session bean, all queries should have the transaction attributed defined
for the Session bean method.  You do not need entity beans to have
transactions.

 So if done right, CMP will be close in speed to straight JDBC, plus
you
 have
 the potential for more goodies like container caching of finders and
 entities and so on, that you'd get for 'free' in some new version of
your
 favourite container, if it doesn't do so already!

There's that if done right problem.

If Orion (or any other app server) backed the Collection returned by a
finder with the ResultSet itself (instead of producing ArrayLists), then
it seems like performance wouldn't be too much different from JDBC.  Of
course, you're going to have all the overhead of constructing entity
bean objects and loading them with all the data (ok, maybe part of the
data if you're using WebLogic or other container with field groups), but
it shouldn't be too dramatic compared to the remote database call.

I don't know if there is a technical problem with doing this or if other
containers do this, but Orion doesn't seem to be that smart.  So yeah,
entities are going to be slower, especially if you're listing 1000's of
them.

EJB-QL, even if Orion supported it, is only going to make things worse.
It's an abstraction above SQL.  All the hints and other
database-specific goodies that you can normally encode in a SQL
statement cannot be used.  And you can't order the result set with EJB
QL, so sorting has to be done outside the database (yeah, right!).

I have found that a good approach to data access is to model your data
using CMP entity beans, use them for write access, and code in JDBC
whenever you need listing behavior that CMP is too slow or too
inflexible to support.

By far the biggest problem is that too inflexible part, IMHO...
modeling relationship attributes, performing aggregation queries, or
querying for data that spans multiple objects simply does not fit into
the world of entity beans.  This criticism seems to apply to just about
any O/R mapping scheme... which is why I think they are useful, but
should not be taken too seriously.

A good example of this hybrid approach is the Punk Image Gallery
sample application for the Maverick MVC framework:
http://mav.sourceforge.net/pig.  It is a comprehensive sample J2EE
application which runs on Orion.  It's not a trivial sample; my friends
and I actually use a live instance to archive (and annotate, wiki-like)
our images.  There is *no* way it could perform reasonably with pure
entity beans.

Jeff Schnitzer
[EMAIL PROTECTED]
Consulting  Contracting - J2EE, WebLogic, Orion/OC4J

 On 6/4/02 7:15 pm, Duffey, Kevin [EMAIL PROTECTED] wrote:

  Hi all,
 
  Kinda curious about one thing. We use BMP, and tried CMP. Both seem
to
 load
  one 

RE: [orion-interest]CMP/BMP and standard JDBC, speed is of essence

2002-04-07 Thread Jeff Schnitzer

 From: The elephantwalker [mailto:[EMAIL PROTECTED]]
 
 You will find that if you want the _enterprise_ features offered by
cmp
 with
 straight jdbc calls, your classes for jdbc calls will be slower than
cmp,
 and _much more_ difficult to develop.
 
 I am not a smarter developer than Karl or Magnus. Their algorithms for
 caching, transaction management, pooling (connection, threads,
objects,
 you
 name it), are among the best in the business. If I were to _attempt_
to
 reproduce these features, I would certainly _fail_ to reproduce these
 features in a reasonable time.

Eh?

Connection pooling and transaction management are provided to session
beans through the data source.  No CMP or entity beans necessary.
Furthermore, if you want sane transaction demarcation, you already have
to wrap all your entity bean access in session bean methods.

Entity bean caching I have found to be remarkably useless.  First of
all, it depends on a pessimistic locking strategy, which is both hard to
use (gotta love those deadlock exceptions!) and not applicable to a
clustered environment or any environment in which the database table can
be modified from an external source.  Furthermore, finder methods are
not cached - and with an eager loading strategy, I really have to wonder
what the great advantage of the caching is... bringing all the bean data
back usually isn't that much more expensive than bringing just the PK
data back, and if it is (because of large data fields) then your cache
is going to have size problems anyways.

Irrespective of who may be a smarter developer, I can guarantee you that
I know a *lot* more about *my* specific business logic than Karl or
Magnus.  Furthermore, Karl and Magnus are for the most part just
implementing a specification produced by a committee of labcoats
dedicated to a lowest-common-denominator set of features that IBM, BEA,
Borland, Sybase,  the rest of the implementers can agree to.  The
absence of ORDER BY in EJB-QL and the lack of a standard PK generation
mechanism make me seriously wonder if any of the people writing the EJB
spec have ever used it to implement a real-world application.

 There will always be a place for jdbc in an enterprise application.
But in
 an enterprise application where the transaction is as important as the
 data,
 and the data structure itself may change from time to time, cmp ejbs
are
 the
 way to go.

Entity beans can be useful for modeling your data.  For some business
cases, they might actually be all you need.  I, however, have yet to see
one of those cases in the real world.  I will reiterate the three
failings I mentioned before:  you cannot model relationship attributes,
you cannot perform aggregation queries, and you cannot query for data
that spans multiple objects.  I'll add that if you're restricting
yourself to EJB-QL, you're going to have even bigger problems, like the
inability to order the result set without sorting yourself in Java.

Relational databases have evolved over the last thirty years to provide
us with a very refined solution for storing and accessing enterprise
data.  Entity beans are a big square peg that can only be brutally
forced into the multifaceted hole that is our problem space.
 
 A resultset relies on datastructure strings, and are closely coupled
with
 the underlying database structure. All my experience indicates that
the
 datastructure will change many times during a project, after the
project
 is
 finished, and before the application is retired.
 
 jdbc is tightly linked with the datastructure. This link goes to a low
 level, as many text strings need to be modified when the datastructure
 changes. Argh! Don't get caught in the performance trap where you
 application performance increases by 2%, but its almost impossible to
 change
 your application without competely rewriting it.

If the primary argument for using entity beans is that they offer a
layer of indirection so that you can rename columns, there are plenty of
alternative approaches which are *much* simpler.

I'm not saying don't use entity beans (although I would *definitely*
recommend that a team which does not already have a lot of EJB expertise
should avoid them), but I am saying that it is a huge mistake to
consider entity beans your only means of data access.  To use the square
peg analogy again, reduce the size of the square to something that fits
smoothly through the hole, and fill in around the edges with other
mechanisms (like JDBC).
 
Jeff Schnitzer
[EMAIL PROTECTED]
Consulting  Contracting - J2EE, WebLogic, Orion/OC4J




RE: [orion-interest]CMP/BMP and standard JDBC, speed is of essence

2002-04-07 Thread The elephantwalker


Entity bean caching I have found to be remarkably useless.  First of
all, it depends on a pessimistic locking strategy, which is both hard to
use (gotta love those deadlock exceptions!) and not applicable to a
clustered environment or any environment in which the database table can
be modified from an external source.  Furthermore, finder methods are
not cached - and with an eager loading strategy, I really have to wonder
what the great advantage of the caching is... bringing all the bean data
back usually isn't that much more expensive than bringing just the PK
data back, and if it is (because of large data fields) then your cache
is going to have size problems anyways.

The latest oc4j release has four different load-locking strategies to choose
from, and you can be sure these will make it into Orion.

pessimistic, old-pessimistic, optimistic, and read-only.

There are also stategies for timing out instances for ejb's.

Irrespective of who may be a smarter developer, I can guarantee you that
I know a *lot* more about *my* specific business logic than Karl or
Magnus.  Furthermore, Karl and Magnus are for the most part just
implementing a specification produced by a committee of labcoats
dedicated to a lowest-common-denominator set of features that IBM, BEA,
Borland, Sybase,  the rest of the implementers can agree to.  The
absence of ORDER BY in EJB-QL and the lack of a standard PK generation
mechanism make me seriously wonder if any of the people writing the EJB
spec have ever used it to implement a real-world application.

I believe they are also on some of these committees. They have also
implemented a far better finder language than ejb-ql, and you can use  ORDER
BY in the orion-ejb-jar.xml. Brett McLaughlin has published a truely
excellent strategy for producing pk's. Go to flashline.com to see Brett's
column on this.


regards,

the elephantwalker
www.elephantwalker.com