Title: [jdjlist] Another DB design (was: Interview Question: DB Design from Objects)
Greg Nudelman wrote:
[EMAIL PROTECTED]">

OK, here is ANOTHER DB design scheme I have just heard yesterday, so I want to run this by everybody.  My acquaintance tried to model his DB tables according to the "classic" accepted way, that is how Tomm Carr and many others have suggested.  Shared attributes in a "root" table and all the object-specific child-class attributes in other tables, joined by FKs.

Unfortunately, his schema was quite extensive and was changing rapidly during development.  So, my acquaintance run head-first into exactly the same problems I have mentioned: 7-8 join queries were the norm and SQL base quickly became unmaintainable due to constant changes in the schema.

A "rapidly changing" schema is an indication of a problem, and not one that can be solved simply by database design.  How would you like to build a house from "rapidly changing" floor plans?  Your acquaintance has not adequately interviewed the information owners, or the information owners have yet to identify what they really want.  Either way, to attempt to design the database is that environment is an exercise in futility.
[EMAIL PROTECTED]">

So my acquaintance implemented a MODIFIED 3-table design:

CustomersSuperclass
----------------
id
objectType
----------------
----------------

2 tables:

Customer
----------------
id
name
address
favorites
comments
----------------
----------------

WholesaleCustomer
----------------
id
name
address
limit
pin
department
----------------
----------------

As you can see, this results in some field duplication (name, address), but the tables are still pretty well normalized. 

But why?  It solves no problem.  It makes nothing easier.  On the contrary, working with these tables are difficult and prone to errors.  Just look at the steps that must take place if a Customer should become a WholesaleCustomer (or vice versa):

1. The ObjectType field in the Superclass table must be updated.
2. The name and address fields must be read from Customer and inserted into WholesaleCustomer (or vice versa).
3. The rest of the fields in the new record must be updated if the information is available.
4. The old record must be deleted.

Some quick questions:
Q1: is the order of the above steps critical?  
A1: Yes.  Obviously step 4 cannot be performed before step 2.
Q2: Suppose the process is interrupted, how easy will it be to recover or is there a chance some information will be lost?
A2: As the answer to Q1 is yes, then the answer here is also yes.  If the process is interrupted, we must verify step 2 was completed before we proceed with step 4.  So the procedure must be carefully logged or data could be lost.
Q3: In step 3, if the new information is not yet available, the fields will be null -- will this be a problem?
A3: Possibly.  At the very least, the application must be able to handle the fact that, say, a record exists in the WholesaleCustomer table but with nulls in the limit, pin and department fields.

I will not duplicate here the arrangement I and some others have suggested.  But let's consider that arrangement and perform the same operation.

1) The ObjectType field in the main customer table (Table1, in my design) must be updated.
2) A new record containing the new information (if available) is inserted into the Customer or WholesaleCustomer table.
3) The old record must be deleted.

Let's ask the same questions:
Q1: is the order of the above steps critical?  
A1: No.  They may be performed in any order.
Q2: Suppose the process is interrupted, how easy will it be to recover or is there a chance some information will be lost?
A2: As the order is not important, an interrupted process can just be restarted.  Steps 2 & 3 will simply fail if already successfully performed.
Q3: In step 2, if the new information is not yet available, a record will not be inserted -- will this be a problem?
A3: No.  General db design allows for this.  (That is why we have such things as inner joins.)
[EMAIL PROTECTED]">

A huge advantage of this scheme, is that you can still perform reports on the entire set of various Customer subtypes, but in order to read an individual customer, you no longer need to perform a join. Instead, we can just read it directly from the WholesaleCustomer table. 

I don't understand.  If all the customer information is in the WholesaleCustomer table, what is in the Customer table?  If it contains duplicate information, why have it at all?
[EMAIL PROTECTED]">

The added bonus is that you end up with no null fields and additional data integrity that you do not get with a 1-table design.

Absolutely wrong.  See step 2 above.  Granted that steps 2 & 3 can be merged (read the name and address fields from Customer, copy to a new WholesaleCustomer record, insert new WholesaleCustomer info, insert record into WholesaleCustomer table), there would be no nulls *if you have all the information* when you perform the operation.  But a record must be inserted into WholesaleCustomer even if you have only some or even none of the WholesaleCustomer-specific information.
[EMAIL PROTECTED]">

Any thoughts?

Sure.  I will repeat my statement from a previous post that there is not necessarily a one to one correlation with a table being created for every subclass in the object design.  One may have a single table handling multiple levels of object inheritance.  The simple example we worked with here *happened* to work out that way.  For one thing, we were considering only the state of the objects and not behaviors.  If Customer and WholesaleCustomer differed only by behavior, they could exist in one table.

It can be very tempting, especially in the early stages of development, to take various shortcuts to more quickly bring about a prototype or early release.  After all, design is nice, but the real fun is in the coding.

This is where teamwork is important.  If a programmer should give in to temptation and start taking shortcuts, his partner should whump him upside the head with a 2x4.

I have been programming for about 22 years.  This is not, of course, proof that I am right and you are wrong.  However, if I had to make one statement that encapsulates the sum total of all the lessons I have learned, I would have to say: "Live by the shortcut, die by the shortcut."

This is not just applicable to database design.  When designing a GUI, the temptation is to drop a JListbox on the frame and put in some quick code to make it do what you want it to do.  It's quick and it works ... until you need the same behavior on a different frame and/or in a different application.

You may think, "If I ever need it again, *then* I will design a subclass."  If you find yourself thinking such thoughts, you should immediately whump yourself upside the head with the nearest large, heavy object.  It will ultimately be less painful than working with shortcuts.


Tomm

[EMAIL PROTECTED]">


To change your membership options, refer to:
http://www.sys-con.com/java/list.cfm

Reply via email to