On Jan 4, 2009, at 11:00 PM, Jason L DeLooze wrote in sum:

Sometimes de-normalization is a good thing.

Clearly the weariness is affecting Jason tonight. While his assertion that the portal may be meaningless has merit (after all, a shared address does not define a family unit) this does not imply de- normalization is the cure.

Indeed the structure Richard suggested - a single Address record with an association table linking individuals to that address - provides exactly the same opportunities a de-normalized structure might. The trick then lies in the presentation. Will the user be changing the address data directly or will there be an intermediary interface to manage the creation and updating of address data? The first option, direct address record manipulation, is subject to the flaws Jason raises. The second option need not be.

However, let's be clear: when you associate people to an "address" what we're really talking about here is a location. Harvey Summers can be reached by post at location "2 GE Road Building 37B, Schenectady, NY 12345". The "address" serves as a label for the location, which in turn is a hierarchy of locations.

What Jason alludes to is a larger problem which is primarily an issue of semantic carelessness and ill-considered application design. For the first part, when someone provides us with an "address" we often do not stop to consider what the nature of the address is. This properly begins with the address request: why did we solicit the address in the first place? Is it a residential address? An address at which one receives their post? The type of address being solicited is more important than the address itself because the type will impart meaning to the data which, lacking such, is nothing more than a location (Box 1234 in the 06776 Post Office in New Milford CT). When we receive an "address change" simply having the new address does not generally provide enough actionable information. Perhaps we can infer the type of action to take through comparative analysis of old and new addresses ("PO Box 1284 New Milford CT 06776" might indicate a data entry error) but it is more often the case that we simply need more information. What type of address is this for Harvey (obviously not a residential address)? Has his residence moved? Does he have any family members that receive their mail (or reside as the case may be) at the new address with him? The simple raising of these questions illuminates the insufficiency of the simple data model originally proposed. It also implies that there is a thing called "family members" which may or may not be independent of the residential and correspondence addresses.

The database designer, not having seriously considered the domain, often errs in the presentation of address-related information and does not put in the effort to create the data entry mechanisms that might invite the above questions. As a result he colludes with the organization in soliciting and storing information that is of dubious value even before entering the database.

That is to say, normalization has little to do with any of it. I can envision the same type of interface necessary for proper location updates stored in a non-normal form and reliant upon scripted updates to modify the location information for related persons. In fact, the "DRY" principle doesn't much apply to "person x is located at location y" information" and storing locations proper may not be within the scope of the system design. Ultimately the data architecture question needs be answered by a thorough investigation of the requirements, not an a priori presumption of how to properly structure and store "addresses."

Food for thought...


Cheers,
-corn



Corn Walker
The Proof Group
http://proofgroup.com/

Reply via email to