> Relational model databases by definition are not supposed to use keys
that have no meaning.

On Sun, Aug 6, 2017 at 12:08 PM, npdennis via 4D_Tech <4d_tech@lists.4d.com>
wrote:
> You may have stated that backwards…

Nope, that's exactly what I meant. But apologies for not making the point
more carefully. I'll make up for that now.

> relational model databases by design should never, never, never use
business data for keys.
> This was the number one rule drilled into me by my college relational
database classes. I
> found this to be very wise advice in the real world.

I agree! But there are two different pieces to this. In the one part, no
synthetic keys and in the other, synthetic keys. When you did your RDBMS
classes, they probably discussed formal design versus system
implementation. When you're creating a normalized *design*, you don't have
artificial/synthetic keys - because your focus is on the pure data itself
and such keys have no meaning. (They're a database artifact, not part of
the real world entity you're modeling.) The "real" primary key gets sorted
out when you're putting your design into second or third "normal form."
(Even if you don't think of it in those terms. I mean the actual design
step, whatever name you use - if you even use a name.) When you go to
*implement* your tables, it's sensible to use a synthetic key. So, in my
example, I used

Building + Room

...because that's enough to uniquely identify each room with no spare
values in the field, no redundant rows, and nothing that belongs in another
table. (Well, you would have a unique set of Building records, as noted in
the earlier post.) So, that's a correct and sensible *design* and is how
I'd think about the actual data. But implementation? Another story. The
building is renamed and I have to go back and update all of the related
room records? PITA.

It's 100% normal to think about the formal table design and the physical
implementation as distinct steps with different rules. And your VIN
example, like Social Security Number, is an excellent example of a "natural
key" that turns out not to be a great idea. I use synthetic keys for *all*
of my tables (well, there might be a rare exception where I use a natural
key, but that's super rare...and I probably end up regretting that most of
the time.)

In my earlier post I was just assuming that the distinction between design
and implementation was clear, but those steps aren't distinct in 4D. (They
are distinct in design tools and used to be a pretty common concept - they
still are distinct in design-oriented RDBMS development tools.)

There's a natural tension here, and there are two competing tasks that
require work to handle. You only get one of them for free, the other you
have to take care of on your own.

* Natural keys and multi-field keys the truly define a row are a *pain* in
the real world where "unique" values aren't always entered
correctly/stable/etc. To avoid hassles with updates, synthetic keys are
super helpful. They give you _stable keys_. (Mutable keys are absolutely a
pain.)

     Synthetic keys take away the pain of related table changes on UPDATE.

* Synthetic keys have _nothing_ to do with the real data in the row. That's
kind of the point. Which means that you get *no help* in preventing
duplicate rows of data. Like my example of the duplicate building + room
with different IDs.

     Synthetic keys give you the pain of having to hand-check for
duplicates.

The problem shows up when you have "unique" rows with duplicate data. That
gets ugly in a big fat hurry when it's a 1 table and children link to
different rows in the 1 table that aren't really "different." Note that I'm
not talking here about the kind of messy data - like customer names and
addresses - where you need to merge records. That's a related but different
question. My building + room example is nice and clean. The data is
identical, it's just that somehow a duplicate got entered. (I don't want to
muddy the waters with messy data, it's not necessary or helpful for the
current discussion.)

Here's a possibly apocryphal tangent from my first years in the work world.
Back when things were typed. On typewriters. It was customary to put at the
bottom of each page 1/5, 2/5 and so on. I heard tell of a report that had a
sixth page added. Rather than retype all of it, here's what the boss got
back at the bottom of the six pages:

1/5
2/5
3/5
4/5
6/6
Not the same thing, but I always thought of it as a good example ;-)

In 4D, you get a benefit from using synthetic keys - you get stable keys
for update for free. I'll take that deal! But you get no automatic help
with duplicate control. You can add this by implementing a compound index
marked as unique. Or you can write custom scanner code to audit your data.
The exact options depend on the database, but the work is required either
way. In 4D databases, what I most often see is that people aren't checking
for duplicate rows either at the engine level (compound index with unique)
or via record audits. 4D itself has no referential integrity scanning tools
built in. (It could, for example, in MSC for relations with arrows drawn.
But then it would need a way of distinguishing required N:1 versus optional
N:1 relations. What's called a "dependent" versus "independent" entity on
the child side.) So, mostly I see creeping bad data.

Since I'm diving into Postgres these days (mostly on the import side - my
table designs aren't even right yet - pacing myself), I see that they have
a wide range of tools for enforcing row uniqueness and referential
integrity. Nice. They've also got UUIDs as a field type because, well,
they're pretty darn handy.
**********************************************************************
4D Internet Users Group (4D iNUG)
FAQ:  http://lists.4d.com/faqnug.html
Archive:  http://lists.4d.com/archives.html
Options: http://lists.4d.com/mailman/options/4d_tech
Unsub:  mailto:4d_tech-unsubscr...@lists.4d.com
**********************************************************************

Reply via email to