Re: Accountability with MySQL

Rhino Wed, 15 Mar 2006 07:14:30 -0800

----- Original Message -----From: "Martijn Tonies" <[EMAIL PROTECTED]>

To: <mysql@lists.mysql.com>
Sent: Wednesday, March 15, 2006 2:53 AM
Subject: Re: Accountability with MySQL

>> > Basics of database design: store what you know.
>> >
>> > Given that NULLs basically means the absence of a value
>> > (eg: unknown), you shouldn't be storing NULLs.
>> >
>> Nonsense!!
>
> That's a bold statement ...
>
>> That's simply wrong. A null means "unknown or not applicable" and is a
>
> Ah, so now things become clear, a NULL actually can mean
> two things? So much for clarity then...
>
Come on; that's not fair. "Unknown" and "not applicable" are more like
different senses of the same thing, not two opposite things.


Read the literature on how to design databases. What you do
is storing "true propositions". That is, each attribute defines a
certain "true proposition". For example:

Employee "Martijn" has Employee# 14.

You should be able to derive these sentences from every row.
Putting a NULL or N/A in there fails to meet this requirement.

Perhaps my theory is a bit rusty but I have never heard of this "truepropositions" business before. Do you have a citation where I can read aboutthis?

Frankly, I'm skeptical about your interpretation of this idea. While youclearly don't want false information in a database, it isn't false to saythat student such-and-such's exam mark or such-and-such an employee'stermination date is unknown or not applicable. It _would_ be false to storea grade of 0 for a student who didn't take the exam because it implies thathe got every question wrong when in fact he didn't write the test at all;that would be an example of 0 having two contradicatory meanings. Given aphobia about nulls, it is a reasonable design to put students who don't takethe test in a separate table but it's not the ONLY way to handle thesituation: a null to indicate a student who didn't take the test isperfectly reasonable too.

>> perfectly valid value to use in many, many situations. Two classic
> examples:
>
> Let me first state that there's a difference between storing NULLs
> and handling NULLs in your result.
>
Yes, of  course there is: so what? You said NULLs should never be

_stored_;

that's what I'm responding to. Using NULLs is not very _difficult_anyway,
just a bit tedious because it is another case to handle.


There's no point in storing what you don't know.

But you _do_ know something: that the student didn't take the exam. That isa piece of true knowledge all by itself.

> So, let's debunk these two classic examples...
>
Please note that I did not say that you MUST use NULLs, just that they

were

perfectly valid to use in a design.


Depends on who you're asking ...

A few years ago, I was working at a company that had developed
an application with an Oracle database with about 400 tables, nothing
too large.

Plenty of tables had NULLs, not because of missing business data,
but rather used internally in the system.

I'm not sure what you mean by that phrase "used internally in the system".

The number of problems we had with them, I cannot count on my
two hands, nor can I count them on the hands of all other developers
(5) for that project.

How about enumerating a few of them here? I've dealt with relationaldatabases in several capacities for over 20 years and I've never found nullsto be a big problem.

Just last week, I spoke to a guy who used to be my collegue there
and we discussed the design a bit (he still does database design and
development for a large company, on Sybase with over 4000
deployed systems) and we both agreed that using NULLs in most
cases as part of the initial design was causing us more problems
then helping us.

Again, how about listing some of these problems?

>> 1. You have a table containing test scores. Some students get every
> question
>> on a given test wrong. Some students don't write the test at all,

perhaps

>> because they were sick. In this scenario, I think the correct thing to
> store
>> is 0 for the students who got every question wrong and NULL for the
> students
>> who didn't write the test.
>
> IMO, the better thing to do in this particular case is to NOT store a

test

> result for the students that did not make the test.
>
Okay, that might be acceptable, if it doesn't cause you to lose track of

the

student altogether. But if this table was the only one that even recorded
the _existence_ of the student, you'd have a problem;


Indeed, then I would have a problem, cause you cannot derive from
the "test results" table that a student exists. This is a rather silly
statement.
If the "test results" table should ALSO store the existence of a student,
you're design is wrong :-)

if someone tried to
verify that the student had attended this school, you wouldn't know that
they had.


In your own words: nonsense.

Each table should store what is was designed to store. If I would
have to know if a student was enlisted in a certain course or would
be attending this school, I would not be using the "test results" table.

Look, I agree that this scenario is not particularly likely to happen but itCOULD happen. Suppose you run a small company that runs certification exams:your only contact with the students is that they show up to take an exam andyou record the mark and send it to whoever awards the certification. In thatcase, you likely wouldn't have an elaborate set of tables containing avariety of student information the way you would if you were a university.In that case, it's quite reasonable to believe that the only table in thesystem is the one that records the test marks. That table may include thestudent name, address, test name, score, etc. etc. That table may only bepopulated as the student comes in the door: the 'greeter' welcomes thestudent to the test center, asks his name and other details, points him tothe exam room, and records the test result when the student hands in theexam. In that scenario, if student Joe Blow does not show up for the testbecause his car broke down, he may never be entered in the system at all andyou have therefore lost sight of the fact that he ever existed. Again, thisis not a very likely scenario but it is not an impossible one and that's thekey point: if it COULD happen, you will have a problem.

Or in a more probable case, if that was the only test for that
course and the student missed it and then had no row in the table, you

might

not have any way of knowing that they took the course!


See above.

Dismissing the less probable case does not dismiss this more probablescenario.

And if they later
wanted to write the exam, having recovered from their illness, your query
might have the effect of keeping them from taking the second exam: your
query would report that they had never taken the first exam so an

adminstrator might refuse to let him/her take the second exam becausethey

(apparently) had never been scheduled to take the first one.


Different problem, see above.

Again, you can't dismiss the second scenario by dismissing the first one;the scenarios are different and the second one is more probable than thefirst.

Now, you could have a second table to record people who were scheduled to
take tests but failed to take them to cover that situation but I think it


Sounds like a decent design to me.

Not surprisingly since it is the one you suggested as the "correct" way tohandle the situation. But the design that uses nulls is NOT invalid and youhave not demonstrated that it is.

would be easier to record all students in one table and then simply store

null for any test that they fail to take and a zero for every case wherea
student got every answer wrong. When you compute the class average, the
avg() function would ensure that the students who got every questionwrong
would pull down the class average but that students who failed to write

the

test at all would NOT skew the average because the avg() function ignores
nulls.


Ah, an excellent example of why NULLs are tedious: they are ignored
by some functions, but not by others.

_ALL_ of the column functions ignore nulls: it is NOT inconsistent. In fact,it is the only logical thing that these functions can do, which you'drealize if you gave it a bit of open-minded thought.

Suppose you had a table with a primary key of employee number and a columncontaining bonuses. Some of the rows contain nulls in the bonus columnbecause the employee didn't earn a bonus yet. Now, you need to answer thesequestions: what is the largest bonus amount? What is the smallest bonusamount? What is the sum of the bonuses? What is the average bonus? Since anull, by definition, is unknown, it isn't bigger or smaller or equal to anyother value, including another null so it can't qualify as the biggest valueor the smallest value. A number plus (or minus, or times, or divided by) anull is undefined so it makes no sense to add it to the sum of the bonuses.A null is NOT a zero so treating it as a zero in an average would undulyskew the result. So, in each case, the function does the logical thing andIGNORES the null.

Without having the record there in the first place, there would have
to be no rule of AVG ignoring NULLs. Problem solved.

That is the other way of the solving the problem. But it is NOT the ONLYway.

> Why store a date column if you don't know?
>
> Why not use:
>
> EMPLOYEES
> EmployeeID int,
> StartingDate Date,
> ...
>
> TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
> EmployeeID
> TerminationDate
>
> Once more: perfectly valid design.
>

Yes, that is also a valid design but it means you have to have yetanother

table that you could have avoided simply by permitting a null in the
employee table. Hey, if you really want your tables to proliferate like
this, that's up to you.


It seems you have a fear of creating new tables ;-)

No, I don't. It seems to me that you have an undue fear of nulls.

This is what database systems are designed to do...

Database systems are designed to accomodate large quantities of tables butany designer with any real world experience will advise you not to createtables simply for the sake of creating tables: that is an unnecessaryproliferation of tables. Look at normalization. As you probably know, eachphase of normalization tends to increase the number of tables in thedatabase. I know for a fact that the theorists have identified at least upto 17NF (17th Normal Form) but how many levels of normalization dobusinesses typically use in their normalizations? Answer: Three.(Occasionally four). And I think you'll find that the main reason is thatthe benefits of normalizing beyond 3NF are outweighed by the increasingnumber of tables to manage.

The key point is that NULLs _do_ work and are a legitimate design

decision.

You don't _have_ to use them but they can save you some work and reduce

the

number of tables you need. (They can also be a bit more work on the
programming side.)

I took your remarks to mean that NULLs were always a bad idea and were a
symptom of bad design and I strongly disagree with that.


So I've noticed.

I'm trying to be gracious here and accept that I may just have misunderstoodthe emphasis in what you are saying.

The message I'm trying to get across is that:
1 - according to (proper) design literature, you should not use NULLs
 if you don't have to

Perhaps you can cite some of this "proper" literature. I'm interested inseeing their definition of what an appropriate time to use nulls is.

2 - NULLs can cause you more problems than you can think of

I'd like to hear what these problems are. I've never had any big problemsbecause of nulls.

3 - its sometimes much easier to avoid storing NULLs AND to be
 able to refactor your database because of it

I agree that nulls can frequently be avoided and that avoiding them willsimplify _some_ situations.

4 - the meaning of NULL can change, so why store it in the first place

Simple: because it happens to be true when it is stored. In the case of thehypothetical employee, I store a null termination date when I hire himbecause I don't know when he is going to leave. If he laters gives hisnotice, then I know when he is leaving and can store that date for histermination date instead of a null; then his row of the table is true again,based on the new facts.

If you are simply saying that you don't like them and prefer to use
different designs to avoid them, then I don't have any problem with that.


I have seen that when I avoid storing NULLs, my applications
became more clear and easier to understand.

Beauty is in the eye of the beholder as we all know. I have no problem withhaving nulls in my tables and consider that a better design than a separatetable for special cases most of the time.

Again, if you are saying that you don't like nulls and prefer to avoid themin your designs, I have no problem with that; that's just your personalpreference. It's the same as if we sat down to eat a meal and you asked forchocolate ice cream for dessert and I had vanilla; neither choice is"wrong", they are just personal preference.

But if you are stating categorically that nulls are always "bad" or "wrong",I disagree strongly. That's like saying only chocolate ice cream isacceptable and that all other flavours are evil. That's just wrong.

--

Rhino



--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.385 / Virus Database: 268.2.3/281 - Release Date: 14/03/2006


--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Accountability with MySQL

Reply via email to