Re: Accountability with MySQL

Martijn Tonies Tue, 14 Mar 2006 23:54:16 -0800

> >> > Basics of database design: store what you know.
> >> >
> >> > Given that NULLs basically means the absence of a value
> >> > (eg: unknown), you shouldn't be storing NULLs.
> >> >
> >> Nonsense!!
> >
> > That's a bold statement ...
> >
> >> That's simply wrong. A null means "unknown or not applicable" and is a
> >
> > Ah, so now things become clear, a NULL actually can mean
> > two things? So much for clarity then...
> >
> Come on; that's not fair. "Unknown" and "not applicable" are more like
> different senses of the same thing, not two opposite things.


Read the literature on how to design databases. What you do
is storing "true propositions". That is, each attribute defines a
certain "true proposition". For example:

Employee "Martijn" has Employee# 14.

You should be able to derive these sentences from every row.
Putting a NULL or N/A in there fails to meet this requirement.

> >> perfectly valid value to use in many, many situations. Two classic
> > examples:
> >
> > Let me first state that there's a difference between storing NULLs
> > and handling NULLs in your result.
> >
> Yes, of  course there is: so what? You said NULLs should never be
_stored_;
> that's what I'm responding to. Using NULLs is not very _difficult_ anyway,
> just a bit tedious because it is another case to handle.

There's no point in storing what you don't know.

> > So, let's debunk these two classic examples...
> >
> Please note that I did not say that you MUST use NULLs, just that they
were
> perfectly valid to use in a design.

Depends on who you're asking ...

A few years ago, I was working at a company that had developed
an application with an Oracle database with about 400 tables, nothing
too large.

Plenty of tables had NULLs, not because of missing business data,
but rather used internally in the system.

The number of problems we had with them, I cannot count on my
two hands, nor can I count them on the hands of all other developers
(5) for that project.

Just last week, I spoke to a guy who used to be my collegue there
and we discussed the design a bit (he still does database design and
development for a large company, on Sybase with over 4000
deployed systems) and we both agreed that using NULLs in most
cases as part of the initial design was causing us more problems
then helping us.

> >> 1. You have a table containing test scores. Some students get every
> > question
> >> on a given test wrong. Some students don't write the test at all,
perhaps
> >> because they were sick. In this scenario, I think the correct thing to
> > store
> >> is 0 for the students who got every question wrong and NULL for the
> > students
> >> who didn't write the test.
> >
> > IMO, the better thing to do in this particular case is to NOT store a
test
> > result for the students that did not make the test.
> >
> Okay, that might be acceptable, if it doesn't cause you to lose track of
the
> student altogether. But if this table was the only one that even recorded
> the _existence_ of the student, you'd have a problem;

Indeed, then I would have a problem, cause you cannot derive from
the "test results" table that a student exists. This is a rather silly
statement.
If the "test results" table should ALSO store the existence of a student,
you're design is wrong :-)

>if someone tried to
> verify that the student had attended this school, you wouldn't know that
> they had.

In your own words: nonsense.

Each table should store what is was designed to store. If I would
have to know if a student was enlisted in a certain course or would
be attending this school, I would not be using the "test results" table.

>Or in a more probable case, if that was the only test for that
> course and the student missed it and then had no row in the table, you
might
> not have any way of knowing that they took the course!

See above.

>And if they later
> wanted to write the exam, having recovered from their illness, your query
> might have the effect of keeping them from taking the second exam: your
> query would report that they had never taken the first exam so an
> adminstrator might refuse to let him/her take the second exam because they
> (apparently) had never been scheduled to take the first one.

Different problem, see above.

> Now, you could have a second table to record people who were scheduled to
> take tests but failed to take them to cover that situation but I think it

Sounds like a decent design to me.

> would be easier to record all students in one table and then simply store
a
> null for any test that they fail to take and a zero for every case where a
> student got every answer wrong. When you compute the class average, the
> avg() function would ensure that the students who got every question wrong
> would pull down the class average but that students who failed to write
the
> test at all would NOT skew the average because the avg() function ignores
> nulls.

Ah, an excellent example of why NULLs are tedious: they are ignored
by some functions, but not by others.

Without having the record there in the first place, there would have
to be no rule of AVG ignoring NULLs. Problem solved.

> > Why store a date column if you don't know?
> >
> > Why not use:
> >
> > EMPLOYEES
> > EmployeeID int,
> > StartingDate Date,
> > ...
> >
> > TERMINATED_EMPLOYEES (albeit a bit agressive ;) )
> > EmployeeID
> > TerminationDate
> >
> > Once more: perfectly valid design.
> >
> Yes, that is also a valid design but it means you have to have yet another
> table that you could have avoided simply by permitting a null in the
> employee table. Hey, if you really want your tables to proliferate like
> this, that's up to you.

It seems you have a fear of creating new tables ;-)

This is what database systems are designed to do...

> The key point is that NULLs _do_ work and are a legitimate design
decision.
> You don't _have_ to use them but they can save you some work and reduce
the
> number of tables you need. (They can also be a bit more work on the
> programming side.)
>
> I took your remarks to mean that NULLs were always a bad idea and were a
> symptom of bad design and I strongly disagree with that.

So I've noticed.

The message I'm trying to get across is that:
1 - according to (proper) design literature, you should not use NULLs
  if you don't have to
2 - NULLs can cause you more problems than you can think of
3 - its sometimes much easier to avoid storing NULLs AND to be
  able to refactor your database because of it
4 - the meaning of NULL can change, so why store it in the first place

> If you are simply saying that you don't like them and prefer to use
> different designs to avoid them, then I don't have any problem with that.

I have seen that when I avoid storing NULLs, my applications
became more clear and easier to understand.


Martijn Tonies
Database Workbench - development tool for MySQL, and more!
Upscene Productions
http://www.upscene.com


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: Accountability with MySQL

Reply via email to