RE: Does Null == ""?

Jon Frisby Tue, 16 Sep 2003 11:50:39 -0700

> Even if I grant you that, the DB designer should have produced the 
> documentation.  I'm willing to place the blame on both people; either 
> one could have avoided the problem.  But, the DBD (data 
> administrator, 
> if you prefer) is more likely to have been *aware* of the 
> problem in the 
> first place; it's their job to understand the data and serve as a 
> reference to it.  Even if the programmer caught the


So in this example we have a situation where neither party practiced
sufficient due-diligence, resulting in a potentially costly mistake for
the company when proper care on the part of either party would have
prevented the situation entirely.  That sort of procedural error is a
risk no matter how cautious you are about the design.


> When you think you understand the system, the tendency is to 
> go no deeper.

True.  But that's a habit that should be beaten out of every
engineer/admin.  It's the worst kind of laziness and arrogant
presumption.


> A good programmer would do that, certainly.  But again, in 
> doing so, the 
> programmer is working around the deficiencies of the DBD.

All parties should plan on working around all other parties, and be
pleasantly surprised when they don't have to.  It's sort of like
"defensive driving" -- yes, the other twits on the road should be paying
attention to the road rather than talking on the cell phone while
searching through the glove box for something while preening in the
rear-view mirror, but do you really want to take the chance and assume
that they are infallible?  Let's call it "defensive coding" / "defensive
DB administration" / "defensive network administration".


> Why?  If it's documented, it's a reasonably intuitive, clear, and 
> probably efficient way.

Marks (nulls) are a separate domain from strings.  The empty string is
just another value out of the set of possible strings but you're
arbitrarily redefining it into the domain of marks.  Yes, that's a
nitpicky detail.  

However, consider the potential scenario of having a *fixed length*
column for middle name -- CHAR(20) instead of VARCHAR(20).  MySQL
doesn't pad CHAR columns with spaces, but it's my understanding that the
"correct" behavior (according to the ANSI spec) is to pad with spaces.
I could be wrong about the spec, but I know that some databases such as
Sybase *do* pad with spaces.

This creates the *practical* problem of someone coming from MySQL (or
any other DB that doesn't pad) trying to do the following on a DB that
*does* pad:
SELECT * FROM whatever WHERE middle_name='';

By assigning a semantic meaning to the *value* of the data (mixing
meta-data with data in essence) you've now created another potential
pitfall -- just like the NULL situation you describe.  Given a choice
between the two pitfalls I'd tend to prefer the "cleaner" option.

Again, some caution on the part of the programmer, or the DB designer
would render this moot via documentation or a different schema design.  


> So in the example I gave you'd set up a separate middle name table, 
> making the middle name in it NULL (indicating A-mark) and 
> have lack of a 
> row in the middle name table be the I-mark?  (I like the A/I-mark 

That's one way of making the distinction, yes.  Another way might be to
have another column describing the meaning of a NULL in a particular
column:  has_middle_name BOOL NOT NULL.  I would tend to avoid that
option on databases that don't have multi-column constraints though to
minimize the possibility that I'll write some buggy code and wind up
with conflicting data (middle_name = 'Foo', yet has_middle_name =
false)...


> notation, by the way -- I hadn't seen it before.)  I don't see the 

I believe the notation comes from E.F. Codd.


> advantage, although logically it would do the job.  It'd be nasty for 
> performance and programmatic complexity, though, especially 
> for a minor 
> value such as middle name.

I'll grant you that the middle name scenario is awfully trivial for such
a solution.  A better example might be a patient's medical records at a
hospital.  The medical chart might contain fields such as
"num_pregnancies", "post_menopause", "when_prostate_last_examined", and
so forth.  In this scenario, distinguishing between A-mark and I-mark
could be extremely important and you have the benefit that I-mark will
apply to numerous columns in aggregate (I.E. either "num_pregnancies"
and "post_menopause" are both applicable, or they are both inapplicable
-- you can't have a situation where one is applicable and the other is
not.).  Having a patient table, and gender-specific child tables becomes
a much more attractive option in this scenario.


> Interesting!  I hadn't been aware of any DBMSs that did that. 

According to their site, FirstSQL is currently the only one.


>  I'm not 
> sure that it's a good idea, actually, because I suspect that 
> there are 
> other shades in meaning that might cause bugs as well.  For 

And here we come to the big controversy.  E.F. Codd liked the idea of
having two possible NULLs (A-mark and I-mark).  C.J. Date started a
ruckus over the issue of whether or not NULL is a good idea back in 1988
or so.  The debate has raged since.


> instance, is 
> "no value" really the same as "inapplicable" in all cases?  I have 
> trouble finding an example of this; here's the best I can do -- a 
> password system.  A password might be unknown to the system, 
> it might be 
> an empty string (the user needs to explictly enter the fact that the 
> password is '' somehow), or there might be no password, 
> indicating that 
> the user doesn't even want to be prompted for one.  I get very 
> suspicious of scenarios where there are more than two 
> possibilities -- 
> it suggests that there might be an unlimited, or at least a large, 
> number, and that not all possibilities have been clearly identified.

I seem to recall reading that an ANSI committee was drafted to answer
the question of "how many NULLs are there", and that they identified at
least 13 distinct meanings that NULL can be used to represent.  I can't
testify to the accuracy of that particular anecdote, but it does seem to
fit with your intuition.

One solution I've seen proposed is to allow the programmer to define
NULL as an enumeration.  Basically "this value can contain a *value*, or
it can contain one of N specific NULL values that I define".


-JF


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Does Null == ""?

Reply via email to