> Even if I grant you that, the DB designer should have produced the > documentation. I'm willing to place the blame on both people; either > one could have avoided the problem. But, the DBD (data > administrator, > if you prefer) is more likely to have been *aware* of the > problem in the > first place; it's their job to understand the data and serve as a > reference to it. Even if the programmer caught the
So in this example we have a situation where neither party practiced sufficient due-diligence, resulting in a potentially costly mistake for the company when proper care on the part of either party would have prevented the situation entirely. That sort of procedural error is a risk no matter how cautious you are about the design. > When you think you understand the system, the tendency is to > go no deeper. True. But that's a habit that should be beaten out of every engineer/admin. It's the worst kind of laziness and arrogant presumption. > A good programmer would do that, certainly. But again, in > doing so, the > programmer is working around the deficiencies of the DBD. All parties should plan on working around all other parties, and be pleasantly surprised when they don't have to. It's sort of like "defensive driving" -- yes, the other twits on the road should be paying attention to the road rather than talking on the cell phone while searching through the glove box for something while preening in the rear-view mirror, but do you really want to take the chance and assume that they are infallible? Let's call it "defensive coding" / "defensive DB administration" / "defensive network administration". > Why? If it's documented, it's a reasonably intuitive, clear, and > probably efficient way. Marks (nulls) are a separate domain from strings. The empty string is just another value out of the set of possible strings but you're arbitrarily redefining it into the domain of marks. Yes, that's a nitpicky detail. However, consider the potential scenario of having a *fixed length* column for middle name -- CHAR(20) instead of VARCHAR(20). MySQL doesn't pad CHAR columns with spaces, but it's my understanding that the "correct" behavior (according to the ANSI spec) is to pad with spaces. I could be wrong about the spec, but I know that some databases such as Sybase *do* pad with spaces. This creates the *practical* problem of someone coming from MySQL (or any other DB that doesn't pad) trying to do the following on a DB that *does* pad: SELECT * FROM whatever WHERE middle_name=''; By assigning a semantic meaning to the *value* of the data (mixing meta-data with data in essence) you've now created another potential pitfall -- just like the NULL situation you describe. Given a choice between the two pitfalls I'd tend to prefer the "cleaner" option. Again, some caution on the part of the programmer, or the DB designer would render this moot via documentation or a different schema design. > So in the example I gave you'd set up a separate middle name table, > making the middle name in it NULL (indicating A-mark) and > have lack of a > row in the middle name table be the I-mark? (I like the A/I-mark That's one way of making the distinction, yes. Another way might be to have another column describing the meaning of a NULL in a particular column: has_middle_name BOOL NOT NULL. I would tend to avoid that option on databases that don't have multi-column constraints though to minimize the possibility that I'll write some buggy code and wind up with conflicting data (middle_name = 'Foo', yet has_middle_name = false)... > notation, by the way -- I hadn't seen it before.) I don't see the I believe the notation comes from E.F. Codd. > advantage, although logically it would do the job. It'd be nasty for > performance and programmatic complexity, though, especially > for a minor > value such as middle name. I'll grant you that the middle name scenario is awfully trivial for such a solution. A better example might be a patient's medical records at a hospital. The medical chart might contain fields such as "num_pregnancies", "post_menopause", "when_prostate_last_examined", and so forth. In this scenario, distinguishing between A-mark and I-mark could be extremely important and you have the benefit that I-mark will apply to numerous columns in aggregate (I.E. either "num_pregnancies" and "post_menopause" are both applicable, or they are both inapplicable -- you can't have a situation where one is applicable and the other is not.). Having a patient table, and gender-specific child tables becomes a much more attractive option in this scenario. > Interesting! I hadn't been aware of any DBMSs that did that. According to their site, FirstSQL is currently the only one. > I'm not > sure that it's a good idea, actually, because I suspect that > there are > other shades in meaning that might cause bugs as well. For And here we come to the big controversy. E.F. Codd liked the idea of having two possible NULLs (A-mark and I-mark). C.J. Date started a ruckus over the issue of whether or not NULL is a good idea back in 1988 or so. The debate has raged since. > instance, is > "no value" really the same as "inapplicable" in all cases? I have > trouble finding an example of this; here's the best I can do -- a > password system. A password might be unknown to the system, > it might be > an empty string (the user needs to explictly enter the fact that the > password is '' somehow), or there might be no password, > indicating that > the user doesn't even want to be prompted for one. I get very > suspicious of scenarios where there are more than two > possibilities -- > it suggests that there might be an unlimited, or at least a large, > number, and that not all possibilities have been clearly identified. I seem to recall reading that an ANSI committee was drafted to answer the question of "how many NULLs are there", and that they identified at least 13 distinct meanings that NULL can be used to represent. I can't testify to the accuracy of that particular anecdote, but it does seem to fit with your intuition. One solution I've seen proposed is to allow the programmer to define NULL as an enumeration. Basically "this value can contain a *value*, or it can contain one of N specific NULL values that I define". -JF -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]