Re: RFC 263 (v1) Add null() keyword and fundamental data type

Glenn Linderman Thu, 21 Sep 2000 13:52:24 -0700
Buddha Buck wrote:

> Ok, let's see if I can make some sense of this...

Thanks for trying.  I think you have.

> You want a singleton scalar datatype in addition to the exising scalar
> datatypes of strings, numbers, references, filehandles, and undef that
> represents an unknown value, similar in semantics to the SQL notion of "NULL".
>
> I'm going to call this prototypical datatype/value "unknown", in order to
> represent its meaning in a more perlish way, as well as to avoid the
> overloaded semantics of NULL (and it's related near-homonyms: SQL's unknown
> NULL, C's NULL invalid pointer, Lisp's NIL, ASCII's NUL, the null string,
> the null list, etc).  Calling anything NULL these days is likely to be
> confusing, so I'll avoid it.

Correct description; I don't really care what it's called; the concept is what is
important.

> This doesn't matter as far as database work goes because DBI can convert
> between SQL NULL and perl unknown just as easily as it can convert between
> SQL NULL and perl undef.

Correct.

> I just reread RFC 263, and I do have some unanswered questions.  How
> pervasive is this "unknown" value?  If $a is unknown, what about $a
>
> Given:
>
> $a = unknown;
> print "\$a is ", ($a ? "true" : "false"), "\n";
>
> What should print?

Well, that's an interesting question, and gives me an inspiration.  A "ternary"
operator is nicely useful with a binary logic system.  It would seem that a
"quatenary" operator would be more useful with a ternary logic system.  Yet SQL
suffers along with binary operators and a ternary logic system.  Buddha, I think
you've hit the nail on the head about why so many people find SQL NULL so hard to
deal with.

So first, I'll answer your question as stated, and then I'll exposulate on the
inspiration above.

For your example, "false" should print.  This is because $a is not true, so the
"true" branch should not be taken, so the else brach is taken instead.  However,
the text printed is misleading, because the value isn't false, but unknown.  The
text should be changed to "not true", or "false or unknown", to be useful in a
ternary logic system.

The exposition on the inspiration:  The basic problem that results from ternary
logic is that there are three possible results: true, false, and unknown.
Structured programming based on binary logic has given us operations that consider
only two possible results, true and false, that being all there is in binary
systems.

So the basic if/then/else construct that perl, SQL, and most other programming
languages have, and the ternary operator, really don't allow simple expression of
conditionals when using a ternary logic system.

So one could extend the logic programming construct, say with "otherwise" (I
hesitate to reuse "unknown" for both the data value and the keyword), resulting in

    if ( ternary_condition )
    # then
    { # true part
    }
    else
    { # false part
    }
    otherwise
    { # unknown part
    }

Or, a switch-like statement could be similarly used/extended.  Such constructs
would quite possibly make it easier to write programs based on ternary logic.  The
historical problem with using binary constructs with ternary logic is the need to
repeat part of the condition in one leg of the binary structure to further
subdivide it into the remaining two clauses needed, and with complex conditions,
that becomes more complex quickly.  Ternary structured constructs could simplify
this.

> I think the example of:
>
>    die "Fatal, \$name is unset!" if ($name == null);
>
> is flawed.  It will never die, because as you said, two unknown values
> won't compare as equal.  Besides, the test would try to convert both $name
> and null to numbers before doing the numerical comparison, so it would
> depend on what the numerical value of unknown is.  You really need a
> "known()" built-in to go with this, such that known($a) is true if $a is
> NOT unknown.

Yes, known(), that's consistent with "unknown", and would substitute for
"isnull()" which was mentioned by others.  I agree that example is flawed.

> What gets me is that the implementation of this would require virtually
> every operator, function, etc in core perl to be special-cased to deal with
> the unknown value, yet the RFC makes no mention of this.

Yes, it is true that virtually every operator would have to change.  This is not a
trivial matter.  The closest the RFC comes to mentioning it is "that have the
proper semantics" in the implementation section.

> With undef, it's simply a matter of having the internal representation of
> undef return 0 or "" when asked for a numerical or string value.  This
> makes most things deal with undef nicely -- even booleans.  With unknown,
> since it is specifically designed to propagate, everything would have to
> deal with unknown values, not just integers, or strings,  or booleans.

Yes.

> Importantly, unless you decide something arbitrary like "unknown is false"
> like the way that it was decided that "undef is false", then you throw out
> the law of excluded middle (every expression is either true or false), and
> make things like the ?:, ||, or && operators go all wiggly (not to mention
> if, while, unless, and so forth).  And if you do arbitrarily say "unknown
> is false", how do you deal with the cases where you want to say "I don't
> know if it's true or false"?
>
> SQL gets away with this by saying that boolean contexts require a boolean
> value, which you get by using a relational operator.  People don't go "IF
> variable THEN..." unless they know that variable will be boolean -- and
> can't be NULL.  And they then decided that using a relational operator on
> NULL will always yield FALSE.  That works for them.

I think you've described some of the issues with dealing with ternary logic in a
binary structure accurately.

> But that's not Perl.

Well, that's not today's Perl, for sure.  The real question here is can we
implement the RFC in such a way that, while it perturbs the implementation of all
the Perl ops, that in the absence of unknown values in the data, that everything
is compatible, but in the presence of unknown values in the data, that unknown is
properly handled.

> Perl programmers like functions that return a useful
> but true value on success, or undef on failure, and are quite comfortable
> going:
>
> $var = f();
> if ($var) { g($var) };   # f was a success
>
> That won't work if $var can end up being unknown.

Sure.  Nothing precludes that.  It's up to the interface definition of f(), and if
f() chooses to only return true or undef, there is nothing in this RFC that alters
it or prevents it.

> Worse, fixing it by saying

> if (defined($var)) { g($var) };
>
> doesn't help, because the RFC says that defined(unknown) is unknown!

If f() is altered to return unknown, then the correct "fix" would involve your
"known()" function, not "defined()".

> Your unknown seems to be very special-case for doing SQL-based DB work.

Yes, that is the exact application.  DBI is a great module collection.  It could
be built into a wonderful DB language.  Especially if Perl could deal with SQL
NULL.

> In all my time programming it, I can't remember wanting it.  It doesn't seem
> to integrate with the rest of Perl all that well, requiring massive changes
> under the hood to integrate it and the potential for messing up lots of
> long-standing Perl idioms, for a relatively small benefit.

Yes, most of the benefits of most Perl idioms provide a small benefit.
Collectively, they become a powerful language.  No doubt someone will take my
recently posted sample demonstrating a real albeit small benefit, and conclude
that it just isn't worth the effort.  My sample was small, because I didn't want
to post whole scripts, and therefore the benefit is small.  And maybe it isn't
worth it.

Proper implementation of this RFC would seem to require (1) significant work under
the hood (2) careful design choices to avoid perturbing lots of long-standing Perl
idioms.  Or maybe it could be implemented as a module; producing the objects with
the correctly overloaded semantics.  Damien Neil has suggested something along
that line, but I haven't yet had time to pursue it fully, although it looks
extremely interesting, and perhaps could even be implemented in Perl5 for basic
operators, if the overloading semantics are strong enough.  I haven't yet
experimented with the perl overload capabilities enough to know.

> Damian mentioned that his Q::S package and RFC would/could provide
> something with similar semantics, and his RFC would also likely result in
> massive changes under the hood, but it also provides a large generally
> useful functionality (and in CONSTANT TIME, too ;-).  It is unclear as of
> yet if the benefit of Q::S will outweigh the probably cost of Q::S.  Can
> you make as strong a case for unknown?

I'm not sure how Damian achieves that CONSTANT TIME business, but he apparently
can do some/all of it in a module, so I'm not sure that massive changes under the
hood are needed.  But maybe it is a tradeoff between massive changes under the
hood vs.  massive changes above the hood using (today's slow) Perl objects.

Damian's code seems to require much additional logic beyond this RFC; I really
don't know how massive the cost is.  If his is simple (he already has a perl5
module that does some of it, at least, which I haven't measured the size or
completeness of), maybe this RFC is even simpler.

--
Glenn
=====
Even if you're on the right track,
you'll get run over if you just sit there.
                       -- Will Rogers


_______________________________________________
Why pay for something you could get for free?
NetZero provides FREE Internet Access and Email
http://www.netzero.net/download/index.html
Re: RFC 263 (v1) Add null() keyword and fundamental data type

Reply via email to