Re: RFC 213 (v1) rindex and index should return undef on failure

Glenn Linderman Sun, 17 Sep 2000 15:53:26 -0700
Chaim Frenkel wrote:

> But you would still have to catch the exception. Not a nice thing to
> terminate the program just because an expected mismatch occured.

Sometimes it is, sometimes it isn't.

> Not finding something is not exceptional.

Sometimes it is, sometimes it isn't.  Why were you looking for it, if you didn't
expect to find it?

> Neither is EOF on a file, or working with an empty list. Adding all these
> exceptions for non-exceptional and quite common scenerios is bothersome.

This is truly one of the conundrums of programming.  Let me use sequential file
reading as the example scenario.  The "normal" thing to do in such a program is to
read the next record, and process it somehow.  The processing sometimes gets quite
involved, of course, but let's not dwell on that here.  So we have the following
loop, not fully structured programming, and no error handling, exception handling,
or any such thing.

readloop:
  { $length = read ( FILE, $buffer, $want );
    if ( $lawyer  ||  $bill_collector )
    { & due_process ( $buffer, $length );
    } else
    { & do_process ( $buffer, $length );
    }
    goto readloop;
  }

Now the loop works fine, but somehow, we need to get out of the loop when we
encounter an error, or end-of-file.  Many programs, including many perl programs,
actually do this incorrectly, but it is "good enough" most of the time: they treat
_all_ errors during file reading as end-of-file.  This is _not_ good enough all of
the time, but we're not writing airplane controls most of the time either, but we
should stay aware of the issue, even though we usually ignore it.

So there are alternatives:

(1) we could check $length for zero, and decide that there was nothing left to
read.
(2) we could modify the goto to have a modifier: "if ! eof ( FILE )".
(3) we could check $! just after the call to read to determine if there was an
error [should do this in any case], and if it was EOF, could branch out of the
loop.

I posit that the "correct" solution should be (2)... that you _should not_ as a
matter of course, interpret error codes as a way of choosing the control flow of
the program.  Error codes (and exceptions, when used to report errors) should
alter the control flow of the program in the sense of reporting the error so that
appropriate, but unusual, action can be taken.

So we should check $length, and we should check $!, but finding $length <> $want,
and $! != 0 should both be treated as out-of-the-ordinary conditions-- errors
should not happen in well designed programs.

Of course, most people would write the loop

   while ( $length = read ( ... ))
   { ... }

or, for line oriented stuff (rather than record oriented)

   while ( <FILE> ) { ... }

Such techniques build in a check for errors, but inappropriately mistreat _any_
error as EOF.  To avoid mistreating errors that way, such a loop should be
followed by

   if ( $!  != 0 ) { #< handle error ># }

Do you often see that coded?  Admittedly, non-EOF disk file errors are rare these
days of reliable storage.  But not all files are disk files, and not all errors
are EOF.

So there is quite a bit of sloppiness in most code, regarding error handling.
That's not a nice thing either.

So if (optionally, pragma Throw, similar to pragma Fatal, see RFC 119) all errors
could be thrown as exceptions, it would allow programmers to force themselves to
be less sloppy about error handling when (like airplane controls) they really
should be precise.

And actually, once you get the habit of coding that way, it really isn't even any
harder.  Of the three choices above, none are particularly hard to code.  (2) is
not harder than (1), or harder than (3).  In fact, it might be easier.  But of
course (2) doesn't add error checking, but neither does (1) or (3)... the add "if
something goes wrong, then pretend it was EOF and exit the loop".  Clearly of DASD
were less reliable, we'd see fewer programs written those ways, and we'd get
better diagnostics when something goes wrong, rather than just less output from
our programs.

To modify the above loop to do error handling properly would take several
additional lines of code, no matter what technique was used to code it.  Using the
syntax of RFC 119, and assuming that all errors turn into thrown exceptions, you
can separate the normal logic flow and the error logic flow as follows:

   while ( ! eof ( FILE ))
   { $length = read ( FILE, $buffer, $want );
      if ( $lawyer  ||  $bill_collector )
      { & due_process ( $buffer, $length );
      } else
      { & do_process ( $buffer, $length );
      }
   }
   catch FileError { #< handle error ># };

(N.B.  I didn't write the several lines, just used the placeholder: #< handle
error >#)

So really, the point is that it is nice to precheck the conditions and avoid error
handling, whether it be via error codes, or via exception handling.

File handling is pretty obvious: usually you get data, not EOF, so it is pretty
obvious where the exception handling should go.  But regarding the "index" method,
I agree that some of the time you are searching for the existance of a string, and
you don't really expect to find it -- that neither finding it, nor not finding it
is unexpected.  In the example John coded, with the result of "index" being passed
directly to "substr", it would be better if index threw an exception on error.
But in other situations where index might be used, that wouldn't always be true.
"index" is one of those functions that has multiple outcomes some or all of which
might be acceptable in some circumstances, and some or all of which might be
unacceptable in some circumstances.  And that is the true conundrum: the function
as written is not as convenient to use in all circumstances as it could be.

The only solutions I have to this matter are:

(A) to add an additional parameter to index (and similar functions) and declare
the types of results which are acceptable in the circumstances, so that index can
then throw exceptions if the expected sorts of results are not found, and return
"normal" information in the cases which are acceptable.

or (B) to make independent functions one of which treats not finding the string as
success, one of which treats finding the string as success, one of which treats
both as success.  "index" is written as the third case, at the moment, but is
often used (per John's example) as in the second case.

Given "index" as written, wrappers could be written to handle the other cases, but
instead of doing so, and instead of fully checking for all the possible errors,
programmers are lazy and make unchecked assumptions about the content of the data
they will be processing, and then wonder what went wrong when they get data that
doesn't meet those assumptions.

I agree with your concern that exception handling is (generally) more expensive
than error codes.  However, I see it as a good expenditure of the fast CPUs of
today, as a tradeoff towards reliable processing.  And maybe in Perl6 exception
handling could be less expensive than it is (by comparison to error codes) in
other languages?  That's a question for the internals guys, of course.

--
Glenn
=====
Even if you're on the right track,
you'll get run over if you just sit there.
                       -- Will Rogers



_____NetZero Free Internet Access and Email______
   http://www.netzero.net/download/index.html
Re: RFC 213 (v1) rindex and index should return undef on failure

Reply via email to