Multiple result sets and the $sth->more_results method (Mark 3)

Tim Bunce Tue, 04 May 2004 02:04:31 -0700

Here's a further update of what I'm thinking of with respect
to $sth->more_results (many thanks for the valuable feedback).


This is quite long because I've tried to consider all the issues.


=head1 Multiple result sets and the $sth->more_results method

There are 3 kinds of batches of SQL statements:

 o Arrays of parameters - where an array of values is bound to
   placeholders in a single statement.
   [This kind of batch is supported via the bind_param_array() and
   execute_array() methods and won't be discussed further here.]

 o Stored Procedures - where a procedure that contains multiple
   statements is executed AND the database is one that returns
   back to the client the results from those embedded statements.

 o Explicit Batches - where the $sql passed to prepare() contains
   multiple statements. The statements are typically delimited by
   semicolons but database specific and that's not relevant here.
   Some databases, such as Sybase, don't make a distinction between
   Explicit Batches and Stored Procedures.

In the text below I'll refer to the individual statements within
a batch as I<sub-statements>.

When a batch is first executed the statement handle appears exactly
as if the batch only contained the first sub-statement.
The only exception is that the $sth->{NUM_OF_PARAMS} attribute
equals the total number of placeholders in the entire statement.

The $sth->more_results method is used to move to the results of the
next sub-statement in the batch. If more_results() returns true then
the statement handle appears exactly as if the batch only contained
the next sub-statement. And so on. The $sth->more_results() method
I<appears> to have the effect of preparing and executing the next
sub-statement within the batch.

Some errors will obviously be detected and reported by the original
prepare() and execute() but others will not be detected till later.
As far as possible more_results() should report errors that would
have been detected and reported by a corresponding prepare() and
execute() of the individual sub-statement.  Even if the driver has
to call it's own databases' more_results() function at the end of
the previous result set, an error relating to the next sub-statement
should be stored internally and not reported till the $sth->more_results()
method is called.

The more_results() method returns false in these situations:
If there is another sub-statement and that sub-statement fails
then
    The error is recorded (which will trigger RaiseError if enabled).
    more_results() returns 0;
else
    # No more result sets. Either all succeeded or an error has caused
    # execution of later statement to be skipped
    more_results() returns undef; 
end

Because the statement handle always appears as if only one statement
has been executed it follows that:

  $sth->err can be used to tell if there was an error, and
  $sth->{NUM_OF_FIELDS} > 0 if the sub-statement is a SELECT, and
  $sth->rows can be used to get the row count if it isn't
             (but see SQL_BATCH_ROW_COUNT below)

Whether the entire batch of statements is executed before any results
are available, or whether calling more_results() actually triggers
the execution of the next sub-statement in the batch, is undefined.

The state of statement handle attributes after more_results() returns
undef is undefined.

Placeholders work as if the entire statement string was a single statement.
If execute() is called with parameter values then the number of values must
match the total number of placeholders in the entire statement string.


=head2 Skipping Non-SELECT Results

When executing stored procedures, for example, applications are
sometimes only interested in the results of SELECT statements executed
by the procedure and not in any INSERTs, DELETEs, or UPDATEs that
it may also have executed. So it would be helpful to make it easy
to skip those non-SELECT results (as DBD::ODBC currently does).

However, extending the DBI specification to support skipping non-SELECT
results from batches is problematic for several reasons, including
that execution of sub-statements is effectively split between the
initial execute() and subsequent calls to more_results().

A complete implementation of skipping non-SELECT results would require
that execute() should call more_results() whenever NUM_OF_FIELDS was 0.
(It can't be limited to apply only to "batches" because many drivers
can't tell if there are multiple results until they try to fetch them.)

So, at least for now, this feature won't be part of the DBI but,
as always, drivers are free to implement their own mechanisms
under the control of driver-specific attributes.


=head2 Driver Metadata via get_info()

Three get_info() types are relevant here: SQL_MULT_RESULT_SETS,
SQL_BATCH_SUPPORT and SQL_BATCH_ROW_COUNT. I'll use the symbolic
names available via the DBI::Const::GetInfoType module:

  use DBI::Const::GetInfoType;

=head3 SQL_MULT_RESULT_SETS

  my $multresults_support = $dbh->get_info($GetInfoType{SQL_MULT_RESULT_SETS});

returns 'Y' is the driver supports multiple result sets and 'N' otherwise.
The likes of MS SQL Server, Sybase and DB2 return 'Y'.

=head3 SQL_BATCH_SUPPORT

  my $batch_support = $dbh->get_info($GetInfoType{SQL_BATCH_SUPPORT});

This returns a bitmask indicating the drivers support for batches:

  SQL_BS_SELECT_EXPLICIT

The driver supports Explicit Batches that can have I<result-set> generating statements.

  SQL_BS_ROW_COUNT_EXPLICIT

The driver supports Explicit Batches that can have I<row-count> generating statements.

  SQL_BS_SELECT_PROC
  SQL_BS_ROW_COUNT_PROC

As above but for Stored Procedures

The likes of MS SQL Server sets all bits, DB2 sets SQL_BS_SELECT_EXPLICIT,
SQL_BS_ROW_COUNT_EXPLICIT and SQL_BS_SELECT_PROC but not SQL_BS_ROW_COUNT_PROC.


=head3 SQL_BATCH_ROW_COUNT

  my $batch_row_count = $dbh->get_info($GetInfoType{SQL_BATCH_ROW_COUNT});

This returns a bitmask indicating how row counts from I<consecutive>
INSERT, DELETE, or UPDATE statements within the batch are made
available:

  SQL_BRC_ROLLED_UP

Row counts for consecutive INSERT, DELETE, or UPDATE statements are
rolled up into one. If this bit is not set, then row counts are
available for each individual statement.

  SQL_BRC_PROCEDURES

Row counts, if any, are available when a batch is executed I<within> a
stored procedure. If row counts are available, they can be rolled
up or individually available, depending on the SQL_BRC_ROLLED_UP bit.

  SQL_BRC_EXPLICIT

Row counts, if any, are available when an Explicit Batch is executed.
If row counts are available, they can be rolled up or individually
available, depending on the SQL_BRC_ROLLED_UP bit.


=head2 Example Implementation via DBI Subclass

In this section I've provided an I<example> proof-of-concept
implementation (in the form of a DBI subclass) as a way of
expressing the behaviour and exploring the issues.

It is I<not> intended to be a model implementation for drivers to
copy because drivers for databases which support multiple results
should just let the database look after it.

(Drivers for other databases can just ignore this whole thread,
at least until the DBI has a working preparse() method :)

I have left a $sth->{SkipNonSelect} attribute in this example but,
as discussed above, I'm not proposing to include it in the spec.

Also, this example doesn't do-the-right-thing with NUM_OF_PARAMS
because it doesn't set it the the total number of params required.


sub prepare {
    my ($dbh, $Statement) = @_;
    my @statements = split /;/, $Statement; # (obviously dumb example)
    my $sth = $dbh->SUPER::prepare(shift @statements); # prepare the first
    $sth->{statements} = [EMAIL PROTECTED]                  # store the rest
        if @statements;
    return $sth;
}

sub execute {
    my $sth = shift;
    $sth->{parameters} = [ @_ ];
    my @params = splice @{ $sth->{parameters} }, 0, $sth->{NUM_OF_PARAMS};
    my $rv = $sth->SUPER::execute(@params);
    if ($sth->{SkipNonSelect} && $sth->{NUM_OF_FIELDS} == 0) {
        # Application wants to skip this non-SELECT statement,
        # so call more_results() to see if there any any SELECTS,
        # if so we'll return the result of more_results(), but if not
        # then we'll return the original $rv (so this is transparent)
        my $more_rv = $sth->more_results();
        return $more_rv if defined $more_rv;
    }
    retun $rv;
}

sub more_results {
    my ($sth) = @_;
    my $statements = $sth->{statements}
        or return undef; # doesn't have multiple statements
    next_results:
    my $Statement = shift @$statements
        or return undef; # no more statements left
    $sth->prepare($Statement)
        or return 0;    # prepare failure
    my @params = splice @{ $sth->{parameters} }, 0, $sth->{NUM_OF_PARAMS};
    my $rv = $sth->execute(@params)
        or return 0;    # execute failure
    if ($sth->{SkipNonSelect} && $sth->{NUM_OF_FIELDS} == 0) {
        goto next_results;
    }
    return $rv; # must be true but is otherwise undefined
}

sub finish {
    my ($sth) = @_;
    $sth->SUPER::finish; # discard any unfetched from current sub-statement
    $sth->{statements} = undef; # discard any unexecuted sub-statements
    $sth->{parameters} = undef; # discard any unused parameters
    return 1;
}

If a batch of statements update the values of placeholders which
are bound using bind_param_inout() the output values are only
guaranteed to be available after all results have been processed
(ie when more_results() returns undef).  Though some drivers may be
able to update the values sooner, or update them more than once.

[In practice a driver may have to call the 'more_results' function
of the underlying database API at the end of each result set
in order to correctly implement the desired behaviour. In that case
it would typically just set a flag and still defer the preparing
of the new statement attributes until $sth->more_results is called.]


=head2 Application Examples

The generic loop to fetch multiple result sets is like this:

  $sth->execute;
  do {
      if ($sth->err) {
          ...handle sub-statement error... (if RaiseError not set)
      }
      elsif ($sth->{NUM_OF_FIELDS}) {
          while (@data = $sth->fetchrow_array) {
              ...
          }
      }
      else {
          printf "Non-SELECT: %d rows\n", $sth->rows;
      }
  } while (defined $sth->more_results);

If the application is happy to exit the loop if any sub-statement
trigers an error then the first C<if> and the C<defined> can be removed.
Then the code reduces to:

  $sth->execute or ...;
  do {
      if ($sth->{NUM_OF_FIELDS}) {
          while (@data = $sth->fetchrow_array) {
              ...
          }
      }
      else {
          printf "Non-SELECT: %d rows\n", $sth->rows;
      }
  } while ($sth->more_results);

and, if you've no interest in any non-SELECT results that might be
returned, then that simplifies down to:

  $sth->execute or ...;
  do {
      next unless $sth->{NUM_OF_FIELDS} > 0;
      while (@data = $sth->fetchrow_array) {
          ...
      }
  } while ($sth->more_results};
 
or even, for the do-loop-phobic with no non-SELECT results, just:

  $sth->execute or ...;
  while ($data = $sth->fetchrow_arrayref
     or $sth->more_results && $data = $sth->fetchrow_arrayref
  ) {
      ...
  }
 
=head2 Active

The interaction between the Active attribute, the finish method,
and row fetch errors needs to be reconsidered in relation to batches.

The finish() method will discard I<all> pending result sets.
The more_results() method automatically discards any unfetched
rows from the current sub-statement.

The statement handle Active attribute is used to indicate that there
is active state on the database server associated with this handle.
In other words that there is more data to be fetched, and that
finish() should be called before the statement handle is destroyed.

The principles described at the start mean that $sth->{Active}
I<should> go false at the end of I<each> set of SELECT results.
However, that goes against the definition of what Active means
because in a batch there I<is> still active state on the server.

So Active will remain true while there is active state on the server.
This looser definition gives more flexibility to the driver and
should discourage applications from making inappropriate assumptions.

The original principle still applies: Active should automatically
go false after results from a statement handle are fully consumed,
and $sth->finish can be used to force it to false if required.

This means:

  Active should be false after execute() of a single non-SELECT
  statement, or a stored procedure that returns a single non-SELECT
  statement row count. Applications shouldn't need to call finish()
  in these cases.

  Active can be true for a non-SELECT sub-statement (within a batch),
  which isn't the case currently. But no code should be relying on that.

=head2 Non-terminal Row Errors

A non-terminal row fetch error occurs where a row has an error,
like a truncation or divide by zero, I<but more rows can still be fetched>.

Currently that can be distinguished from a terminal error (where
more rows I<can't> be fetched) because Active remains true. But now
that Active will stay true till the end of a batch that can't be used.

This isn't a big issue because (a) very few applications need to
distinguish recover from non-terminal row fetch errors, and
(b) it's always possible to just try to fetch the next row.
So I'm inclined to ignore it, at least for now.

=cut


Comments?

Tim.

p.s. Many thanks to Michael Peppler, Jeff Urlwin and Martin Evans for their
valuable research and assistance.

Multiple result sets and the $sth->more_results method (Mark 3)

Reply via email to