RFC 136 (v2) Implementation of hash iterators

Perl6 RFC Librarian Mon, 04 Sep 2000 14:32:15 -0700
This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Implementation of hash iterators

=head1 VERSION

  Maintainer: Tom Hughes <[EMAIL PROTECTED]>
  Date: 20 Aug 2000
  Last Modified: 4 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Version: 2
  Number: 136
  Status: Developing

=head1 ABSTRACT

Perl 5 makes very limited use of iterators internally. It has long
been suggested that certain common idioms could be implemented much
more efficiently using iterators and this RFC suggests some possible
mechanisms for achieving this goal.

=head1 DESCRIPTION

=head2 How iterators work in perl 5

In perl 5 every hash has exactly one iterator, which can be used
explicitly by the programmer using the each function and which is
also used by perl implicitly to implement the keys and values
functions.

This is confusing and dangerous as you can easily get an action
at a distance effect whereby using each/keys/values in a function
affects the iteration being performed by the caller.

=head2 How iterators might work in perl 6

In perl 6 the keys and values functions should no longer use the
same iterator as the each function - each use of keys and values
should use it's own private iterator instead.

In fact those iterators should probably be lazy so that a piece
of code like this:

  foreach (keys %x)
  {
    ...
  }

does not immediately iterate over the whole hash, pushing copies
of all the keys onto the stack; but should instead create an
iterator object of some sort that can be passed to the loop op
which will then pull off one value for each pass of the loop.

This is significantly more efficient for large hashes, especially
if the loop exits early.

=head2 Freezing state for keys and values efficiently

The only problem with this system is that the current implementation
of keys and values causes the result to be frozen so that changes to
the hash made in the loop body do not effect the values being iterated
over.

I believe this problem can be solved by using the vtable for the
hash to wrap any mutating functions with a completion routine that
will advance the iterator to completion, creating a temporary list
of copied keys/values that it can then continue to iterate over.

Obviously after this completion was done the wrapper routine would
call the original vtable routine to update the hash. It would also
remove itself from the vtable so you would only pay the cost of the
wrapper on the first change to the hash. Equally you would only pay
for building the temporary list when it was necessary - ie when you
change the hash inside the loop.

For keys the iterator would need to trap insertions and deletions
and for values it would need to trap these and also modifications of
existing values.

Because this mechanism can sit above the main hash implementation
it should work equally for both builtin hashes and for things like
tied hashes where implementation details are unknown to the core.

=head2 Handling tied hashes

The current tied hash interface is only capable of supporting a
single iterator via the FIRSTKEY and NEXTKEY functions.

In order to support the enhanced iteration described above it is
suggested that the tied hash interface be extended but the precise
details of that are a matter for a language RFC rather than this
document.

Certain underlying structures with which tied hashes are used (for
example some types of dbm file) only support a single iterator which
makes support of an enhanced tied hash interface difficult.

It is however possible to work around this limitation using some
form of buffering. This could be done either by the tied hash
implementation or, more helpfully, at the perl level if the tied
hash indicated that it could only support a single iterator. There
are basically three ways that this buffering could work:

=over 4

=item Perl 5 style - always buffer all the items

This technique requires that when creating an iterator for a hash
which only supports single iteration that the underlying iterator
should be reset and then advanced to completion, creating a list
of values which is saved and then iterated over by the perl level
iterator.

=item Buffer previously read values

This form of buffering works by having the first iterator save
each value as it iterates. If a second iterator is then required
it can begin work by traversing the saved values and then if
necessary moving on to use the underlying iterator - the most
advanced of all the iterators is responsible for reading and
saving values from the underlying iterator.

The buffer values can be discarded whenever the number of
iterators in use falls to zero.

=item Buffer pending values when necessary

The final form of buffering works by having the first iterator
work as normal, retrieving and returning values from the real
underlying iterator, until such time as a second iterator is
created.

At this point the underlying iterator will be advanced to the
end and the result values cached. The second logical iterator
can then reset the underlying iterator and start to use it while
the original logical iterator can continue by using the buffered
values.

It should be possible to share the buffering of the tail values
between multiple iterators as appropriate and to destroy the
buffer only when all the iterators are destroyed.

=back

Overall I believe that the third of these provided the best
results - it is free until a second iterator is created and
then only buffers as much as is necessary to support the goal
of multiple iteration.

=head1 CHANGES

=over 4

=item *

    Added a discussion of tied hash issues and ways of
    resolving them

=item *

    Set the status as "Developing"

=back

=head1 REFERENCES

RFC 123: Builtin: lazy

perlfunc manpage for discussion of each, keys and values
RFC 136 (v2) Implementation of hash iterators

Reply via email to