This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Add reshape() for multi-dimensional array reshaping

=head1 VERSION

   Maintainer: Nathan Wiger <[EMAIL PROTECTED]>
   Date: 24 Aug 2000
   Version: 1
   Mailing List: [EMAIL PROTECTED]
   Number: 148
   Status: Developing

=head1 ABSTRACT

Currently, there is no easy way to reshape existing arrays into multiple
arrays or matrices. This makes nifty array manipulation and complex math
hard.

2 RFC's, 90 and 91, describe highly-specific solutions. However, these
are non-extensible. A more general-purpose tool that can do arbitrary
multi-dimensional array reshaping is a better choice for core. Other
functions can then simply be specialized forms of this builtin.

=head1 DESCRIPTION

Let's jump in. This RFC proposes a C<reshape> builtin with the following
syntax:

  @reshaped = reshape $x, $y, $i, @array [, @array ...]

Where C<$x> and C<$y> are the length and number of the arrays produced,
respectively (they can also be thought of as the x and y of a matrix,
hence the notation). The C<$i> specifies the interleave of the elements.
0 specifies no interleave, whereas 1 specifies to interleave across
lists. There is currently no meaning to $<i > 1>, but this may be added
later.

If called in the one-list form, then that list is split into multiple
other lists. If called with more than one list, then those lists are
joined back together into a single list. In both cases, C<$x>, C<$y>,
and C<$i> are used in the same way to determine how the lists are
reshaped.

The dimensions are subject to the following properties:

  1. Less data than specified causes C<reshape> to
     return undef

  2. More data than specified is silently discarded

If either the C<$x> or C<$y> dimension is undef or 0, then it is assumed
to be a wildcard. See below.

=head2 Single Array Form - SPLIT

When one array is passed in, it is split up. Here, the C<$x> and C<$y>
determine the dimensions of the resulting lists. The C<$i> determines
the interleave. For example, assume reshape is called with the list
(1..23) in the following forms:

   $x,$y,$i  Results
   --------  ------------------------------------------
   3, 2, 0   ( [1,2,3], [4,5,6] ) 
   2, 4, 0   ( [1,2],[3,4],[5,6],[7,8] )
   3, 2, 1   ( [1,3,5], [2,4,6] )
   3, 3, 1   ( [1,4,7], [2,5,8], [3,6,9] )
   14,20,1   undef - not enough data to fill

Notice how each dimension works together to C<reshape> the arrays. As
such, the combination of the arguments is more significant than the
individual arguments themselves. Also, note that any excess data left
over after the dimensions have been fulfilled is discarded. In the final
example, undef is returned, allowing you to easily check if you have
enough data:

   @matrix = reshape 14, 20, 1, @input or die "Not enough data!";

In addition, wildcards can be used. With a fixed C<$y>, only that many
lists are returned.  However, with a wildcard C<$y>, any number of
C<$x>-long lists are returned:

   $x,$y,$i  Results
   --------  ------------------------------------------
   4, 0, 1   ( [1,6,11,16], [2,7,12,17],
               [3,8,13,18], [4,9,14,19],
               [5,10,15,20] )               # lose 21, 22

Note that we lose data here because we can't get an exact number of
lists length C<$x>. With a fixed C<$x>, lists I<must> be returned that
fixed length.

However, with a wildcard C<$x>, lists will be expanded to fill the
number specified by C<$y>, even in mismatched sizes:

   $x,$y,$i  Results
   --------  ------------------------------------------
   0, 2, 1   ( [1,3,5,7,9,11,13,15,17,19,21,23],
               [2,4,6,8,10,12,14,16,18,20,22])   # unzip
   0, 7, 0   ( [1,2,3,4], [5,6,7,8], [9,10,11,12],
               [13,14,15,16], [17,18,19,20],
               [21,22,23] )                      # partition

Here, all the data is guaranteed to be preserved. It is simply split
into exactly the number of parts specified by C<$y>, even if that
results in some lists being smaller.

=head1 Multiple Array Form - JOIN

In this form, multiple arrays are joined back together. Here, C<$x>,
C<$y>, and C<$i> specify the dimensions to use to rejoin the lists, not
to split them up. The dimensions simply work in reverse: Rather than
specifying how many lists to create, they specify which elements of the
input lists are joined back together.

So, we'll assume an input array of the form:

   ( [1,4,7,10], [2,5,8], [3,6,9] )

Which is called by C<reshape> with the following dimensions:

   $x,$y,$i  Results
   --------  ------------------------------------------
   0, 0, 1   ( 1,2,3,4,5,6,7,8,9,10 )      # zip
   0, 0, 0   ( 1,4,7,10,2,5,8,3,6,9 )      # simple concat
   3, 0, 1   ( 1,2,3,4,5,6,7,8,9 )         # 3 vals from all lists
   0, 2, 1   ( 1,2,4,5,7,8,10 )            # all vals from 2 lists
   3, 2, 1   ( 1,2,4,5,7,8 )               # 3 vals x 2 lists

Hopefully this is easy to understand. C<$x> controls how many elements
of each list are used, and C<$y> controls how many lists are used. This
is just like the splitting operation, but in reverse. C<$i> simply
controls whether or not they're interleaved or just concatenated (same
as @a = @b, @c). Again, wildcards (0) can be used here as well.

=head2 zip, unzip, and partition

RFC's 90 and 91 can now be entirely explained and written as specialized
forms of C<reshape>:

   Function         C<reshape> Equivalent
   ---------------- ----------------------------------
   zip @a, @b       reshape 0, 0, 1, @a, @b
   unzip $y, @c     reshape 0, $y, 1, @c
   part $y, @c      reshape 0, $y, 0, @c   

This makes understanding what the operations are doing and how they
differ much easier. It also means that the functions can be extremely
compact, since they need only pass C<reshape> the appropriate arguments.

This also makes it obvious that the existing functions are only working
on the y axis of our imaginary matrix model. This means additional
functions that work just on the x axis may be useful as well. 

=head2 Matrix Calculations and Extensions

It is the opinion of the author that extensive matrix calculations and
manipulations be left to external modules. A function such as this
should be able to take care of most of the basic funcionality needed to
create N-dimensional matrices. However, true matrix functions should be
put in modules.

=head1 IMPLEMENTATION

We'll get to this in v10. :-)

=head1 MIGRATION

None. This introduces new functionality.

=head1 REFERENCES

RFC 81: Lazily evaluated list generation functions 

http://www.mail-archive.com/perl6-language%40perl.org/msg01910.html

Thanks to Uri Guttman for suggesting the APL "reshape" name

Reply via email to