This and other RFCs are available on the web at
http://dev.perl.org/rfc/
=head1 TITLE
Add reshape() for multi-dimensional array reshaping
=head1 VERSION
Maintainer: Nathan Wiger <[EMAIL PROTECTED]>
Date: 24 Aug 2000
Version: 1
Mailing List: [EMAIL PROTECTED]
Number: 148
Status: Developing
=head1 ABSTRACT
Currently, there is no easy way to reshape existing arrays into multiple
arrays or matrices. This makes nifty array manipulation and complex math
hard.
2 RFC's, 90 and 91, describe highly-specific solutions. However, these
are non-extensible. A more general-purpose tool that can do arbitrary
multi-dimensional array reshaping is a better choice for core. Other
functions can then simply be specialized forms of this builtin.
=head1 DESCRIPTION
Let's jump in. This RFC proposes a C<reshape> builtin with the following
syntax:
@reshaped = reshape $x, $y, $i, @array [, @array ...]
Where C<$x> and C<$y> are the length and number of the arrays produced,
respectively (they can also be thought of as the x and y of a matrix,
hence the notation). The C<$i> specifies the interleave of the elements.
0 specifies no interleave, whereas 1 specifies to interleave across
lists. There is currently no meaning to $<i > 1>, but this may be added
later.
If called in the one-list form, then that list is split into multiple
other lists. If called with more than one list, then those lists are
joined back together into a single list. In both cases, C<$x>, C<$y>,
and C<$i> are used in the same way to determine how the lists are
reshaped.
The dimensions are subject to the following properties:
1. Less data than specified causes C<reshape> to
return undef
2. More data than specified is silently discarded
If either the C<$x> or C<$y> dimension is undef or 0, then it is assumed
to be a wildcard. See below.
=head2 Single Array Form - SPLIT
When one array is passed in, it is split up. Here, the C<$x> and C<$y>
determine the dimensions of the resulting lists. The C<$i> determines
the interleave. For example, assume reshape is called with the list
(1..23) in the following forms:
$x,$y,$i Results
-------- ------------------------------------------
3, 2, 0 ( [1,2,3], [4,5,6] )
2, 4, 0 ( [1,2],[3,4],[5,6],[7,8] )
3, 2, 1 ( [1,3,5], [2,4,6] )
3, 3, 1 ( [1,4,7], [2,5,8], [3,6,9] )
14,20,1 undef - not enough data to fill
Notice how each dimension works together to C<reshape> the arrays. As
such, the combination of the arguments is more significant than the
individual arguments themselves. Also, note that any excess data left
over after the dimensions have been fulfilled is discarded. In the final
example, undef is returned, allowing you to easily check if you have
enough data:
@matrix = reshape 14, 20, 1, @input or die "Not enough data!";
In addition, wildcards can be used. With a fixed C<$y>, only that many
lists are returned. However, with a wildcard C<$y>, any number of
C<$x>-long lists are returned:
$x,$y,$i Results
-------- ------------------------------------------
4, 0, 1 ( [1,6,11,16], [2,7,12,17],
[3,8,13,18], [4,9,14,19],
[5,10,15,20] ) # lose 21, 22
Note that we lose data here because we can't get an exact number of
lists length C<$x>. With a fixed C<$x>, lists I<must> be returned that
fixed length.
However, with a wildcard C<$x>, lists will be expanded to fill the
number specified by C<$y>, even in mismatched sizes:
$x,$y,$i Results
-------- ------------------------------------------
0, 2, 1 ( [1,3,5,7,9,11,13,15,17,19,21,23],
[2,4,6,8,10,12,14,16,18,20,22]) # unzip
0, 7, 0 ( [1,2,3,4], [5,6,7,8], [9,10,11,12],
[13,14,15,16], [17,18,19,20],
[21,22,23] ) # partition
Here, all the data is guaranteed to be preserved. It is simply split
into exactly the number of parts specified by C<$y>, even if that
results in some lists being smaller.
=head1 Multiple Array Form - JOIN
In this form, multiple arrays are joined back together. Here, C<$x>,
C<$y>, and C<$i> specify the dimensions to use to rejoin the lists, not
to split them up. The dimensions simply work in reverse: Rather than
specifying how many lists to create, they specify which elements of the
input lists are joined back together.
So, we'll assume an input array of the form:
( [1,4,7,10], [2,5,8], [3,6,9] )
Which is called by C<reshape> with the following dimensions:
$x,$y,$i Results
-------- ------------------------------------------
0, 0, 1 ( 1,2,3,4,5,6,7,8,9,10 ) # zip
0, 0, 0 ( 1,4,7,10,2,5,8,3,6,9 ) # simple concat
3, 0, 1 ( 1,2,3,4,5,6,7,8,9 ) # 3 vals from all lists
0, 2, 1 ( 1,2,4,5,7,8,10 ) # all vals from 2 lists
3, 2, 1 ( 1,2,4,5,7,8 ) # 3 vals x 2 lists
Hopefully this is easy to understand. C<$x> controls how many elements
of each list are used, and C<$y> controls how many lists are used. This
is just like the splitting operation, but in reverse. C<$i> simply
controls whether or not they're interleaved or just concatenated (same
as @a = @b, @c). Again, wildcards (0) can be used here as well.
=head2 zip, unzip, and partition
RFC's 90 and 91 can now be entirely explained and written as specialized
forms of C<reshape>:
Function C<reshape> Equivalent
---------------- ----------------------------------
zip @a, @b reshape 0, 0, 1, @a, @b
unzip $y, @c reshape 0, $y, 1, @c
part $y, @c reshape 0, $y, 0, @c
This makes understanding what the operations are doing and how they
differ much easier. It also means that the functions can be extremely
compact, since they need only pass C<reshape> the appropriate arguments.
This also makes it obvious that the existing functions are only working
on the y axis of our imaginary matrix model. This means additional
functions that work just on the x axis may be useful as well.
=head2 Matrix Calculations and Extensions
It is the opinion of the author that extensive matrix calculations and
manipulations be left to external modules. A function such as this
should be able to take care of most of the basic funcionality needed to
create N-dimensional matrices. However, true matrix functions should be
put in modules.
=head1 IMPLEMENTATION
We'll get to this in v10. :-)
=head1 MIGRATION
None. This introduces new functionality.
=head1 REFERENCES
RFC 81: Lazily evaluated list generation functions
http://www.mail-archive.com/perl6-language%40perl.org/msg01910.html
Thanks to Uri Guttman for suggesting the APL "reshape" name