RFC 148 (v2) Add reshape() for multi-dimensional array reshaping

Perl6 RFC Librarian Mon, 18 Sep 2000 17:58:29 -0700
=head1 VERSION
Reply-To: [EMAIL PROTECTED]

This and other RFCs are available on the web at
  http://dev.perl.org/rfc/

=head1 TITLE

Add reshape() for multi-dimensional array reshaping
                
=head1 VERSION

  Maintainer: Nathan Wiger <[EMAIL PROTECTED]>
  Date: 24 Aug 2000
  Last Modified: 18 Sep 2000
  Mailing List: [EMAIL PROTECTED]
  Number: 148
  Version: 2
  Status: Developing

=head1 CHANGES

   1. Altered syntax to increase flexibility

   2. Removed arbitrary interleaving feature

   3. Almost had a stroke trying to update all my RFC's

=head1 ABSTRACT

Currently, there is no easy way to reshape existing arrays into multiple
arrays or matrices. This makes nifty array manipulation and complex math
hard.

A general-purpose tool that can do arbitrary multi-dimensional array
reshaping, from which other array manipulation functions can be derived,
makes data manipulation easier.

=head1 DESCRIPTION

Let's jump in. This RFC proposes a C<reshape> builtin with the following
syntax:

  @reshaped = reshape [$x, $y, $z, ..], @a, @b ...

The prototype would look something like this:

  sub reshape (\@;\@\@\@\@\@\@...);

The first argument is an array of dimensions, where C<$x> and C<$y> are
the shape of the array to produce. The order and meaning of the
arguments are the same ones used by the C<:shape> array attribute
described in B<RFC 203>.

We only need one C<reshape> since it is a multipurpose tool that works
in any direction, serving as its own inverse.

The dimensions used are subject to the following properties:

  1. Less data than specified causes C<reshape> to
     return undef

  2. More data than specified is silently discarded

If any of the dimensions is specified as C<-1>, then that indicates a
wildcard and grabs enough data to fill up the list. See below.

=head2 Single Array Form - SPLIT

When one array is passed in, it is split up. Here, the C<$x> and C<$y>
determine the dimensions of the resulting lists. The C<$i> determines
the interleave. For example, assume reshape is called with the list
(1..23) in the following forms:

   @a = 1..23;
   @results = reshape [$x,$y], @a;

   $x,$y   @results
   -----   ------------------------------------------
   3, 2    ( [1,2,3], [4,5,6] ) 
   2, 4    ( [1,2],[3,4],[5,6],[7,8] )
   14,20   undef - not enough data to fill

Notice how both dimensions work together to C<reshape> the array. As
such, the combination of the arguments is more significant than the
individual arguments themselves. Also, note that any excess data left
over after the dimensions have been fulfilled is discarded. In the final
example, undef is returned, allowing you to easily check if you have
enough data:

   @matrix = reshape [14,20], @input or die "Not enough data!";

In addition, wildcards can be used. With a fixed C<$y>, only that many
lists are returned.  However, with a wildcard C<$y>, any number of
C<$x>-long lists are returned:

   $x,$y   @results
   -----   ------------------------------------------
   4, -1   ( [1,2,3,4], [5,6,7,8],
             [9,10,11,12], [13,14,15,16],
             [17,18,19,20] )               # lose 21..23

Note that we lose data here because we can't get an exact number of
lists length C<$x>. With a fixed C<$x>, lists I<must> be returned that
fixed length.

However, with a wildcard C<$x>, lists will be expanded to fill the
number specified by C<$y>, even in mismatched sizes:

   $x,$y   @results
   -----   ------------------------------------------
   -1, 2   ( [1,2,3,4,5,6,7,8,9,10,11,12],
             [13,14,15,16,17,18,19,20,21,22,23])

Here, all the data is guaranteed to be preserved. It is simply split
into exactly the number of parts specified by C<$y>, even if that
results in some lists being different sizes.

=head1 Multiple Array Form - JOIN

In this form, multiple arrays are joined back together. Here, C<$x>
and C<$y>, specify the dimensions to use to rejoin the lists, not
to split them up. The dimensions simply work in reverse: Rather than
specifying how many lists to create, they specify which elements of the
input lists are joined back together.

So, we'll assume an input array of the form:

   ( [1,2,3,4], [5,6,7,8], [9,10,11,12] )

Which is called by C<reshape> with the following dimensions:

   $x,$y   @results
   -----   ------------------------------------------
   -1,-1   ( 1,4,7,10,2,5,8,3,6,9 )      # simple concat
   3, -1   ( 1,2,3,5,6,7,8,9,10 )        # 3 vals from all lists
   -1, 2   ( 1,2,3,4,5,6,7,8 )           # all vals from 2 lists
   3, 2    ( 1,2,3,4,5,6 )               # 3 vals x 2 lists

Hopefully this is easy to understand. C<$x> controls how many elements
of each list are used, and C<$y> controls how many lists are used. This
is just like the splitting operation, but in reverse. Again, wildcards
of C<-1> can be used here as well.

=head2 Matrix Calculations and Extensions

It is the opinion of the author that extensive matrix calculations and
manipulations be left to external modules. A function such as this
should be able to take care of most of the basic funcionality needed to
create N-dimensional matrices. However, true matrix functions should be
put in modules.

=head1 IMPLEMENTATION

We'll get to this in v10. :-)

=head1 MIGRATION

None. This introduces new functionality.

=head1 REFERENCES

RFC 81: Lazily evaluated list generation functions 

RFC 203: Arrays: Notation for declaring and creating arrays

http://www.mail-archive.com/perl6-language%40perl.org/msg01910.html

Thanks to Uri Guttman for suggesting the APL "reshape" name
RFC 148 (v2) Add reshape() for multi-dimensional array reshaping

Reply via email to