[Numpy-discussion] structured arrays, recarrays, and record arrays

2015-01-18 Thread Allan Haldane
Hello all,

Documentation of recarrays is poor and I'd like to improve it. In order 
to do this I've been looking at core/records.py, and I would appreciate 
some feedback on my plan.

Let me start by describing what I see. In the docs there is some 
confusion about 'structured arrays' vs 'record arrays' vs 'recarrays' - 
the docs use them often interchangeably. They also refer to structured 
dtypes alternately as 'struct data types', 'record data types' or simply 
'records' (eg, see the reference/arrays.dtypes and 
reference/arrays.indexing doc pages).

But by my reading of the code there are really three (or four) distinct 
types of arrays with structure. Here's a possible nomenclature:
  * Structured arrays are simply ndarrays with structured dtypes. That
is, the data type is subdivided into fields of different type.
  * recarrays are a subclass of ndarrays that allow access to the
fields by attribute.
  * Record arrays are recarrays where the elements have additionally
been converted to 'numpy.core.records.record' type such that each
data element is an object with field attributes.
  * (it is also possible to create arrays with dtype.dtype of
numpy.core.records.record, but which are not recarrays. However I
have never seen this done.)

Here's code demonstrating the creation of the different types of array 
(in order: structured array, recarray, ???, record array).

  arr = np.array([(1,'a'), (2,'b')],
dtype=[('foo', int), ('bar', 'S1')])
  recarr = arr.view(type=np.recarray)
  noname = arr.view(dtype=dtype(np.record, arr.dtype))
  recordarr = arr.view(dtype=dtype((np.record, arr.dtype)),
  type=np.recarray)

  type(arr), arr.dtype.type
 (numpy.ndarray, numpy.void)
  type(recarr), recarr.dtype.type
 (numpy.core.records.recarray, numpy.void)
  type(recordarr), recordarr.dtype.type
 (numpy.core.records.recarray, numpy.core.records.record)

Note that the functions numpy.rec.array, numpy.rec.fromrecords, 
numpy.rec.fromarrays, and np.recarray.__new__ create record arrays. 
However, in the docs you can see examples of the creation of recarrays, 
eg in the recarray and ndarray.view doctrings and in 
http://www.scipy.org/Cookbook/Recarray. The files 
numpy/lib/recfunctions.py and numpy/lib/npyio.py (and possibly masked 
arrays, but I haven't looked yet) make extensive use of recarrays (but 
not record arrays).

The main functional difference between recarrays and record arrays is 
field access on individual elements:

  recordarr[0].foo
 1
  recarr[0].foo
 Traceback (most recent call last):
   File stdin, line 1, in module
 AttributeError: 'numpy.void' object has no attribute 'foo'

Also, note that recarrays have a small performance penalty relative to 
structured arrays, and record arrays have another one relative to 
recarrays because of the additional python logic.

So my first goal in updating the docs is to use the right terms in the 
right place. In almost all cases, references to 'records' (eg 'record 
types') should be replaced with 'structured' (eg 'structured types'), 
with the exception of docs that deal specifically with record arrays. 
It's my guess that in the distant past structured datatypes were 
intended to always be of type numpy.core.records.record (thus the 
description in reference/arrays.dtypes) but that 
numpy.core.records.record became generally obsolete without updates to 
the docs. doc/records.rst.txt seems to document the transition.

I've made a preliminary pass of the docs, which you can see here
https://github.com/ahaldane/numpy/commit/d87633b228dabee2ddfe75d1ee9e41ba7039e715
Mostly I renamed 'record type' to 'structured type', and added a very 
rough draft to numpy/doc/structured_arrays.py.

I would love to hear from those more knowledgeable than myself on 
whether this works!

Cheers,
Allan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] structured arrays, recarrays, and record arrays

2015-01-18 Thread Allan Haldane

In light of my previous message I'd like to bring up 
https://github.com/numpy/numpy/issues/3581, as it is now clearer to me 
what is happening. In the example on that page the user creates a 
recarray and a record array (in my nomenclature) without realizing that 
they are slightly different types of beast. This is probably because the 
str() or repr() representations of these two objects are identical. To 
distinguish them you have to look at their dtype.type. Using the setup 
from my last message:

  print repr(recarr)
 rec.array([(1, 'a'), (2, 'b')],
   dtype=[('foo', 'i8'), ('bar', 'S1')])
  print repr(recordarr)
 rec.array([(1, 'a'), (2, 'b')],
   dtype=[('foo', 'i8'), ('bar', 'S1')])
  print repr(recarr.dtype)
 dtype([('foo', 'i8'), ('bar', 'S1')])
  print repr(recordarr.dtype)
 dtype([('foo', 'i8'), ('bar', 'S1')])
  print recarr.dtype.type
 type 'numpy.void'
  print recordarr.dtype.type
 class 'numpy.core.records.record'

Based on this, it occurs to me that the repr of a dtype should list 
dtype.type if it is not numpy.void. This might be nice to see:

  print repr(recarr.dtype)
dtype([('foo', 'i8'), ('bar', 'S1')])
  print repr(recordarr.dtype)
dtype((numpy.core.records.record, [('foo', 'i8'), ('bar', 'S1')]))

I could easily implement this by redefining __repr__ for the 
numpy.core.records.record class, but this does not solve the problem for 
any other cases of overridden base_dtype. So perhaps modifications 
should be made to the original repr function of dtype (in the functions 
arraydescr_struct_str and arraydescr_struct_repr in 
numpy/core/src/multiarray/descriptor.c). However, also note that the doc 
http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html says that 
creating dtypes using the form dtype((base_dtype, new_dtype)) is 
discouraged (near the bottom).

Another possibility is to discourage recarrays, and only document record 
arrays (or vv). However, many people's code already depends on both of 
these types.

Is any of this at all reasonable? It would require a change to dtype str 
and repr, which could affect a lot of things.

Cheers,
Allan

On 01/18/2015 11:36 PM, Allan Haldane wrote:
 Hello all,

 Documentation of recarrays is poor and I'd like to improve it. In order
 to do this I've been looking at core/records.py, and I would appreciate
 some feedback on my plan.

 Let me start by describing what I see. In the docs there is some
 confusion about 'structured arrays' vs 'record arrays' vs 'recarrays' -
 the docs use them often interchangeably. They also refer to structured
 dtypes alternately as 'struct data types', 'record data types' or simply
 'records' (eg, see the reference/arrays.dtypes and
 reference/arrays.indexing doc pages).

 But by my reading of the code there are really three (or four) distinct
 types of arrays with structure. Here's a possible nomenclature:
   * Structured arrays are simply ndarrays with structured dtypes. That
 is, the data type is subdivided into fields of different type.
   * recarrays are a subclass of ndarrays that allow access to the
 fields by attribute.
   * Record arrays are recarrays where the elements have additionally
 been converted to 'numpy.core.records.record' type such that each
 data element is an object with field attributes.
   * (it is also possible to create arrays with dtype.dtype of
 numpy.core.records.record, but which are not recarrays. However I
 have never seen this done.)

 Here's code demonstrating the creation of the different types of array
 (in order: structured array, recarray, ???, record array).

   arr = np.array([(1,'a'), (2,'b')],
 dtype=[('foo', int), ('bar', 'S1')])
   recarr = arr.view(type=np.recarray)
   noname = arr.view(dtype=dtype(np.record, arr.dtype))
   recordarr = arr.view(dtype=dtype((np.record, arr.dtype)),
   type=np.recarray)

   type(arr), arr.dtype.type
  (numpy.ndarray, numpy.void)
   type(recarr), recarr.dtype.type
  (numpy.core.records.recarray, numpy.void)
   type(recordarr), recordarr.dtype.type
  (numpy.core.records.recarray, numpy.core.records.record)

 Note that the functions numpy.rec.array, numpy.rec.fromrecords,
 numpy.rec.fromarrays, and np.recarray.__new__ create record arrays.
 However, in the docs you can see examples of the creation of recarrays,
 eg in the recarray and ndarray.view doctrings and in
 http://www.scipy.org/Cookbook/Recarray. The files
 numpy/lib/recfunctions.py and numpy/lib/npyio.py (and possibly masked
 arrays, but I haven't looked yet) make extensive use of recarrays (but
 not record arrays).

 The main functional difference between recarrays and record arrays is
 field access on individual elements:

   recordarr[0].foo
  1
   recarr[0].foo
  Traceback (most recent call last):
File stdin, line 1, in module
  AttributeError: 

[Numpy-discussion] ANN: Scipy 0.15.1

2015-01-18 Thread Pauli Virtanen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear all,

We are pleased to announce the Scipy 0.15.1 release.

Scipy 0.15.1 contains only bugfixes. The module
``scipy.linalg.calc_lwork`` removed in Scipy 0.15.0 is restored.
This module is not a part of Scipy's public API, and although it is
available again in Scipy 0.15.1, using it is deprecated and it may be
removed again in a future Scipy release.

Source tarballs, binaries, and full release notes are available at
https://sourceforge.net/projects/scipy/files/scipy/0.15.1/

Best regards,
Pauli Virtanen


==
SciPy 0.15.1 Release Notes
==

SciPy 0.15.1 is a bug-fix release with no new features compared to 0.15.0.

Issues fixed
- 

* `#4413 https://github.com/scipy/scipy/pull/4413`__: BUG: Tests too
strict, f2py doesn't have to overwrite this array
* `#4417 https://github.com/scipy/scipy/pull/4417`__: BLD: avoid
using NPY_API_VERSION to check not using deprecated...
* `#4418 https://github.com/scipy/scipy/pull/4418`__: Restore and
deprecate scipy.linalg.calc_work
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlS8CA4ACgkQ6BQxb7O0pWCmOQCgzg9AXDaqRaK5/QBWopIrv2OA
WkEAn0ltDfDHFpw0zMzB9mUscAAb2xnE
=JrGj
-END PGP SIGNATURE-
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion