I would be much more in favor of `copy` eliminating padding in the
dtype, if dtypes with different paddings were considered equivalent.
But they are not.
Numpy has always treated dtypes with different padding bytes as
not-equal, and prints them very differently:
>>> a = np.array([1], dtype={'names': ['f'],
... 'formats': ['i4'],
... 'offsets': [0]})
>>> b = np.array([1], dtype={'names': ['f'],
... 'formats': ['i4'],
... 'offsets': [4]})
>>> a.dtype == b.dtype
False
>>> a.dtype
dtype([('f', '<i4')])
>>> b.dtype
dtype({'names':['f'], 'formats':['<i4'], 'offsets':[4], 'itemsize':8})
That is unlike strides, which are hidden from the user.
If we do a "dtype-overhaul" as has been plentifully discussed before,
there are many things we might change about structured dtypes, and
making padding be irrelevant in most operations could be a good one.
On the other hand, one of the main purposes of structured arrays appears
to be for interpreting binary blobs and for interfacing with C code with
C structs, where padding could be very important. Eg, if someone is
reading a binary file, they might want to do
>>> np.fromfile('myfile', a.dtype, count=10)
and then it matters very greatly to them whether the dtype has padding
or not.
Best,
Allan
PS. It is unfinished, but I would like to advertise an 'ArrayCollection'
ndarray ducktype I have worked a bit on. This ducktype behaves very much
like structured arrays for indexing and assignment, but avoids all these
padding issues and in other ways is more suitable for "pandas-like"
usage than structured arrays. See the "ArrayCollection" and
"MaskedArrayCollection" classes at
https://github.com/ahaldane/ndarray_ducktypes
See the tests and doc folders for some brief example usage.
On 4/11/19 10:07 PM, Nathaniel Smith wrote:
> My concern would be that to implement (2), I think .copy() has to
> either special-case certain dtypes, or else we have to add some kind
> of "simplify for copy" operation to the dtype protocol. These both add
> architectural complexity, so maybe it's better to avoid it unless we
> have a compelling reason?
>
> On Thu, Apr 11, 2019 at 6:51 AM Marten van Kerkwijk
> <[email protected]> wrote:
>>
>> Hi All,
>>
>> An issue [1] about the copying of arrays with structured dtype raised a
>> question about what the expected behaviour is: does copy always preserve the
>> dtype as is, or should it remove padding?
>>
>> Specifically, consider an array with a structure with many fields, say 'a'
>> to 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be
>> returned. In this case, its dtype will include a large offset. Now, if we
>> copy this view, should the result have exactly the same dtype, including the
>> large offset (i.e., the copy takes as much memory as the original full
>> array), or should the padding be removed? From the discussion so far, it
>> seems the logic has boiled down to a choice between:
>>
>> (1) Copy is a contract that the dtype will not vary (e.g., we also do not
>> change endianness);
>>
>> (2) Copy is a contract that any access to the data in the array will return
>> exactly the same result, without wasting memory and possibly optimized for
>> access with different strides. E.g., `array[::10].copy() also compacts the
>> result.
>>
>> An argument in favour of (2) is that, before numpy 1.16, `a[['a',
>> 'z']].copy()` did return an array without padding. Of course, this relied on
>> `a[['a', 'z']]` already returning a copy without padding, but still this is
>> a regression.
>>
>> More generally, there should at least be a clear way to get the compact
>> copy. Also, it would make sense for things like `np.save` to remove any
>> padding (it currently does not).
>>
>> What do people think? All the best,
>>
>> Marten
>>
>> [1] https://github.com/numpy/numpy/issues/13299
>> _______________________________________________
>> NumPy-Discussion mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
>
_______________________________________________
NumPy-Discussion mailing list
[email protected]
https://mail.python.org/mailman/listinfo/numpy-discussion