I would be much more in favor of `copy` eliminating padding in the dtype, if dtypes with different paddings were considered equivalent. But they are not.
Numpy has always treated dtypes with different padding bytes as not-equal, and prints them very differently: >>> a = np.array([1], dtype={'names': ['f'], ... 'formats': ['i4'], ... 'offsets': [0]}) >>> b = np.array([1], dtype={'names': ['f'], ... 'formats': ['i4'], ... 'offsets': [4]}) >>> a.dtype == b.dtype False >>> a.dtype dtype([('f', '<i4')]) >>> b.dtype dtype({'names':['f'], 'formats':['<i4'], 'offsets':[4], 'itemsize':8}) That is unlike strides, which are hidden from the user. If we do a "dtype-overhaul" as has been plentifully discussed before, there are many things we might change about structured dtypes, and making padding be irrelevant in most operations could be a good one. On the other hand, one of the main purposes of structured arrays appears to be for interpreting binary blobs and for interfacing with C code with C structs, where padding could be very important. Eg, if someone is reading a binary file, they might want to do >>> np.fromfile('myfile', a.dtype, count=10) and then it matters very greatly to them whether the dtype has padding or not. Best, Allan PS. It is unfinished, but I would like to advertise an 'ArrayCollection' ndarray ducktype I have worked a bit on. This ducktype behaves very much like structured arrays for indexing and assignment, but avoids all these padding issues and in other ways is more suitable for "pandas-like" usage than structured arrays. See the "ArrayCollection" and "MaskedArrayCollection" classes at https://github.com/ahaldane/ndarray_ducktypes See the tests and doc folders for some brief example usage. On 4/11/19 10:07 PM, Nathaniel Smith wrote: > My concern would be that to implement (2), I think .copy() has to > either special-case certain dtypes, or else we have to add some kind > of "simplify for copy" operation to the dtype protocol. These both add > architectural complexity, so maybe it's better to avoid it unless we > have a compelling reason? > > On Thu, Apr 11, 2019 at 6:51 AM Marten van Kerkwijk > <m.h.vankerkw...@gmail.com> wrote: >> >> Hi All, >> >> An issue [1] about the copying of arrays with structured dtype raised a >> question about what the expected behaviour is: does copy always preserve the >> dtype as is, or should it remove padding? >> >> Specifically, consider an array with a structure with many fields, say 'a' >> to 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be >> returned. In this case, its dtype will include a large offset. Now, if we >> copy this view, should the result have exactly the same dtype, including the >> large offset (i.e., the copy takes as much memory as the original full >> array), or should the padding be removed? From the discussion so far, it >> seems the logic has boiled down to a choice between: >> >> (1) Copy is a contract that the dtype will not vary (e.g., we also do not >> change endianness); >> >> (2) Copy is a contract that any access to the data in the array will return >> exactly the same result, without wasting memory and possibly optimized for >> access with different strides. E.g., `array[::10].copy() also compacts the >> result. >> >> An argument in favour of (2) is that, before numpy 1.16, `a[['a', >> 'z']].copy()` did return an array without padding. Of course, this relied on >> `a[['a', 'z']]` already returning a copy without padding, but still this is >> a regression. >> >> More generally, there should at least be a clear way to get the compact >> copy. Also, it would make sense for things like `np.save` to remove any >> padding (it currently does not). >> >> What do people think? All the best, >> >> Marten >> >> [1] https://github.com/numpy/numpy/issues/13299 >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion