On Mon, Jan 22, 2018 at 10:53 AM, <josef.p...@gmail.com> wrote: > > > On Sun, Jan 21, 2018 at 9:48 PM, Allan Haldane <allanhald...@gmail.com> > wrote: > >> Hello all, >> >> We are making a decision (again) about what to do about the >> behavior of multiple-field indexing of structured arrays: Should >> it return a view or a copy, and on what release schedule? >> >> As a reminder, this refers to operations like (1.13 behavior): >> >> >>> a = np.zeros(3, dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'f4')]) >> >>> a[['a', 'c']] >> array([(0, 0.), (0, 0.), (0, 0.)], >> dtype=[('a', '<i4'), ('c', '<f4')] >> >> In numpy 1.14.0 we made this return a view instead of a copy, but >> downstream test failures suggest we reconsider. In our current >> implementation for 1.14.1, we have reverted this change, but >> still plan to go through with it in 1.15. >> >> See here for our discussion the problem and solutions: >> https://github.com/numpy/numpy/pull/10411 >> >> The two main options we have discussed are either to try to make >> the change in 1.15, or never make the change at all and always >> return a copy. >> >> Here are some pros and cons: >> >> Pros (change to view in 1.15) >> ============================= >> >> * Views are useful and convenient. Other forms of indexing also >> often return views so this is more consistent. >> * This change has been planned since numpy 1.7 in 2009, >> and there have been visible FutureWarnings about it since >> then. Anyone whose code will break should have seen the >> warnings. It has been extensively warned about in recent >> release notes. >> * Past discussions have supported the change. See my comment in >> the PR with many links to them and to other history. >> * Users have requested the change on the list. >> * Possibly a majority of the reported code failures were not >> actually caused by the change, but by another bug (#8100) >> involving np.load/np.save which this change exposed. If we >> push it off to 1.15, we will have time to fix this other bug. >> (There were no FutureWarnings for this breakage, of course). >> * The code that really will break is of the form >> a[['a', 'c']].view('i8') >> because the returned itemsize is different. This has >> raised FutureWarnings since numpy 1.7, and no users reported >> failures due to this change. In the PR we still try to >> mitigate this breakage by introducing a new method >> `pack_fields`, which converts the result into the 1.13 form, >> so that >> np.pack_fields(a[['a', 'c']]).view('i8') >> will work. >> >> >> Cons (keep returning a copy) >> ============================ >> >> * The extra convenience is not really that much, and fancy >> indexing also returns a copy instead of a view, so there is >> a precedent there. >> * We want to minimize compatibility breaks with old behavior. >> We've had a fair amount of discussion and complaints about >> how we break things in general. >> * We have lived with a "copy" for 8 years now. At some point the >> behavior gets set in stone for compatibility reasons. >> * Users have written to the list and github about their code >> breaking in 1.14.0. As far as I am aware, they all refer >> to the #8100 problem. >> * If a new function `pack_fields` is needed to guard against >> mishaps with the view behavior, that seems like a sign that >> keeping the copy behavior is the best option from an API >> perspective. >> >> My initial vote is go with the change in 1.15: The "view" code >> that will ultimately break (not the code related to #8100) has >> been sending FutureWarnings for many years, and I am not aware of >> any user complaints involving it: All the complaints so far >> would be fixed with #8100 in 1.15. >> >> > (Note based on a linked mailing list thread, 2012 might be the last time I > looked more closely at structured dtypes. > So some of what I understand might be outdated.) > > > views on structured dtypes are very important, but viewing them as > standard arrays with standard dtypes is the main part that I had used. > Essentially structured dtypes are useless for any computation, e.g. just > some simple reduce operation. To work with them we need a standard view. > > I think the usecase that fails in statsmodels (except there is no test > failure anymore because we switched to using pandas in the unit test) >
do add a detail here results is a recarray created from a csv file with results = genfromtxt(open(filename, "rb"), delimiter=",", names=True,dtype=float) ['acvar_lb','acvar_ub'] are the last two columns, so this corresponds to my example below where AFAIU no padding is necessary to get a view. > > > cls.confint_res = cls.results[['acvar_lb','acvar > _ub']].view((float, > > > 2)) > E ValueError: Changing the dtype to a subarray type is only > supported if the total itemsize is unchanged > > > This is similar to the above example > a[['a', 'c']].view('i8') > but it doesn't try to combine fields. > > In many examples where I used structured dtypes a long time ago, switched > between consistent views as either a standard array of subsets or as > .structured dtypes. > For this usecase it wouldn't matter whether a[['a', 'c']] returns a view > or copy, as long as we can get the second view that is consistent with the > selected part of the memory. This would also be independent of whether > numpy pads internally and adjusts the strides if possible or not. > > >>> np.__version__ > '1.11.2' > > >>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')]) > >>> a > array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), > (1, 1.0, 1.0)], > dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<f8')]) > > >>> a.mean(0) > Traceback (most recent call last): > File "<pyshell#15>", line 1, in <module> > a.mean(0) > File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", > line 65, in _mean > ret = umr_sum(arr, axis, dtype, out, keepdims) > TypeError: cannot perform reduce with flexible type > > >>> a[['b', 'c']].mean(0) > Traceback (most recent call last): > File "<pyshell#16>", line 1, in <module> > a[['b', 'c']].mean(0) > File "C:\...\python-3.4.4.amd64\lib\site-packages\numpy\core\_methods.py", > line 65, in _mean > ret = umr_sum(arr, axis, dtype, out, keepdims) > TypeError: cannot perform reduce with flexible type > > >>> a[['b', 'c']].view(('f8', 2)).mean(0) > array([ 1., 1.]) > >>> a[['b', 'c']].view(('f8', 2)).dtype > dtype('float64') > > > Aside The plan is that statsmodels will drop all usage and support for > rec_arays/structured dtypes > in the following release (0.10). > Then structured dtypes are free (from our perspective) to provide low > level struct support > instead of pretending to be dataframe_like. > > Josef > > > >> Feel free to also discuss the related proposed change, to make >> np.diag return a view instead of a copy. That change has >> not been implemented yet, only proposed. > > >> Cheers, >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion