On Mon, Jan 29, 2018 at 1:22 PM, Eric Wieser <wieser.eric+nu...@gmail.com> wrote:
> I think that there's a lot of confusion going around about recarrays vs > structured arrays. > > [`recarray`](https://github.com/numpy/numpy/blob/v1.13.0/ > numpy/core/records.py) are a wrapper around structured arrays that > provide: > * Attribute access to fields as `arr.field` in addition to the normal > `arr['field']` > * Automatic datatype-guessing for nested lists of tuples (which needs a > little work, but seems like a justifiable feature) > * An undocumented `field` method that behaves like the 1.14 indexing > behavior (!) > > Meanwhile, `recfunctions` is a collection of functions that work on normal > structured arrays - so is misleadingly named. > The only link to recarrays is that most of the functions have a > `asrecarray` parameter which applies `.view(recarray)` to the result. > > > deprecate recarrays > > Given how thin an abstraction they are over structured arrays, I don't > think you mean this. > Are you advocating for deprecating structured arrays entirely, or just > deprecating recfunctions? > First, statsmodels is in the pandas camp for dataframes, so I don't have any invested interest in recarrays/structured dtypes anymore. What I meant was that structured dtypes with implicit (hidden?) padding becomes unintuitive for the recarray/dataframe usecase. (At least I won't try to update my intuition about having extra things in there that are not specified by the main structured dtype.) Also the dataframe_like usage of structured dtypes doesn't seem to be much under consideration anymore. So, my **impression** is that the recent changes make the recarray/dataframe usecase for structured dtypes more difficult. Given that there is pandas, xarray, dask and more, numpy could as well drop any pretense of supporting dataframe_likes. Or, adjust the recfunctions so we can still work dataframe_like with structured dtypes/recarrays/recfunctions. Josef > > Eric > > On Mon, 29 Jan 2018 at 09:39 Chris Barker <chris.bar...@noaa.gov> wrote: > >> On Sat, Jan 27, 2018 at 8:50 PM, Allan Haldane <allanhald...@gmail.com> >> wrote: >> >>> On 01/26/2018 06:01 PM, josef.p...@gmail.com wrote: >>> >>>> I thought recarrays were pretty cool back in the day, but pandas is >>>> a much better option. >>>> >>>> So I pretty much only use structured arrays for data exchange with C >>>> code.... >>>> >>>> My impression is that this turns into a deprecate recarrays and >>>> supporting recfunction issue. >>>> >>>> >> >>> *should* we have any dataframe-like functionality in numpy? >> >> >>> >>> We get requests every once in a while about how to sort rows, or about >>> adding a "groupby" function. I myself have used recarrays in a >>> dataframe-like way, when I wanted a quick multiple-array object that >>> supported numpy indexing. So there is some demand to have minimal >>> "dataframe-like" behavior in numpy itself. >>> >>> recarrays play part of this role currently, though imperfectly due to >>> padding and cache issues. I think I'm comfortable with supporting some >>> minor use of structured/recarrays as dataframe-like, with a warning in docs >>> that the user should really look at pandas/xarray, and that structured >>> arrays are primarily for data exchange. >>> >> >> Well, I think we should either: >> >> deprecate recarrays -- i.e. explicitly not support DataFrame-like >> functionality in numpy, keeping only the data-exchange functionality as >> maintained. >> >> or >> >> Properly support it -- which doesn't mean re-implementing Pandas or >> xarray, but it would mean addressing any bug-like issues like not dealing >> properly with padding. >> >> Personally, I don't need/want it enough to contribute, but if someone >> does, great. >> >> This reminds me a bit of the old numpy.Matrix issue -- it was ALMOST >> there, but not quite, with issues, and there was essentially no overlap >> between the people that wanted it and the people that had the time and >> skills to really make it work. >> >> (If we want to dream, maybe one day we should make a minimal >>> multiple-array container class. I imagine it would look pretty similar to >>> recarray, but stored as a set of arrays instead of a structured array. But >>> maybe recarrays are good enough, and let's not reimplement pandas either.) >>> >> >> Exactly -- we really don't need to re-implement Pandas.... >> >> (except it's CSV reading capability :-) ) >> >> -CHB >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> chris.bar...@noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@python.org >> https://mail.python.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion