On Fri, Mar 16, 2012 at 4:26 PM, Bryan Van de Ven <bry...@continuum.io> wrote: > Hi all, > > I have spent some time thinking about things, and discussing them with folks > nearby. I actually got to wondering whether we really need new dtypes for > this. It seems like enumerated values or factor levels could be cast as an > annotation or metadata that could be attached to any existing integral > dtypes. It spells differently enough that I have put up an alternate version > that reflects this notion. I'd like to see what folks think of this > direction: > > https://github.com/bryevdv/numpy/blob/enum/doc/neps/enum_alt.rst > > So this would require adding machinery to existing dtypes to behave properly > when there is factor metadata present. Perhaps that is not an acceptable > trade-off, but it seems worth discussing.
I took a look at this, but I think something was lost in the translation from your head to text :-). Your description here makes it sound like what's different about this proposal is that there's very different underlying mechanics, but the enum_alt file just seems to describe an alternative and more-or-less equivalent user-level API. Unless you told me, I would have assumed that it just created a new dtype, rather than modified existing ones. What mechanism are you thinking of? Or did I miss something? > I think a very similar approach could be used to add categorical ranges to > any numerical or string types (I think they are called "shingles" in R?) A 'shingle' is a way of mapping (floating point) numbers into categories. However, they generally allow a single number to fall into multiple categories. So for example, you might take these data points: 1 2 3 4 5 6 7 8 9 10 11 And divide them into categories A, B, C like this: 1 2 3 4 5 6 7 8 9 10 11 AAAAAAAAAAAAA BBBBBBBBBBBBB CCCCCCCCCCCCCCC Which is why they're called "shingles" :-) http://www.floridadisaster.org/hrg/images/roofs/shingle_loose_tab_large.jpg This can be a very convenient data structure for various sorts of visualizations, but I'm not sure how it would make sense to integrate it into basic numerical types. R has a more basic function called 'cut' which takes a numerical array plus some specified breakpoints, and returns a factor array. But that's a simple utility function that doesn't need any special features in the underlying representation. -- Nathaniel
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion