> Forgive me if this is has already been addressed, but my question is > what happens when we have more than one "label" (not as in a labeled > axis but an observation label -- but not a tick because they're not > unique!) per say row axis and heterogenous dtypes. This is really the > problem that I would like to see addressed and from the BoF comments > I'm not sure this use case is going to be covered. I'm also not sure > I expressed myself clearly enough or understood what's already > available. For me, this is the single most common use case and most > of what we are talking about now is just convenient slicing but > ignoring some basic and prominent concerns. Please correct me if I'm > wrong. I need to play more with DataArray implementation but haven't > had time yet. > > I often have data that looks like this (not really, but it gives the > idea in a general way I think). > > city, month, year, region, precipitation, temperature > "Austin", "January", 1980, "South", 12.1, 65.4, > "Austin", "February", 1980, "South", 24.3, 55.4 > "Austin", "March", 1980, "South", 3, 69.1 > .... > "Austin", "December", 2009, 1, 62.1 > "Boston", "January", 1980, "Northeast", 1.5, 19.2 > .... > "Boston","December", 2009, "Northeast", 2.1, 23.5 > ... > "Memphis","January",1980, "South", 2.1, 35.6 > ... > "Memphis","December",2009, "South", 1.2, 33.5 > ...
Your labels are unique if you look at them the right way. Here's how I would represent that in a datarray: * axis0 = 'city', ['Austin', 'Boston', ...] * axis1 = 'month', ['January', 'February', ...] * axis2 = 'year', [1980, 1981, ...] * axis3 = 'region', ['Northeast', 'South', ...] * axis4 = 'measurement', ['precipitation', 'temperature'] and then I'd make a 5-D datarray labeled with [axis0, axis1, axis2, axis3, axis4]. Now I realize not everyone wants to represent their tabular data as a big tensor that they index every which way, and I think this is one thing that pandas is for. Oh, and the other problem with the 5-D datarray is that you'd probably want it to be sparse. This is another discussion worth having. I want to eventually replace the labeling stuff in Divisi with datarray, but sparse matrices are largely the point of using Divisi. So how do we make a sparse datarray? One answer would be to have datarray be a wrapper that encapsulates any sufficiently matrix-like type. This is approximately what I did in the now-obsolete Divisi1. Nobody liked the fact that you had to wrap and unwrap your arrays to accomplish anything that we hadn't thought of in writing Divisi. I would not recommend this route. The other option, which is more like Divisi2. would be to provide the functionality of datarray using a mixin. Then a standard dense datarray could inherit from (np.ndarray, Datarray), while a sparse datarray could inherit from (sparse.csr_matrix, Datarray), for example. -- Rob _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion