On Wed, Jul 13, 2011 at 7:15 PM, Craig Yoshioka <crai...@me.com> wrote: > Yup exactly. To enable this sort of tracking I needed to explicitly > reverse-engineer the effects of indexing on axes. I figure overriding > indexing catches most cases that modify axes, but other holes need to be > plugged as well... ie: tranpose, swapaxes. This probably means most C > functions that change array axes (np.mean(axis=), etc.) need to be covered as > well.... that sucks. > > BTW, it sounds like you're trying to track very similar data. I am trying to > load structural biology data formats, and I try to preserve as much of the > metadata as possible, ie: I am encoding unit cell length/angle information as > vectors, etc. > > Here is my implementation: > > def __getitem__(self,idxs): > idxs,trans,keep = idx_axes(self,idxs) > array = self.view(np.ndarray).transpose(trans) > array = array[idxs] > if isinstance(array,ndarray): > array = array.view(self.__class__) > array.axes = self.axes.transpose(keep) > return array > > > def idx_axes(array,idxs): > > # explicitly expand ellipsis > expanded_idxs = idx_expanded(array.ndim,idxs) > > # determine how the axes will be rearranged as a result of axes-based > indexing > # and the creation of newaxes > remapped_axes = idx_axes_remapped(array.ndim,array.axes,expanded_idxs) > > # determine numpy compatible transpose, before newaxes are created > transposed_axes = idx_axes_transposed(remapped_axes) > > # determine numpy compatible indexes with axes-based indexing removed > filtered_idxs = idx_filtered(expanded_idxs) > > # determine which axes will be kept after numpy indexing > kept_axes = idx_axes_kept(remapped_axes,filtered_idxs) > > return filtered_idxs,transposed_axes,kept_axes > > > def idx_expanded(ndim,idxs): > ''' > explicitly expands ellipsis taking into account newaxes > ''' > > if not isinstance(idxs,tuple): > return idx_expanded(ndim,(idxs,)) > > # how many dimensions we will end up having > ndim = ndim + idxs.count(newaxis) > > filler = slice(None) > > def skip_ellipsis(idxs): > return tuple([filler if isinstance(x,type(Ellipsis)) else x for x in > idxs]) > > def fill_ellipsis(ndim,l,r): > return (filler,)*(ndim-len(l)-len(r)) > > # expand first ellipsis, treat all other ellipsis as slices > if Ellipsis in idxs: > idx = idxs.index(Ellipsis) > llist = idxs[:idx] > rlist = skip_ellipsis(idxs[idx+1:]) > cfill = fill_ellipsis(ndim,llist,rlist) > idxs = llist + cfill + rlist > > return idxs > > > def idx_filtered(idxs): > ''' > replace indexes on axes with normal numpy indexes > ''' > def axisAsIdx(idx): > if isinstance(idx,str): > return slice(None) > elif isinstance(idx,slice): > if isinstance(idx.start,str): > return idx.stop > return idx > > return tuple([axisAsIdx(x) for x in idxs]) > > > def idx_axes_remapped(ndim,axes,idxs): > ''' > if given a set of array indexes that contain labeled axes, > return a tuple that maps axes in the source array to the axes > as they will end up in the destination array. Must take into > account the spaces created by newaxis indexes. > ''' > > # how many dims are we expecting? > ndim = ndim + idxs.count(newaxis) > > # new unique object for marking unassigned locations in mapping > unassigned = object() > > # by default all locations are unsassigned > mapping = [unassigned] * ndim > > # find labels in indexes and set the dims for those locations > for dim,label in enumerate(idxs): > if label == newaxis: > mapping[dim] = label > elif isinstance(label,str): > mapping[dim] = axes.dimForLabel(label) > elif isinstance(label,slice): > if isinstance(label.start,str): > mapping[dim] = axes.dimForLabel(label.start) > > # find unassigned dims, in order > unmapped = [d for d in range(ndim) if d not in set(mapping)] > > # fill in remaining unassigned locations with dims > for dst,src in enumerate(mapping): > if src == unassigned: > mapping[dst] = unmapped.pop(0) > > return tuple(mapping) > > > def idx_axes_transposed(mapping): > ''' > stripping out newaxes in mapping yields a tuple compatible with transpose > ''' > return tuple([x for x in mapping if x != newaxis]) > > > def idx_axes_kept(mapping,idxs): > ''' > remove axes from mapping that will not survive the indexing (ie: ints) > ''' > kept = [] > first_list = True > for dst,src in enumerate(mapping): > if dst < len(idxs): > idx = idxs[dst] > if isinstance(idx,int): > continue > elif isinstance(idx,list): > if not first_list: > continue > first_list = False > kept += [src] > return tuple(kept) > > > > > On Jul 13, 2011, at 3:13 PM, Sam Quinan wrote: > >> I'm currently working on interfacing ndarrays with a custom C-representation >> for n-dimensional arrays. My custom C code provides additional per-axis >> information (labeling, spacing between samples / range of sample positions >> along the axis, axis direction, cell vs.node centering, etc.) Subclassing >> ndarray to hold onto this info is fairly simple, but getting numpy's methods >> to intelligently modify that information when the array is sliced is >> something that I'm still trying to figure out. >> >> A robust way to attach per-axis info to a given ndarray, whether it just be >> a label or some more complex structure, would definitely be something I (and >> likely others) would find useful... >> >> That said, I'd love to know more about how the idx_axes() structure in your >> workaround works... >> >> - Sam >> >> >> >> On 7/13/11 12:00 PM, "numpy-discussion-requ...@scipy.org" >> <numpy-discussion-requ...@scipy.org> wrote: >> >>> Date: Tue, 12 Jul 2011 16:39:47 -0700 >>> From: Craig Yoshioka <crai...@me.com> >>> Subject: [Numpy-discussion] named ndarray axes >>> To: NumPy-Discussion@scipy.org >>> Message-ID: <0fc8b43e-26cd-40ed-a6fa-59dd8d641...@me.com> >>> Content-Type: text/plain; CHARSET=US-ASCII >>> >>> I brought up a while ago about how it would be nice if numpy arrays could >>> have >>> their axes 'labeled'. = I got an implementation that works pretty well >>> for >>> me and in the process learned quite a few things, and was hoping to foster >>> some more discussion on this topic, as I think I have found a >>> simple/flexible >>> solution to support this at the numpy level. >>> >>> Here are *some* examples code snippets from my unit tests on 'Array': >>> >>> a = Array((4,5,6)) >>> >>> # you can assign data to all axes by attribute: >>> a.Axes.Label = (0:'z',1:'y',2:'x'} >>> >>> # or add metadata to each individually: >>> a.Axes[1].Vector = [0,1,0] >>> a.Axes[2].Vector = [0,0,1] >>> >>> # simple case int indexing >>> b = a[0] >>> assert b.shape == (5,6) >>> assert b.Axes.Label == {0:'y',1:'x'} >>> assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]} >>> >>> # indexing with slices >>> b = a[:,0,:] >>> assert b.shape == (4,6) >>> assert b.Axes.Label == {0:'z',1:'x'} >>> assert b.Axes.Vector == {1:[0,0,1]} >>> >>> # indexing with ellipsis >>> b = a[...,0] >>> assert b.shape == (4,5) >>> assert b.Axes.Label == {0:'z',1:'y'} >>> >>> # indexing with ellipsis, newaxis, etc. >>> b = a[newaxis,...,2,newaxis] >>> assert b.shape == (1,4,5,1) >>> assert b.Axes.Label == {1:'z',2:'y'} >>> >>> # indexing with lists >>> b = a[[1,2],:,[1,2]] >>> assert b.shape == (2,5) >>> assert b.Axes.Label == {0:'z',1:'y'} >>> >>> # most interesting examples, indexing with axes labels---------------- >>> # I was a bit confused about how to handle indexing with mixed >>> axes/non-axes indexes >>> # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being >>> applied, the first? the first after 'x'? >>> # One option is to disallow mixing (simpler to implement, >>> understand?) >>> # Instead I chose to treat the axis indexing as a forced assignment >>> of an axis to a position. >>> >>> # axis indexing that transposes the first two dimensions, but otherwise >>> does nothing >>> b = a['y','z'] >>> assert b.shape == (5,4,6) >>> assert b.Axes.Label == {0:'y',1:'z',2:'x'} >>> >>> # abusing slices to allow specifying indexes for axes >>> b = a['y':0,'z'] >>> assert b.shape == (4,6) >>> assert b.Axes.Label == {0:'z',1:'x'} >>> >>> # unfortunately that means a slice index on an axis must be written like >>> so: >>> b = a['y':slice(0,2),'x','z'] >>> assert b.shape == (2,6,4) >>> assert b.Axes.Label == {0:'y',1:'x',2:'z'} >>> >>> b = a['y':[1,2,3],'x','z':slice(0:1)] >>> # or due to the forced transposition, this is the same as: >>> c = a['y','x','z'][[1,2,3],:,0:1] >>> >>> assert b.shape == (3,6,1) >>> assert b.Axes.Label == {0:'y',1:'x',2:'z'} >>> assert b.shape == c.shape >>> assert b.Axes == c.Axes >>> >>> >>> #----------------------------------------------------------------------------- >>> ----------- >>> >>> >>> To do all this I essentially had to recreate the effects of numpy indexing >>> on >>> axes.... This is not ideal, but so far I seem to have addressed most of the >>> indexing I've used, at least. Here is what __getitem__ looks like: >>> >>> def __getitem__(self,idxs): >>> filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) >>> array = self.view(ndarray).transpose(transposed_axes) >>> array = array[filtered_idxs] >>> if isinstance(array,ndarray): >>> array = array.view(Array) >>> array.Axes = self.Axes.keep(kept_axes) >>> return array >>> >>> As you can see idx_axes() essentially recreates a lot of ndarray indexing >>> behavior, so that its effects can be explicitly handled. >>> >>> Having done all this, I think the best way for numpy to support 'labeled' >>> axes >>> in the future is by having numpy itself keep track of a very simple tuple >>> attribute, like shape, and leave more complex axis naming/labeling to >>> subclasses on the python side. As an example, upon creating a new dimension >>> in an array, numpy assigns that dimension a semi-unique id, and this tuple >>> could be used in __array_finalize__. >>> >>> For example my __array_finalize__ could look like: >>> >>> def __array_finalize__(self,obj): >>> if hasattr(obj,'axesdata'): >>> for axesid in self.axes: >>> if axesid in obj.axes: >>> self.axesdata[axesid] = obj.axesdata[axesid] >>> >>> >>> This would cover a lot more situations and lead to much simpler code since >>> the >>> work required on the C side would be minimal, but still allow robust and >>> custom tracking and propagation of axes information. >>> Subclasses that tap into this data would react to the result of numpy >>> operations vs. having to predict/anticipate. >>> >>> For example, my __getitem__, relying on the __array_finalize__ above, could >>> look like: >>> >>> def __getitem__(self,idxs): >>> filtered_idxs,transposed_axes= self.idx_axes(idxs) >>> array = self.transpose(transposed_axes) >>> return array[filtered_idxs] >>> >>> Not shown is how much simpler and robust the code for idx_axes would then >>> be. >>> I estimate it would go from 130 loc to < 20 loc. >>> >>> Sorry for the extra long e-mail, >>> -Craig >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
Have you guys been following the DataArray discussions or project at all? I think it provides a nearly complete implementation of what you've been describing (named axes): https://github.com/fperez/datarray (no joke: 21 forks!) some links: http://inscight.org/2011/05/18/episode_13/ https://convore.com/python-scientific-computing/data-array-in-numpy/ I'm excited that many more people seem to be excited about making this kind of functionality available in the scientific Python stack so understanding every perspective and set of requirements makes a big difference =) best, Wes _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion