Re: [GRASS-dev] Compute mahalanobis distance using Scipy

Paulo van Breugel Fri, 13 Feb 2015 00:58:28 -0800

Hi Glyn,

Your suggestions to compute the mahalanobis distance work perfect. However,
I have a follow up question. For large data sets, I am getting a
MemoryError when running "mahdist = np.sum(np.sum(delta[None,:,:,:] *
VI[:,:,None,None],axis=1) * delta,axis=0)"


 # pro is list with raster names

s = len(pro)

dat_pro = None

layer = garray.array()

r, c = layer.shape

for i, map in enumerate(pro):

     layer.read(map, null=np.nan)

     if dat_pro is None:

         dat_pro = np.empty((s, r, c), dtype=np.double)

         dat_pro[i,:,:] = layer

del layer
delta = dat_pro - stat_mean[:,None,None]

mahdist = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) *
delta, axis=0)

stat_mahal = garray.array()

stat_mahal[...] = mahdist

I guess this is because the calculations are done in-memory? Any way to
avoid this memory problem when using large data sets (something like
working with memmap objects?)

Cheers,

Paulo


On Mon, Feb 2, 2015 at 7:19 PM, Glynn Clements <gl...@gclements.plus.com>
wrote:

>
> Paulo van Breugel wrote:
>
> > > The second version:
> > >
> > > > >     VI = np.linalg.inv(covar)
> > > > >     delta = dat_ref - stat_mean[:,None,None]
> > > > >     m = np.sum(np.sum(delta[:,:,:,None] * delta[:,:,None,:] *
> VI[None,None,:,:],axis=-1),axis=-1)
> > > > >     stat_mah = np.sqrt(m)
> > >
> > > should work with delta being a 3-D array.
> > >
> >
> > Yes, but when running the lines above, I am getting the same error
> message:
> >
> > Traceback (most recent call last):
> >   File "<stdin>", line 1, in <module>
> > ValueError: operands could not be broadcast together with shapes
> > (3,77,78,78) (1,1,3,3)
>
> Well, it's not quite the same; both operands now have the same number
> of dimensions, but they're not compatible (essentially, for each
> dimension either both values should be the same or one of them should
> be one).
>
> > >From what I can see from broadcasting rules, there is mismatch in the
> last
> > and fore-last dimensions of VI[None,None,:,:] compared to the other two?
> >
> > delta[:,:,:,None].shape
> > (3, 77, 78, 1)
> >
> > delta[:,:,None,:].shape
> > (3, 77, 1, 78)
> >
> > VI[None,None,:,:].shape
> > (1, 1, 3, 3)
>
> The problem was that delta is 3x77x78 whereas the code assumed that it
> would be 77x78x3.
>
> > I am really have to get a better grasp of working with these arrays, but
> in
> > any case, after a bit trial and error, I came up with the following to
> > compute m.
> >
> >  m = np.sum(np.sum(delta[None,:,:,:] * VI[:,:,None,None],axis=1) * delta,
> > axis=0)
> >
> >
> > This does, in my limited testing, gives the same result as using the loop
> > with
> >
> > pow(distance.mahalanobis(cell_ref, stat_mean, VI),2).
> >
> >
> > to be sure, does the above makes sense to you?
>
> I believe that's correct.
>
> --
> Glynn Clements <gl...@gclements.plus.com>
>

_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Re: [GRASS-dev] Compute mahalanobis distance using Scipy

Reply via email to