I am working on a memory-intensive experiment with very large
arrays so I must be careful when allocating memory. Numpy already
supports a number of in-place operations (+=, *=) making the task
much more manageable. However, it is not obvious to me out I set
values based on a very simple condition.
The expression
> > >
y[y<0]=-1
> > >
generates a binary index mask y>=0 of the same size as the array
y, which is problematic when y is quite large.
> > >
I was wondering if there was anything like a set_where(A, cmp,
B, setval, [optional elseval]) function where cmp would be a
comparison operator expressed as a string.
> > >
The code below illustrates what I want to do. Admittedly, it
needs to be cleaned up but it's a proof of concept. Does numpy
provide any functions that support the functionality of the code
below?
That's a good question, but I'm pretty sure it doesn't, apart from
numpy.clip(). The way I'd try to solve that problem would be with
the dreaded for loop. Don't iterate over single elements, but if
you have a gargantuan array, working in chunks of ten thousand (or
whatever) won't have too much overhead:
> >
block = 100000
for n in arange(0,len(y),block):
yc = y[n:n+block]
yc[yc<0] = -1
It's a bit of a pain, but working with arrays that nearly fill RAM
*is* a pain, as I'm sure you are all too aware by now.
You might look into numexpr, this is the sort of thing it does
(though I've never used it and can't say whether it can do this).
Well, Numexpr is designed to minimize the number of temporaries, and
can do what Damian wants without requiring to put the mask in a
temporary. However, the output will require new space.  The usage
should be something like:
In [11]: y = numpy.random.normal(0, 10, 10)
In [12]: numexpr.evaluate('where(y<0, -1, y)')
Out[12]:
array([  7.11784295,  -1.        ,  10.92876842,  -1.        ,
0.76092629,  -1.        ,  14.07021792,  -1.        ,
5.67173405,  31.28631822])

Ops.  I realised that, for this particular case, Numexpr memory usage is 
similar to its NumPy counterpart:

y[:] = numpy.where(y<0, -1, y)

So, I think the best option for you should be working with chunks, as 
Anne suggested.


