On Wed, Jul 17, 2013 at 10:57 AM, Frédéric Bastien <[email protected]> wrote:
> > > > On Wed, Jul 17, 2013 at 10:39 AM, Nathaniel Smith <[email protected]> wrote: > >> On Tue, Jul 16, 2013 at 7:53 PM, Frédéric Bastien <[email protected]> >> wrote: >> > Hi, >> > >> > >> > On Tue, Jul 16, 2013 at 11:55 AM, Nathaniel Smith <[email protected]> >> wrote: >> >> >> >> On Tue, Jul 16, 2013 at 2:34 PM, Arink Verma <[email protected]> >> wrote: >> >>> >> >>> >Each ndarray does two mallocs, for the obj and buffer. These could be >> >>> > combined into 1 - just allocate the total size and do some pointer >> >>> > >arithmetic, then set OWNDATA to false. >> >>> So, that two mallocs has been mentioned in project introduction. I got >> >>> that wrong. >> >> >> >> >> >> On further thought/reading the code, it appears to be more complicated >> >> than that, actually. >> >> >> >> It looks like (for a non-scalar array) we have 2 calls to >> PyMem_Malloc: 1 >> >> for the array object itself, and one for the shapes + strides. And, >> one call >> >> to regular-old malloc: for the data buffer. >> >> >> >> (Mysteriously, shapes + strides together have 2*ndim elements, but to >> hold >> >> them we allocate a memory region sized to hold 3*ndim elements. I'm >> not sure >> >> why.) >> >> >> >> And contrary to what I said earlier, this is about as optimized as it >> can >> >> be without breaking ABI. We need at least 2 calls to >> malloc/PyMem_Malloc, >> >> because the shapes+strides may need to be resized without affecting >> the much >> >> larger data area. But it's tempting to allocate the array object and >> the >> >> data buffer in a single memory region, like I suggested earlier. And >> this >> >> would ALMOST work. But, it turns out there is code out there which >> assumes >> >> (whether wisely or not) that you can swap around which data buffer a >> given >> >> PyArrayObject refers to (hi Theano!). And supporting this means that >> data >> >> buffers and PyArrayObjects need to be in separate memory regions. >> > >> > >> > Are you sure that Theano "swap" the data ptr of an ndarray? When we play >> > with that, it is on a newly create ndarray. So a node in our graph, >> won't >> > change the input ndarray structure. It will create a new ndarray >> structure >> > with new shape/strides and pass a data ptr and we flag the new ndarray >> with >> > own_data correctly to my knowledge. >> > >> > If Theano pose a problem here, I'll suggest that I fix Theano. But >> currently >> > I don't see the problem. So if this make you change your mind about this >> > optimization, tell me. I don't want Theano to prevent optimization in >> NumPy. >> >> It's entirely possible I misunderstood, so let's see if we can work it >> out. I know that you want to assign to the ->data pointer in a >> PyArrayObject, right? That's what caused some trouble with the 1.7 API >> deprecations, which were trying to prevent direct access to this >> field? Creating a new array given a pointer to a memory region is no >> problem, and obviously will be supported regardless of any >> optimizations. But if that's all you were doing then you shouldn't >> have run into the deprecation problem. Or maybe I'm misremembering! >> > > What is currently done at only 1 place is to create a new PyArrayObject > with a given ptr. So NumPy don't do the allocation. We later change that > ptr to another one. > > It is the change to the ptr of the just created PyArrayObject that caused > problem with the interface deprecation. I fixed all other problem releated > to the deprecation (mostly just rename of function/macro). But I didn't > fixed this one yet. I would need to change the logic to compute the final > ptr before creating the PyArrayObject object and create it with the final > data ptr. But in call cases, NumPy didn't allocated data memory for this > object, so this case don't block your optimization. > > One thing in our optimization "wish list" is to reuse allocated > PyArrayObject between Theano function call for intermediate results(so > completly under Theano control). This could be useful in particular for > reshape/transpose/subtensor. Those functions are pretty fast and from > memory, I already found the allocation time was significant. But in those > cases, it is on PyArrayObject that are views, so the metadata and the data > would be in different memory region in all cases. > > The other cases of optimization "wish list" is if we want to reuse the > PyArrayObject when the shape isn't the good one (but the number of > dimensions is the same). If we do that for operation like addition, we will > need to use PyArray_Resize(). This will be done on PyArrayObject whose data > memory was allocated by NumPy. So if you do one memory allowcation for > metadata and data, just make sure that PyArray_Resize() will handle that > correctly. > > On the usefulness of doing only 1 memory allocation, on our old gpu > ndarray, we where doing 2 alloc on the GPU, one for metadata and one for > data. I removed this, as this was a bottleneck. allocation on the CPU are > faster the on the GPU, but this is still something that is slow except if > you reuse memory. Do PyMem_Malloc, reuse previous small allocation? > > For those that read up all this, the conclusion is that Theano should > block this optimization. If you optimize the allocation of new > PyArrayObject, they will be less incentive to do the "wish list" > optimization. > > One last thing to keep in mind is that you should keep the data segment > aligned. I would arg that alignment on the datatype size isn't enough, so I > would suggest on cache line size or something like this. But I don't have > number to base this one. This would also help in the case of resize that > change the number of dimensions. > > There is a similar thing done in f2py which is still keeping it from being current with the 1.7 macro replacement by functions. I'd like to add a 'swap' type function and would welcome discussion/implementation fo such. Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
