On Mon, May 14, 2012 at 5:31 PM, mark florisson <markflorisso...@gmail.com>wrote:
> On 12 May 2012 22:55, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> > wrote: > > On 05/11/2012 03:37 PM, mark florisson wrote: > >> > >> On 11 May 2012 12:13, Dag Sverre Seljebotn<d.s.seljeb...@astro.uio.no> > >> wrote: > >>> > >>> (NumPy devs: I know, I get too many ideas. But this time I *really* > >>> believe > >>> in it, I think this is going to be *huge*. And if Mark F. likes it it's > >>> not > >>> going to be without manpower; and as his mentor I'd pitch in too here > and > >>> there.) > >>> > >>> (Mark F.: I believe this is *very* relevant to your GSoC. I certainly > >>> don't > >>> want to micro-manage your GSoC, just have your take.) > >>> > >>> Travis, thank you very much for those good words in the "NA-mask > >>> interactions..." thread. It put most of my concerns away. If anybody is > >>> leaning towards for opaqueness because of its OOP purity, I want to > refer > >>> to > >>> C++ and its walled-garden of ideological purity -- it has, what, 3-4 > >>> different OOP array libraries, neither of which is able to out-compete > >>> the > >>> other. Meanwhile the rest of the world happily cooperates using > pointers, > >>> strides, CSR and CSC. > >>> > >>> Now, there are limits to what you can do with strides and pointers. > >>> Noone's > >>> denying the need for more. In my mind that's an API where you can do > >>> fetch_block and put_block of cache-sized, N-dimensional blocks on an > >>> array; > >>> but it might be something slightly different. > >>> > >>> Here's what I'm asking: DO NOT simply keep extending ndarray and the > >>> NumPy C > >>> API to deal with this issue. > >>> > >>> What we need is duck-typing/polymorphism at the C level. If you keep > >>> extending ndarray and the NumPy C API, what we'll have is a one-to-many > >>> relationship: One provider of array technology, multiple consumers > (with > >>> hooks, I'm sure, but all implementations of the hook concept in the > NumPy > >>> world I've seen so far are a total disaster!). > >>> > >>> What I think we need instead is something like PEP 3118 for the > >>> "abstract" > >>> array that is only available block-wise with getters and setters. On > the > >>> Cython list we've decided that what we want for CEP 1000 (for boxing > >>> callbacks etc.) is to extend PyTypeObject with our own fields; we could > >>> create CEP 1001 to solve this issue and make any Python object an > >>> exporter > >>> of "block-getter/setter-arrays" (better name needed). > >>> > >>> What would be exported is (of course) a simple vtable: > >>> > >>> typedef struct { > >>> int (*get_block)(void *ctx, ssize_t *upper_left, ssize_t > *lower_right, > >>> ...); > >>> ... > >>> } block_getter_setter_array_vtable; > >>> > >>> Let's please discuss the details *after* the fundamentals. But the > reason > >>> I > >>> put void* there instead of PyObject* is that I hope this could be used > >>> beyond the Python world (say, Python<->Julia); the void* would be > handed > >>> to > >>> you at the time you receive the vtable (however we handle that). > >> > >> > >> I suppose it would also be useful to have some way of predicting the > >> output format polymorphically for the caller. E.g. dense * > >> block_diagonal results in block diagonal, but dense + block_diagonal > >> results in dense, etc. It might be useful for the caller to know > >> whether it needs to allocate a sparse, dense or block-structured > >> array. Or maybe the polymorphic function could even do the allocation. > >> This needs to happen recursively of course, to avoid intermediate > >> temporaries. The compiler could easily handle that, and so could numpy > >> when it gets lazy evaluation. > > > > > > Ah. But that depends too on the computation to be performed too; a) > > elementwise, b) axis-wise reductions, c) linear algebra... > > > > In my oomatrix code (please don't look at it, it's shameful) I do this > using > > multiple dispatch. > > > > I'd rather ignore this for as long as we can, only implementing "a[:] = > ..." > > -- I can't see how decisions here would trickle down to the API that's > used > > in the kernel, it's more like a pre-phase, and better treated > orthogonally. > > > > > >> I think if the heavy lifting of allocating output arrays and exporting > >> these arrays work in numpy, then support in Cython could use that (I > >> can already hear certain people object to more complicated array stuff > >> in Cython :). Even better here would be an external project that each > >> our projects could use (I still think the nditer sorting functionality > >> of arrays should be numpy-agnostic and externally available). > > > > > > I agree with the separate project idea. It's trivial for NumPy to > > incorporate that as one of its methods for exporting arrays, and I don't > > think it makes sense to either build it into Cython, or outright depend > on > > NumPy. > > > > Here's what I'd like (working title: NumBridge?). > > > > - Mission: Be the "double* + shape + strides" in a world where that is > no > > longer enough, by providing tight, focused APIs/ABIs that are usable > across > > C/Fortran/Python. > > > > I basically want something I can quickly acquire from a NumPy array, then > > pass it into my C code without dragging along all the cruft that I don't > > need. > > > > - Written in pure C + specs, usable without Python > > > > - PEP 3118 "done right", basically semi-standardize the internal Cython > > memoryview ABI and get something that's passable on stack > > > > - Get block get/put API > > > > - Iterator APIs > > > > - Utility code for exporters and clients (iteration code, axis > reordering, > > etc.) > > > > Is the scope of that insane, or is it at least worth a shot to see how > bad > > it is? Beyond figuring out a small subset that can be done first, and > > whether performance considerations must be taken or not, there's two > > complicating factors: Pluggable dtypes, memory management. Perhaps you > could > > come to Oslo for a couple of days to brainstorm... > > > > Dag > > The blocks are a good idea, but I think fairly complicated for > complicated matrix layouts. It would be nice to have something that is > reasonably efficient for at least most of the array storage > mechanisms. > I'm going to do a little brain dump below, let's see if anything is useful > :) > > What if we basically take the CSR format and make it a little simpler, > easier to handle, and better suited for other layouts. Basically, keep > shape/strides for dense arrays, and for blocked arrays just "extend" > your number of dimensions, i.e. a 2D blocked array becomes a 4D array, > something like this: > > >>> a = np.arange(4).repeat(4).reshape(4, 4); > >>> a > array([[0, 0, 0, 0], > [1, 1, 1, 1], > [2, 2, 2, 2], > [3, 3, 3, 3]]) > > >>> a.shape = (2, 2, 2, 2) > >>> itemsize = a.dtype.itemsize > >>> a.strides = (8 * itemsize, 2 * itemsize, 4 * itemsize, 1 * itemsize) > >>> a > array([[[[0, 0], > [1, 1]], > > [[0, 0], > [1, 1]]], > > > [[[2, 2], > [3, 3]], > > [[2, 2], > [3, 3]]]]) > > >>> print a.flatten() > [0 0 1 1 0 0 1 1 2 2 3 3 2 2 3 3] > > Now, given some buffer flag (PyBUF_Sparse or something), use basically > suboffsets with indirect dimensions, where only ever the last > dimension is a row of contiguous memory (the entire thing may be > contiguous, but the point is that you don't care). This row may > > - be variable sized > - have either a single "column start" (e.g. diagonal, band/tri- > diagonal, block diagonal, etc), OR > - a list of column indices, the length of the row (like in the CSR > format) > > The length of each innermost row is then determined by looking at, in > order: > - the extent as specified in the shape list > - if -1, and some flag is set, the extent is determined like CSR, > i.e. (uintptr_t) row[i + 1] - (uintptr_t) row[i] > -> maybe instead of pointers indices are better, for > serialization, GPUs, etc > - otherwise, use either a function pointer or perhaps a list of extents > > All these details will obviously be abstracted, allowing for easy > iteration, but it can also be used by ufuncs operating on contiguous > rows (often, since the actual storage is contiguous and stored in a 1D > array, some flag could indicate contiguity as an optimization for > unary ufuncs and flat iteration). Tiled nditer-ation could also work > here without too big a hassle. > When you slice, you add to the suboffset and manipulate the single > extent or entire list of extents in that dimension, and the flag to > determine the length using the pointer subtraction should be cleared > (this should probably all happen through vtable functions). > > An exporter would also be free to use different malloced pointers, > allowing variable sized array support with append/pop functionality in > multiple dimensions (if there are no active buffer views). > > Random access in the case where a column start is provided is still > contant time, and done though a[i][j][k - colstart], unless you have > discontiguous rows, in which case you are allowed a logarithmic search > (if the size exceeds some threshold). I see scipy.sparse does a linear > search, which is pretty slow in pathological cases: > > from scipy import sparse > a = np.random.random((1, 4000000)) > b = sparse.csr_matrix(a) > %timeit a[0, 1000] > %timeit b[0, 1000] > > 10000000 loops, best of 3: 190 ns per loop > 10 loops, best of 3: 29.3 ms per loop > > Now, depending on the density and operation, the caller may have some > idea of how to allocate an output array. I'm not sure how to handle > "insertions" of new elements, but I presume through vtable put/insert > functions. I'm also not sure how to fit this in with linear algebra > functionality, other than providing conversions of the view. > > I think a disadvantage of this scheme is that you can't reorder your > axes anymore, and many operations that are easy in the dense case are > suddenly harder, e.g. this scheme does not allow you to go from a > csr-like format into csc. But I think what this gives is reasonable > generality to allow easy use in C/Fortran/Cython compiled code/numpy > ufunc invocation, as well as allowing efficient-ish storage for > various kinds of arrays. > > Does this make any sense? > I think it does, although it is not clear to me how it would generalize to more than 2 dimensions (i.e. how would you handle a 3d sparse array). Would you add more level of indirection ? I have been thinking a bit about related concepts in the context of generalized sparse arrays (which could be a first step toward numpy arrays on top of multiple malloc blocks instead of just one), and one of the solution I have seen is generalized b-tree/b*-trees. The main appeal to me is that the underlying storage is still one-dimensional, and the "magic" happens "only" in the indexing. This has several advantages: - getting the low level block algorithms rights is a solved problem - it allows for growable arrays in O(log(N)) instead of O(N) - one can hope to get near optimal performances in the common cases (1 and 2d) with degenerate indexing functions. Two of the references I have been looking at: - UBTree: http://www.scholarpedia.org/article/B-tree_and_UB-tree - Storing Matrices on Disk : Theory and Practice Revisited (by Zhang, Yi, Munagala, Kamesh and Yang, Jun) I have meant to implement some of those ideas and do some basic benchmarks (especially to compare against CSR and CSC in 2 dimensions, in which case one could imagine building the index function in such a way as range queries map to contiguous blocks of memory). cheers, David
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion