On 01/28/2011 12:37 PM, Dag Sverre Seljebotn wrote: > On 01/28/2011 01:01 AM, Travis Oliphant wrote: > >> Just to start the conversation, and to find out who is interested, I would >> like to informally propose generator arrays for NumPy 2.0. This concept >> has as one use-case, the deferred arrays that Mark Wiebe has proposed. But, >> it also allows for "compressed arrays", on-the-fly computed arrays, and >> streamed or generated arrays. >> >> Basically, the modification I would like to make is to have an array flag >> (MEMORY) that when set means that the data attribute of a numpy array is a >> pointer to the address in memory where the data begins with the strides >> attribute pointing to a C-array of integers (in other words, all current >> arrays are MEMORY arrays) >> >> But, when the MEMORY flag is not set, the data attribute instead points to a >> length-2 C-array of pointers to functions >> >> [read(N, output_address, self->index_iter, self->extra), write(N, >> input_address, self->index_iter, self->extra)] >> >> Either of these could then be NULL (i.e. if write is NULL, then the array >> must be read-only). >> >> When the MEMORY flag is not set, the strides member of the ndarray structure >> is a pointer to the index_iter object (which could be anything that the >> particular read and write methods need it to be). >> >> The array structure should also get a member to hold the "extra" argument >> (which would hold any state that the array needed to hold on to in order to >> correctly perform the read or write operations --- i.e. it could hold an >> execution graph for deferred evaluation). >> >> The index_iter structure is anything that the read and write methods need to >> correctly identify *where* to write. Now, clearly, we could combine >> index_iter and extra into just one "structure" that holds all needed state >> for read and write to work correctly. The reason I propose two slots is >> because at least mentally in the use case of having these structures be >> calculation graphs, one of these structures is involved in "computing the >> location to read/write" and the other is involved in "computing what to >> read/write" >> >> The idea is fairly simple, but with some very interesting potential features: >> >> * lazy evaluation (of indexing, ufuncs, etc.) >> * fancy indexing as views instead of copies (really just another >> example of lazy evaluation) >> * compressed arrays >> * generated arrays (from computation or streamed data) >> * infinite arrays >> * computed arrays >> * missing-data arrays >> * ragged arrays (shape would be the bounding box --- which makes me >> think of ragged arrays as examples of masked arrays). >> * arrays that view PIL data. >> >> One could build an array with a (logically) infinite number of elements (we >> could use -2 in the shape tuple to indicate that). >> >> We don't need examples of all of these features for NumPy 2.0 to be >> released, because to really make this useful, we would need to modify all >> "calculation" code to produce a NON MEMORY array. What to do here still >> needs a lot of thought and experimentation. >> >> But, I can think about a situation where all NumPy calculations that produce >> arrays provide the option that when they are done inside of a particular >> context, a user-supplied behavior over-rides the default return. I want >> to study what Mark is proposing and understand his new iterator at a deeper >> level before providing more thoughts here. >> >> That's the gist of what I am thinking about. I would love feedback and >> comments. >> >> > I guess my reaction is along the lines of Charles': Why can't "a + b", > where a and b are NumPy arrays, simply return an object of a different > type that is lazily evaluated? Why can't infinite arrays simply be yet > another type? > > Of course, much useful functionality should then be refactored into a > new "abstract array" class, and iterators etc. be given an API that > works with more than one type. > > A special-case flag and function pointers seems a bit like reinventing > OO to me, and OO is already provided by Python. >
Whoops. I spend too much time with Cython. Cython provides this kind of (fast, C-level) OO, but not Python. Sorry! Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion