Re: [Numpy-discussion] Generator arrays

Dag Sverre Seljebotn Fri, 28 Jan 2011 03:43:33 -0800

On 01/28/2011 12:37 PM, Dag Sverre Seljebotn wrote:
> On 01/28/2011 01:01 AM, Travis Oliphant wrote:
>    
>> Just to start the conversation, and to find out who is interested, I would 
>> like to informally propose generator arrays for NumPy 2.0.     This concept 
>> has as one use-case, the deferred arrays that Mark Wiebe has proposed.  But, 
>> it also allows for "compressed arrays", on-the-fly computed arrays, and 
>> streamed or generated arrays.
>>
>> Basically, the modification I would like to make is to have an array flag 
>> (MEMORY) that when set means that the data attribute of a numpy array is a 
>> pointer to the address in memory where the data begins with the strides 
>> attribute pointing to a C-array of integers (in other words, all current 
>> arrays are MEMORY arrays)
>>
>> But, when the MEMORY flag is not set, the data attribute instead points to a 
>> length-2 C-array of pointers to functions
>>
>>      [read(N, output_address, self->index_iter, self->extra),  write(N, 
>> input_address, self->index_iter, self->extra)]
>>
>> Either of these could then be NULL (i.e. if write is NULL, then the array 
>> must be read-only).
>>
>> When the MEMORY flag is not set, the strides member of the ndarray structure 
>> is a pointer to the index_iter object (which could be anything that the 
>> particular read and write methods need it to be).
>>
>> The array structure should also get a member to hold the "extra" argument 
>> (which would hold any state that the array needed to hold on to in order to 
>> correctly perform the read or write operations --- i.e. it could hold an 
>> execution graph for deferred evaluation).
>>
>> The index_iter structure is anything that the read and write methods need to 
>> correctly identify *where* to write.   Now, clearly, we could combine 
>> index_iter and extra into just one "structure" that holds all needed state 
>> for read and write to work correctly.   The reason I propose two slots is 
>> because at least mentally in the use case of having these structures be 
>> calculation graphs, one of these structures is involved in "computing the 
>> location to read/write" and the other is involved in "computing what to 
>> read/write"
>>
>> The idea is fairly simple, but with some very interesting potential features:
>>
>>      * lazy evaluation (of indexing, ufuncs, etc.)
>>      * fancy indexing as views instead of copies (really just another 
>> example of lazy evaluation)
>>      * compressed arrays
>>      * generated arrays (from computation or streamed data)
>>      * infinite arrays
>>      * computed arrays
>>      * missing-data arrays
>>      * ragged arrays (shape would be the bounding box --- which makes me 
>> think of ragged arrays as examples of masked arrays).
>>      * arrays that view PIL data.
>>
>> One could build an array with a (logically) infinite number of elements (we 
>> could use -2 in the shape tuple to indicate that).
>>
>> We don't need examples of all of these features for NumPy 2.0 to be 
>> released, because to really make this useful, we would need to modify all 
>> "calculation" code to produce a NON MEMORY array.     What to do here still 
>> needs a lot of thought and experimentation.
>>
>> But, I can think about a situation where all NumPy calculations that produce 
>> arrays provide the option that when they are done inside of a particular 
>> context,  a user-supplied behavior over-rides the default return.   I want 
>> to study what Mark is proposing and understand his new iterator at a deeper 
>> level before providing more thoughts here.
>>
>> That's the gist of what I am thinking about.   I would love feedback and 
>> comments.
>>
>>      
> I guess my reaction is along the lines of Charles': Why can't "a + b",
> where a and b are NumPy arrays, simply return an object of a different
> type that is lazily evaluated? Why can't infinite arrays simply be yet
> another type?
>
> Of course, much useful functionality should then be refactored into a
> new "abstract array" class, and iterators etc. be given an API that
> works with more than one type.
>
> A special-case flag and function pointers seems a bit like reinventing
> OO to me, and OO is already provided by Python.
>


Whoops. I spend too much time with Cython. Cython provides this kind of 
(fast, C-level) OO, but not Python. Sorry!

Dag Sverre
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Generator arrays

Reply via email to