[Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Travis Oliphant

Hi all,

I'm having a good discussion with Carl Banks and Greg Ewing over on 
python-dev about the buffer interface. We are converging on a pretty 
good design, IMHO.   If anybody wants to participate in the discussion, 
please read the PEP (there are a few things that still need to change) 
and add your two cents over on python-dev (you can read it through the 
gmane gateway and participate without signing up).

The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for NumPy

Best regards,

-Travis


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Zachary Pincus
Hi,

I have a specific question and then a general question, and some  
minor issues for clarification.

Specifically, regarding the arguments to getbufferproc:
 166 format
 167address of a format-string (following extended struct
 168syntax) indicating what is in each element of
 169of memory.  The number of elements is len / itemsize,
 170where itemsize is the number of bytes implied by the format.
 171NULL if not needed in which case format is B for
 172unsigned bytes.  The memory for this string must not
 173be freed by the consumer --- it is managed by the exporter.

Is this saying that either NULL or a pointer to B can be supplied  
by getbufferproc to indicate to the caller that the array is unsigned  
bytes? If so, is there a specific reason to put the (minor)  
complexity of handling this case in the caller's hands, instead of  
dealing with it internally to getbufferproc? In either case, the  
wording is a bit unclear, I think.

The general question is that there are several other instances where  
getbufferproc is allowed to return ambiguous information which must  
be handled on the client side. For example, C-contiguous data can be  
indicated either by a NULL strides pointer or a pointer to a properly- 
constructed strides array. Clients that can't handle C-contiguous  
data (contrived example, I know there is a function to deal with  
that) would then need to check both for NULL *and* inside the strides  
array if not null, before properly deciding that the data isn't  
usable them. Similarly, the suboffsets can be either all negative or  
NULL to indicate the same thing.

Might it be more appropriate to specify only one canonical behavior  
in these cases? Otherwise clients which don't do all the checks on  
the data might not properly interoperate with providers which format  
these values in the alternate manner.


Also, some typos, and places additional clarification could help:

 253 PYBUF_STRIDES (strides and isptr)
Should 'isptr' be 'suboffsets'?

 75 of a larger array can be described without copying the data.   T
Dangling 'T'.

 279 Get the buffer and optional information variables about the  
 buffer.
 280 Return an object-specific view object (which may be simply a
 281 borrowed reference to the object itself).
This phrasing (and similar phrasing elsewhere) is somewhat opaque to  
me. What's an object-specific view object?

 287 Call this function to tell obj that you are done with your view
Similarly, the 'view' concept and terminology should be defined more  
clearly in the preamble.

 333 The struct string-syntax is missing some characters to fully
 334 implement data-format descriptions already available elsewhere (in
 335 ctypes and NumPy for example).  Here are the proposed additions:
Is the following table just the additions? If so, it might be good to  
show the full spec, and flag the specific additions. If not, then the  
additions should be flagged.

 341 't'   bit (number before states how many bits)
vs.
 372 According to the struct-module, a number can preceed a character
 373 code to specify how many of that type there are.  The
I'm confused -- could this be phrased more clearly? Does '5t' refer  
to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If  
so, is it equivalent to or different from '5t'?

 378 Functions should be added to ctypes to create a ctypes object from
 379 a struct description, and add long-double, and ucs-2 to ctypes.
Very cool.

In general, the logic of the 'locking mechanism' should be described  
at a high level at some point. It's described in nitty-gritty  
details, but at least I would have appreciated a bit more of a  
discussion about the general how and why -- this would be helpful to  
clients trying to use the locking mechanism properly.

Thanks to Travis and everyone else involved for your work on this  
cogent and sorely-needed PEP.

Zach



On Mar 27, 2007, at 12:42 PM, Travis Oliphant wrote:


 Hi all,

 I'm having a good discussion with Carl Banks and Greg Ewing over on
 python-dev about the buffer interface. We are converging on a pretty
 good design, IMHO.   If anybody wants to participate in the  
 discussion,
 please read the PEP (there are a few things that still need to change)
 and add your two cents over on python-dev (you can read it through the
 gmane gateway and participate without signing up).

 The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for  
 NumPy

 Best regards,

 -Travis


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Travis Oliphant
Zachary Pincus wrote:
 Hi,

   
 Is this saying that either NULL or a pointer to B can be supplied  
 by getbufferproc to indicate to the caller that the array is unsigned  
 bytes? If so, is there a specific reason to put the (minor)  
 complexity of handling this case in the caller's hands, instead of  
 dealing with it internally to getbufferproc? In either case, the  
 wording is a bit unclear, I think.
   

Yes, the wording could be more clear.   I'm trying to make it easy for 
exporters to change
to the new buffer interface.   

The main idea I really want to see is that if the caller just passes 
NULL instead of an address then it means they are assuming the data will 
be unsigned bytes   It is up to the exporter to either allow this or 
raise an error. 

The exporter should always be explicit if an argument for returning the 
format is provided (I may have thought differently a few days ago).

 The general question is that there are several other instances where  
 getbufferproc is allowed to return ambiguous information which must  
 be handled on the client side. For example, C-contiguous data can be  
 indicated either by a NULL strides pointer or a pointer to a properly- 
 constructed strides array. 

Here.  I'm trying to be easy on the exporter and the consumer.  If the 
data is contiguous, then neither the exporter nor will likely care about 
the strides.  Allowing this to be NULL is like the current array 
protocol convention which allows this to be None.  

 Clients that can't handle C-contiguous  
 data (contrived example, I know there is a function to deal with  
 that) would then need to check both for NULL *and* inside the strides  
 array if not null, before properly deciding that the data isn't  
 usable them.
Not really.  A client that cannot deal with strides will simply not pass 
an address to a stride array to the buffer protocol (that argument will 
be NULL).  If the exporter cannot provide memory without stride 
information, then an error will be raised.

 Similarly, the suboffsets can be either all negative or  
 NULL to indicate the same thing.
   
I think it's much easier to check if suboffsets is NULL rather than 
checking all the entries to see if they are -1 for the very common case 
(i.e. the NumPy case) of no dereferencing.Also, if you can't deal 
with suboffsets you would just not provide an address for them.
 Might it be more appropriate to specify only one canonical behavior  
 in these cases? Otherwise clients which don't do all the checks on  
 the data might not properly interoperate with providers which format  
 these values in the alternate manner.
   
It's important to also be easy to use.  I don't think clients should be 
required to ask for strides and suboffsets if they can't handle them. 

 Also, some typos, and places additional clarification could help:

   
 253 PYBUF_STRIDES (strides and isptr)
 
 Should 'isptr' be 'suboffsets'?
   

Yes, but I think we are going to take out the multiple locks.
   
 75 of a larger array can be described without copying the data.   T
 
 Dangling 'T'.
   
Thanks,

   
 279 Get the buffer and optional information variables about the  
 buffer.
 280 Return an object-specific view object (which may be simply a
 281 borrowed reference to the object itself).
 
 This phrasing (and similar phrasing elsewhere) is somewhat opaque to  
 me. What's an object-specific view object?
   
At the moment it's the buffer provider.  It is not defined because it 
could be a different thing for each exporter.   We are still discussing 
this particular point and may drop it.
   
 333 The struct string-syntax is missing some characters to fully
 334 implement data-format descriptions already available elsewhere (in
 335 ctypes and NumPy for example).  Here are the proposed additions:
 
 Is the following table just the additions? If so, it might be good to  
 show the full spec, and flag the specific additions. If not, then the  
 additions should be flagged.
   

Yes, these are just the additions.  I don't want to do the full spec, it 
is already available elsewhere in the Python docs.

   
 341 't'   bit (number before states how many bits)
 
 vs.
   
 372 According to the struct-module, a number can preceed a character
 373 code to specify how many of that type there are.  The
 
 I'm confused -- could this be phrased more clearly? Does '5t' refer  
 to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If  
 so, is it equivalent to or different from '5t'?
   
Yes, 't' is equivalent to '5t'  and the difference between one field 
5-bits wide or 5-one bit fields is a confusion based on thinking there 
are fields at all.   Both of those are equivalent.  If you want fields 
then you have to define names. 

   
 378 Functions should be added to ctypes to create a ctypes object from
 379 a struct description, and add long-double, and ucs-2 to ctypes.
 
 Very cool.

 In general, the logic of 

Re: [Numpy-discussion] Buffer interface PEP

2007-03-27 Thread Zachary Pincus
 Is this saying that either NULL or a pointer to B can be supplied
 by getbufferproc to indicate to the caller that the array is unsigned
 bytes? If so, is there a specific reason to put the (minor)
 complexity of handling this case in the caller's hands, instead of
 dealing with it internally to getbufferproc? In either case, the
 wording is a bit unclear, I think.


 Yes, the wording could be more clear.   I'm trying to make it easy for
 exporters to change
 to the new buffer interface.

 The main idea I really want to see is that if the caller just passes
 NULL instead of an address then it means they are assuming the data  
 will
 be unsigned bytes   It is up to the exporter to either allow this or
 raise an error.

 The exporter should always be explicit if an argument for returning  
 the
 format is provided (I may have thought differently a few days ago).

Understood -- I'm for the exporters being as explicit as possible if  
the argument is provided.

 The general question is that there are several other instances where
 getbufferproc is allowed to return ambiguous information which must
 be handled on the client side. For example, C-contiguous data can be
 indicated either by a NULL strides pointer or a pointer to a  
 properly-
 constructed strides array.

 Here.  I'm trying to be easy on the exporter and the consumer.  If the
 data is contiguous, then neither the exporter nor will likely care  
 about
 the strides.  Allowing this to be NULL is like the current array
 protocol convention which allows this to be None.

See below. My comments here aren't suggesting that NULL should be  
disallowed. I'm basically wondering whether it is a good idea to  
allow NULL and something else to represent the same information.  
(E.g. as above, an exporter could choose to show C-contiguous data  
with a NULL returned to the client, or with a trivial strides array).

Otherwise two different exporters exporting identical data could  
provide different representations, which the clients would need to be  
able to handle. I'm not sure that this is a recipe for perfect  
interoperability.

 Clients that can't handle C-contiguous
 data (contrived example, I know there is a function to deal with
 that) would then need to check both for NULL *and* inside the strides
 array if not null, before properly deciding that the data isn't
 usable them.
 Not really.  A client that cannot deal with strides will simply not  
 pass
 an address to a stride array to the buffer protocol (that argument  
 will
 be NULL).  If the exporter cannot provide memory without stride
 information, then an error will be raised.

This doesn't really address my question, which I obscured with a  
poorly-chosen example. The PEP says (or at least that's how I read  
it) that if the client *does* provide an address for the stride  
array, then for un-strided arrays, the exporter may either choose to  
fill on NULL at that address, or provide a strides array.

Might it be easier for clients if the PEP required that NULL be  
returned if the array is C-contiguous? Or at least strongly suggested  
that? (I understand that there might be cases where an naive exporter  
thinks it is dealing with a strided array but it really is  
contiguous, and the exporter shouldn't be required to do that  
detection.)

The use-case isn't too strong here, but I think it's clear in the  
suboffsets case (see below).

 Similarly, the suboffsets can be either all negative or
 NULL to indicate the same thing.
 I think it's much easier to check if suboffsets is NULL rather than
 checking all the entries to see if they are -1 for the very common  
 case
 (i.e. the NumPy case) of no dereferencing.Also, if you can't deal
 with suboffsets you would just not provide an address for them.

My point exactly! As written, the PEP allows an exporter to either  
return NULL, or an array of all negative numbers (in the case that  
the client requested that information), forcing a fully -conforming  
client to make *both* checks in order to decide what to do.

Especially in this case, it would make sense to require a NULL be  
returned in the case of no suboffsets. This makes things easier for  
both clients that can deal with both suboffsets or non-offsets (e.g.  
they can branch on NULL, not on NULL or all-are-negative), and also  
for clients that can *only* deal with suboffsets.

Now, in these two cases, the use-case is pretty narrow, I agree.  
Basically it makes things easier for savvy clients that can deal with  
different data types, by not forcing them to make two checks (strides  
== NULL or strides array is trivial; suboffsets == NULL or suboffsets  
are all negative) when one would do. Again, this PEP allows the same  
information can be passed in two very different ways, when it really  
doesn't seem like that ambiguity makes life that much easier for  
exporters.

Maybe I'm wrong about this last point, though. Then there comes the  
trade-off -- should savvy clients