[Numpy-discussion] Buffer interface PEP
Hi all, I'm having a good discussion with Carl Banks and Greg Ewing over on python-dev about the buffer interface. We are converging on a pretty good design, IMHO. If anybody wants to participate in the discussion, please read the PEP (there are a few things that still need to change) and add your two cents over on python-dev (you can read it through the gmane gateway and participate without signing up). The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for NumPy Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buffer interface PEP
Hi, I have a specific question and then a general question, and some minor issues for clarification. Specifically, regarding the arguments to getbufferproc: 166 format 167address of a format-string (following extended struct 168syntax) indicating what is in each element of 169of memory. The number of elements is len / itemsize, 170where itemsize is the number of bytes implied by the format. 171NULL if not needed in which case format is B for 172unsigned bytes. The memory for this string must not 173be freed by the consumer --- it is managed by the exporter. Is this saying that either NULL or a pointer to B can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think. The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array. Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. Might it be more appropriate to specify only one canonical behavior in these cases? Otherwise clients which don't do all the checks on the data might not properly interoperate with providers which format these values in the alternate manner. Also, some typos, and places additional clarification could help: 253 PYBUF_STRIDES (strides and isptr) Should 'isptr' be 'suboffsets'? 75 of a larger array can be described without copying the data. T Dangling 'T'. 279 Get the buffer and optional information variables about the buffer. 280 Return an object-specific view object (which may be simply a 281 borrowed reference to the object itself). This phrasing (and similar phrasing elsewhere) is somewhat opaque to me. What's an object-specific view object? 287 Call this function to tell obj that you are done with your view Similarly, the 'view' concept and terminology should be defined more clearly in the preamble. 333 The struct string-syntax is missing some characters to fully 334 implement data-format descriptions already available elsewhere (in 335 ctypes and NumPy for example). Here are the proposed additions: Is the following table just the additions? If so, it might be good to show the full spec, and flag the specific additions. If not, then the additions should be flagged. 341 't' bit (number before states how many bits) vs. 372 According to the struct-module, a number can preceed a character 373 code to specify how many of that type there are. The I'm confused -- could this be phrased more clearly? Does '5t' refer to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If so, is it equivalent to or different from '5t'? 378 Functions should be added to ctypes to create a ctypes object from 379 a struct description, and add long-double, and ucs-2 to ctypes. Very cool. In general, the logic of the 'locking mechanism' should be described at a high level at some point. It's described in nitty-gritty details, but at least I would have appreciated a bit more of a discussion about the general how and why -- this would be helpful to clients trying to use the locking mechanism properly. Thanks to Travis and everyone else involved for your work on this cogent and sorely-needed PEP. Zach On Mar 27, 2007, at 12:42 PM, Travis Oliphant wrote: Hi all, I'm having a good discussion with Carl Banks and Greg Ewing over on python-dev about the buffer interface. We are converging on a pretty good design, IMHO. If anybody wants to participate in the discussion, please read the PEP (there are a few things that still need to change) and add your two cents over on python-dev (you can read it through the gmane gateway and participate without signing up). The PEP is stored in numpy/doc/pep_buffer.txt on the SVN tree for NumPy Best regards, -Travis ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Buffer interface PEP
Zachary Pincus wrote: Hi, Is this saying that either NULL or a pointer to B can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think. Yes, the wording could be more clear. I'm trying to make it easy for exporters to change to the new buffer interface. The main idea I really want to see is that if the caller just passes NULL instead of an address then it means they are assuming the data will be unsigned bytes It is up to the exporter to either allow this or raise an error. The exporter should always be explicit if an argument for returning the format is provided (I may have thought differently a few days ago). The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array. Here. I'm trying to be easy on the exporter and the consumer. If the data is contiguous, then neither the exporter nor will likely care about the strides. Allowing this to be NULL is like the current array protocol convention which allows this to be None. Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Not really. A client that cannot deal with strides will simply not pass an address to a stride array to the buffer protocol (that argument will be NULL). If the exporter cannot provide memory without stride information, then an error will be raised. Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. I think it's much easier to check if suboffsets is NULL rather than checking all the entries to see if they are -1 for the very common case (i.e. the NumPy case) of no dereferencing.Also, if you can't deal with suboffsets you would just not provide an address for them. Might it be more appropriate to specify only one canonical behavior in these cases? Otherwise clients which don't do all the checks on the data might not properly interoperate with providers which format these values in the alternate manner. It's important to also be easy to use. I don't think clients should be required to ask for strides and suboffsets if they can't handle them. Also, some typos, and places additional clarification could help: 253 PYBUF_STRIDES (strides and isptr) Should 'isptr' be 'suboffsets'? Yes, but I think we are going to take out the multiple locks. 75 of a larger array can be described without copying the data. T Dangling 'T'. Thanks, 279 Get the buffer and optional information variables about the buffer. 280 Return an object-specific view object (which may be simply a 281 borrowed reference to the object itself). This phrasing (and similar phrasing elsewhere) is somewhat opaque to me. What's an object-specific view object? At the moment it's the buffer provider. It is not defined because it could be a different thing for each exporter. We are still discussing this particular point and may drop it. 333 The struct string-syntax is missing some characters to fully 334 implement data-format descriptions already available elsewhere (in 335 ctypes and NumPy for example). Here are the proposed additions: Is the following table just the additions? If so, it might be good to show the full spec, and flag the specific additions. If not, then the additions should be flagged. Yes, these are just the additions. I don't want to do the full spec, it is already available elsewhere in the Python docs. 341 't' bit (number before states how many bits) vs. 372 According to the struct-module, a number can preceed a character 373 code to specify how many of that type there are. The I'm confused -- could this be phrased more clearly? Does '5t' refer to a field 5-bits wide, or 5-one bit fields? Is 't' allowed? If so, is it equivalent to or different from '5t'? Yes, 't' is equivalent to '5t' and the difference between one field 5-bits wide or 5-one bit fields is a confusion based on thinking there are fields at all. Both of those are equivalent. If you want fields then you have to define names. 378 Functions should be added to ctypes to create a ctypes object from 379 a struct description, and add long-double, and ucs-2 to ctypes. Very cool. In general, the logic of
Re: [Numpy-discussion] Buffer interface PEP
Is this saying that either NULL or a pointer to B can be supplied by getbufferproc to indicate to the caller that the array is unsigned bytes? If so, is there a specific reason to put the (minor) complexity of handling this case in the caller's hands, instead of dealing with it internally to getbufferproc? In either case, the wording is a bit unclear, I think. Yes, the wording could be more clear. I'm trying to make it easy for exporters to change to the new buffer interface. The main idea I really want to see is that if the caller just passes NULL instead of an address then it means they are assuming the data will be unsigned bytes It is up to the exporter to either allow this or raise an error. The exporter should always be explicit if an argument for returning the format is provided (I may have thought differently a few days ago). Understood -- I'm for the exporters being as explicit as possible if the argument is provided. The general question is that there are several other instances where getbufferproc is allowed to return ambiguous information which must be handled on the client side. For example, C-contiguous data can be indicated either by a NULL strides pointer or a pointer to a properly- constructed strides array. Here. I'm trying to be easy on the exporter and the consumer. If the data is contiguous, then neither the exporter nor will likely care about the strides. Allowing this to be NULL is like the current array protocol convention which allows this to be None. See below. My comments here aren't suggesting that NULL should be disallowed. I'm basically wondering whether it is a good idea to allow NULL and something else to represent the same information. (E.g. as above, an exporter could choose to show C-contiguous data with a NULL returned to the client, or with a trivial strides array). Otherwise two different exporters exporting identical data could provide different representations, which the clients would need to be able to handle. I'm not sure that this is a recipe for perfect interoperability. Clients that can't handle C-contiguous data (contrived example, I know there is a function to deal with that) would then need to check both for NULL *and* inside the strides array if not null, before properly deciding that the data isn't usable them. Not really. A client that cannot deal with strides will simply not pass an address to a stride array to the buffer protocol (that argument will be NULL). If the exporter cannot provide memory without stride information, then an error will be raised. This doesn't really address my question, which I obscured with a poorly-chosen example. The PEP says (or at least that's how I read it) that if the client *does* provide an address for the stride array, then for un-strided arrays, the exporter may either choose to fill on NULL at that address, or provide a strides array. Might it be easier for clients if the PEP required that NULL be returned if the array is C-contiguous? Or at least strongly suggested that? (I understand that there might be cases where an naive exporter thinks it is dealing with a strided array but it really is contiguous, and the exporter shouldn't be required to do that detection.) The use-case isn't too strong here, but I think it's clear in the suboffsets case (see below). Similarly, the suboffsets can be either all negative or NULL to indicate the same thing. I think it's much easier to check if suboffsets is NULL rather than checking all the entries to see if they are -1 for the very common case (i.e. the NumPy case) of no dereferencing.Also, if you can't deal with suboffsets you would just not provide an address for them. My point exactly! As written, the PEP allows an exporter to either return NULL, or an array of all negative numbers (in the case that the client requested that information), forcing a fully -conforming client to make *both* checks in order to decide what to do. Especially in this case, it would make sense to require a NULL be returned in the case of no suboffsets. This makes things easier for both clients that can deal with both suboffsets or non-offsets (e.g. they can branch on NULL, not on NULL or all-are-negative), and also for clients that can *only* deal with suboffsets. Now, in these two cases, the use-case is pretty narrow, I agree. Basically it makes things easier for savvy clients that can deal with different data types, by not forcing them to make two checks (strides == NULL or strides array is trivial; suboffsets == NULL or suboffsets are all negative) when one would do. Again, this PEP allows the same information can be passed in two very different ways, when it really doesn't seem like that ambiguity makes life that much easier for exporters. Maybe I'm wrong about this last point, though. Then there comes the trade-off -- should savvy clients