Bogdan,

After further checking, it does not look like a design decision, but a
strict implementation of the standard ! See opencl1.1 page 163 :

"6.1.5 Alignment of Types

For 3-component vector data types, the size of the data type is 4 *
sizeof(component). This means that a 3-component vector data type will
be aligned to a 4 * sizeof(component) boundary. The vload3 and vstore3
built-in functions can be used to read and write, respectively,
3-component vector data types from an array of packed scalar data
type."

So Apple's implementation looks correct...

Andreas,

What implementation of opencl are you using on your 64-bit Linux, as
it might have an alignment issue ? Could you please try the code
proposed by Bogdan to check float3 size ?

Thanks.

Regards,

David.

2010/12/14 Bogdan Opanchuk <[email protected]>:
> Hello David,
>
> On Tue, Dec 14, 2010 at 2:54 AM, David Libault <[email protected]> 
> wrote:
>> Thank you for your answer. I tried your proposition, and, as you say,
>> it returns 16 bytes instead of 12.
>>
>> Strange bug in Apple's implementation...
>
> I would not call that a bug; most probably they decided that improved
> float3 fetching/storing speed (16 bytes can be transferred using the
> single instruction and properly coalesced, as opposed to 12 bytes) is
> worth increased storage space. If you were using "new float3[...]" to
> allocate buffer, it would pass unnoticed, but when you allocated
> memory in Python, you bumped into this design decision.
>
> Best regards,
> Bogdan
>

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to