Hi Andreas
On 3/8/12 1:55 AM, Andreas Kloeckner wrote:
<#part sign=pgpmime>
Hi Alex, Karsten,
On Wed, 7 Mar 2012 16:43:22 -0500, Alex Nitz<[email protected]> wrote:
This is what I am getting on an NVIDIA GTX580.
[snip]
:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
I've committed what I think might be a workaround for this to git. Can
you please try it out and report back?
Andreas
I rebuild on commit:
* commit 9e925431b16a09a1e18794a30b95f820d56b037a (HEAD, origin/master,
origin/HEAD, mast
| Author: Andreas Kloeckner<[email protected]>
| Date: Wed Mar 7 19:52:08 2012 -0500
|
| Try and work around an Nvidia compiler issue in cfloat_cast().
|
| (Issue reported by Karsten Wiesner and Alex Nitz)
But still got the error at different GPUs:
(below you'll find a oclDeviceQuery output)
Build on<pyopencl.Device 'nForce 980a/780a SLI' on 'NVIDIA CUDA' at
0xe31f60>:
:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
~^~~~~
Build on<pyopencl.Device 'Tesla C1060' on 'NVIDIA CUDA' at 0x1add190>:
:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
~^~~~~
Build on<pyopencl.Device 'Tesla C2050' on 'NVIDIA CUDA' at 0x19214f0>:
:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
~^~~~~
Cheers, Karsten
PS.:
[oclDeviceQuery] starting...
/usr/local/nvidia/OpenCL/bin/linux/release/oclDeviceQuery Starting...
OpenCL SW Info:
CL_PLATFORM_NAME: NVIDIA CUDA
CL_PLATFORM_VERSION: OpenCL 1.1 CUDA 4.1.1
OpenCL SDK Revision: 7027912
OpenCL Device Info:
3 devices found supporting OpenCL:
---------------------------------
Device Tesla C2050
---------------------------------
CL_DEVICE_NAME: Tesla C2050
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 290.10
CL_DEVICE_VERSION: OpenCL 1.1 CUDA
CL_DEVICE_OPENCL_C_VERSION: OpenCL C 1.1
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 14
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 1024 / 1024 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1147 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 671 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 2687 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: yes
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: denorms INF-quietNaNs round-to-nearest
round-to-zero round-to-inf fma
CL_DEVICE_IMAGE<dim> 2D_MAX_WIDTH 32768
2D_MAX_HEIGHT 32768
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
CL_DEVICE_COMPUTE_CAPABILITY_NV: 2.0
NUMBER OF MULTIPROCESSORS: 14
NUMBER OF CUDA CORES: 448
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 32768
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT
1, DOUBLE 1
---------------------------------
Device Tesla C1060
---------------------------------
CL_DEVICE_NAME: Tesla C1060
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 290.10
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 30
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1296 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1023 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 4095 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest
round-to-zero round-to-inf fma
CL_DEVICE_IMAGE<dim> 2D_MAX_WIDTH 4096
2D_MAX_HEIGHT 16383
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_fp64
CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.3
NUMBER OF MULTIPROCESSORS: 30
NUMBER OF CUDA CORES: 240
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 16384
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_TRUE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_FALSE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT
1, DOUBLE 1
---------------------------------
Device nForce 980a/780a SLI
---------------------------------
CL_DEVICE_NAME: nForce 980a/780a SLI
CL_DEVICE_VENDOR: NVIDIA Corporation
CL_DRIVER_VERSION: 290.10
CL_DEVICE_VERSION: OpenCL 1.0 CUDA
CL_DEVICE_TYPE: CL_DEVICE_TYPE_GPU
CL_DEVICE_MAX_COMPUTE_UNITS: 1
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 512 / 512 / 64
CL_DEVICE_MAX_WORK_GROUP_SIZE: 512
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1200 MHz
CL_DEVICE_ADDRESS_BITS: 32
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 128 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 253 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: local
CL_DEVICE_LOCAL_MEM_SIZE: 16 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 64 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 128
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 8
CL_DEVICE_SINGLE_FP_CONFIG: INF-quietNaNs round-to-nearest
round-to-zero round-to-inf fma
CL_DEVICE_IMAGE<dim> 2D_MAX_WIDTH 4096
2D_MAX_HEIGHT 16383
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_EXTENSIONS: cl_khr_byte_addressable_store
cl_khr_icd
cl_khr_gl_sharing
cl_nv_compiler_options
cl_nv_device_attribute_query
cl_nv_pragma_unroll
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
CL_DEVICE_COMPUTE_CAPABILITY_NV: 1.1
NUMBER OF MULTIPROCESSORS: 1
NUMBER OF CUDA CORES: 8
CL_DEVICE_REGISTERS_PER_BLOCK_NV: 8192
CL_DEVICE_WARP_SIZE_NV: 32
CL_DEVICE_GPU_OVERLAP_NV: CL_FALSE
CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV: CL_FALSE
CL_DEVICE_INTEGRATED_MEMORY_NV: CL_TRUE
CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t> CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT
1, DOUBLE 0
---------------------------------
2D Image Formats Supported (71)
---------------------------------
# Channel Order Channel Type
1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
---------------------------------
3D Image Formats Supported (71)
---------------------------------
# Channel Order Channel Type
1 CL_R CL_FLOAT
2 CL_R CL_HALF_FLOAT
3 CL_R CL_UNORM_INT8
4 CL_R CL_UNORM_INT16
5 CL_R CL_SNORM_INT16
6 CL_R CL_SIGNED_INT8
7 CL_R CL_SIGNED_INT16
8 CL_R CL_SIGNED_INT32
9 CL_R CL_UNSIGNED_INT8
10 CL_R CL_UNSIGNED_INT16
11 CL_R CL_UNSIGNED_INT32
12 CL_A CL_FLOAT
13 CL_A CL_HALF_FLOAT
14 CL_A CL_UNORM_INT8
15 CL_A CL_UNORM_INT16
16 CL_A CL_SNORM_INT16
17 CL_A CL_SIGNED_INT8
18 CL_A CL_SIGNED_INT16
19 CL_A CL_SIGNED_INT32
20 CL_A CL_UNSIGNED_INT8
21 CL_A CL_UNSIGNED_INT16
22 CL_A CL_UNSIGNED_INT32
23 CL_RG CL_FLOAT
24 CL_RG CL_HALF_FLOAT
25 CL_RG CL_UNORM_INT8
26 CL_RG CL_UNORM_INT16
27 CL_RG CL_SNORM_INT16
28 CL_RG CL_SIGNED_INT8
29 CL_RG CL_SIGNED_INT16
30 CL_RG CL_SIGNED_INT32
31 CL_RG CL_UNSIGNED_INT8
32 CL_RG CL_UNSIGNED_INT16
33 CL_RG CL_UNSIGNED_INT32
34 CL_RA CL_FLOAT
35 CL_RA CL_HALF_FLOAT
36 CL_RA CL_UNORM_INT8
37 CL_RA CL_UNORM_INT16
38 CL_RA CL_SNORM_INT16
39 CL_RA CL_SIGNED_INT8
40 CL_RA CL_SIGNED_INT16
41 CL_RA CL_SIGNED_INT32
42 CL_RA CL_UNSIGNED_INT8
43 CL_RA CL_UNSIGNED_INT16
44 CL_RA CL_UNSIGNED_INT32
45 CL_RGBA CL_FLOAT
46 CL_RGBA CL_HALF_FLOAT
47 CL_RGBA CL_UNORM_INT8
48 CL_RGBA CL_UNORM_INT16
49 CL_RGBA CL_SNORM_INT16
50 CL_RGBA CL_SIGNED_INT8
51 CL_RGBA CL_SIGNED_INT16
52 CL_RGBA CL_SIGNED_INT32
53 CL_RGBA CL_UNSIGNED_INT8
54 CL_RGBA CL_UNSIGNED_INT16
55 CL_RGBA CL_UNSIGNED_INT32
56 CL_BGRA CL_UNORM_INT8
57 CL_BGRA CL_SIGNED_INT8
58 CL_BGRA CL_UNSIGNED_INT8
59 CL_ARGB CL_UNORM_INT8
60 CL_ARGB CL_SIGNED_INT8
61 CL_ARGB CL_UNSIGNED_INT8
62 CL_INTENSITY CL_FLOAT
63 CL_INTENSITY CL_HALF_FLOAT
64 CL_INTENSITY CL_UNORM_INT8
65 CL_INTENSITY CL_UNORM_INT16
66 CL_INTENSITY CL_SNORM_INT16
67 CL_LUMINANCE CL_FLOAT
68 CL_LUMINANCE CL_HALF_FLOAT
69 CL_LUMINANCE CL_UNORM_INT8
70 CL_LUMINANCE CL_UNORM_INT16
71 CL_LUMINANCE CL_SNORM_INT16
oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.1 CUDA
4.1.1, SDK Revision = 7027912, NumDevs = 3, Device = Tesla C2050, Device =
Tesla C1060, Device = nForce 980a/780a SLI
System Info:
Local Time/Date = 14:28:48, 03/08/2012
CPU Name: AMD Phenom(tm) 9750 Quad-Core Processor
# of CPU processors: 4
Linux version 2.6.32.28-atlas-generic (root@bob) (gcc version 4.3.2 (Debian
4.3.2-1.1) ) #1 SMP Wed Feb 2 09:00:17 CET 2011
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl