On Mon, Apr 9, 2012 at 5:46 PM, Pazzula, Dominic J <dominic.j.pazz...@citi.com> wrote: > This is a more general question. I was attempting to run the code below on > my aforementioned OLD POS Cuda card. I get the following error: > > Traceback (most recent call last): > File "histo.py", line 54, in <module> > mod_grid = compiler.SourceModule(grid_gpu) > File > "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", > line 283, in __init__ > arch, code, cache_dir, include_dirs) > File > "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", > line 273, in compile > return compile_plain(source, options, keep, nvcc, cache_dir) > File > "/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", > line 137, in compile_plain > cmdline, stdout=stdout, stderr=stderr) > pycuda.driver.CompileError: nvcc said it demoted types in source code it > compiled--this is likely not what you want. > [command: nvcc --cubin -arch sm_11 > -I/usr/local/lib/python2.7/site-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda > kernel.cu] > [stderr: > ptxas /tmp/tmpxft_0000153b_00000000-2_kernel.ptx, line 95; warning : Double > is not supported. Demoting to float > ] > > Nowhere in the code is Double referenced. I'm guessing something behind the > scenes is trying to declare a Double on my behalf. Is there a way to turn > this off? > > Thanks > Dominic
I think this is fixed in the most recent git-development version. Thomas > -----Original Message----- > From: pycuda-boun...@tiker.net [mailto:pycuda-boun...@tiker.net] On Behalf Of > Francisco Villaescusa Navarro > Sent: Friday, April 06, 2012 11:26 AM > To: Thomas Wiecki > Cc: pycuda@tiker.net > Subject: Re: [PyCUDA] Histograms with PyCUDA > > Thanks for all the suggestions! > > Regarding removing sqrt: it seems that the code only gains about ~1%, > and you lose the capacity to easily define linear intervals... > > I have tried with sqrt and sqrtf, but there is not difference in the > total time (or it is very small). > > The code to find the histogram of an array with values between 0 and 1 > should be something as: > > import numpy as np > import time > import pycuda.driver as cuda > import pycuda.autoinit > import pycuda.gpuarray as gpuarray > import pycuda.cumath as cumath > from pycuda.compiler import SourceModule > from pycuda import compiler > > grid_gpu_template = """ > __global__ void grid(float *values, int size, float *temp_grid) > { > unsigned int id = threadIdx.x; > int i,bin; > const uint interv = %(interv)s; > > for(i=id;i<size;i+=blockDim.x){ > bin=(int)(values[i]*interv); > if (bin==interv){ > bin=interv-1; > } > temp_grid[id*interv+bin]+=1.0; > } > } > """ > > reduction_gpu_template = """ > __global__ void reduction(float *temp_grid, float *his) > { > unsigned int id = blockIdx.x*blockDim.x+threadIdx.x; > const uint interv = %(interv)s; > const uint threads = %(max_number_of_threads)s; > > if(id<interv){ > for(int i=0;i<threads;i++){ > his[id]+=temp_grid[id+interv*i]; > } > } > } > """ > > number_of_points=100000000 > max_number_of_threads=512 > interv=1024 > > blocks=interv/max_number_of_threads > if interv%max_number_of_threads!=0: > blocks+=1 > > values=np.random.random(number_of_points).astype(np.float32) > > grid_gpu = grid_gpu_template % { > 'interv': interv, > } > mod_grid = compiler.SourceModule(grid_gpu) > grid = mod_grid.get_function("grid") > > reduction_gpu = reduction_gpu_template % { > 'interv': interv, > 'max_number_of_threads': max_number_of_threads, > } > mod_redt = compiler.SourceModule(reduction_gpu) > redt = mod_redt.get_function("reduction") > > values_gpu=gpuarray.to_gpu(values) > temp_grid_gpu > =gpuarray.zeros((max_number_of_threads,interv),dtype=np.float32) > hist=np.zeros(interv,dtype=np.float32) > hist_gpu=gpuarray.to_gpu(hist) > > start=time.clock()*1e3 > grid > (values_gpu > ,np > .int32 > (number_of_points > ),temp_grid_gpu,grid=(1,1),block=(max_number_of_threads,1,1)) > redt(temp_grid_gpu,hist_gpu,grid=(blocks, > 1),block=(max_number_of_threads,1,1)) > hist=hist_gpu.get() > print 'Time used to grid with GPU:',time.clock()*1e3-start,' ms' > > > start=time.clock()*1e3 > bins_histo=np.linspace(0.0,1.0,interv+1) > hist_CPU=np.histogram(values,bins=bins_histo)[0] > print 'Time used to grid with CPU:',time.clock()*1e3-start,' ms' > > print 'max difference between methods=',np.max(hist_CPU-hist) > > > ################ > > Results: > > Time used to grid with GPU: 680.0 ms > Time used to grid with CPU: 9320.0 ms > max difference between methods= 0.0 > > So it seems that with this algorithm we can't achieve factors larger > than ~15 > > Fran. > > > > _______________________________________________ > PyCUDA mailing list > PyCUDA@tiker.net > http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda