Thanks a lot for the replies!

I'm not sure to fully understand what you say, so please, let me say it with my own words (if I'm wrong please let me know):

I transfer the array with the numbers I want to grid to the GPU. Over each element of that array I overwrite the value of the bin that corresponds to that array's element and I return that array (containing integer numbers with the positions of the bins) to the CPU where I make the reduction.

Fran.


El 04/04/2012, a las 22:34, Pazzula, Dominic J escribió:

Exactly what I was about to propose. Doing the reduction would probably be faster on the CPU. NumPy + MKL would thread what is essentially a series of element-wise array additions.

From: pycuda-boun...@tiker.net [mailto:pycuda-boun...@tiker.net] On Behalf Of David Mertens
Sent: Wednesday, April 04, 2012 3:27 PM
To: Francisco Villaescusa Navarro
Cc: pycuda@tiker.net
Subject: Re: [PyCUDA] Histograms with PyCUDA

You could take a two-step approach: (1) Have each thread work on a subset of the data and create its own histogram, then (2) run the resulting collection of bins through a sum/reduction kernel. The time for the first step would be roughly N_data / N_simultaneous_blocks_per_device, and the running time for the second step would be roughly N_blocks * log(N_blocks).

Not a very detailed answer, but I hope that helps. :-)

David

On Wed, Apr 4, 2012 at 2:07 PM, Francisco Villaescusa Navarro <villaescusa.franci...@gmail.com > wrote:
Thanks a lot for the quick reply!

I was wondering whether I could "count" the number of elements in a given interval by something such as:

moduleHistrogram = SourceModule("""
__global__ void H(float *pos, int size, float his, float lower_limit, float upper_limit)
{
   unsigned int idx = blockIdx.x*blockDim.x+threadIdx.x;
   unsigned int idy = blockIdx.y*blockDim.y+threadIdx.y;
   unsigned int id = idy*gridDim.x*blockDim.x+idx;

   if (id<size) {
       if (pos[id]<upper_limit && pos[id]>lower_limit){
               his=his+1.0;
       }
   }
}
""")

I have tried this but it doesn't work (because the value of the variable his is not "viewed" by different threads, each of them has its own local value for the variable his. I also tried with the kernel:

moduleHistrogram = SourceModule("""
__global__ void H(float *pos, int size, float his, float lower_limit, float upper_limit)
{
   unsigned int idx = blockIdx.x*blockDim.x+threadIdx.x;
   unsigned int idy = blockIdx.y*blockDim.y+threadIdx.y;
   unsigned int id = idy*gridDim.x*blockDim.x+idx;

   __shared__ float A;
   A=his;
  __syncthreads();

   if (id<size) {
       if (pos[id]<upper_limit && pos[id]>lower_limit){
               A=A+1.0;
               __syncthreads();
               his=A;
       }
   }
}
""")

but the problem isn't solved.

Probably I'm doing something very stupid and I would like to know what it is.

Thanks a lot,

Fran.

El 04/04/2012, a las 20:32, Andreas Kloeckner escribió:


<#part sign=pgpmime>
On Wed, 4 Apr 2012 19:47:08 +0200, Francisco Villaescusa Navarro <villaescusa.franci...@gmail.com > wrote:
Hi,

I have been writing some lines for a project regarding management of
pretty large data sets. I have been trying to simplify the problem as
much as possible to understand where the problem is since I got wrong
results.

The simplification of the problem is the following:

I have a pretty long array of data containing numbers in a given
interval (let's suppose between 0.0 and 1.0), for example

total_numbers=10000
np.random.random(total_numbers).astype(np.float32)

I would like make a histogram of those data. I was wondering which
would be the best strategy to achieve this in PyCUDA.

http://lmgtfy.com/?q=cuda+histogram

:) (Nothing special about *Py*CUDA in this instance. In particular,
there's no canned functionality that will do this for you.)

HTH,
Andreas


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda



--
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to