Hello Andreas,

thank you for your feedback:

Which prerequisite must have a data structure to be good for GPU?
Should I allocate exact size of memory for each array ?

Is it ok to use numpy data stractures to compute arrays and execute
operations on the GPU, instead of python 'set' data structure?
e.g. np.intersect1d(['a','beta','gamma'],['gamma','delta','omega'])
could it be parallelized ?

as approch, is it better to try to parallelize the operation of
intersection between two keys in a dictionary,
or rather import the whole dictionary (or a partition of it) in the GPU ?






On Sun, Dec 21, 2014 at 9:44 PM, Andreas Kloeckner <li...@informa.tiker.net>
wrote:

> Luigi,
>
> here are a few problems with your approach:
>
> - The contents of your SourceModule is not valid C (as in, C the
>   programming language)
>
> - 'set' is a Python data structure. PyCUDA will not magically swap out
>   the code of 'set' and execute its operations on the GPU.
>
> - Working with arrays of variable-size objects (such as strings) on the
>   GPU is somewhat tricky. You'll have to come up with a good data
>   structure.  In particular, just copying over a Python data structure
>   will not help--if it succeeds, the pointers in the structure will
>   point to CPU memory and be entirely useless on the GPU.
>
> Andreas
>
>
> Luigi Assom <luigi.as...@gmail.com> writes:
> > I need to parallelize a computation of intersection of sets of keywords
> > over GPU .
> >
> > As example, I will take a cosine similarity computing the intersection
> > between two sets.
> > (see also post:
> >
> http://stackoverflow.com/questions/22381939/python-calculate-cosine-similarity-of-two-dicts-faster
> > )
> >
> > I want to compute the similiarity, for each key value pairs of large
> > dictionaries.
> >
> > The value of a key is indeed a set of thousands of elements, and they can
> > be strings.
> >
> > Using multiprocessing I was able to improve by 4x, but i would like to
> try
> > out GPU for really speed up the computation.
> >
> > in the source module, i actually don't know how to declare my parameters
> > cause they are not float and i haven't found a tutorial using other data
> > structures than numerical arrays with numpy.
> > That's why I was I converted my lists of keywords in np.asarray() and I
> > have tried the following:
> >
> >
> >
> > # convert list of strings into numpy array
> > key1 = 'key1'
> > array1 = np.asarray(D[key1])
> >
> > # convert list of strings into numpy array
> > array2 = np.asarray(D[key2])
> >
> > # assign memory to cuda
> >
> > array1_cuda = cuda.mem_alloc(sys.getsizeof(array1))
> > array2_cuda = cuda.mem_alloc(sys.getsizeof(array2))
> >
> > # and tried
> >
> > mod = SourceModule("""
> >   __global__ void cosine(*a, *b)
> >   {
> >     int idx = threadIdx.x + threadIdx.y*4;
> >     proxy =
> > len(set(a[idx])&set(b[idx]))/math.sqrt(len(set(a[idx]))*len(set(b[idx])))
> >
> >   }
> >   """)
> >
> >
> >
> > a_gpu = gpuarray.to_gpu(array1)
> > b_gpu = gpuarray.to_gpu(array2)
> >
> > proxy =
> > len(set(a_gpu)&set(b_gpu))/math.sqrt(len(set(a_gpu))*len(set(b_gpu)))
> >
> >
> >
> >
> > but I get
> >
> > TypeError: GPUArrays are not hashable.
> >
> >
> > Is it a problem of data structure, or am I following a conceptual
> mistake ?
> >
> >
> > with multiprocessing (without pyCuda) my code is:
> >
> > ## Measuring Performance: 4x !
> > with Timer() as t:
> >     key = 'key1'
> >     setParent = D[key]
> >     ngbrProxy = set([])
> >     p = Pool()
> >     for ngbr in p.imap_unordered(cosine,setParent):
> >         ngbrProxy.add(ngbr)
> >
> > print "=> elasped lpush: %s s" % t.secs
> >
> > I wonder how I could exploit the GPU for this type of computation: I am
> not
> > working with numerical matrixes; on the documentation of pyCuda i read it
> > is possibile to assign any type of data structures, even str, but I
> > couldn't find an example.
> >
> > Could you please help in working this out ?
> > _______________________________________________
> > PyCUDA mailing list
> > PyCUDA@tiker.net
> > http://lists.tiker.net/listinfo/pycuda
>
>


-- 
Luigi Assom

Skype contact: oggigigi
_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda

Reply via email to