Hi,
I’ve been using pyopencl for awhile for various simulation/data processing
tasks. I recently upgraded to a new computer, and noticed things were
considerably slower.
After some experimentation, I tracked this down to the version of pyopencl I
was using. The updated version (2015.2.4; most recent on pypi) takes
significantly longer to queue a function call (~1.5 ms) than the old version
(2015.1, ~0.03 ms). Both times come from the same machine*. Profiling
indicates that the newer version is making lots of function calls the old
version did not. FYI, the code I used to test this is below (adapted from
documentation).
For my purposes, this is slightly alarming: my code makes lots of kernel calls,
in which case the new version is 50x slower for small data sets!
Is this something that has been/will be fixed in newer versions of pyopencl?
Is there a workaround? Of course, for the time being I can use the old
version, but I’d rather not be stuck with it.
If needed, I can provide the profiler output.
Thanks,
Dustin Kleckner
*OS X w/ python 3.5 installed via Anaconda, pyopencl installed via pip. Code
was tested with GPU and CPU, which similar results.
Test Code:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, print_function
import numpy as np
import pyopencl as cl
import time
import cProfile
a_np = np.random.rand(50000).astype(np.float32)
b_np = np.random.rand(50000).astype(np.float32)
# ctx = cl.create_some_context()
device = cl.get_platforms()[0].get_devices(cl.device_type.CPU)[0]
ctx = cl.Context([device])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np)
b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np)
prg = cl.Program(ctx, """
__kernel void sum(__global const float *a_g, __global const float *b_g,
__global float *res_g) {
int gid = get_global_id(0);
res_g[gid] = a_g[gid] + b_g[gid];
}
""").build()
res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes)
cProfile.run('''
start = time.time()
for n in range (100): prg.sum(queue, a_np.shape, None, a_g, b_g, res_g)
el = time.time() - start
''')
print('cl version:', cl.VERSION)
print('kernel start time: %.3f ms' % (10*el))
start = time.time()
res_np = np.empty_like(a_np)
cl.enqueue_copy(queue, res_np, res_g)
el = time.time() - start
print('copy time: %.3f ms' % (1E3*el))
_______________________________________________
PyOpenCL mailing list
[email protected]
https://lists.tiker.net/listinfo/pyopencl