Forgot the most important thing... are you sure nothing else is running on that GPU? Maybe a OpenCL or CUDA or OpenGL application?
On 12 June 2014 14:52, CRV§ADER//KY <[email protected]> wrote: > You're right that something is wrong. There's no way to justify 0.2 > seconds per iteration of overhead. > Could you confirm that you are not doing any buffer copy in the meantime? > Did you retry with a different OpenCL Platform (e.g. AMD CPU or Intel CPU)? > > What happens if you pipeline the kernel execution? > > events = [] > kernel_total = 0 > > t0 = time.time() > for i in range(64): > events.append(prog.sha1( queue , shape , None , in_buf , out_buf , > ..<other buffers> )) > > t2 = time.time() > print("Scheduling time: %f", t2 - t0) > t1 = t2 > > for event in events: > event.wait() > t2 = time.time() > kernel_elapsed = 1e-9 * ( event.profile.end - event.profile.start ) > kernel_total += kernel_elapsed > print("Real run time: %f, Kernel time: %f", t2 - t1, kernel_elapsed) > t1 = t2 > > print("Total real run time: %f, Total kernel time: %f", t2 - t0, > kernel_total) > > > On 11 June 2014 23:15, Abhilash Dighe <[email protected]> wrote: > >> Hi, >> >> I was hoping to get some insight on my observations. I am using PyOpenCL >> version 2 with NVIDIA Tesla M2090 to run my kernel which runs SHA1 >> algorithm over variably sized data blocks. I'm running the same kernel I'm >> trying to find the execution time for my kernel. But I'm getting different >> readings for time for when I use the PyOpenCL's profiling tool and when I >> use the standard python time library. My code is structured as: >> >> >> hash_start = time.time() >> hash_event = prog.sha1( queue , shape , None , in_buf , out_buf , >> ..<other buffers> ) >> hash_event.wait() >> hash_end = time.time() >> add_hash_CPU_time( hash_end - hash_start ) >> add_hash_GPU_time( 1e-9 * ( hash_event.profile.end - >> hash_event.profile.start ) ) >> >> These are the results for a test case of size 3 GB. The kernel gets >> called 64 times and runs 12288 threads each time. >> >> Total OpenCL profiling time = 1.56s >> Total CPU wall clock time = 13.79s >> >> I needed some help understanding what the cause for this inconsistency >> is. Or is there any mistake I'm making in recording the data. >> >> Regards, >> Abhilash Dighe >> >> _______________________________________________ >> PyOpenCL mailing list >> [email protected] >> http://lists.tiker.net/listinfo/pyopencl >> >> >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
