Dear devs,
I'm trying to profile some PyOpenCL scripts using NVIDIA's Compute
Visual Profiler. However, I always receive an error suggesting I might
not have released some resources properly.
Some people at NVIDIA's OpenCL forum have run into the same error and
reported that including clReleaseEvent(event-name) at the end of their
code solved the problem.
I can't find a binding for this function in the PyOpenCL documentation,
and I'm curious if this feature is implemented or if anyone could send
me a script that currently works for their profiler.
I have included a condensed version of my code plus a file that can be
used to run the script through the profiler (the script was made to
investigate how strided access of global memory affects memory bandwidth).
PS:
The exact error message reads: "Compute Visual Profiler Error. Profiler
data file "path".csv does not contain profiler output. This can happen
when: a) Profiling is disabled... b)The application does not invoke any
kernels or memory transfers. c) The application does not release
resources (contexts, events, etc.). The program needs to be modified to
properly free up all resources before termination.
PPS:
The PyOpenCL examples included in the download give similar errors.
Kind regards
---
- Andreas Reiten
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pyopencl as cl
import numpy as np
import time
stride = 1
max_stride = 32
N = 2**19*256
M = N/max_stride
block_size_V = 1
block_size_H = 256
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)#, properties=cl.command_queue_properties.PROFILING_ENABLE)
# PUSH
mf = cl.mem_flags
in_data_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=np.linspace(0,N-1,N).astype(np.float32))
out_data_buf = cl.Buffer(ctx, mf.WRITE_ONLY| mf.COPY_HOST_PTR, hostbuf=np.zeros(M).astype(np.float32))
source = """__kernel
void aligned(
__global const float* in_data,
__global float* out_data,
int stride
)
{
int i = get_global_id(1);
out_data[i] = in_data[i*stride];
}
"""
prg = cl.Program(ctx, source).build(options="-cl-mad-enable")
# COMPUTE
evt = prg.aligned(queue,(1,int(M)),in_data_buf,out_data_buf,np.int32(stride),local_size=(block_size_V, block_size_H))
evt.wait()
#t_kernel = evt.profile.end - evt.profile.start
#in_data = np.zeros(N).astype(np.float32)
#results = np.zeros(M).astype(np.float32)
#cl.enqueue_read_buffer(queue, in_data_buf, in_data).wait()
#cl.enqueue_read_buffer(queue, out_data_buf, results).wait()
queue.flush()
in_data_buf.release()
out_data_buf.release()
#print str(M)+' vs N = '+str(N)
#print 'MBW = '+str(M*4*2/(t_kernel))+' [GB/s]'
#!/bin/bash
python global_access_strided.py
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl