Dear devs,

I'm trying to profile some PyOpenCL scripts using NVIDIA's Compute Visual Profiler. However, I always receive an error suggesting I might not have released some resources properly. Some people at NVIDIA's OpenCL forum have run into the same error and reported that including clReleaseEvent(event-name) at the end of their code solved the problem. I can't find a binding for this function in the PyOpenCL documentation, and I'm curious if this feature is implemented or if anyone could send me a script that currently works for their profiler.

I have included a condensed version of my code plus a file that can be used to run the script through the profiler (the script was made to investigate how strided access of global memory affects memory bandwidth).

PS:
The exact error message reads: "Compute Visual Profiler Error. Profiler data file "path".csv does not contain profiler output. This can happen when: a) Profiling is disabled... b)The application does not invoke any kernels or memory transfers. c) The application does not release resources (contexts, events, etc.). The program needs to be modified to properly free up all resources before termination.

PPS:
The PyOpenCL examples included in the download give similar errors.

Kind regards
---
- Andreas Reiten
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pyopencl as cl
import numpy as np
import time

stride = 1
max_stride = 32

N = 2**19*256
M = N/max_stride
block_size_V = 1
block_size_H = 256

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)#, properties=cl.command_queue_properties.PROFILING_ENABLE)

# PUSH
mf = cl.mem_flags	
in_data_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=np.linspace(0,N-1,N).astype(np.float32))
out_data_buf = cl.Buffer(ctx, mf.WRITE_ONLY| mf.COPY_HOST_PTR, hostbuf=np.zeros(M).astype(np.float32))


source = """__kernel
void aligned(
__global const float* in_data,
__global float* out_data,
int stride
)
{
	int i = get_global_id(1);
	out_data[i] = in_data[i*stride];
}

"""

prg = cl.Program(ctx, source).build(options="-cl-mad-enable")	

# COMPUTE
evt = prg.aligned(queue,(1,int(M)),in_data_buf,out_data_buf,np.int32(stride),local_size=(block_size_V, block_size_H))

evt.wait()
#t_kernel = evt.profile.end - evt.profile.start

#in_data = np.zeros(N).astype(np.float32)
#results = np.zeros(M).astype(np.float32)
#cl.enqueue_read_buffer(queue, in_data_buf, in_data).wait() 	
#cl.enqueue_read_buffer(queue, out_data_buf, results).wait()

queue.flush()
in_data_buf.release()
out_data_buf.release()

#print str(M)+' vs N = '+str(N)
#print 'MBW = '+str(M*4*2/(t_kernel))+' [GB/s]'
#!/bin/bash
python global_access_strided.py
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to