[PyOpenCL] Using NVIDIA's Visual Profiler

Andreas Mon, 22 Nov 2010 16:21:06 -0800

Dear devs,

I'm trying to profile some PyOpenCL scripts using NVIDIA's ComputeVisual Profiler. However, I always receive an error suggesting I mightnot have released some resources properly.Some people at NVIDIA's OpenCL forum have run into the same error andreported that including clReleaseEvent(event-name) at the end of theircode solved the problem.I can't find a binding for this function in the PyOpenCL documentation,and I'm curious if this feature is implemented or if anyone could sendme a script that currently works for their profiler.

I have included a condensed version of my code plus a file that can beused to run the script through the profiler (the script was made toinvestigate how strided access of global memory affects memory bandwidth).

PS:

The exact error message reads: "Compute Visual Profiler Error. Profilerdata file "path".csv does not contain profiler output. This can happenwhen: a) Profiling is disabled... b)The application does not invoke anykernels or memory transfers. c) The application does not releaseresources (contexts, events, etc.). The program needs to be modified toproperly free up all resources before termination.


PPS:
The PyOpenCL examples included in the download give similar errors.

Kind regards
---
- Andreas Reiten

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pyopencl as cl
import numpy as np
import time

stride = 1
max_stride = 32

N = 2**19*256
M = N/max_stride
block_size_V = 1
block_size_H = 256

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)#, properties=cl.command_queue_properties.PROFILING_ENABLE)

# PUSH
mf = cl.mem_flags	
in_data_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=np.linspace(0,N-1,N).astype(np.float32))
out_data_buf = cl.Buffer(ctx, mf.WRITE_ONLY| mf.COPY_HOST_PTR, hostbuf=np.zeros(M).astype(np.float32))


source = """__kernel
void aligned(
__global const float* in_data,
__global float* out_data,
int stride
)
{
	int i = get_global_id(1);
	out_data[i] = in_data[i*stride];
}

"""

prg = cl.Program(ctx, source).build(options="-cl-mad-enable")	

# COMPUTE
evt = prg.aligned(queue,(1,int(M)),in_data_buf,out_data_buf,np.int32(stride),local_size=(block_size_V, block_size_H))

evt.wait()
#t_kernel = evt.profile.end - evt.profile.start

#in_data = np.zeros(N).astype(np.float32)
#results = np.zeros(M).astype(np.float32)
#cl.enqueue_read_buffer(queue, in_data_buf, in_data).wait() 	
#cl.enqueue_read_buffer(queue, out_data_buf, results).wait()

queue.flush()
in_data_buf.release()
out_data_buf.release()

#print str(M)+' vs N = '+str(N)
#print 'MBW = '+str(M*4*2/(t_kernel))+' [GB/s]'

#!/bin/bash
python global_access_strided.py

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

[PyOpenCL] Using NVIDIA's Visual Profiler

Reply via email to