Another thing that came to my mind: On Fri, 4 May 2018 15:32:58 +0800, Ruiling Song <ruiling.s...@intel.com> wrote: > +static int tonemap_opencl_filter_frame(AVFilterLink *inlink, AVFrame *input) > +{ > + AVFilterContext *avctx = inlink->dst; > + AVFilterLink *outlink = avctx->outputs[0]; > + TonemapOpenCLContext *ctx = avctx->priv; > + AVFrame *output = NULL; > + cl_int cle; > + int err; > + double peak = ctx->peak; > + > + AVHWFramesContext *input_frames_ctx = > + (AVHWFramesContext*)input->hw_frames_ctx->data; > + > + av_log(ctx, AV_LOG_DEBUG, "Filter input: %s, %ux%u (%"PRId64").\n", > + av_get_pix_fmt_name(input->format), > + input->width, input->height, input->pts); > + > + if (!input->hw_frames_ctx) > + return AVERROR(EINVAL); > + > + output = ff_get_video_buffer(outlink, outlink->w, outlink->h); > + if (!output) { > + err = AVERROR(ENOMEM); > + goto fail; > + } > + > + err = av_frame_copy_props(output, input); > + if (err < 0) > + goto fail; > + > + if (!peak) > + peak = determine_signal_peak(input); > + > + if (ctx->trc != -1) > + output->color_trc = ctx->trc; > + if (ctx->primaries != -1) > + output->color_primaries = ctx->primaries; > + if (ctx->colorspace != -1) > + output->colorspace = ctx->colorspace; > + > + ctx->trc_in = input->color_trc; > + ctx->trc_out = output->color_trc; > + ctx->colorspace_in = input->colorspace; > + ctx->colorspace_out = output->colorspace; > + ctx->primaries_in = input->color_primaries; > + ctx->primaries_out = output->color_primaries; > + > + assert(output->sw_format == AV_PIX_FMT_NV12); > + > + if (!ctx->initialised) { > + err = tonemap_opencl_init(avctx); > + if (err < 0) > + goto fail; > + } > + > + switch(input_frames_ctx->sw_format) { > + case AV_PIX_FMT_P010: > + err = launch_kernel(avctx, ctx->kernel, output, input, peak); > + if (err < 0) goto fail; > + break; > + default: > + av_log(ctx, AV_LOG_ERROR, "unsupported format in tonemap_opencl.\n"); > + err = AVERROR(ENOSYS); > + goto fail; > + } > + > + cle = clFinish(ctx->command_queue); > + if (cle != CL_SUCCESS) { > + av_log(avctx, AV_LOG_ERROR, "Failed to finish command queue: %d.\n", > + cle); > + err = AVERROR(EIO); > + goto fail; > + } > + > + > + av_frame_free(&input); > + > + av_log(ctx, AV_LOG_DEBUG, "Filter output: %s, %ux%u (%"PRId64").\n", > + av_get_pix_fmt_name(output->format), > + output->width, output->height, output->pts); > + > + return ff_filter_frame(outlink, output); > + > +fail: > + clFinish(ctx->command_queue); > + av_frame_free(&input); > + av_frame_free(&output); > + return err; > +}
Reading this logic, it seems like each frame is individually processed by calling clEnqueueNDRangeKernel followed by clFinish, after which the function returns. I'm not very familiar with OpenCL, but if it behaves anything like OpenGL and Vulkan, this kind of design essentially forces blocking until the frame is finished processing and fully resident in host RAM again, before the next frame can be started? Generally when batch processing large amounts of images, it's better to do asynchronous processing to avoid pipeline stalls. If the GPU is free to resume work on the next image while the previous image is still in the process of getting cleared from caches and DMA'd back into host RAM, without having to stall for the transfer either way. Ideally, PCIe bandwidth permitting, you want either the compute processors or the DMA engine to be saturated. A blocking design permits neither. I'm not entirely sure how that's done in OpenCL, if at all possible, but normally it would revolve around using some kind of fence or signal object to regularly poll in-flight frames and only emit them from the filter once they're available. In general, this requires some kind of filter API that allows ingesting and emitting frames at different times - I'm sure this is possible with lavf somehow? Correct me if I'm wrong, Niklas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel