>> Additionally, could you give your opinion on the feature we also may
want to add in the future, that we mentioned in the previous email?
Basically, we may want to add one more CUDA function, specifically
cuMemcpy2DAsync, and the possibility to set a CUStream in
AVCUDADeviceContext, so it is used with cuMemcpy2DAsync instead of
cuMemcpy2D in "nvdec_retrieve_data" in file libavcodec/nvdec.c. In our
use case this would save up to  0.72 ms (GPU time) per frame, in case of
decoding 8 fullhd frames, and up to 0.5 ms (GPU time) per frame, in case
of decoding two 4k frames. This may sound too little, but for us is
significant. Our software needs to do many things in a maximum of 33ms
with CUDA on the GPU per frame, and we have little GPU time left.
> 
> This is interesting and I'm considering making that the default, as it
> would fit well with the current infrastructure, delaying the sync call
> to the moment the frame leaves avcodec, which with the internal
> re-ordering and delay should give plenty of time for the copy to finish.

I'm not sure if/how well this works with the mapped cuvid frames though.
The frame would already be unmapped and potentially re-used again before
the async copy completes. So it would need an immediately call to Sync
right after the 3 async copy calls, making the entire effort pointless.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to