>> Additionally, could you give your opinion on the feature we also may want to add in the future, that we mentioned in the previous email? Basically, we may want to add one more CUDA function, specifically cuMemcpy2DAsync, and the possibility to set a CUStream in AVCUDADeviceContext, so it is used with cuMemcpy2DAsync instead of cuMemcpy2D in "nvdec_retrieve_data" in file libavcodec/nvdec.c. In our use case this would save up to 0.72 ms (GPU time) per frame, in case of decoding 8 fullhd frames, and up to 0.5 ms (GPU time) per frame, in case of decoding two 4k frames. This may sound too little, but for us is significant. Our software needs to do many things in a maximum of 33ms with CUDA on the GPU per frame, and we have little GPU time left. > > This is interesting and I'm considering making that the default, as it > would fit well with the current infrastructure, delaying the sync call > to the moment the frame leaves avcodec, which with the internal > re-ordering and delay should give plenty of time for the copy to finish.
I'm not sure if/how well this works with the mapped cuvid frames though. The frame would already be unmapped and potentially re-used again before the async copy completes. So it would need an immediately call to Sync right after the 3 async copy calls, making the entire effort pointless.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel