Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
applied ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 20.03.2020 12:30, Yaroslav Pogrebnyak wrote: On 20.03.20 00:47, Timo Rothenpieler wrote: I'm looking into adding hardware-frame support to make_writable, so modifications might not be needed. Yep it seems to be more consistent if av_frame_make_writable could support hardware frames. Please let me know if you are going to do it, or if I need to send modified patch. Thanks! It is not as simple as I anticipated to do it in a generic way. The main issue at hand is how nvdec returns hardware frames, which I need to fix first to get rid of a lot of hackery that stems from it. But I do intend to go along with it. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 20.03.20 00:47, Timo Rothenpieler wrote: I'm looking into adding hardware-frame support to make_writable, so modifications might not be needed. Yep it seems to be more consistent if av_frame_make_writable could support hardware frames. Please let me know if you are going to do it, or if I need to send modified patch. Thanks! ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 19.03.2020 15:59, Yaroslav Pogrebnyak wrote: Got it, thanks! I'll re-do it and submit updated patch soon. I'm looking into adding hardware-frame support to make_writable, so modifications might not be needed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 19.03.20 22:41, Timo Rothenpieler wrote: h264_cuvid copies frames back to normal VRAM, and does not pass around mapped nvdec surfaces, like nvdec does. Writing around in these is documented as disallowed. You can call av_frame_is_writable() on the frame. If it returns true, it's safe to write into it. If it returns false, you have to allocate a new output frame. Got it, thanks! I'll re-do it and submit updated patch soon. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 19.03.2020 15:11, Yaroslav Pogrebnyak wrote: On 19.03.20 21:40, Timo Rothenpieler wrote: For what I'm aware, make_writable does not work on hardware frames. And the nvdec hwaccel returns frames that are mapped device memory, and thus hard read-only. You will need to manually allocate output frames from the hw_frames_ctx. Yes I see. So it seems we can safely remove this call. Also, I was thinking that output frame allocation is not needed because we can safely operate on input frame in-place saving extra memory allocation and copy. It seems works well. Is it ok, or should we always allocate output frame? If removing call to av_frame_make_writable would be enough, I could send updated patch then. P.S. Also it just strange why it worked well with h264_cuvid. h264_cuvid copies frames back to normal VRAM, and does not pass around mapped nvdec surfaces, like nvdec does. Writing around in these is documented as disallowed. You can call av_frame_is_writable() on the frame. If it returns true, it's safe to write into it. If it returns false, you have to allocate a new output frame. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 19.03.20 21:40, Timo Rothenpieler wrote: For what I'm aware, make_writable does not work on hardware frames. And the nvdec hwaccel returns frames that are mapped device memory, and thus hard read-only. You will need to manually allocate output frames from the hw_frames_ctx. Yes I see. So it seems we can safely remove this call. Also, I was thinking that output frame allocation is not needed because we can safely operate on input frame in-place saving extra memory allocation and copy. It seems works well. Is it ok, or should we always allocate output frame? If removing call to av_frame_make_writable would be enough, I could send updated patch then. P.S. Also it just strange why it worked well with h264_cuvid. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
On 19.03.2020 14:35, Yaroslav Pogrebnyak wrote: Oh, I didn't noticed that h264_cuvid is legacy. It seems the problem in this line: ret = av_frame_make_writable(input_main); If removed, it starts to work with -hwaccel cuda. I'll take a closed look why and what happens but any advice would be helpful. Thanks! For what I'm aware, make_writable does not work on hardware frames. And the nvdec hwaccel returns frames that are mapped device memory, and thus hard read-only. You will need to manually allocate output frames from the hw_frames_ctx. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
Oh, I didn't noticed that h264_cuvid is legacy. It seems the problem in this line: ret = av_frame_make_writable(input_main); If removed, it starts to work with -hwaccel cuda. I'll take a closed look why and what happens but any advice would be helpful. Thanks! On 19.03.20 21:15, Timo Rothenpieler wrote: I'm currently trying to get this to work with nvdec, but seemingly can't: ./ffmpeg_g.exe -v verbose -hwaccel_output_format cuda -hwaccel cuda -i test_h264.mp4 -hwaccel_output_format cuda -hwaccel cuda -i test2_h264.mp4 -filter_complex "[0:v]scale_cuda=640:-2[p],[1:v][p]overlay_cuda=x=100:y=100:shortest=true" -an -c:v h264_nvenc -y out.mp4 It works with legacy h264_cuvid, but definitely also needs to work with the proper nvdec hwaccel. I'm currently investigating as to why, but the error it produces is very hard to track down: Error while filtering 2: Invalid argument Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #1:0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
I'm currently trying to get this to work with nvdec, but seemingly can't: ./ffmpeg_g.exe -v verbose -hwaccel_output_format cuda -hwaccel cuda -i test_h264.mp4 -hwaccel_output_format cuda -hwaccel cuda -i test2_h264.mp4 -filter_complex "[0:v]scale_cuda=640:-2[p],[1:v][p]overlay_cuda=x=100:y=100:shortest=true" -an -c:v h264_nvenc -y out.mp4 It works with legacy h264_cuvid, but definitely also needs to work with the proper nvdec hwaccel. I'm currently investigating as to why, but the error it produces is very hard to track down: Error while filtering 2: Invalid argument Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #1:0 ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
[FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda
Signed-off-by: Yaroslav Pogrebnyak --- Changes in v2: - Fixed switch() indentation style configure | 2 + libavfilter/Makefile | 1 + libavfilter/allfilters.c | 1 + libavfilter/vf_overlay_cuda.c | 446 + libavfilter/vf_overlay_cuda.cu | 54 5 files changed, 504 insertions(+) create mode 100644 libavfilter/vf_overlay_cuda.c create mode 100644 libavfilter/vf_overlay_cuda.cu diff --git a/configure b/configure index 18f2841765..b08dc7bd62 100755 --- a/configure +++ b/configure @@ -3026,6 +3026,8 @@ scale_cuda_filter_deps_any="cuda_nvcc cuda_llvm" thumbnail_cuda_filter_deps="ffnvcodec" thumbnail_cuda_filter_deps_any="cuda_nvcc cuda_llvm" transpose_npp_filter_deps="ffnvcodec libnpp" +overlay_cuda_filter_deps="ffnvcodec" +overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm" amf_deps_any="libdl LoadLibrary" nvenc_deps="ffnvcodec" diff --git a/libavfilter/Makefile b/libavfilter/Makefile index 750412da6b..1ecaeae372 100644 --- a/libavfilter/Makefile +++ b/libavfilter/Makefile @@ -328,6 +328,7 @@ OBJS-$(CONFIG_OVERLAY_OPENCL_FILTER) += vf_overlay_opencl.o opencl.o \ opencl/overlay.o framesync.o OBJS-$(CONFIG_OVERLAY_QSV_FILTER)+= vf_overlay_qsv.o framesync.o OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER) += vf_overlay_vulkan.o vulkan.o +OBJS-$(CONFIG_OVERLAY_CUDA_FILTER) += vf_overlay_cuda.o framesync.o vf_overlay_cuda.ptx.o OBJS-$(CONFIG_OWDENOISE_FILTER) += vf_owdenoise.o OBJS-$(CONFIG_PAD_FILTER)+= vf_pad.o OBJS-$(CONFIG_PAD_OPENCL_FILTER) += vf_pad_opencl.o opencl.o opencl/pad.o diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c index 501e5d041b..fb32bef788 100644 --- a/libavfilter/allfilters.c +++ b/libavfilter/allfilters.c @@ -312,6 +312,7 @@ extern AVFilter ff_vf_overlay; extern AVFilter ff_vf_overlay_opencl; extern AVFilter ff_vf_overlay_qsv; extern AVFilter ff_vf_overlay_vulkan; +extern AVFilter ff_vf_overlay_cuda; extern AVFilter ff_vf_owdenoise; extern AVFilter ff_vf_pad; extern AVFilter ff_vf_pad_opencl; diff --git a/libavfilter/vf_overlay_cuda.c b/libavfilter/vf_overlay_cuda.c new file mode 100644 index 00..63cb425b2d --- /dev/null +++ b/libavfilter/vf_overlay_cuda.c @@ -0,0 +1,446 @@ +/* + * Copyright (c) 2020 Yaroslav Pogrebnyak + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +/** + * @file + * Overlay one video on top of another using cuda hardware acceleration + */ + +#include "libavutil/log.h" +#include "libavutil/mem.h" +#include "libavutil/opt.h" +#include "libavutil/pixdesc.h" +#include "libavutil/hwcontext.h" +#include "libavutil/hwcontext_cuda_internal.h" +#include "libavutil/cuda_check.h" + +#include "avfilter.h" +#include "framesync.h" +#include "internal.h" + +#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, ctx->hwctx->internal->cuda_dl, x) +#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) ) + +#define BLOCK_X 32 +#define BLOCK_Y 16 + +static const enum AVPixelFormat supported_main_formats[] = { +AV_PIX_FMT_NV12, +AV_PIX_FMT_YUV420P, +AV_PIX_FMT_NONE, +}; + +static const enum AVPixelFormat supported_overlay_formats[] = { +AV_PIX_FMT_NV12, +AV_PIX_FMT_YUV420P, +AV_PIX_FMT_YUVA420P, +AV_PIX_FMT_NONE, +}; + +/** + * OverlayCUDAContext + */ +typedef struct OverlayCUDAContext { +const AVClass *class; + +enum AVPixelFormat in_format_overlay; +enum AVPixelFormat in_format_main; + +AVBufferRef *device_ref; +AVCUDADeviceContext *hwctx; + +CUcontext cu_ctx; +CUmodule cu_module; +CUfunction cu_func; +CUstream cu_stream; + +FFFrameSync fs; + +int x_position; +int y_position; + +} OverlayCUDAContext; + +/** + * Helper to find out if provided format is supported by filter + */ +static int format_is_supported(const enum AVPixelFormat formats[], enum AVPixelFormat fmt) +{ +for (int i = 0; formats[i] != AV_PIX_FMT_NONE; i++) +if (formats[i] == fmt) +return 1; +return 0; +} + +/** + * Helper checks if we can process main and overlay pixel formats + */ +static int formats_match(const enum AVPixelFormat format_main, const enum AVPixelFormat format_o