Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-28 Thread Timo Rothenpieler

applied
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-20 Thread Timo Rothenpieler

On 20.03.2020 12:30, Yaroslav Pogrebnyak wrote:

On 20.03.20 00:47, Timo Rothenpieler wrote:

I'm looking into adding hardware-frame support to make_writable, so 
modifications might not be needed.


Yep it seems to be more consistent if av_frame_make_writable could 
support hardware frames.


Please let me know if you are going to do it, or if I need to send 
modified patch. Thanks!




It is not as simple as I anticipated to do it in a generic way.
The main issue at hand is how nvdec returns hardware frames, which I 
need to fix first to get rid of a lot of hackery that stems from it.


But I do intend to go along with it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-20 Thread Yaroslav Pogrebnyak

On 20.03.20 00:47, Timo Rothenpieler wrote:

I'm looking into adding hardware-frame support to make_writable, so 
modifications might not be needed.


Yep it seems to be more consistent if av_frame_make_writable could 
support hardware frames.


Please let me know if you are going to do it, or if I need to send 
modified patch. Thanks!



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Timo Rothenpieler

On 19.03.2020 15:59, Yaroslav Pogrebnyak wrote:


Got it, thanks! I'll re-do it and submit updated patch soon.



I'm looking into adding hardware-frame support to make_writable, so 
modifications might not be needed.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Yaroslav Pogrebnyak

On 19.03.20 22:41, Timo Rothenpieler wrote:

h264_cuvid copies frames back to normal VRAM, and does not pass around 
mapped nvdec surfaces, like nvdec does.

Writing around in these is documented as disallowed.

You can call av_frame_is_writable() on the frame. If it returns true, 
it's safe to write into it. If it returns false, you have to allocate 
a new output frame.


Got it, thanks! I'll re-do it and submit updated patch soon.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Timo Rothenpieler

On 19.03.2020 15:11, Yaroslav Pogrebnyak wrote:

On 19.03.20 21:40, Timo Rothenpieler wrote:


For what I'm aware, make_writable does not work on hardware frames.
And the nvdec hwaccel returns frames that are mapped device memory, 
and thus hard read-only.


You will need to manually allocate output frames from the hw_frames_ctx.


Yes I see. So it seems we can safely remove this call.

Also, I was thinking that output frame allocation is not needed because 
we can safely operate on input frame in-place saving extra memory 
allocation and copy. It seems works well. Is it ok, or should we always 
allocate output frame?


If removing call to av_frame_make_writable would be enough, I could send 
updated patch then.


P.S. Also it just strange why it worked well with h264_cuvid.


h264_cuvid copies frames back to normal VRAM, and does not pass around 
mapped nvdec surfaces, like nvdec does.

Writing around in these is documented as disallowed.

You can call av_frame_is_writable() on the frame. If it returns true, 
it's safe to write into it. If it returns false, you have to allocate a 
new output frame.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Yaroslav Pogrebnyak

On 19.03.20 21:40, Timo Rothenpieler wrote:


For what I'm aware, make_writable does not work on hardware frames.
And the nvdec hwaccel returns frames that are mapped device memory, 
and thus hard read-only.


You will need to manually allocate output frames from the hw_frames_ctx.


Yes I see. So it seems we can safely remove this call.

Also, I was thinking that output frame allocation is not needed because 
we can safely operate on input frame in-place saving extra memory 
allocation and copy. It seems works well. Is it ok, or should we always 
allocate output frame?


If removing call to av_frame_make_writable would be enough, I could send 
updated patch then.


P.S. Also it just strange why it worked well with h264_cuvid.


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Timo Rothenpieler

On 19.03.2020 14:35, Yaroslav Pogrebnyak wrote:

Oh, I didn't noticed that h264_cuvid is legacy.

It seems the problem in this line:

ret = av_frame_make_writable(input_main);

If removed, it starts to work with -hwaccel cuda.

I'll take a closed look why and what happens but any advice would be 
helpful. Thanks!


For what I'm aware, make_writable does not work on hardware frames.
And the nvdec hwaccel returns frames that are mapped device memory, and 
thus hard read-only.


You will need to manually allocate output frames from the hw_frames_ctx.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Yaroslav Pogrebnyak

Oh, I didn't noticed that h264_cuvid is legacy.

It seems the problem in this line:

ret = av_frame_make_writable(input_main);

If removed, it starts to work with -hwaccel cuda.

I'll take a closed look why and what happens but any advice would be 
helpful. Thanks!



On 19.03.20 21:15, Timo Rothenpieler wrote:

I'm currently trying to get this to work with nvdec, but seemingly can't:

./ffmpeg_g.exe -v verbose
-hwaccel_output_format cuda -hwaccel cuda -i test_h264.mp4 
-hwaccel_output_format cuda -hwaccel cuda -i test2_h264.mp4

-filter_complex
"[0:v]scale_cuda=640:-2[p],[1:v][p]overlay_cuda=x=100:y=100:shortest=true" 


-an -c:v h264_nvenc -y out.mp4

It works with legacy h264_cuvid, but definitely also needs to work 
with the proper nvdec hwaccel.


I'm currently investigating as to why, but the error it produces is 
very hard to track down:


Error while filtering 2: Invalid argument
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #1:0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-19 Thread Timo Rothenpieler

I'm currently trying to get this to work with nvdec, but seemingly can't:

./ffmpeg_g.exe -v verbose
-hwaccel_output_format cuda -hwaccel cuda -i test_h264.mp4 
-hwaccel_output_format cuda -hwaccel cuda -i test2_h264.mp4

-filter_complex
"[0:v]scale_cuda=640:-2[p],[1:v][p]overlay_cuda=x=100:y=100:shortest=true"
-an -c:v h264_nvenc -y out.mp4

It works with legacy h264_cuvid, but definitely also needs to work with 
the proper nvdec hwaccel.


I'm currently investigating as to why, but the error it produces is very 
hard to track down:


Error while filtering 2: Invalid argument
Failed to inject frame into filter network: Invalid argument
Error while processing the decoded data for stream #1:0
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

[FFmpeg-devel] [PATCH v2 2/2] avfilter: add vf_overlay_cuda

2020-03-18 Thread Yaroslav Pogrebnyak
Signed-off-by: Yaroslav Pogrebnyak 
---
Changes in v2:
- Fixed switch() indentation style

 configure  |   2 +
 libavfilter/Makefile   |   1 +
 libavfilter/allfilters.c   |   1 +
 libavfilter/vf_overlay_cuda.c  | 446 +
 libavfilter/vf_overlay_cuda.cu |  54 
 5 files changed, 504 insertions(+)
 create mode 100644 libavfilter/vf_overlay_cuda.c
 create mode 100644 libavfilter/vf_overlay_cuda.cu

diff --git a/configure b/configure
index 18f2841765..b08dc7bd62 100755
--- a/configure
+++ b/configure
@@ -3026,6 +3026,8 @@ scale_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 thumbnail_cuda_filter_deps="ffnvcodec"
 thumbnail_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 transpose_npp_filter_deps="ffnvcodec libnpp"
+overlay_cuda_filter_deps="ffnvcodec"
+overlay_cuda_filter_deps_any="cuda_nvcc cuda_llvm"
 
 amf_deps_any="libdl LoadLibrary"
 nvenc_deps="ffnvcodec"
diff --git a/libavfilter/Makefile b/libavfilter/Makefile
index 750412da6b..1ecaeae372 100644
--- a/libavfilter/Makefile
+++ b/libavfilter/Makefile
@@ -328,6 +328,7 @@ OBJS-$(CONFIG_OVERLAY_OPENCL_FILTER) += vf_overlay_opencl.o opencl.o \
 opencl/overlay.o framesync.o
 OBJS-$(CONFIG_OVERLAY_QSV_FILTER)+= vf_overlay_qsv.o framesync.o
 OBJS-$(CONFIG_OVERLAY_VULKAN_FILTER) += vf_overlay_vulkan.o vulkan.o
+OBJS-$(CONFIG_OVERLAY_CUDA_FILTER)   += vf_overlay_cuda.o framesync.o vf_overlay_cuda.ptx.o
 OBJS-$(CONFIG_OWDENOISE_FILTER)  += vf_owdenoise.o
 OBJS-$(CONFIG_PAD_FILTER)+= vf_pad.o
 OBJS-$(CONFIG_PAD_OPENCL_FILTER) += vf_pad_opencl.o opencl.o opencl/pad.o
diff --git a/libavfilter/allfilters.c b/libavfilter/allfilters.c
index 501e5d041b..fb32bef788 100644
--- a/libavfilter/allfilters.c
+++ b/libavfilter/allfilters.c
@@ -312,6 +312,7 @@ extern AVFilter ff_vf_overlay;
 extern AVFilter ff_vf_overlay_opencl;
 extern AVFilter ff_vf_overlay_qsv;
 extern AVFilter ff_vf_overlay_vulkan;
+extern AVFilter ff_vf_overlay_cuda;
 extern AVFilter ff_vf_owdenoise;
 extern AVFilter ff_vf_pad;
 extern AVFilter ff_vf_pad_opencl;
diff --git a/libavfilter/vf_overlay_cuda.c b/libavfilter/vf_overlay_cuda.c
new file mode 100644
index 00..63cb425b2d
--- /dev/null
+++ b/libavfilter/vf_overlay_cuda.c
@@ -0,0 +1,446 @@
+/*
+ * Copyright (c) 2020 Yaroslav Pogrebnyak 
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+/**
+ * @file
+ * Overlay one video on top of another using cuda hardware acceleration
+ */
+
+#include "libavutil/log.h"
+#include "libavutil/mem.h"
+#include "libavutil/opt.h"
+#include "libavutil/pixdesc.h"
+#include "libavutil/hwcontext.h"
+#include "libavutil/hwcontext_cuda_internal.h"
+#include "libavutil/cuda_check.h"
+
+#include "avfilter.h"
+#include "framesync.h"
+#include "internal.h"
+
+#define CHECK_CU(x) FF_CUDA_CHECK_DL(ctx, ctx->hwctx->internal->cuda_dl, x)
+#define DIV_UP(a, b) ( ((a) + (b) - 1) / (b) )
+
+#define BLOCK_X 32
+#define BLOCK_Y 16
+
+static const enum AVPixelFormat supported_main_formats[] = {
+AV_PIX_FMT_NV12,
+AV_PIX_FMT_YUV420P,
+AV_PIX_FMT_NONE,
+};
+
+static const enum AVPixelFormat supported_overlay_formats[] = {
+AV_PIX_FMT_NV12,
+AV_PIX_FMT_YUV420P,
+AV_PIX_FMT_YUVA420P,
+AV_PIX_FMT_NONE,
+};
+
+/**
+ * OverlayCUDAContext
+ */
+typedef struct OverlayCUDAContext {
+const AVClass  *class;
+
+enum AVPixelFormat in_format_overlay;
+enum AVPixelFormat in_format_main;
+
+AVBufferRef *device_ref;
+AVCUDADeviceContext *hwctx;
+
+CUcontext cu_ctx;
+CUmodule cu_module;
+CUfunction cu_func;
+CUstream cu_stream;
+
+FFFrameSync fs;
+
+int x_position;
+int y_position;
+
+} OverlayCUDAContext;
+
+/**
+ * Helper to find out if provided format is supported by filter
+ */
+static int format_is_supported(const enum AVPixelFormat formats[], enum AVPixelFormat fmt)
+{
+for (int i = 0; formats[i] != AV_PIX_FMT_NONE; i++)
+if (formats[i] == fmt)
+return 1;
+return 0;
+}
+
+/**
+ * Helper checks if we can process main and overlay pixel formats
+ */
+static int formats_match(const enum AVPixelFormat format_main, const enum AVPixelFormat format_o