Hey folks, I'm trying to transcode an HEVC (yuv420p10le) encoded file to H264 using a GTX 1650 nvenc and having issues with what I assume are the pixel formats conversions on hardware. My encode speed (in fps) is pretty low (see below), far lower than I get when transcoding HEVC -> HEVC. ffmpeg version is N-94578-gd6bd902599-gcff309097a+3 (on a Windows 10 OS, though I don't think this is relevant). For the purposes of this experiment, let's say I'm not concerned with lossiness with format conversions.
I'd like to know what I'm doing wrong and what commands I can issue for the following: decode on GPU -> format conversion (if necessary) on GPU -> encode on GPU. I might not be understanding a few concepts. The combination of options that I thought were available and I tried out are: - decoder (I mostly left this blank for auto) and encoder (always h264_nvenc) - hwaccel - hwaccel_output_format - filters (vf): - format - scale_npp (for format conversion on gpu) I have no idea what the options pix_fmt or other filters like colorspace do for hardware (how is pix_fmt different from hwaccel_output_format?). At this point I'm kind of stuck. Don't know how to convert formats on the GPU (I assume the format conversion is happening on the CPU). Input details: ffprobe input.mp4 Stream #0:0(eng): Video: hevc (Main 10) (hvc1 / 0x31637668), yuv420p10le(tv, bt2020nc/bt2020/smpte2084), 1920x1080, 24886 kb/s, SAR 1:1 DAR 16:9, 29.99 fps, ... Summary of various combinations (- indicates left blank): test | hwaccel | hwaccel_output_format | filter (vf) | encodefps | note 1 | cuda | - | - | X | Failed 2 | cuda | cuda | - | X | Failed 3 | cuda | yuv420p | - | 361 | Video messed up 4 | cuda | cuda | format=yuv420p | X | Failed 5 | cuvid | cuda | format=yuv420p | 91 | Not using GPU decode 6 | cuda | - | format=yuv420p | 161 | Not using GPU format conversion 7 | cuvid | - | format=yuv420p | 91 | Not using GPU decode 8 | cuda | - | scale_npp=format=yuv420p | X | Failed 9 | cuda | cuda | scale_npp=format=yuv420p | X | Failed I would expect a speed of around test 3 (without the screwed up video). Is there any way to convert the pixel formats on the hardware without screwing up the video? On a similar note, I'd love for someone to explain the failing encodes. Here are the details for corresponding encodes: 1. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -c:v h264_nvenc output.mp4 Fails with the following: [graph_1_in_0_1 @ 000001cc9670e4c0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 000001cc8740fc00] NVDEC capabilities: [hevc @ 000001cc8740fc00] format supported: yes, max_mb_count: 262144 [hevc @ 000001cc8740fc00] min_width: 144, max_width: 8192 [hevc @ 000001cc8740fc00] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 000001cc87420840] w:1920 h:1080 pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [h264_nvenc @ 000001cc8747fbc0] Loaded Nvenc version 9.0 [h264_nvenc @ 000001cc8747fbc0] Nvenc initialized successfully [h264_nvenc @ 000001cc8747fbc0] 1 CUDA capable devices found [h264_nvenc @ 000001cc8747fbc0] [ GPU #0 - < GeForce GTX 1650 > has Compute SM 7.5 ] [h264_nvenc @ 000001cc8747fbc0] 10 bit encode not supported [h264_nvenc @ 000001cc8747fbc0] No NVENC capable devices found [h264_nvenc @ 000001cc8747fbc0] Nvenc unloaded Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height 2. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -c:v h264_nvenc output.mp4 Fails with the following: [graph_1_in_0_1 @ 00000240b7932340] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 00000240b79e37c0] NVDEC capabilities: [hevc @ 00000240b79e37c0] format supported: yes, max_mb_count: 262144 [hevc @ 00000240b79e37c0] min_width: 144, max_width: 8192 [hevc @ 00000240b79e37c0] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 00000240b7937e00] w:1920 h:1080 pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [h264_nvenc @ 00000240b7483700] Loaded Nvenc version 9.0 [h264_nvenc @ 00000240b7483700] Nvenc initialized successfully [h264_nvenc @ 00000240b7483700] 10 bit encode not supported [h264_nvenc @ 00000240b7483700] Provided device doesn't support required NVENC features [h264_nvenc @ 00000240b7483700] Nvenc unloaded Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height Alright, so it seems that the hardware h264 encoder doesn't support 10 bit encodes (that's coming from the decoder). So lets try changing the format: 3. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format yuv420p -i input.mp4 -c:v h264_nvenc output.mp4 Pretty decent encode at ~ 360 fps. Alas, the video is screwed up. Colors are weird: [graph_1_in_0_1 @ 00000256c9ac7b40] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 00000256cbb737c0] NVDEC capabilities: [hevc @ 00000256cbb737c0] format supported: yes, max_mb_count: 262144 [hevc @ 00000256cbb737c0] min_width: 144, max_width: 8192 [hevc @ 00000256cbb737c0] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 00000256cbac7e00] w:1920 h:1080 pixfmt:yuv420p tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [h264_nvenc @ 00000256cb693700] Loaded Nvenc version 9.0 [h264_nvenc @ 00000256cb693700] Nvenc initialized successfully [h264_nvenc @ 00000256cb693700] 1 CUDA capable devices found [h264_nvenc @ 00000256cb693700] [ GPU #0 - < GeForce GTX 1650 > has Compute SM 7.5 ] [h264_nvenc @ 00000256cb693700] supports NVENC Let's use a format filter to change format: 4. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4 Fails with the following: [graph_1_in_0_1 @ 0000019390de5c80] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 00000193908675c0] NVDEC capabilities: [hevc @ 00000193908675c0] format supported: yes, max_mb_count: 262144 [hevc @ 00000193908675c0] min_width: 144, max_width: 8192 [hevc @ 00000193908675c0] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 00000193a031ee80] w:1920 h:1080 pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [auto_scaler_0 @ 00000193b7aee780] w:iw h:ih flags:'bicubic' interl:0 [Parsed_format_0 @ 00000193908eee80] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0' Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0' Error reinitializing filters! Failed to inject frame into filter network: Function not implemented Error while processing the decoded data for stream #0:0 5. ffmpeg -loglevel verbose -hwaccel cuvid -hwaccel_output_format cuda -i input.mp4 -vf format=yuv420p -c:v h264_nvenc output.mp4 Succeeds, but only encodes at around 91 fps, due to, I assume, not using GPU decoder. What is the difference between cuvid and cuda hwaccel (why did the previous fail and this succeed)? Here is the relevant output: [graph_1_in_0_1 @ 000002152cc3cc00] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 000002152ac33700] Initializing cuvid hwaccel [AVHWFramesContext @ 000002152cc3f0c0] Pixel format 'yuv420p10le' is not supported [hevc @ 000002152ac33700] Error initializing a CUDA frame pool cuvid hwaccel requested for input stream #0:0, but cannot be initialized. [hevc @ 000002152ac33700] Error parsing NAL unit #2. [hevc @ 000002152ac79180] Could not find ref with POC 0 Error while decoding stream #0:0: Operation not permitted [graph 0 input from stream 0:0 @ 000002152d638b80] w:1920 h:1080 pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [auto_scaler_0 @ 000002152ca176c0] w:iw h:ih flags:'bicubic' interl:0 [Parsed_format_0 @ 000002152d3fee40] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0' [auto_scaler_0 @ 000002152ca176c0] w:1920 h:1080 fmt:yuv420p10le sar:1/1 -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4 [h264_nvenc @ 000002152ac31800] Loaded Nvenc version 9.0 [h264_nvenc @ 000002152ac31800] Nvenc initialized successfully [h264_nvenc @ 000002152ac31800] 1 CUDA capable devices found [h264_nvenc @ 000002152ac31800] [ GPU #0 - < GeForce GTX 1650 > has Compute SM 7.5 ] [h264_nvenc @ 000002152ac31800] supports NVENC Take out hwaccel_output: 6. ffmpeg -loglevel verbose -hwaccel cuda -i in.mp4 -vf format=yuv420p -c:v h264_nvenc out.mp4 Succeeds, encodes at 161 fps (using both hardware GPU decoder and encoder, but I believe the changing of format is happening on the CPU between the two stages). [graph_1_in_0_1 @ 0000025491bf2b00] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 0000025491b84900] NVDEC capabilities: [hevc @ 0000025491b84900] format supported: yes, max_mb_count: 262144 [hevc @ 0000025491b84900] min_width: 144, max_width: 8192 [hevc @ 0000025491b84900] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 0000025491c0eec0] w:1920 h:1080 pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [auto_scaler_0 @ 00000254b747cfc0] w:iw h:ih flags:'bicubic' interl:0 [Parsed_format_0 @ 000002549203d840] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0' [auto_scaler_0 @ 00000254b747cfc0] w:1920 h:1080 fmt:p010le sar:1/1 -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4 [h264_nvenc @ 00000254920a0f40] Loaded Nvenc version 9.0 [h264_nvenc @ 00000254920a0f40] Nvenc initialized successfully [h264_nvenc @ 00000254920a0f40] 1 CUDA capable devices found [h264_nvenc @ 00000254920a0f40] [ GPU #0 - < GeForce GTX 1650 > has Compute SM 7.5 ] [h264_nvenc @ 00000254920a0f40] supports NVENC 7. ffmpeg -loglevel verbose -hwaccel cuvid -i in.mp4 -vf format=yuv420p -c:v h264_nvenc out.mp4 Only encoding on GPU, not decoding (91 fps). [graph_1_in_0_1 @ 000002163875b5c0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 00000216380c3c00] Initializing cuvid hwaccel [AVHWFramesContext @ 00000216387fc300] Pixel format 'yuv420p10le' is not supported [hevc @ 00000216380c3c00] Error initializing a CUDA frame pool cuvid hwaccel requested for input stream #0:0, but cannot be initialized. [hevc @ 00000216380c3c00] Error parsing NAL unit #2. [hevc @ 000002163813d300] Could not find ref with POC 0 Error while decoding stream #0:0: Operation not permitted [graph 0 input from stream 0:0 @ 00000216387594c0] w:1920 h:1080 pixfmt:yuv420p10le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [auto_scaler_0 @ 000002164f8a0c40] w:iw h:ih flags:'bicubic' interl:0 [Parsed_format_0 @ 00000216387593c0] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0' [auto_scaler_0 @ 000002164f8a0c40] w:1920 h:1080 fmt:yuv420p10le sar:1/1 -> w:1920 h:1080 fmt:yuv420p sar:1/1 flags:0x4 [h264_nvenc @ 0000021638590f40] Loaded Nvenc version 9.0 [h264_nvenc @ 0000021638590f40] Nvenc initialized successfully [h264_nvenc @ 0000021638590f40] 1 CUDA capable devices found [h264_nvenc @ 0000021638590f40] [ GPU #0 - < GeForce GTX 1650 > has Compute SM 7.5 ] [h264_nvenc @ 0000021638590f40] supports NVENC Lets see if I can do format conversion in the GPU (instead of GPU -> CPU -> GPU), by using the scale_npp filter. 8. ffmpeg -loglevel verbose -hwaccel cuda -i input.mp4 -vf scale_npp=format=yuv420p -c:v h264_nvenc output.mp4 Fails [graph_1_in_0_1 @ 0000022f3001e080] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 0000022f207d7f40] NVDEC capabilities: [hevc @ 0000022f207d7f40] format supported: yes, max_mb_count: 262144 [hevc @ 0000022f207d7f40] min_width: 144, max_width: 8192 [hevc @ 0000022f207d7f40] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 0000022f3034ee80] w:1920 h:1080 pixfmt:p010le tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [auto_scaler_0 @ 0000022f47b2d300] w:iw h:ih flags:'bicubic' interl:0 [Parsed_scale_npp_0 @ 0000022f20c49b40] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_scale_npp_0' Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0' Error reinitializing filters! Failed to inject frame into filter network: Function not implemented Error while processing the decoded data for stream #0:0 9. ffmpeg -loglevel verbose -hwaccel cuda -hwaccel_output_format cuda -i in.mp4 -vf scale_npp=format=yuv420p -c:v h264_nvenc out.mp4 Fails: [graph_1_in_0_1 @ 00000200040adac0] tb:1/48000 samplefmt:fltp samplerate:48000 chlayout:0x3 [hevc @ 00000200747b65c0] NVDEC capabilities: [hevc @ 00000200747b65c0] format supported: yes, max_mb_count: 262144 [hevc @ 00000200747b65c0] min_width: 144, max_width: 8192 [hevc @ 00000200747b65c0] min_height: 144, max_height: 8192 [graph 0 input from stream 0:0 @ 00000200040aa8c0] w:1920 h:1080 pixfmt:cuda tb:1/90000 fr:30/1 sar:1/1 sws_param:flags=2 [Parsed_scale_npp_0 @ 0000020074c75b80] Unsupported input format: p010le [Parsed_scale_npp_0 @ 0000020074c75b80] Failed to configure output pad on Parsed_scale_npp_0 Error reinitializing filters! Failed to inject frame into filter network: Function not implemented Error while processing the decoded data for stream #0:0 I'd appreciate any help or pointer in the right direction (even an alternate mailing list). _______________________________________________ ffmpeg-user mailing list ffmpeg-user@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-user To unsubscribe, visit link above, or email ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".