Re: [FFmpeg-user] audio artefacts after segment and transcode

Philipp Hasenfratz Tue, 07 May 2019 21:49:14 -0700

Hi Ted

>> I am transcoding larger videos on a set of computers in parallel. I do this 
>> by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then 
>> transcode parts using GNU parallel, then recombine parts into one output 
>> file using ffmpeg -f concat -i ...). This works well, but I had issues with 
>> audio being not in sync with videos or having audio "artefacts". I solved 
>> that by transcoding audio separately, but I would prefer the more direct 
>> solution to transcode both audio and video in one step.
>
> Probably transcoding video and audio (that’s been segmented while stream 
> copying) in one step is more or less causing this…


After some tests yesterday when I applied your suggestions, and after some 
tests of my own that I have conducted myself prior to my initial post. I can 
tell my thoughts on this:

I think that the segmentation (ffmpeg -i ... -f segment ...) itself does not 
change anything. The segmentation is just splitting up the input file (keeping 
timestamps, copying data). The problem arises afterwards when processing the 
parts/segments. Somehow the timestamps are getting out of sync and I have a 
feeling that this is because of the segmentation (to be precise: muxer and 
encoder do not have the whole input file, but only a part of it).

Each part video is getting demuxed, decoded, encoded and muxed again. And 
somewhere in this process timestamps are getting modified (either (de-)muxer or 
by the de- or encoder). This might be because the container or stream codec 
needs to have another tbn (codec timebase) or changes from a constant frame 
rate to a variable frame rate or because the container has another requirements 
on the timebase than the input container has specified.

If we were using the whole file as input, like ffmpeg -i input.avi -c:v libx265 
-c:a aac output.mov. ffmpeg/libav does take good care of this and the 
muxing/encoding works like a charm. But when first segmenting the input into 
parts and encode/mux them separately, ffmpeg/libav does not have the full 
picture and tries to fill in gaps. One sign that this is happening are warnings 
like "[mov @ 0x561bc46a3d80] Non-monotonous DTS in output stream 0:1; previous: 
121611520, current: 121611024; changing to 121611521. This may result in 
incorrect timestamps in the output file.". Stream 0:1 is audio and looking at 
the timestamps, audio stream seems to be "behind". ffmpeg is then doing the 
only thing it can (lacking the whole picture because of the segmentation), it 
corrects the timestamp to the best known value. Unfortunately this must result 
in an audio gap, creating the audio artefacts.

The question is, why is this happening? If libx265 has a constant frame rate of 
25. And the origin video has a constant frame rate of 25. Why can audio lack 
behind (why we don't have enough audio samples)? I currently can only explain 
this by libx265 encoder, or maybe the mov muxer somehow changing the framerate 
to 24.542 (as mediainfo/ffprobe tell me).

Or another question to potentially solve the issue: how could I tell 
ffmpeg/libav to keep the timestamps as long as possible ("timestamp 
passthrough") so that the ending ffmpeg -f concat -i XYZ call still has the 
original timestamps and might see the whole picture of the original video again?

> If you can live with just encoding in one step you might get better results?
> Of course then you’ll need to decode the whole file from start to finish, but 
> that’s not as cpu intensive, and not reliable, as you’ve seen.

Thank you for suggesting! Yes that would actually make sense that a 
"pre-encoding" (into yuv4, rawvideo or so) in the segmentation phase might 
improve the situation and I ran through your suggestion (using yuv4 and pcm in 
pre-segmentation). The result is better, but I can still here some artefacts 
(less pronounced, but still there). The reason why I would prefer to avoid this 
pre-segmentation into a "raw-format" is IO boundness: a lot of videos such as 
timelapses or raw video captures from a camera are in 2k+. Thus pre-transcoding 
them into yuv4 or rawvideo will produce enormous amounts of data. Thus easily 
get IO bound which would annihilate the performance uplift of a Multi-Computer 
solution to fastly transcode a video. Still I do agree with you that it's only 
a matter of decoding (and convert to a raw stream) which is far less CPU 
intensive than the encoding. Therefore this would be a possible scenario for 
"small resolution" videos.

>> # step 2: create segments
>>
>> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 
>> -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type 
>> ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v? 
>> -map a? -map s? /tmp/input_part_%06d.mp4
>
> try changing it to
>
> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5 
> -segment_list /tmp/input_part.list -segment_list_type ffconcat -map 0? -c 
> copy -c:v yuv4 -c:a pcm_f32le /tmp/input_part_%06d.mov
>
> Segment sizes should be longer though, at 0.5 seconds the overhead would not 
> be insignificant. I’m guessing it was just for the demo?

Segment sizes: Yes, exactly, I was just using a short segment_time of 0.5 for 
the demo. So that the audio artefacts are getting more pronounced. A common 
value that I normally choose is something between 10 and 30 seconds (depending 
on GOP / key-frame-interval).

you used -c:a pcm_f32le. In my example I forgot to add an audio codec in the 
test-setup I was presenting, sorry for that. I normally have it in as well.

>> # step 4: create a ffconcat file for the output file
>>
>> for f in /tmp/output_part_*.mp4; do echo "file '$f'" 
>> >>/tmp/output_part.list; done
>
> The first line in the ffconcat being ffconcat version 1.0 seems to help, you 
> should probably just use the generated ffconcat segment list as the template,
>
> sed 's/input/output/g' /tmp/input_part.list > /tmp/output_part.list

Right, that's the better solution.

>> Do you have an explanation or do you know how this audio artefacts can be 
>> solved? Can it be that it's just an issue with codec timebases or because 
>> libx265 is using a variable frame rate (ffprobe of output.mov has an 
>> effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)? 
>> I would very much appreciate some help.
>
> The timebase thing could bake sense, something something rounding issues when 
> segmenting, timestamps being unaligned, type of thing? But I don’t think x265 
> does variable frame rates (not sure), regardless in an mp4 it’s most 
> definitely constant. Set the framerate during the encoding step if that’s 
> important, the “normal” ones you can use abbreviations for (ntsc, pal, film, 
> ntsc-film, etc) to pass the right rate instead of rounding the decimals.

Right. The -r must be in the encoding. Kind of doesn't make sense in 
combination with -c:v copy of course...

regarding variable frame rate for x265:

mediainfo ./output.mov # and ./output.mp4
...
Frame rate mode                          : Variable
Frame rate                               : 24.542 FPS
Minimum frame rate                       : 8.333 FPS
Maximum frame rate                       : 25.000 FPS
Original frame rate                      : 25.000 FPS
...

both .mp4 and .mov show a frame rate of 24.542 (and a min/max that is not the 
same), that's why I was referring to variable frame rate.

I appreciated your reply

Philipp
_______________________________________________
ffmpeg-user mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-user] audio artefacts after segment and transcode

Reply via email to