Hi Ted
>> I am transcoding larger videos on a set of computers in parallel. I do this
>> by segmenting an input file at key-frames (ffmpeg -i ... -f segment), then
>> transcode parts using GNU parallel, then recombine parts into one output
>> file using ffmpeg -f concat -i ...). This works well, but I had issues with
>> audio being not in sync with videos or having audio "artefacts". I solved
>> that by transcoding audio separately, but I would prefer the more direct
>> solution to transcode both audio and video in one step.
>
> Probably transcoding video and audio (that’s been segmented while stream
> copying) in one step is more or less causing this…
After some tests yesterday when I applied your suggestions, and after some
tests of my own that I have conducted myself prior to my initial post. I can
tell my thoughts on this:
I think that the segmentation (ffmpeg -i ... -f segment ...) itself does not
change anything. The segmentation is just splitting up the input file (keeping
timestamps, copying data). The problem arises afterwards when processing the
parts/segments. Somehow the timestamps are getting out of sync and I have a
feeling that this is because of the segmentation (to be precise: muxer and
encoder do not have the whole input file, but only a part of it).
Each part video is getting demuxed, decoded, encoded and muxed again. And
somewhere in this process timestamps are getting modified (either (de-)muxer or
by the de- or encoder). This might be because the container or stream codec
needs to have another tbn (codec timebase) or changes from a constant frame
rate to a variable frame rate or because the container has another requirements
on the timebase than the input container has specified.
If we were using the whole file as input, like ffmpeg -i input.avi -c:v libx265
-c:a aac output.mov. ffmpeg/libav does take good care of this and the
muxing/encoding works like a charm. But when first segmenting the input into
parts and encode/mux them separately, ffmpeg/libav does not have the full
picture and tries to fill in gaps. One sign that this is happening are warnings
like "[mov @ 0x561bc46a3d80] Non-monotonous DTS in output stream 0:1; previous:
121611520, current: 121611024; changing to 121611521. This may result in
incorrect timestamps in the output file.". Stream 0:1 is audio and looking at
the timestamps, audio stream seems to be "behind". ffmpeg is then doing the
only thing it can (lacking the whole picture because of the segmentation), it
corrects the timestamp to the best known value. Unfortunately this must result
in an audio gap, creating the audio artefacts.
The question is, why is this happening? If libx265 has a constant frame rate of
25. And the origin video has a constant frame rate of 25. Why can audio lack
behind (why we don't have enough audio samples)? I currently can only explain
this by libx265 encoder, or maybe the mov muxer somehow changing the framerate
to 24.542 (as mediainfo/ffprobe tell me).
Or another question to potentially solve the issue: how could I tell
ffmpeg/libav to keep the timestamps as long as possible ("timestamp
passthrough") so that the ending ffmpeg -f concat -i XYZ call still has the
original timestamps and might see the whole picture of the original video again?
> If you can live with just encoding in one step you might get better results?
> Of course then you’ll need to decode the whole file from start to finish, but
> that’s not as cpu intensive, and not reliable, as you’ve seen.
Thank you for suggesting! Yes that would actually make sense that a
"pre-encoding" (into yuv4, rawvideo or so) in the segmentation phase might
improve the situation and I ran through your suggestion (using yuv4 and pcm in
pre-segmentation). The result is better, but I can still here some artefacts
(less pronounced, but still there). The reason why I would prefer to avoid this
pre-segmentation into a "raw-format" is IO boundness: a lot of videos such as
timelapses or raw video captures from a camera are in 2k+. Thus pre-transcoding
them into yuv4 or rawvideo will produce enormous amounts of data. Thus easily
get IO bound which would annihilate the performance uplift of a Multi-Computer
solution to fastly transcode a video. Still I do agree with you that it's only
a matter of decoding (and convert to a raw stream) which is far less CPU
intensive than the encoding. Therefore this would be a possible scenario for
"small resolution" videos.
>> # step 2: create segments
>>
>> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5
>> -reset_timestamps 1 -segment_list /tmp/input_part.list -segment_list_type
>> ffconcat -r 25 -c:v copy -c:a copy -strict experimental -c:s copy -map v?
>> -map a? -map s? /tmp/input_part_%06d.mp4
>
> try changing it to
>
> ffmpeg -y -hide_banner -i /tmp/input.avi -f segment -segment_time 0.5
> -segment_list /tmp/input_part.list -segment_list_type ffconcat -map 0? -c
> copy -c:v yuv4 -c:a pcm_f32le /tmp/input_part_%06d.mov
>
> Segment sizes should be longer though, at 0.5 seconds the overhead would not
> be insignificant. I’m guessing it was just for the demo?
Segment sizes: Yes, exactly, I was just using a short segment_time of 0.5 for
the demo. So that the audio artefacts are getting more pronounced. A common
value that I normally choose is something between 10 and 30 seconds (depending
on GOP / key-frame-interval).
you used -c:a pcm_f32le. In my example I forgot to add an audio codec in the
test-setup I was presenting, sorry for that. I normally have it in as well.
>> # step 4: create a ffconcat file for the output file
>>
>> for f in /tmp/output_part_*.mp4; do echo "file '$f'"
>> >>/tmp/output_part.list; done
>
> The first line in the ffconcat being ffconcat version 1.0 seems to help, you
> should probably just use the generated ffconcat segment list as the template,
>
> sed 's/input/output/g' /tmp/input_part.list > /tmp/output_part.list
Right, that's the better solution.
>> Do you have an explanation or do you know how this audio artefacts can be
>> solved? Can it be that it's just an issue with codec timebases or because
>> libx265 is using a variable frame rate (ffprobe of output.mov has an
>> effective fps of 23.94 while input.avi has a constant frame rate of 25 fps)?
>> I would very much appreciate some help.
>
> The timebase thing could bake sense, something something rounding issues when
> segmenting, timestamps being unaligned, type of thing? But I don’t think x265
> does variable frame rates (not sure), regardless in an mp4 it’s most
> definitely constant. Set the framerate during the encoding step if that’s
> important, the “normal” ones you can use abbreviations for (ntsc, pal, film,
> ntsc-film, etc) to pass the right rate instead of rounding the decimals.
Right. The -r must be in the encoding. Kind of doesn't make sense in
combination with -c:v copy of course...
regarding variable frame rate for x265:
mediainfo ./output.mov # and ./output.mp4
...
Frame rate mode : Variable
Frame rate : 24.542 FPS
Minimum frame rate : 8.333 FPS
Maximum frame rate : 25.000 FPS
Original frame rate : 25.000 FPS
...
both .mp4 and .mov show a frame rate of 24.542 (and a min/max that is not the
same), that's why I was referring to variable frame rate.
I appreciated your reply
Philipp
_______________________________________________
ffmpeg-user mailing list
[email protected]
https://ffmpeg.org/mailman/listinfo/ffmpeg-user
To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".