Hi,

Not strictly speaking a ffmpeg (recurring) question, but ffmpeg is often used for that...

Since I a have only a stereo setup (albeit a decent one) attached to my TV, I started a while ago to generate downmixed 2.0 tracks with ffmpeg on my video files with 5.1 (or 7.1) tracks.

My original motivation was a too low perceived loudness of the dialogs compared to the music/ambiant sound in *some* movies (not all of them!). My hypothesis at that time was that the built-in downmixing of my equiment was overweighting the left and right channels (both front and side) compared to the central channel where most dialogs are supposed to be placed.

So I started with the "-ac 2" option in ffmpeg... Which basically changed nothing (as far as I could say, at least). Investigating more I then found the -af "pan=stereo| FL< ... | FR< ..." syntax to chose the weighting coefficient of each 5.1 channel to buiild the stereo channels.

There were recommended coefficients:
FL < 1.0*FL + 0.707*FC + 0.707*SL (and similarly from FR)
These ones were ginving the same result than -ac 2 to my ears.

There were also tons of alternate formula described on various web sites... I ended up with
FL < 0.707*FL + 1.0*FC + 0.707*SL
It was doing what it was supposed to do: louder dialogs compared to music and ambient sounds.

However I finally observed that it was also narrowing the stereo image. Indeed, FC does not contain only voices but also a large part of the music and ambient sounds. Overweighting FC would not narrow the stereo image it was containing only the voices, but this is not the case.

I kept wondering why the dialog loudness is sometimes perceived too low after downmixing, and I have a possible explanation: the brain is very good at isolating a voice buried in the ambient noise because it can located where it comes from. That's why people with hearing aids still have difficulties to follow a conversation when multiple people speak at the same time: the earings aids can restore the volume, but the directivity is (mostly) lost... So, with a real 5.1 or 7.1 setup the brain is not bothered by the side/rear channels when it comes to focus on the central dialogs, because they come from fully different directions. But after downmix, what was coming from the side/rear channels is now coming from the front channels, making the separation task more difficult for the brain. The solution is hence to downweight the side/rear channels... Therefore I am now using:

FL < 1.0*FL + 0.707*FC + 0.4*SL

And it seems better to me: the dialogs are clearer, without narrowing the stereo image. But maybe this is just what I desperately want to hear...

Any thought on all of this ?

_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to