Re: [FFmpeg-trac] #11083(swresample:new): Converting multichannel audio in FLTP sample format to stereo in S16 attenuates volume unexpectedly

FFmpeg Thu, 04 Jul 2024 00:14:13 -0700

#11083: Converting multichannel audio in FLTP sample format to stereo in S16
attenuates volume unexpectedly
-------------------------------------+-------------------------------------
             Reporter:  Jiamin.X     |                    Owner:  (none)
                 Type:  defect       |                   Status:  new
             Priority:  important    |                Component:
                                     |  swresample
              Version:  unspecified  |               Resolution:
             Keywords:  resampling   |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Description changed by Jiamin.X:


Old description:

> When converting multichannel audio in FLTP sample format to stereo in S16
> sample format, volume is decreased unexpectedly.
>
> The original 6-channel audio input file in FLTP sample format:
> {{{
> % ffprobe multich-audio.mp4
> ffprobe version 6.0 Copyright (c) 2007-2023 the FFmpeg developers
>   built with Apple clang version 15.0.0 (clang-1500.0.40.1)
>   configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_2 --enable-shared
> --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-
> ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl
> --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d
> --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e
> --enable-librist --enable-librubberband --enable-libsnappy --enable-
> libsrt --enable-libsvtav1 --enable-libtesseract --enable-libtheora
> --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx
> --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2
> --enable-libxvid --enable-lzma --enable-libfontconfig --enable-
> libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb
> --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex
> --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack
> --disable-indev=jack --enable-videotoolbox --enable-audiotoolbox
>   libavutil      58.  2.100 / 58.  2.100
>   libavcodec     60.  3.100 / 60.  3.100
>   libavformat    60.  3.100 / 60.  3.100
>   libavdevice    60.  1.100 / 60.  1.100
>   libavfilter     9.  3.100 /  9.  3.100
>   libswscale      7.  1.100 /  7.  1.100
>   libswresample   4. 10.100 /  4. 10.100
>   libpostproc    57.  1.100 / 57.  1.100
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
>   Metadata:
>     major_brand     : isom
>     minor_version   : 512
>     compatible_brands: isomdby1iso2mp41
>     encoder         : www.aliyun.com - Media Transcoding
>   Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
>   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
> 5.1(side), fltp, 256 kb/s (default)
>     Metadata:
>       handler_name    : SoundHandler
>       vendor_id       : [0][0][0][0]
>     Side data:
>       audio service type: main
> }}}
>
> Converted from 6-channel in **fltp** to 2-chhanel in **flt**, output:
> **stereo-flt.mkv**
> {{{
> % ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_f32le stereo-flt.mkv
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
>   Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
>   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
> 5.1(side), fltp, 256 kb/s (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_f32le (native))
> Output #0, matroska, to 'stereo-flt.mkv':
>   Stream #0:0(und): Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz,
> stereo, flt, 3072 kb/s (default)
> }}}
>
> Converted from 6-channel in **fltp** to 2-chhanel in **s16**, output:
> **stereo-s16.mkv**
> {{{
> % ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_s16le stereo-s16.mkv
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
>   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
> 5.1(side), fltp, 256 kb/s (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
> Output #0, matroska, to 'stereo-s16.mkv':
>   Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz,
> stereo, s16, 1536 kb/s (default)
> }}}
>
> Converted from 6-channel in **fltp** to 2-chhanel in **s16**, with
> **-rematrix_maxval 1000**, output:
> **stereo-s16-rematrix_maxval-1000.mkv**
> {{{
> % ffmpeg -i multich-audio.mp4 -rematrix_maxval 1000 -ac 2 -c:a pcm_s16le
> stereo-s16-rematrix_maxval-1000.mkv
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
>   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
> 5.1(side), fltp, 256 kb/s (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
> Output #0, matroska, to 'stereo-s16-rematrix_maxval-1000.mkv':
>   Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz,
> stereo, s16, 1536 kb/s (default)
> }}}
>
> Using **volumedetect** to check the max and mean volumes of the original
> file and the 3 generated files above:
>
> 1. Volume statistics of **multich-audio.mp4** (the original file):
> {{{
> % ffmpeg -i multich-audio.mp4 -af "volumedetect" -vn -sn -f null
> /dev/null
> Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
>   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
> 5.1(side), fltp, 256 kb/s (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
> Output #0, null, to '/dev/null':
>   Stream #0:0(und): Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s
> (default)
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] n_samples: 19150848
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] mean_volume: -25.4 dB
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] max_volume: -1.9 dB
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_1db: 68
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_2db: 2022
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_3db: 3665
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_4db: 6371
> [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_5db: 10144
> }}}
>
> 2. Volume statistics of **stereo-flt.mkv**:
> {{{
> % ffmpeg -i stereo-flt.mkv -af "volumedetect" -vn -sn -f null /dev/null
> Input #0, matroska,webm, from 'stereo-flt.mkv':
>   Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s
> (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
> Output #0, null, to '/dev/null':
>   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> [Parsed_volumedetect_0 @ 0x7fce1e404200] n_samples: 6383616
> [Parsed_volumedetect_0 @ 0x7fce1e404200] mean_volume: -21.6 dB
> [Parsed_volumedetect_0 @ 0x7fce1e404200] max_volume: 0.0 dB
> [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_0db: 1466
> [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_1db: 1310
> [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_2db: 3452
> [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_3db: 4591
> }}}
>
> 3. Volume statistics of **stereo-rematrix_maxval-1000.mkv**:
> {{{
> % ffmpeg -i stereo-s16-rematrix_maxval-1000.mkv -af "volumedetect" -vn
> -sn -f null /dev/null
> Input #0, matroska,webm, from 'stereo-s16-rematrix_maxval-1000.mkv':
>   Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
> (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
> Output #0, null, to '/dev/null':
>   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> [Parsed_volumedetect_0 @ 0x7febc6a06180] n_samples: 6383616
> [Parsed_volumedetect_0 @ 0x7febc6a06180] mean_volume: -21.6 dB
> [Parsed_volumedetect_0 @ 0x7febc6a06180] max_volume: 0.0 dB
> [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_0db: 1466
> [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_1db: 1310
> [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_2db: 3452
> [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_3db: 4591
> }}}
>
> 4. Volume statistics of **stereo-s16.mkv**:
> {{{
> % ffmpeg -i stereo-s16.mkv -af "volumedetect" -vn -sn -f null /dev/null
> Input #0, matroska,webm, from 'stereo-s16.mkv':
>   Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
> (default)
> Stream mapping:
>   Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
> Output #0, null, to '/dev/null':
>   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
> (default)
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] n_samples: 6383616
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] mean_volume: -29.3 dB
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] max_volume: -5.9 dB
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_5db: 21
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_6db: 294
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_7db: 598
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_8db: 831
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_9db: 1816
> [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_10db: 4223
> }}}
>
> From the above, we can see that converting to **fltp** or **s16 with
> -rematrix_maxval 100** have the exact same max and mean volume, while
> converting to **s16** directly without setting **-rematrix_maxval**
> results in lot weaker volume. If I convert from 2-channel **flt** to
> 2-channel **s16**, the volume will not be affected at all.
>
> I checked the code related to the **rematrix_maxval** setting in
> **libswresample/rematrix.c** (shown below), we can see if
> **rematrix_maxval** is not manually set, it will be treated differently
> based on the input and output sample formats, basically for **s16** as
> either the input or output sample format, **rematrix_maxval** will be set
> to 1, which will affect the matrix params used in the later rematrix
> process, essentially attenuates the coefficients used for downmixing and
> cause volume attenuation as a result.
>
> My confusion is that why we have to check the output sample format and
> adjust this **rematrix_maxval** accordingly before dowxmixing, it looks
> to me that only the input sample format will affect the rematrix/downmix
> process, because rematrix/downmix will operate on the input data the same
> way regardless of the output sample format. If I am right, we may need to
> remove the **av_get_packed_sample_fmt(s->out_sample_fmt) <
> AV_SAMPLE_FMT_FLT** check in the fowlloing code (If this is correct, I
> may send a pull request later):
>
> {{{
> av_cold static int auto_matrix(SwrContext *s)
> {
>     double maxval;
>     int ret;
>
>     if (s->rematrix_maxval > 0) {
>         maxval = s->rematrix_maxval;
>     } else if (   av_get_packed_sample_fmt(s->out_sample_fmt) <
> AV_SAMPLE_FMT_FLT
>                || av_get_packed_sample_fmt(s->int_sample_fmt) <
> AV_SAMPLE_FMT_FLT) {
>         maxval = 1.0;
>     } else
>         maxval = INT_MAX;
>     ...
> }
>
> av_cold int swr_build_matrix(uint64_t in_ch_layout_param, uint64_t
> out_ch_layout_param,
>                              double center_mix_level, double
> surround_mix_level,
>                              double lfe_mix_level, double maxval,
>                              double rematrix_volume, double
> *matrix_param,
>                              int stride, enum AVMatrixEncoding
> matrix_encoding, void *log_context)
> {
>     ...
>
>     if(maxcoef > maxval || rematrix_volume  < 0){
>         maxcoef /= maxval;
>         for(i=0; i<SWR_CH_MAX; i++)
>             for(j=0; j<SWR_CH_MAX; j++){
>                 matrix_param[stride*i + j] /= maxcoef;
>             }
>     }
>     ....
> }
> }}}

New description:

 When converting multichannel audio in FLTP sample format to stereo in S16
 sample format, volume is decreased unexpectedly.

 The original 6-channel audio input file in FLTP sample format:
 {{{
 % ffprobe multich-audio.mp4
 ffprobe version 6.0 Copyright (c) 2007-2023 the FFmpeg developers
   built with Apple clang version 15.0.0 (clang-1500.0.40.1)
   configuration: --prefix=/usr/local/Cellar/ffmpeg/6.0_2 --enable-shared
 --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-
 ldflags='-Wl,-ld_classic' --enable-ffplay --enable-gnutls --enable-gpl
 --enable-libaom --enable-libaribb24 --enable-libbluray --enable-libdav1d
 --enable-libjxl --enable-libmp3lame --enable-libopus --enable-librav1e
 --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt
 --enable-libsvtav1 --enable-libtesseract --enable-libtheora --enable-
 libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-
 libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-
 libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype
 --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-
 libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr
 --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack
 --enable-videotoolbox --enable-audiotoolbox
   libavutil      58.  2.100 / 58.  2.100
   libavcodec     60.  3.100 / 60.  3.100
   libavformat    60.  3.100 / 60.  3.100
   libavdevice    60.  1.100 / 60.  1.100
   libavfilter     9.  3.100 /  9.  3.100
   libswscale      7.  1.100 /  7.  1.100
   libswresample   4. 10.100 /  4. 10.100
   libpostproc    57.  1.100 / 57.  1.100
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
   Metadata:
     major_brand     : isom
     minor_version   : 512
     compatible_brands: isomdby1iso2mp41
     encoder         : www.aliyun.com - Media Transcoding
   Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
 5.1(side), fltp, 256 kb/s (default)
     Metadata:
       handler_name    : SoundHandler
       vendor_id       : [0][0][0][0]
     Side data:
       audio service type: main
 }}}

 Converted from 6-channel in **fltp** to 2-chhanel in **flt**, output:
 **stereo-flt.mkv**
 {{{
 % ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_f32le stereo-flt.mkv
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
   Duration: 00:01:06.50, start: 0.000000, bitrate: 256 kb/s
   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
 5.1(side), fltp, 256 kb/s (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_f32le (native))
 Output #0, matroska, to 'stereo-flt.mkv':
   Stream #0:0(und): Audio: pcm_f32le ([3][0][0][0] / 0x0003), 48000 Hz,
 stereo, flt, 3072 kb/s (default)
 }}}

 Converted from 6-channel in **fltp** to 2-chhanel in **s16**, output:
 **stereo-s16.mkv**
 {{{
 % ffmpeg -i multich-audio.mp4 -ac 2 -c:a pcm_s16le stereo-s16.mkv
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
 5.1(side), fltp, 256 kb/s (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
 Output #0, matroska, to 'stereo-s16.mkv':
   Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz,
 stereo, s16, 1536 kb/s (default)
 }}}

 Converted from 6-channel in **fltp** to 2-chhanel in **s16**, with
 **-rematrix_maxval 1000**, output: **stereo-s16-rematrix_maxval-1000.mkv**
 {{{
 % ffmpeg -i multich-audio.mp4 -rematrix_maxval 1000 -ac 2 -c:a pcm_s16le
 stereo-s16-rematrix_maxval-1000.mkv
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
 5.1(side), fltp, 256 kb/s (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
 Output #0, matroska, to 'stereo-s16-rematrix_maxval-1000.mkv':
   Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz,
 stereo, s16, 1536 kb/s (default)
 }}}

 Using **volumedetect** to check the max and mean volumes of the original
 file and the 3 generated files above:

 1. Volume statistics of **multich-audio.mp4** (the original file):
 {{{
 % ffmpeg -i multich-audio.mp4 -af "volumedetect" -vn -sn -f null /dev/null
 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'multich-audio.mp4':
   Stream #0:0[0x1](und): Audio: eac3 (ec-3 / 0x332D6365), 48000 Hz,
 5.1(side), fltp, 256 kb/s (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (eac3 (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0(und): Audio: pcm_s16le, 48000 Hz, 5.1(side), s16, 4608 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] n_samples: 19150848
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] mean_volume: -25.4 dB
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] max_volume: -1.9 dB
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_1db: 68
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_2db: 2022
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_3db: 3665
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_4db: 6371
 [Parsed_volumedetect_0 @ 0x7fd9a47052c0] histogram_5db: 10144
 }}}

 2. Volume statistics of **stereo-flt.mkv**:
 {{{
 % ffmpeg -i stereo-flt.mkv -af "volumedetect" -vn -sn -f null /dev/null
 Input #0, matroska,webm, from 'stereo-flt.mkv':
   Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7fce1e404200] n_samples: 6383616
 [Parsed_volumedetect_0 @ 0x7fce1e404200] mean_volume: -21.6 dB
 [Parsed_volumedetect_0 @ 0x7fce1e404200] max_volume: 0.0 dB
 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_0db: 1466
 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_1db: 1310
 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_2db: 3452
 [Parsed_volumedetect_0 @ 0x7fce1e404200] histogram_3db: 4591
 }}}

 3. Volume statistics of **stereo-rematrix_maxval-1000.mkv**:
 {{{
 % ffmpeg -i stereo-s16-rematrix_maxval-1000.mkv -af "volumedetect" -vn -sn
 -f null /dev/null
 Input #0, matroska,webm, from 'stereo-s16-rematrix_maxval-1000.mkv':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7febc6a06180] n_samples: 6383616
 [Parsed_volumedetect_0 @ 0x7febc6a06180] mean_volume: -21.6 dB
 [Parsed_volumedetect_0 @ 0x7febc6a06180] max_volume: 0.0 dB
 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_0db: 1466
 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_1db: 1310
 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_2db: 3452
 [Parsed_volumedetect_0 @ 0x7febc6a06180] histogram_3db: 4591
 }}}

 4. Volume statistics of **stereo-s16.mkv**:
 {{{
 % ffmpeg -i stereo-s16.mkv -af "volumedetect" -vn -sn -f null /dev/null
 Input #0, matroska,webm, from 'stereo-s16.mkv':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] n_samples: 6383616
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] mean_volume: -29.3 dB
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] max_volume: -5.9 dB
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_5db: 21
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_6db: 294
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_7db: 598
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_8db: 831
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_9db: 1816
 [Parsed_volumedetect_0 @ 0x7fbf80a080c0] histogram_10db: 4223
 }}}

 I also tried converting 2-channel **flt** to 2-channel **s16** and check
 the volome as below, output: **stereo-s16-from-2-ch-flt.mkv**
 {{{
 % ffmpeg -y -i stereo-flt.mkv -c:a pcm_s16le stereo-s16-from-2-ch-flt.mkv
 Input #0, matroska,webm, from 'stereo-flt.mkv':
   Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
 Output #0, matroska, to 'stereo-s16-from-2-ch-flt.mkv':
   Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 48000 Hz, stereo,
 s16, 1536 kb/s (default)
 }}}

 Checked the volume of **stereo-flt.mkv** and **stereo-s16-from-2-ch-
 flt.mkv** (both have the exact same max and mean volume):
 {{{
 % ffmpeg -i stereo-flt.mkv -af "volumedetect" -vn -sn -f null /dev/null
 Input #0, matroska,webm, from 'stereo-flt.mkv':
   Stream #0:0: Audio: pcm_f32le, 48000 Hz, 2 channels, flt, 3072 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_f32le (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] n_samples: 6383616
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] mean_volume: -21.6 dB
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] max_volume: 0.0 dB
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] histogram_0db: 1466
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] histogram_1db: 1310
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] histogram_2db: 3452
 [Parsed_volumedetect_0 @ 0x7fc091f09c80] histogram_3db: 4591


 % ffmpeg -i stereo-s16-from-2-ch-flt.mkv -af "volumedetect" -vn -sn -f
 null /dev/null
 Input #0, matroska,webm, from 'stereo-s16-from-2-ch-flt.mkv':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, 2 channels, s16, 1536 kb/s
 (default)
 Stream mapping:
   Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))
 Output #0, null, to '/dev/null':
   Stream #0:0: Audio: pcm_s16le, 48000 Hz, stereo, s16, 1536 kb/s
 (default)
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] n_samples: 6383616
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] mean_volume: -21.6 dB
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] max_volume: 0.0 dB
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] histogram_0db: 1466
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] histogram_1db: 1310
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] histogram_2db: 3452
 [Parsed_volumedetect_0 @ 0x7fcbc5604080] histogram_3db: 4591
 }}}

 From the above, we can see that converting to **fltp** or **s16 with
 -rematrix_maxval 100** have the exact same max and mean volume, while
 converting to **s16** directly without setting **-rematrix_maxval**
 results in lot weaker volume. Also we can see that converting from
 2-channel **flt** to 2-channel **s16**, the volume is not affected at all.

 I checked the code related to the **rematrix_maxval** setting in
 **libswresample/rematrix.c** (shown below), we can see if
 **rematrix_maxval** is not manually set, it will be treated differently
 based on the input and output sample formats, basically for **s16** as
 either the input or output sample format, **rematrix_maxval** will be set
 to 1, which will affect the matrix params used in the later rematrix
 process, essentially attenuates the coefficients used for downmixing and
 cause volume attenuation as a result.

 My confusion is that why we have to check the output sample format and
 adjust this **rematrix_maxval** accordingly before dowxmixing, it looks to
 me that only the input sample format will affect the rematrix/downmix
 process, because rematrix/downmix will operate on the input data the same
 way regardless of the output sample format. If I am right, we may need to
 remove the **av_get_packed_sample_fmt(s->out_sample_fmt) <
 AV_SAMPLE_FMT_FLT** check in the fowlloing code (If this is correct, I may
 send a pull request later):

 {{{
 av_cold static int auto_matrix(SwrContext *s)
 {
     double maxval;
     int ret;

     if (s->rematrix_maxval > 0) {
         maxval = s->rematrix_maxval;
     } else if (   av_get_packed_sample_fmt(s->out_sample_fmt) <
 AV_SAMPLE_FMT_FLT
                || av_get_packed_sample_fmt(s->int_sample_fmt) <
 AV_SAMPLE_FMT_FLT) {
         maxval = 1.0;
     } else
         maxval = INT_MAX;
     ...
 }

 av_cold int swr_build_matrix(uint64_t in_ch_layout_param, uint64_t
 out_ch_layout_param,
                              double center_mix_level, double
 surround_mix_level,
                              double lfe_mix_level, double maxval,
                              double rematrix_volume, double *matrix_param,
                              int stride, enum AVMatrixEncoding
 matrix_encoding, void *log_context)
 {
     ...

     if(maxcoef > maxval || rematrix_volume  < 0){
         maxcoef /= maxval;
         for(i=0; i<SWR_CH_MAX; i++)
             for(j=0; j<SWR_CH_MAX; j++){
                 matrix_param[stride*i + j] /= maxcoef;
             }
     }
     ....
 }
 }}}

--
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/11083#comment:1>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker

_______________________________________________
FFmpeg-trac mailing list
FFmpeg-trac@avcodec.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-trac

To unsubscribe, visit link above, or email
ffmpeg-trac-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-trac] #11083(swresample:new): Converting multichannel audio in FLTP sample format to stereo in S16 attenuates volume unexpectedly

Reply via email to