I am getting a shortened audio stream when including the audio filters aresample and amix, which later makes it impossible to concat the clips, because the different stream lengths lose sync between audio and video, with errors:
Invalid audio PTS

First, here is the output from latest ffmpeg in debian package, which works correctly:

$ ffmpeg-3.2.14-1~deb9u1 -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV -i Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a -filter_complex "[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'" -filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k -r 30 20190922_1532_ch5.1e-3.mov
ffmpeg version 3.2.14-1~deb9u1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
  configuration: --prefix=/usr --extra-version='1~deb9u1' --toolchain=hardened --libdir=/usr/lib/i386-linux-gnu --incdir=/usr/include/i386-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libebur128 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 34.101 / 55. 34.101
  libavcodec     57. 64.101 / 57. 64.101
  libavformat    57. 56.101 / 57. 56.101
  libavdevice    57.  1.100 / 57.  1.100
  libavfilter     6. 65.100 /  6. 65.100
  libavresample   3.  1.  0 /  3.  1.  0
  libswscale      4.  2.100 /  4.  2.100
  libswresample   2.  3.100 /  2.  3.100
  libpostproc    54.  1.100 / 54.  1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf57.56.101
  Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
    Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 60k tbc (default)
    Metadata:
      handler_name    : DataHandler
    Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
    Metadata:
      handler_name    : DataHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 512
    compatible_brands: isomiso2
    encoder         : Lavf57.56.101
  Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
    Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 218 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
No pixel format specified, yuvj420p for H.264 encoding chosen.
Use -pix_fmt yuv420p for compatibility with outdated media players.
[libx264 @ 0x170dc20] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x170dc20] profile High, level 3.1
[libx264 @ 0x170dc20] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-3.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf57.56.101
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuvj420p(pc), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc57.64.101 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      encoder         : Lavc57.64.101 aac
Stream mapping:
  Stream #0:0 (h264) -> crop (graph 0)
  Stream #0:1 (pcm_s16le) -> aresample (graph 1)
  Stream #1:0 (aac) -> amix:input1 (graph 1)
  drawtext (graph 0) -> Stream #0:0 (libx264)
  amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
frame=  420 fps= 12 q=-1.0 Lsize=    8303kB time=00:00:14.01 bitrate=4853.1kbits/s speed=0.417x video:8063kB audio:224kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.198401%
[libx264 @ 0x170dc20] frame I:2     Avg QP:14.83  size:195688
[libx264 @ 0x170dc20] frame P:106   Avg QP:19.65  size: 59553
[libx264 @ 0x170dc20] frame B:312   Avg QP:25.61  size:  4972
[libx264 @ 0x170dc20] consecutive B-frames:  1.0%  0.0%  0.0% 99.0%
[libx264 @ 0x170dc20] mb I  I16..4: 27.6% 29.0% 43.4%
[libx264 @ 0x170dc20] mb P  I16..4:  1.1%  1.3%  0.6%  P16..4: 30.5% 31.5% 22.6%  0.0%  0.0%    skip:12.4% [libx264 @ 0x170dc20] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 36.1%  7.7%  1.3%  direct: 4.2%  skip:50.7%  L0:37.5% L1:38.6% BI:23.9%
[libx264 @ 0x170dc20] final ratefactor: 18.99
[libx264 @ 0x170dc20] 8x8 transform intra:37.9% inter:54.5%
[libx264 @ 0x170dc20] coded y,uvDC,uvAC intra: 56.2% 64.1% 51.9% inter: 27.2% 19.7% 1.0%
[libx264 @ 0x170dc20] i16 v,h,dc,p: 73%  9% 14%  4%
[libx264 @ 0x170dc20] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 12% 11% 40% 4%  6%  6%  6%  5% 10% [libx264 @ 0x170dc20] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 15% 6%  9%  9%  9%  8% 11%
[libx264 @ 0x170dc20] i8c dc,h,v,p: 54% 25% 17%  5%
[libx264 @ 0x170dc20] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x170dc20] ref P L0: 41.9% 11.1% 40.8%  6.0%  0.2%
[libx264 @ 0x170dc20] ref B L0: 93.5%  5.9%  0.6%
[libx264 @ 0x170dc20] ref B L1: 99.4%  0.6%
[libx264 @ 0x170dc20] kb/s:4717.36
[aac @ 0x170fac0] Qavg: 582.581

Next ffprobe shows the video length:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.mov':
    encoder         : Lavf57.56.101
  Duration: 00:00:14.03, start: 0.000000, bitrate: 4849 kb/s
    Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 1024x768, 4717 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (defaul
      encoder         : Lavc57.64.101 libx264
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 131 kb/s (default)

And to get the ACTUAL audio length, I split the audio stream to it's own file.mpa using ffmpeg, then ffprobe:
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-3.m4a':
    encoder         : Lavf58.33.100
  Duration: 00:00:14.03, start: 0.000000, bitrate: 133 kb/s
    Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 131 kb/s (default)

Then I repeat the above with only the change to use ffmpeg current by git:

$ ffmpeg -i 20190922_1532_3Kf-pan-right_3969_c2t14.MOV -i Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a -filter_complex "[0]crop=x=128:y=0:w=1024:h=720,pad=1024:768:0:24,drawtext='fontsize=32:fontcolor=0xa73450:bordercolor=white:shadowcolor=black:fontfile=/usr/share/fonts/TrueType/SF-Foxboro-Script-Bold.ttf:x=(w-text_w-20):y=(h-text_h-36):shadowx=2:shadowy=2:borderw=1:text=seahorseCorral.org'" -filter_complex "aresample=48000,amix" -s 1024x768 -c:v h264 -b:v 4700k -r 30 20190922_1532_ch5.1e-g.mov ffmpeg version N-95129-g04858650b1 Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 6.3.0 (Debian 6.3.0-18+deb9u1) 20170516
  configuration: --prefix=/usr/local --enable-gpl --enable-libmp3lame --enable-libvorbis --enable-libx264 --enable-libopenjpeg --enable-libfreetype --disable-doc --disable-htmlpages --disable-podpages --enable-shared --enable-libvpx --extra-cflags=-I/usr/include --extra-ldflags=-L/usr/lib/i386-linux-gnu --enable-libass --enable-libtesseract
  libavutil      56. 35.100 / 56. 35.100
  libavcodec     58. 59.101 / 58. 59.101
  libavformat    58. 33.100 / 58. 33.100
  libavdevice    58.  9.100 / 58.  9.100
  libavfilter     7. 59.100 /  7. 59.100
  libswscale      5.  6.100 /  5.  6.100
  libswresample   3.  6.100 /  3.  6.100
  libpostproc    55.  6.100 / 55.  6.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_3Kf-pan-right_3969_c2t14.MOV':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf57.56.101
  Duration: 00:00:14.01, start: 0.002000, bitrate: 25128 kb/s
    Stream #0:0(eng): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720, 23587 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 60k tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: pcm_s16le (sowt / 0x74776F73), 48000 Hz, stereo, s16, 1536 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'Voice_20190922-1315_voiceOverForEMR-outroClip_c108t8.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 512
    compatible_brands: isomiso2
    encoder         : Lavf57.56.101
  Duration: 00:00:08.02, start: 0.000000, bitrate: 220 kb/s
    Stream #1:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, mono, fltp, 218 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 (h264) -> crop (graph 0)
  Stream #0:1 (pcm_s16le) -> aresample (graph 1)
  Stream #1:0 (aac) -> amix:input1 (graph 1)
  drawtext (graph 0) -> Stream #0:0 (libx264)
  amix (graph 1) -> Stream #0:1 (aac)
Press [q] to stop, [?] for help
[libx264 @ 0x142f2c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX LZCNT BMI1 SlowPshufb
[libx264 @ 0x142f2c0] profile High, level 3.1
[libx264 @ 0x142f2c0] 264 - core 148 r2748 97eaef2 - H.264/MPEG-4 AVC codec - Copyleft 2003-2016 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=6 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=abr mbtree=1 bitrate=4700 ratetol=1.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mov, to '20190922_1532_ch5.1e-g.mov':
  Metadata:
    major_brand     : qt
    minor_version   : 512
    compatible_brands: qt
    encoder         : Lavf58.33.100
    Stream #0:0: Video: h264 (libx264) (avc1 / 0x31637661), yuvj420p(pc, progressive), 1024x768, q=-1--1, 4700 kb/s, 30 fps, 15360 tbn, 30 tbc (default)
    Metadata:
      encoder         : Lavc58.59.101 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/4700000 buffer size: 0 vbv_delay: N/A
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
    Metadata:
      encoder         : Lavc58.59.101 aac
frame=  420 fps= 14 q=-1.0 Lsize=    8270kB time=00:00:13.90 bitrate=4873.7kbits/s speed=0.45x video:8061kB audio:193kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.185768%
[libx264 @ 0x142f2c0] frame I:2     Avg QP:14.84  size:195655
[libx264 @ 0x142f2c0] frame P:106   Avg QP:19.64  size: 59577
[libx264 @ 0x142f2c0] frame B:312   Avg QP:25.62  size:  4960
[libx264 @ 0x142f2c0] consecutive B-frames:  1.0%  0.0%  0.0% 99.0%
[libx264 @ 0x142f2c0] mb I  I16..4: 27.8% 28.6% 43.6%
[libx264 @ 0x142f2c0] mb P  I16..4:  1.2%  1.3%  0.6%  P16..4: 30.5% 31.4% 22.6%  0.0%  0.0%    skip:12.5% [libx264 @ 0x142f2c0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 36.0%  7.7%  1.3%  direct: 4.2%  skip:50.8%  L0:37.6% L1:38.7% BI:23.8%
[libx264 @ 0x142f2c0] final ratefactor: 18.99
[libx264 @ 0x142f2c0] 8x8 transform intra:36.9% inter:54.6%
[libx264 @ 0x142f2c0] coded y,uvDC,uvAC intra: 56.3% 63.9% 51.8% inter: 27.2% 19.7% 1.0%
[libx264 @ 0x142f2c0] i16 v,h,dc,p: 73%  9% 14%  4%
[libx264 @ 0x142f2c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 10% 13% 39% 4%  6%  6%  6%  5% 10% [libx264 @ 0x142f2c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 15% 18% 14% 6%  9%  9%  9%  8% 11%
[libx264 @ 0x142f2c0] i8c dc,h,v,p: 54% 24% 17%  5%
[libx264 @ 0x142f2c0] Weighted P-Frames: Y:9.4% UV:0.0%
[libx264 @ 0x142f2c0] ref P L0: 41.5% 11.5% 40.8%  6.0%  0.2%
[libx264 @ 0x142f2c0] ref B L0: 93.4%  6.0%  0.6%
[libx264 @ 0x142f2c0] ref B L1: 99.4%  0.6%
[libx264 @ 0x142f2c0] kb/s:4716.52
[aac @ 0x142d800] Qavg: 297.740

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.mov':
    encoder         : Lavf58.33.100
  Duration: 00:00:14.00, start: 0.000000, bitrate: 4838 kb/s
    Stream #0:0: Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 1024x768, 4716 kb/s, 30 fps, 30 tbr, 15360 tbn, 60 tbc (default)
      encoder         : Lavc58.59.101 libx264
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '20190922_1532_ch5.1e-g.m4a':
    encoder         : Lavf58.33.100
  Duration: 00:00:12.33, start: 0.000000, bitrate: 130 kb/s
    Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)

The audio is 1.70 seconds shorter, always. Different video input lengths and different audio lengths result in the same 1.70 seconds lost.

If I don't have any voice input and audio filter then the output streams match length, since they are from the same input video.

I've also tried first resampling the voice-over audio to 48000 and stereo first, then removing the aresample filter, leaving only the amix. Still bad audio. Since the next step would be to mix the audio in audacity and remux it back together, I'll stop testing now and see what you think.

Stewart

_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to