#10148: M2TS encoding of H.264 omits the init segment
-------------------------------------+-------------------------------------
             Reporter:  John Coiner  |                     Type:  defect
               Status:  new          |                 Priority:  normal
            Component:               |                  Version:  git-
  undetermined                       |  master
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
 == Summary of the bug

 Users of OBS who stream with HLS are transmitting non-compliant HLS
 streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.

 == Background

 === HLS

 Each chunk of an HLS stream should be independently-decodable -- it should
 begin with a key frame and include any metadata (eg. SPS, PPS) needed to
 initialize the decoders. Section 3 of the HLS RFC requires this: "Any
 Media Segment that contains video SHOULD include enough information to
 initialize a video decoder and decode a continuous set of frames ..."

 === YouTube

 I work at YouTube Live, where these non-compliant HLS streams cause
 certain headaches. Today YouTube is able to support them, with the caveat
 that noncompliant HLS uploads reduce the reliability of the resulting
 livestream broadcasts. In the future, it would be nice if OBS would
 produce compliant streams and YouTube could eventually reject noncompliant
 HLS.

 === The Bug

 The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
 media into M2TS segments.


 For H.264, this file has logic that intends to take the "extradata" (which
 includes the SPS and PPS) from the codec parser and re-emit it at the
 beginning of key-frame segments. This is the logic:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         do {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)
                 extradd = 0;
         } while (p < buf_end && (state & 0x1f) != 9 &&
                  (state & 0x1f) != 5 && (state & 0x1f) != 1);

         if ((state & 0x1f) != 5)
             extradd = 0;
         if ((state & 0x1f) != 9) { // AUD NAL
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
                 return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This code scans the segment for NALs until it finds either an Access Unit
 Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
 (1). If the first one found is an IDR picture (a key frame) it re-emits
 the "extradata" into its output buffer ahead of the key frame.

 The problem is that encoders sometimes preface the key frame with another
 Access Unit Delimiter. In that case this code does not repeat the
 "extradata" and the resulting HLS stream may be noncompliant and
 unjoinable.

 == Proposed Fix?

 Could this be more robustly written as:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         while (p < buf_end
                && extradd > 0
                && (state & 0x1f) != 5  // IDR picture
                && (state & 0x1f) != 1  // non-IDR picture
                ) {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)  // SPS NAL
                 extradd = 0;
         }

         if ((state & 0x1f) != 5) {
             // Not an IDR picture
             extradd = 0;
         }

         if (extradd > 0) {
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
               return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This alternate code ignores Access Unit Delimiters, so it's insensitive to
 whether they appear at the beginning or end of a segment. It scans for the
 first picture, and if this is an IDR picture it prefixes the segment with
 its '''extradata'''.

 I've confirmed using captured media from one of the problematic uploads
 that this does reinsert '''extradata''' in each HLS segment, producing a
 stream that is continuously joinable.

 It also passes fate.

 == How to reproduce

 I don't know how these content creators are configuring their OBS setups.
 (We know that OBS has the problem thanks to its User-Agent of "libobs...")
 My guess is:
    * They are using some VAAPI-supported hardware. Unlike the NVENC
 driver, the VAAPI driver in OBS has no way to hint to the underlying
 hardware that it should repeat SPS and PPS. Empirically, the software
 H.264 encoder appears to repeat SPS and PPS.
    * The underlying hardware or driver defaults to never repeating SPS and
 PPS.
    * The encoded bitstream interacts with this bug to produce an
 unjoinable HLS stream.

 It's also possible to reproduce the bug with a canned input file and a
 single ffmpeg command line (of the form "ffmpeg -i input_file -codec copy
 -f hls ...") that yields a noncompliant, unjoinable HLS stream. Perhaps I
 can follow up with such a file if that'd be helpful?
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10148>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
_______________________________________________
FFmpeg-trac mailing list
FFmpeg-trac@avcodec.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-trac

To unsubscribe, visit link above, or email
ffmpeg-trac-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to