#10148: TS encoding of H.264 omits the SPS and PPS metadata
-------------------------------------+-------------------------------------
             Reporter:  John Coiner  |                    Owner:  (none)
                 Type:  defect       |                   Status:  open
             Priority:  normal       |                Component:
                                     |  undetermined
              Version:  git-master   |               Resolution:
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Description changed by John Coiner:

Old description:

> == Summary of the bug
>
> Users of OBS who stream with HLS are transmitting non-compliant HLS
> streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.
>
> == Background
>
> === HLS
>
> Each chunk of an HLS stream should be independently-decodable -- it
> should begin with a key frame and include any metadata (eg. SPS, PPS)
> needed to initialize the decoders. Section 3 of the HLS RFC requires
> this: "Any Media Segment that contains video SHOULD include enough
> information to initialize a video decoder and decode a continuous set of
> frames ..."
>
> === YouTube
>
> I work at YouTube Live. YouTube allows creators to upload live streams
> with HLS: https://support.google.com/youtube/answer/10349430?hl=en
>
> Sometimes creators transmit non-compliant HLS streams. Today YouTube is
> able to support them, with the caveat that noncompliant HLS uploads
> reduce the reliability of the resulting livestream broadcasts. In the
> future, it would be nice if OBS would produce compliant streams and
> YouTube could eventually reject noncompliant HLS.
>
> === The Bug
>
> The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
> media into TS segments.
>
> For H.264, this file has logic that intends to take the "extradata"
> (which includes the SPS and PPS) from the codec parser and re-emit it at
> the beginning of key-frame segments. This is the logic:
>
> {{{
>     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
>         const uint8_t *p = buf, *buf_end = p + size;
>         uint32_t state = -1;
>         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
>         int ret = ff_check_h264_startcode(s, st, pkt);
>         if (ret < 0)
>             return ret;
>
>         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
>             extradd = 0;
>
>         do {
>             p = avpriv_find_start_code(p, buf_end, &state);
>             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
>             if ((state & 0x1f) == 7)
>                 extradd = 0;
>         } while (p < buf_end && (state & 0x1f) != 9 &&
>                  (state & 0x1f) != 5 && (state & 0x1f) != 1);
>
>         if ((state & 0x1f) != 5)
>             extradd = 0;
>         if ((state & 0x1f) != 9) { // AUD NAL
>             data = av_malloc(pkt->size + 6 + extradd);
>             if (!data)
>                 return AVERROR(ENOMEM);
>             memcpy(data + 6, st->codecpar->extradata, extradd);
>             memcpy(data + 6 + extradd, pkt->data, pkt->size);
>             AV_WB32(data, 0x00000001);
>             data[4] = 0x09;
>             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
>             buf     = data;
>             size    = pkt->size + 6 + extradd;
>         }
>     } else ...
> }}}
>
> This code scans the segment for NALs until it finds either an Access Unit
> Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
> (1). If the first one found is an IDR picture (a key frame) it re-emits
> the "extradata" into its output buffer ahead of the key frame.
>
> The problem is that encoders sometimes preface the key frame with another
> Access Unit Delimiter. In that case this code does not repeat the
> "extradata" and the resulting HLS stream may be noncompliant and
> unjoinable.
>
> == Proposed Fix?
>
> Could this be more robustly written as:
>
> {{{
>     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
>         const uint8_t *p = buf, *buf_end = p + size;
>         uint32_t state = -1;
>         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
>         int ret = ff_check_h264_startcode(s, st, pkt);
>         if (ret < 0)
>             return ret;
>
>         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
>             extradd = 0;
>
>         while (p < buf_end
>                && extradd > 0
>                && (state & 0x1f) != 5  // IDR picture
>                && (state & 0x1f) != 1  // non-IDR picture
>                ) {
>             p = avpriv_find_start_code(p, buf_end, &state);
>             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
>             if ((state & 0x1f) == 7)  // SPS NAL
>                 extradd = 0;
>         }
>
>         if ((state & 0x1f) != 5) {
>             // Not an IDR picture
>             extradd = 0;
>         }
>
>         if (extradd > 0) {
>             data = av_malloc(pkt->size + 6 + extradd);
>             if (!data)
>               return AVERROR(ENOMEM);
>             memcpy(data + 6, st->codecpar->extradata, extradd);
>             memcpy(data + 6 + extradd, pkt->data, pkt->size);
>             AV_WB32(data, 0x00000001);
>             data[4] = 0x09;
>             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
>             buf     = data;
>             size    = pkt->size + 6 + extradd;
>         }
>     } else ...
> }}}
>
> This alternate code ignores Access Unit Delimiters, so it's insensitive
> to whether they appear at the beginning of a segment. It scans for the
> first picture, and if this is an IDR picture it prefixes the segment with
> its '''extradata'''.
>
> I've confirmed using captured media from one of the problematic uploads
> that this does reinsert '''extradata''' in each HLS segment, producing a
> stream that is continuously joinable.
>
> It also passes fate.
>
> == How to reproduce
>
> EDIT: See comment 7
>
> I don't know how these content creators are configuring their OBS setups.
> (We know that OBS has the problem thanks to its User-Agent of
> "libobs...") My guess is:
>    * They are using some VAAPI-supported hardware. Unlike the NVENC
> driver, the VAAPI driver in OBS has no way to hint to the underlying
> hardware that it should repeat SPS and PPS.
>    * The underlying hardware or driver defaults to never repeating SPS
> and PPS.
>    * The encoded bitstream interacts with this bug to produce an
> unjoinable HLS stream.
>
> More than half of OBS+H.264+HLS users are producing compliant HLS streams
> -- likely because they are using an encoder that repeats SPS and PPS and
> thus doesn't rely on '''mpegtsenc.c''' to repeat it.

New description:

 == Summary of the bug

 Users of OBS who stream with HLS are transmitting non-compliant HLS
 streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.

 == Background

 === HLS

 Each chunk of an HLS stream should be independently-decodable -- it should
 begin with a key frame and include any metadata (eg. SPS, PPS) needed to
 initialize the decoders. Section 3 of the HLS RFC requires this: "Any
 Media Segment that contains video SHOULD include enough information to
 initialize a video decoder and decode a continuous set of frames ..."

 === YouTube

 I work at YouTube Live. YouTube allows creators to upload live streams
 with HLS: https://support.google.com/youtube/answer/10349430?hl=en

 Sometimes creators transmit non-compliant HLS streams. Today YouTube is
 able to support them, with the caveat that noncompliant HLS uploads reduce
 the reliability of the resulting livestream broadcasts. In the future, it
 would be nice if OBS would produce compliant streams and YouTube could
 eventually reject noncompliant HLS.

 === The Bug

 The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
 media into TS segments.

 For H.264, this file has logic that intends to take the "extradata" (which
 includes the SPS and PPS) from the codec parser and re-emit it at the
 beginning of key-frame segments. This is the logic:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         do {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)
                 extradd = 0;
         } while (p < buf_end && (state & 0x1f) != 9 &&
                  (state & 0x1f) != 5 && (state & 0x1f) != 1);

         if ((state & 0x1f) != 5)
             extradd = 0;
         if ((state & 0x1f) != 9) { // AUD NAL
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
                 return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This code scans the segment for NALs until it finds either an Access Unit
 Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
 (1). If the first one found is an IDR picture (a key frame) it re-emits
 the "extradata" into its output buffer ahead of the key frame.

 The problem is that encoders sometimes preface the key frame with another
 Access Unit Delimiter. In that case this code does not repeat the
 "extradata" and the resulting HLS stream may be noncompliant and
 unjoinable.

 == Proposed Fix?

 Could this be more robustly written as:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         while (p < buf_end
                && extradd > 0
                && (state & 0x1f) != 5  // IDR picture
                && (state & 0x1f) != 1  // non-IDR picture
                ) {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)  // SPS NAL
                 extradd = 0;
         }

         if ((state & 0x1f) != 5) {
             // Not an IDR picture
             extradd = 0;
         }

         if (extradd > 0) {
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
               return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This alternate code ignores Access Unit Delimiters, so it's insensitive to
 whether they appear at the beginning of a segment. It scans for the first
 picture, and if this is an IDR picture it prefixes the segment with its
 '''extradata'''.

 I've confirmed using captured media from one of the problematic uploads
 that this does reinsert '''extradata''' in each HLS segment, producing a
 stream that is continuously joinable.

 It also passes fate.

 == How to reproduce

 EDIT: See comment 15 for a simple method that works with any input file.

 Comment 7 is an earlier path to reproduce, which relied on a particular
 input file.

 I don't know how these content creators are configuring their OBS setups.
 (We know that it's OBS, thanks to its User-Agent of "libobs...") My guess
 is:
    * They are using some VAAPI-supported hardware. Unlike the NVENC
 driver, the VAAPI driver in OBS has no way to hint to the underlying
 hardware that it should repeat SPS and PPS.
    * The underlying hardware or driver defaults to never repeating SPS and
 PPS.
    * The underlying hardware or driver defaults to emitting Access Unit
 Delimiters, similar to running x264 with its "aud" option enabled.
    * The encoded bitstream interacts with this bug to produce an
 unjoinable HLS stream.

 More than half of OBS+H.264+HLS users are producing compliant HLS streams
 -- perhaps because they are using an encoder that repeats SPS and PPS and
 thus doesn't rely on '''mpegtsenc.c''' to repeat it.

--
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10148#comment:16>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
_______________________________________________
FFmpeg-trac mailing list
FFmpeg-trac@avcodec.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-trac

To unsubscribe, visit link above, or email
ffmpeg-trac-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to