#10148: TS encoding of H.264 omits the SPS and PPS medata
-------------------------------------+-------------------------------------
             Reporter:  John Coiner  |                    Owner:  (none)
                 Type:  defect       |                   Status:  new
             Priority:  normal       |                Component:
                                     |  undetermined
              Version:  git-master   |               Resolution:
             Keywords:               |               Blocked By:
             Blocking:               |  Reproduced by developer:  0
Analyzed by developer:  0            |
-------------------------------------+-------------------------------------
Changes (by John Coiner):

 * summary:  M2TS encoding of H.264 omits the SPS and PPS medata => TS
     encoding of H.264 omits the SPS and PPS medata


Old description:

> == Summary of the bug
>
> Users of OBS who stream with HLS are transmitting non-compliant HLS
> streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.
>
> == Background
>
> === HLS
>
> Each chunk of an HLS stream should be independently-decodable -- it
> should begin with a key frame and include any metadata (eg. SPS, PPS)
> needed to initialize the decoders. Section 3 of the HLS RFC requires
> this: "Any Media Segment that contains video SHOULD include enough
> information to initialize a video decoder and decode a continuous set of
> frames ..."
>
> === YouTube
>
> I work at YouTube Live, where these non-compliant HLS streams cause
> certain headaches. Today YouTube is able to support them, with the caveat
> that noncompliant HLS uploads reduce the reliability of the resulting
> livestream broadcasts. In the future, it would be nice if OBS would
> produce compliant streams and YouTube could eventually reject
> noncompliant HLS.
>
> === The Bug
>
> The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
> media into M2TS segments.
>

> For H.264, this file has logic that intends to take the "extradata"
> (which includes the SPS and PPS) from the codec parser and re-emit it at
> the beginning of key-frame segments. This is the logic:
>
> {{{
>     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
>         const uint8_t *p = buf, *buf_end = p + size;
>         uint32_t state = -1;
>         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
>         int ret = ff_check_h264_startcode(s, st, pkt);
>         if (ret < 0)
>             return ret;
>
>         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
>             extradd = 0;
>
>         do {
>             p = avpriv_find_start_code(p, buf_end, &state);
>             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
>             if ((state & 0x1f) == 7)
>                 extradd = 0;
>         } while (p < buf_end && (state & 0x1f) != 9 &&
>                  (state & 0x1f) != 5 && (state & 0x1f) != 1);
>
>         if ((state & 0x1f) != 5)
>             extradd = 0;
>         if ((state & 0x1f) != 9) { // AUD NAL
>             data = av_malloc(pkt->size + 6 + extradd);
>             if (!data)
>                 return AVERROR(ENOMEM);
>             memcpy(data + 6, st->codecpar->extradata, extradd);
>             memcpy(data + 6 + extradd, pkt->data, pkt->size);
>             AV_WB32(data, 0x00000001);
>             data[4] = 0x09;
>             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
>             buf     = data;
>             size    = pkt->size + 6 + extradd;
>         }
>     } else ...
> }}}
>
> This code scans the segment for NALs until it finds either an Access Unit
> Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
> (1). If the first one found is an IDR picture (a key frame) it re-emits
> the "extradata" into its output buffer ahead of the key frame.
>
> The problem is that encoders sometimes preface the key frame with another
> Access Unit Delimiter. In that case this code does not repeat the
> "extradata" and the resulting HLS stream may be noncompliant and
> unjoinable.
>
> == Proposed Fix?
>
> Could this be more robustly written as:
>
> {{{
>     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
>         const uint8_t *p = buf, *buf_end = p + size;
>         uint32_t state = -1;
>         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
>         int ret = ff_check_h264_startcode(s, st, pkt);
>         if (ret < 0)
>             return ret;
>
>         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
>             extradd = 0;
>
>         while (p < buf_end
>                && extradd > 0
>                && (state & 0x1f) != 5  // IDR picture
>                && (state & 0x1f) != 1  // non-IDR picture
>                ) {
>             p = avpriv_find_start_code(p, buf_end, &state);
>             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
>             if ((state & 0x1f) == 7)  // SPS NAL
>                 extradd = 0;
>         }
>
>         if ((state & 0x1f) != 5) {
>             // Not an IDR picture
>             extradd = 0;
>         }
>
>         if (extradd > 0) {
>             data = av_malloc(pkt->size + 6 + extradd);
>             if (!data)
>               return AVERROR(ENOMEM);
>             memcpy(data + 6, st->codecpar->extradata, extradd);
>             memcpy(data + 6 + extradd, pkt->data, pkt->size);
>             AV_WB32(data, 0x00000001);
>             data[4] = 0x09;
>             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
>             buf     = data;
>             size    = pkt->size + 6 + extradd;
>         }
>     } else ...
> }}}
>
> This alternate code ignores Access Unit Delimiters, so it's insensitive
> to whether they appear at the beginning or end of a segment. It scans for
> the first picture, and if this is an IDR picture it prefixes the segment
> with its '''extradata'''.
>
> I've confirmed using captured media from one of the problematic uploads
> that this does reinsert '''extradata''' in each HLS segment, producing a
> stream that is continuously joinable.
>
> It also passes fate.
>
> == How to reproduce
>
> I don't know how these content creators are configuring their OBS setups.
> (We know that OBS has the problem thanks to its User-Agent of
> "libobs...") My guess is:
>    * They are using some VAAPI-supported hardware. Unlike the NVENC
> driver, the VAAPI driver in OBS has no way to hint to the underlying
> hardware that it should repeat SPS and PPS. Empirically, the software
> H.264 encoder appears to repeat SPS and PPS.
>    * The underlying hardware or driver defaults to never repeating SPS
> and PPS.
>    * The encoded bitstream interacts with this bug to produce an
> unjoinable HLS stream.
>
> More than half of OBS+H.264+HLS users are producing compliant HLS streams
> -- likely because they are using an encoder that repeats SPS and PPS and
> thus doesn't rely on '''mpegtsenc.c''' to repeat it.
>
> It's also possible to reproduce the bug with a canned input file and a
> single ffmpeg command line (of the form "ffmpeg -i input_file -codec copy
> -f hls ...") that yields a noncompliant, unjoinable HLS stream. Perhaps I
> can follow up with such a file if that'd be helpful?

New description:

 == Summary of the bug

 Users of OBS who stream with HLS are transmitting non-compliant HLS
 streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.

 == Background

 === HLS

 Each chunk of an HLS stream should be independently-decodable -- it should
 begin with a key frame and include any metadata (eg. SPS, PPS) needed to
 initialize the decoders. Section 3 of the HLS RFC requires this: "Any
 Media Segment that contains video SHOULD include enough information to
 initialize a video decoder and decode a continuous set of frames ..."

 === YouTube

 I work at YouTube Live, where these non-compliant HLS streams cause
 certain headaches. Today YouTube is able to support them, with the caveat
 that noncompliant HLS uploads reduce the reliability of the resulting
 livestream broadcasts. In the future, it would be nice if OBS would
 produce compliant streams and YouTube could eventually reject noncompliant
 HLS.

 === The Bug

 The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
 media into TS segments.


 For H.264, this file has logic that intends to take the "extradata" (which
 includes the SPS and PPS) from the codec parser and re-emit it at the
 beginning of key-frame segments. This is the logic:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         do {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)
                 extradd = 0;
         } while (p < buf_end && (state & 0x1f) != 9 &&
                  (state & 0x1f) != 5 && (state & 0x1f) != 1);

         if ((state & 0x1f) != 5)
             extradd = 0;
         if ((state & 0x1f) != 9) { // AUD NAL
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
                 return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This code scans the segment for NALs until it finds either an Access Unit
 Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
 (1). If the first one found is an IDR picture (a key frame) it re-emits
 the "extradata" into its output buffer ahead of the key frame.

 The problem is that encoders sometimes preface the key frame with another
 Access Unit Delimiter. In that case this code does not repeat the
 "extradata" and the resulting HLS stream may be noncompliant and
 unjoinable.

 == Proposed Fix?

 Could this be more robustly written as:

 {{{
     if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
         const uint8_t *p = buf, *buf_end = p + size;
         uint32_t state = -1;
         int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
 st->codecpar->extradata_size : 0;
         int ret = ff_check_h264_startcode(s, st, pkt);
         if (ret < 0)
             return ret;

         if (extradd && AV_RB24(st->codecpar->extradata) > 1)
             extradd = 0;

         while (p < buf_end
                && extradd > 0
                && (state & 0x1f) != 5  // IDR picture
                && (state & 0x1f) != 1  // non-IDR picture
                ) {
             p = avpriv_find_start_code(p, buf_end, &state);
             av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
             if ((state & 0x1f) == 7)  // SPS NAL
                 extradd = 0;
         }

         if ((state & 0x1f) != 5) {
             // Not an IDR picture
             extradd = 0;
         }

         if (extradd > 0) {
             data = av_malloc(pkt->size + 6 + extradd);
             if (!data)
               return AVERROR(ENOMEM);
             memcpy(data + 6, st->codecpar->extradata, extradd);
             memcpy(data + 6 + extradd, pkt->data, pkt->size);
             AV_WB32(data, 0x00000001);
             data[4] = 0x09;
             data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
             buf     = data;
             size    = pkt->size + 6 + extradd;
         }
     } else ...
 }}}

 This alternate code ignores Access Unit Delimiters, so it's insensitive to
 whether they appear at the beginning or end of a segment. It scans for the
 first picture, and if this is an IDR picture it prefixes the segment with
 its '''extradata'''.

 I've confirmed using captured media from one of the problematic uploads
 that this does reinsert '''extradata''' in each HLS segment, producing a
 stream that is continuously joinable.

 It also passes fate.

 == How to reproduce

 I don't know how these content creators are configuring their OBS setups.
 (We know that OBS has the problem thanks to its User-Agent of "libobs...")
 My guess is:
    * They are using some VAAPI-supported hardware. Unlike the NVENC
 driver, the VAAPI driver in OBS has no way to hint to the underlying
 hardware that it should repeat SPS and PPS. Empirically, the software
 H.264 encoder appears to repeat SPS and PPS.
    * The underlying hardware or driver defaults to never repeating SPS and
 PPS.
    * The encoded bitstream interacts with this bug to produce an
 unjoinable HLS stream.

 More than half of OBS+H.264+HLS users are producing compliant HLS streams
 -- likely because they are using an encoder that repeats SPS and PPS and
 thus doesn't rely on '''mpegtsenc.c''' to repeat it.

 It's also possible to reproduce the bug with a canned input file and a
 single ffmpeg command line (of the form "ffmpeg -i input_file -codec copy
 -f hls ...") that yields a noncompliant, unjoinable HLS stream. Perhaps I
 can follow up with such a file if that'd be helpful?

--
-- 
Ticket URL: <https://trac.ffmpeg.org/ticket/10148#comment:5>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
_______________________________________________
FFmpeg-trac mailing list
FFmpeg-trac@avcodec.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-trac

To unsubscribe, visit link above, or email
ffmpeg-trac-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to