#10148: TS encoding of H.264 omits the SPS and PPS medata
-------------------------------------+-------------------------------------
Reporter: John Coiner | Owner: (none)
Type: defect | Status: new
Priority: normal | Component:
| undetermined
Version: git-master | Resolution:
Keywords: | Blocked By:
Blocking: | Reproduced by developer: 0
Analyzed by developer: 0 |
-------------------------------------+-------------------------------------
Changes (by John Coiner):
* summary: M2TS encoding of H.264 omits the SPS and PPS medata => TS
encoding of H.264 omits the SPS and PPS medata
Old description:
> == Summary of the bug
>
> Users of OBS who stream with HLS are transmitting non-compliant HLS
> streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.
>
> == Background
>
> === HLS
>
> Each chunk of an HLS stream should be independently-decodable -- it
> should begin with a key frame and include any metadata (eg. SPS, PPS)
> needed to initialize the decoders. Section 3 of the HLS RFC requires
> this: "Any Media Segment that contains video SHOULD include enough
> information to initialize a video decoder and decode a continuous set of
> frames ..."
>
> === YouTube
>
> I work at YouTube Live, where these non-compliant HLS streams cause
> certain headaches. Today YouTube is able to support them, with the caveat
> that noncompliant HLS uploads reduce the reliability of the resulting
> livestream broadcasts. In the future, it would be nice if OBS would
> produce compliant streams and YouTube could eventually reject
> noncompliant HLS.
>
> === The Bug
>
> The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
> media into M2TS segments.
>
> For H.264, this file has logic that intends to take the "extradata"
> (which includes the SPS and PPS) from the codec parser and re-emit it at
> the beginning of key-frame segments. This is the logic:
>
> {{{
> if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
> const uint8_t *p = buf, *buf_end = p + size;
> uint32_t state = -1;
> int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
> int ret = ff_check_h264_startcode(s, st, pkt);
> if (ret < 0)
> return ret;
>
> if (extradd && AV_RB24(st->codecpar->extradata) > 1)
> extradd = 0;
>
> do {
> p = avpriv_find_start_code(p, buf_end, &state);
> av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
> if ((state & 0x1f) == 7)
> extradd = 0;
> } while (p < buf_end && (state & 0x1f) != 9 &&
> (state & 0x1f) != 5 && (state & 0x1f) != 1);
>
> if ((state & 0x1f) != 5)
> extradd = 0;
> if ((state & 0x1f) != 9) { // AUD NAL
> data = av_malloc(pkt->size + 6 + extradd);
> if (!data)
> return AVERROR(ENOMEM);
> memcpy(data + 6, st->codecpar->extradata, extradd);
> memcpy(data + 6 + extradd, pkt->data, pkt->size);
> AV_WB32(data, 0x00000001);
> data[4] = 0x09;
> data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
> buf = data;
> size = pkt->size + 6 + extradd;
> }
> } else ...
> }}}
>
> This code scans the segment for NALs until it finds either an Access Unit
> Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
> (1). If the first one found is an IDR picture (a key frame) it re-emits
> the "extradata" into its output buffer ahead of the key frame.
>
> The problem is that encoders sometimes preface the key frame with another
> Access Unit Delimiter. In that case this code does not repeat the
> "extradata" and the resulting HLS stream may be noncompliant and
> unjoinable.
>
> == Proposed Fix?
>
> Could this be more robustly written as:
>
> {{{
> if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
> const uint8_t *p = buf, *buf_end = p + size;
> uint32_t state = -1;
> int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
> st->codecpar->extradata_size : 0;
> int ret = ff_check_h264_startcode(s, st, pkt);
> if (ret < 0)
> return ret;
>
> if (extradd && AV_RB24(st->codecpar->extradata) > 1)
> extradd = 0;
>
> while (p < buf_end
> && extradd > 0
> && (state & 0x1f) != 5 // IDR picture
> && (state & 0x1f) != 1 // non-IDR picture
> ) {
> p = avpriv_find_start_code(p, buf_end, &state);
> av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
> if ((state & 0x1f) == 7) // SPS NAL
> extradd = 0;
> }
>
> if ((state & 0x1f) != 5) {
> // Not an IDR picture
> extradd = 0;
> }
>
> if (extradd > 0) {
> data = av_malloc(pkt->size + 6 + extradd);
> if (!data)
> return AVERROR(ENOMEM);
> memcpy(data + 6, st->codecpar->extradata, extradd);
> memcpy(data + 6 + extradd, pkt->data, pkt->size);
> AV_WB32(data, 0x00000001);
> data[4] = 0x09;
> data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
> buf = data;
> size = pkt->size + 6 + extradd;
> }
> } else ...
> }}}
>
> This alternate code ignores Access Unit Delimiters, so it's insensitive
> to whether they appear at the beginning or end of a segment. It scans for
> the first picture, and if this is an IDR picture it prefixes the segment
> with its '''extradata'''.
>
> I've confirmed using captured media from one of the problematic uploads
> that this does reinsert '''extradata''' in each HLS segment, producing a
> stream that is continuously joinable.
>
> It also passes fate.
>
> == How to reproduce
>
> I don't know how these content creators are configuring their OBS setups.
> (We know that OBS has the problem thanks to its User-Agent of
> "libobs...") My guess is:
> * They are using some VAAPI-supported hardware. Unlike the NVENC
> driver, the VAAPI driver in OBS has no way to hint to the underlying
> hardware that it should repeat SPS and PPS. Empirically, the software
> H.264 encoder appears to repeat SPS and PPS.
> * The underlying hardware or driver defaults to never repeating SPS
> and PPS.
> * The encoded bitstream interacts with this bug to produce an
> unjoinable HLS stream.
>
> More than half of OBS+H.264+HLS users are producing compliant HLS streams
> -- likely because they are using an encoder that repeats SPS and PPS and
> thus doesn't rely on '''mpegtsenc.c''' to repeat it.
>
> It's also possible to reproduce the bug with a canned input file and a
> single ffmpeg command line (of the form "ffmpeg -i input_file -codec copy
> -f hls ...") that yields a noncompliant, unjoinable HLS stream. Perhaps I
> can follow up with such a file if that'd be helpful?
New description:
== Summary of the bug
Users of OBS who stream with HLS are transmitting non-compliant HLS
streams, and the root cause is in FFmpeg's '''mpegtsenc.c'''.
== Background
=== HLS
Each chunk of an HLS stream should be independently-decodable -- it should
begin with a key frame and include any metadata (eg. SPS, PPS) needed to
initialize the decoders. Section 3 of the HLS RFC requires this: "Any
Media Segment that contains video SHOULD include enough information to
initialize a video decoder and decode a continuous set of frames ..."
=== YouTube
I work at YouTube Live, where these non-compliant HLS streams cause
certain headaches. Today YouTube is able to support them, with the caveat
that noncompliant HLS uploads reduce the reliability of the resulting
livestream broadcasts. In the future, it would be nice if OBS would
produce compliant streams and YouTube could eventually reject noncompliant
HLS.
=== The Bug
The OBS HLS implementation uses the muxer in '''mpegtsenc.c''' to format
media into TS segments.
For H.264, this file has logic that intends to take the "extradata" (which
includes the SPS and PPS) from the codec parser and re-emit it at the
beginning of key-frame segments. This is the logic:
{{{
if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
const uint8_t *p = buf, *buf_end = p + size;
uint32_t state = -1;
int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
st->codecpar->extradata_size : 0;
int ret = ff_check_h264_startcode(s, st, pkt);
if (ret < 0)
return ret;
if (extradd && AV_RB24(st->codecpar->extradata) > 1)
extradd = 0;
do {
p = avpriv_find_start_code(p, buf_end, &state);
av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
if ((state & 0x1f) == 7)
extradd = 0;
} while (p < buf_end && (state & 0x1f) != 9 &&
(state & 0x1f) != 5 && (state & 0x1f) != 1);
if ((state & 0x1f) != 5)
extradd = 0;
if ((state & 0x1f) != 9) { // AUD NAL
data = av_malloc(pkt->size + 6 + extradd);
if (!data)
return AVERROR(ENOMEM);
memcpy(data + 6, st->codecpar->extradata, extradd);
memcpy(data + 6 + extradd, pkt->data, pkt->size);
AV_WB32(data, 0x00000001);
data[4] = 0x09;
data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
buf = data;
size = pkt->size + 6 + extradd;
}
} else ...
}}}
This code scans the segment for NALs until it finds either an Access Unit
Delimiter (9), an IDR picture (5), another SPS (7), or a non-IDR picture
(1). If the first one found is an IDR picture (a key frame) it re-emits
the "extradata" into its output buffer ahead of the key frame.
The problem is that encoders sometimes preface the key frame with another
Access Unit Delimiter. In that case this code does not repeat the
"extradata" and the resulting HLS stream may be noncompliant and
unjoinable.
== Proposed Fix?
Could this be more robustly written as:
{{{
if (st->codecpar->codec_id == AV_CODEC_ID_H264) {
const uint8_t *p = buf, *buf_end = p + size;
uint32_t state = -1;
int extradd = (pkt->flags & AV_PKT_FLAG_KEY) ?
st->codecpar->extradata_size : 0;
int ret = ff_check_h264_startcode(s, st, pkt);
if (ret < 0)
return ret;
if (extradd && AV_RB24(st->codecpar->extradata) > 1)
extradd = 0;
while (p < buf_end
&& extradd > 0
&& (state & 0x1f) != 5 // IDR picture
&& (state & 0x1f) != 1 // non-IDR picture
) {
p = avpriv_find_start_code(p, buf_end, &state);
av_log(s, AV_LOG_TRACE, "nal %"PRId32"\n", state & 0x1f);
if ((state & 0x1f) == 7) // SPS NAL
extradd = 0;
}
if ((state & 0x1f) != 5) {
// Not an IDR picture
extradd = 0;
}
if (extradd > 0) {
data = av_malloc(pkt->size + 6 + extradd);
if (!data)
return AVERROR(ENOMEM);
memcpy(data + 6, st->codecpar->extradata, extradd);
memcpy(data + 6 + extradd, pkt->data, pkt->size);
AV_WB32(data, 0x00000001);
data[4] = 0x09;
data[5] = 0xf0; // any slice type (0xe) + rbsp stop one bit
buf = data;
size = pkt->size + 6 + extradd;
}
} else ...
}}}
This alternate code ignores Access Unit Delimiters, so it's insensitive to
whether they appear at the beginning or end of a segment. It scans for the
first picture, and if this is an IDR picture it prefixes the segment with
its '''extradata'''.
I've confirmed using captured media from one of the problematic uploads
that this does reinsert '''extradata''' in each HLS segment, producing a
stream that is continuously joinable.
It also passes fate.
== How to reproduce
I don't know how these content creators are configuring their OBS setups.
(We know that OBS has the problem thanks to its User-Agent of "libobs...")
My guess is:
* They are using some VAAPI-supported hardware. Unlike the NVENC
driver, the VAAPI driver in OBS has no way to hint to the underlying
hardware that it should repeat SPS and PPS. Empirically, the software
H.264 encoder appears to repeat SPS and PPS.
* The underlying hardware or driver defaults to never repeating SPS and
PPS.
* The encoded bitstream interacts with this bug to produce an
unjoinable HLS stream.
More than half of OBS+H.264+HLS users are producing compliant HLS streams
-- likely because they are using an encoder that repeats SPS and PPS and
thus doesn't rely on '''mpegtsenc.c''' to repeat it.
It's also possible to reproduce the bug with a canned input file and a
single ffmpeg command line (of the form "ffmpeg -i input_file -codec copy
-f hls ...") that yields a noncompliant, unjoinable HLS stream. Perhaps I
can follow up with such a file if that'd be helpful?
--
--
Ticket URL: <https://trac.ffmpeg.org/ticket/10148#comment:5>
FFmpeg <https://ffmpeg.org>
FFmpeg issue tracker
_______________________________________________
FFmpeg-trac mailing list
FFmpeg-trac@avcodec.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-trac
To unsubscribe, visit link above, or email
ffmpeg-trac-requ...@ffmpeg.org with subject "unsubscribe".