TL:DR - I believe I have what is essentially a low-latency/low-delay h264 stream coming live from a camera (even though it is not marked as such), but libavcodec insists on decoding it with a 4-frame lag.

Note: I have posted about this 4 months ago; I'm reposting an updated question + some answers I have.

I have a camera supplying an h264 stream, whose SPS/PPS claims that it needs 4 reference frames, causing decoding to lag the input by 4 frames. At 5-fps, this is almost 1 second and is very noticeable (and undesired for this app, as in many cases the security person watching the stream can also see the events happening out the window, and a 1 second delay is confusing and seems "broken").

However, when I look at camera specs, and at the streams themselves, they only ever contain I pictures and P pictures (never B pictures), and furthermore that they will all arrive in order. To my (limited) understanding of the h264 protocol, that suggests that it is possible to fully decode and display every image as it arrives, as they will never be out of order or depend on a picture that has not yet arrived.

Another cause for such a delay is the multi-threaded decoder (which adds a lag of 1 frame for each thread); I have disabled it by setting the thread_type to 0, but the delay I am experiencing is apparently caused by the multiple references.

Alex Cohn suggested in <http://www.mail-archive.com/[email protected]/msg00590.html> to modify ff_h264_get_profile(). I've done this as a test, and it seems to work for the streams that I have; I've also patched the SPS/PPS manually and it also solves the problem in the few examples I've tried. But I would rather not have to do either of these patches (the first is ugly and requires me to rebuild ffmpeg myself all the time; the second is just plain ugly and error prone).

Questions:

1. Would it work to patch ff_h264_decode_init() and decode_postinit() to also check the AV_DISCARD mode, and if we are discarding B and/or nonref frames, would set avctx->has_b_frames=0 and low_delay=1 ? Something like that would make it possible to convert _every_ stream to a low-delay stream by dropping the "non-low-delay" frames. For me, it would solve the problem (there are no B frames, so I wouldn't even lose anything by discarding B frames), but it would also be useful for e.g. seek functionality in a media player - if fast-forwarding by showing only I-frames, you would not need to read and discard 3 more frames to show an I frame you have just read.


2. Is it safe to just force low_delay=1 in my case?

Thanks in advance for your time and thoughts,
Camera Man
_______________________________________________
libav-api mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-api

Reply via email to