Hi, In november, we wrote on the mailing list about implementing support for TR-03 in ffmpeg [1]. There were some doubts in the ffmpeg community about whether or not ffmpeg could handle demuxing 3gbps of RTP input without significantly modifying the RTP demuxer and/or doing kernel bypassing.
CBC/Radio Canada contracted us to test what was possible and to try to implement TR-03 in ffmpeg. Using 2 servers connected by 10gbps fibre optic connection and a switch we performed several tests with various tools which showed that it should be possible to receive and demux 3gbps of RTP raw video with a large enough RX queue in the NIC and the socket. We then patched ffmpeg to support depayloading 8 and 10 bit raw video [2] and process the input stream on a seperate thread [3]. This allowed us to succesfully receive a 3gbps raw video stream in ffmpeg and write the raw video to the disk. We were also able to transcode it into h264. Thus it seems to us that ffmpeg should be able to support TR-03 without significant modifications nor kernel bypassing. Bellow is a more detailed description of our testing and development process: 1. In the Linux Kernel: Thanks to iperf tool, we tested that the Linux kernel is able to handle 3gbps of udp streams with a payload size of 800 to 1450 bytes. 2. Using a simple RTP demuxer, we ensured that a user space program is able to handle a 3gbps stream without dropping packets. When adding an increasing amount of processing per packet, we observed that eventually packets are dropped. We concluded that minimal processing per packet should be used to achieve the reception of 3 gbps video stream. 3. We played with Gstreamer which already implements an RTP raw video muxer / demuxer. We were able to send a 3gbps video stream without dropping any packets. In reception, we experienced around 20% packet drop with 3gbps video stream because the thread in charge of socket reading is taking 100% CPU. Gstreamer team is aware of that and have ideas to reduce significantly the CPU usage grouping the processing per packet with the recvmmsg syscall 4. We implement an RTP demuxer compatible with RFC 4175 and pixel format 422-8bits and 422-10bits [2] * Checking FFmpeg tool code, we saw that a separate input thread(s) is used only if there is more than one input. With a minimal pipeline which reads an RTP stream from a socket and writes the raw video into a file, we observed that packets were dropped because too much time was used for packet processing. We modified FFmpeg tool to force the use of a dedicated input thread. 5. Several queues are used from packet reception to packet processing. Tunning each queue allowed us to have zero packet dropped: * In the NIC queue: thanks to ethtool, we increased the queue size from 453 to its maximum (4078) to avoid packet dropped in the NIC queue * In the Kernel queue: we observed no packet dropped after increasing the queue size to 16 mo * In the jitter buffer queue (FFmpeg): By default the jitter buffer is sized for 500 packets. With 1080P raw videos (RFC4175), we calculated that a video frame would lead to around 3000 packets. To be more resilient to packets reordering, we could increase the size of the jitter buffer but we observed that using a big jitter buffer, a significant processing per packet is added and lead thus to packet dropped in the Kernel. In addition, RFC4175 adds a mechanism to be resiliant to packet reordering per video frame. Results: * With : - our test setup composed of 2 servers running Centos 7 linked by a 10gbps switch. - our modified FFmpeg to handle RFC4175 and to improve the reading performance, - NIC and Kernel queues tunned and FFmpeg jitter buffer disabled we were able to: - send a 3 gbps video stream with gstreamer - receive with FFmpeg a 3 gbps video stream 422-8 bits without dropping any packets nor having any video artifacts. * However, using pixel format 4.2.2 10bits (packed), we encountered a performance degradation. Indeed 4.2.2 10bits (packed) is not supported in FFmpeg. We decided to convert into a 4.2.2 10bits planar format. We believe that this conversion adds too much processing per packets and thus leads to packets dropped. We are able to stream (and live transcode) 1080p 60fps 42210-bits without dropping packets. In reception the bandwidth is around 2.2 gbps. [1]: http://ffmpeg.org/pipermail/ffmpeg-devel/2016-November/202554.html [2]: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-February/207253.html _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel