[Discuss-gnuradio] CPU Utilization and USRP2
Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. Could this pirate thread be removed? If the async messages came in over a different UDP port, and the multi-device buffer alignment logic was re-written to be event driven (when recv() is called). Then yes. And I will probably implement this when I get the time. :-) So, my best guess is that you are mostly seeing the overhead of the thread inspecting the packets. Of course there is also additional overhead added by using UDP, parsing VRT packets, parsing inline message packets. Thanks for testing it out BTW! -Josh On 11/04/2010 10:46 AM, David Campbell wrote: Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
This reminds me of a question. What do you guys use for profiling native code on Linux? I have a lot more experience on Mac OS where we have Shark, Instruments and the like. -Marc On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. Could this pirate thread be removed? If the async messages came in over a different UDP port, and the multi-device buffer alignment logic was re-written to be event driven (when recv() is called). Then yes. And I will probably implement this when I get the time. :-) So, my best guess is that you are mostly seeing the overhead of the thread inspecting the packets. Of course there is also additional overhead added by using UDP, parsing VRT packets, parsing inline message packets. Thanks for testing it out BTW! -Josh On 11/04/2010 10:46 AM, David Campbell wrote: Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On 11/04/2010 03:23 PM, Josh Blum wrote: Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. When you say that this thread spins, do you mean that it's in an infinite loop, waiting on blocking, or non-blocking I/O? That is, does it pause while it waits for data, or is it in a tight CPU loop? -- Principal Investigator Shirleys Bay Radio Astronomy Consortium http://www.sbrac.org ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On Thu, Nov 04, 2010 at 03:07:42PM -0500, Marc Epard wrote: This reminds me of a question. What do you guys use for profiling native code on Linux? I have a lot more experience on Mac OS where we have Shark, Instruments and the like. Marc, I like to use oprofile. It's packaged for Fedora and Ubuntu (and probably the rest). It gets the job done using the h/w performance counters, and as such, the measurement doesn't perturb the regular execution time, and there's no need to recompile with special options. It would be a great tool to use on this UHD problem to get a better idea of exactly where the cycles are getting burned. http://oprofile.sourceforge.net/docs On Fedora 13: $ rpm -qa | grep -i oprofile oprofile-0.9.6-6.fc13.x86_64 oprofile-gui-0.9.6-6.fc13.x86_64 Eric On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. Could this pirate thread be removed? If the async messages came in over a different UDP port, and the multi-device buffer alignment logic was re-written to be event driven (when recv() is called). Then yes. And I will probably implement this when I get the time. :-) So, my best guess is that you are mostly seeing the overhead of the thread inspecting the packets. Of course there is also additional overhead added by using UDP, parsing VRT packets, parsing inline message packets. Thanks for testing it out BTW! -Josh On 11/04/2010 10:46 AM, David Campbell wrote: Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On 11/04/2010 01:25 PM, Marcus D. Leech wrote: On 11/04/2010 03:23 PM, Josh Blum wrote: Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. When you say that this thread spins, do you mean that it's in an infinite loop, waiting on blocking, or non-blocking I/O? That is, does it pause while it waits for data, or is it in a tight CPU loop? its a blocking call to a socket ::recv with a timeout ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On Thu, Nov 4, 2010 at 4:07 PM, Marc Epard mep...@me.com wrote: This reminds me of a question. What do you guys use for profiling native code on Linux? I have a lot more experience on Mac OS where we have Shark, Instruments and the like. -Marc Generally, I've used Oprofile. I have recently been exploring cachegrind and callgrind (with valgrind) for use with Kcachegrind. I'm really liking how it displays the results, but I'm still fairly new with it (note: you can also use Kcachegrind with oprofile output). Tom On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: Well, there is extra overhead. A pirate thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. Could this pirate thread be removed? If the async messages came in over a different UDP port, and the multi-device buffer alignment logic was re-written to be event driven (when recv() is called). Then yes. And I will probably implement this when I get the time. :-) So, my best guess is that you are mostly seeing the overhead of the thread inspecting the packets. Of course there is also additional overhead added by using UDP, parsing VRT packets, parsing inline message packets. Thanks for testing it out BTW! -Josh On 11/04/2010 10:46 AM, David Campbell wrote: Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio