Re: [Discuss-gnuradio] CPU Utilization and USRP2
On Thu, Nov 4, 2010 at 4:07 PM, Marc Epard wrote: > This reminds me of a question. What do you guys use for profiling native code > on Linux? I have a lot more experience on Mac OS where we have Shark, > Instruments and the like. > > -Marc Generally, I've used Oprofile. I have recently been exploring cachegrind and callgrind (with valgrind) for use with Kcachegrind. I'm really liking how it displays the results, but I'm still fairly new with it (note: you can also use Kcachegrind with oprofile output). Tom > On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: > >> Well, there is extra overhead. A "pirate" thread in the the receive path >> spins on the socket and inspects the contents. The packet may be an >> asynchronous message packet for flow control or destined for the user. Or it >> may be a data packet, in which case it is placed into a queue to be popped >> off by the device::recv() call. No extra memcopies, its just managing >> pointers. >> >> Could this pirate thread be removed? If the async messages came in over a >> different UDP port, and the multi-device buffer alignment logic was >> re-written to be event driven (when recv() is called). Then yes. And I will >> probably implement this when I get the time. :-) >> >> So, my best guess is that you are mostly seeing the overhead of the thread >> inspecting the packets. Of course there is also additional overhead added by >> using UDP, parsing VRT packets, parsing inline message packets. >> >> >> Thanks for testing it out BTW! >> -Josh >> >> On 11/04/2010 10:46 AM, David Campbell wrote: >>> Hi All, >>> I've noticed that the C++ interfaces provided in gnu-radio and UHD for >>> usrp2 >>> data streaming are CPU-intensive (UHD moreso than gnu-radio). I am >>> wondering if >>> there are easy ways to mitigate this or are there plans in the future to >>> diminish these. For UHD a decimate by 16 process chews up 75% of a CPU >>> just on >>> the uhd::device::recv functiion (not much less even when I use >>> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single >>> packet). For gnuradio's the CPU utilization is more like 36% - still a >>> lot. >>> >>> I may try to recode some of the lower-level interfaces in UHD if there is >>> not >>> an easy way to help improve CPU utilization. >>> >>> Thanks for your help, >>> David >>> >>> >>> ___ >>> Discuss-gnuradio mailing list >>> Discuss-gnuradio@gnu.org >>> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio >> >> ___ >> Discuss-gnuradio mailing list >> Discuss-gnuradio@gnu.org >> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio > > > ___ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > http://lists.gnu.org/mailman/listinfo/discuss-gnuradio > ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On 11/04/2010 01:25 PM, Marcus D. Leech wrote: On 11/04/2010 03:23 PM, Josh Blum wrote: Well, there is extra overhead. A "pirate" thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. When you say that this thread "spins", do you mean that it's in an infinite loop, waiting on blocking, or non-blocking I/O? That is, does it pause while it waits for data, or is it in a tight CPU loop? its a blocking call to a socket ::recv with a timeout ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On Thu, Nov 04, 2010 at 03:07:42PM -0500, Marc Epard wrote: > This reminds me of a question. What do you guys use for profiling > native code on Linux? I have a lot more experience on Mac OS where > we have Shark, Instruments and the like. Marc, I like to use oprofile. It's packaged for Fedora and Ubuntu (and probably the rest). It gets the job done using the h/w performance counters, and as such, the measurement doesn't perturb the "regular" execution time, and there's no need to recompile with special options. It would be a great tool to use on this UHD problem to get a better idea of exactly where the cycles are getting burned. http://oprofile.sourceforge.net/docs On Fedora 13: $ rpm -qa | grep -i oprofile oprofile-0.9.6-6.fc13.x86_64 oprofile-gui-0.9.6-6.fc13.x86_64 Eric > On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: > > > Well, there is extra overhead. A "pirate" thread in the the receive path > > spins on the socket and inspects the contents. The packet may be an > > asynchronous message packet for flow control or destined for the user. Or > > it may be a data packet, in which case it is placed into a queue to be > > popped off by the device::recv() call. No extra memcopies, its just > > managing pointers. > > > > Could this pirate thread be removed? If the async messages came in over a > > different UDP port, and the multi-device buffer alignment logic was > > re-written to be event driven (when recv() is called). Then yes. And I will > > probably implement this when I get the time. :-) > > > > So, my best guess is that you are mostly seeing the overhead of the thread > > inspecting the packets. Of course there is also additional overhead added > > by using UDP, parsing VRT packets, parsing inline message packets. > > > > > > Thanks for testing it out BTW! > > -Josh > > > > On 11/04/2010 10:46 AM, David Campbell wrote: > >> Hi All, > >>I've noticed that the C++ interfaces provided in gnu-radio and UHD for > >> usrp2 > >> data streaming are CPU-intensive (UHD moreso than gnu-radio). I am > >> wondering if > >> there are easy ways to mitigate this or are there plans in the future to > >> diminish these. For UHD a decimate by 16 process chews up 75% of a CPU > >> just on > >> the uhd::device::recv functiion (not much less even when I use > >> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single > >> packet). For gnuradio's the CPU utilization is more like 36% - still a > >> lot. > >> > >> I may try to recode some of the lower-level interfaces in UHD if there > >> is not > >> an easy way to help improve CPU utilization. > >> > >> Thanks for your help, > >> David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
On 11/04/2010 03:23 PM, Josh Blum wrote: > Well, there is extra overhead. A "pirate" thread in the the receive > path spins on the socket and inspects the contents. The packet may be > an asynchronous message packet for flow control or destined for the > user. Or it may be a data packet, in which case it is placed into a > queue to be popped off by the device::recv() call. No extra memcopies, > its just managing pointers. When you say that this thread "spins", do you mean that it's in an infinite loop, waiting on blocking, or non-blocking I/O? That is, does it pause while it waits for data, or is it in a tight CPU loop? -- Principal Investigator Shirleys Bay Radio Astronomy Consortium http://www.sbrac.org ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
This reminds me of a question. What do you guys use for profiling native code on Linux? I have a lot more experience on Mac OS where we have Shark, Instruments and the like. -Marc On Nov 4, 2010, at 2:23 PM, Josh Blum wrote: > Well, there is extra overhead. A "pirate" thread in the the receive path > spins on the socket and inspects the contents. The packet may be an > asynchronous message packet for flow control or destined for the user. Or it > may be a data packet, in which case it is placed into a queue to be popped > off by the device::recv() call. No extra memcopies, its just managing > pointers. > > Could this pirate thread be removed? If the async messages came in over a > different UDP port, and the multi-device buffer alignment logic was > re-written to be event driven (when recv() is called). Then yes. And I will > probably implement this when I get the time. :-) > > So, my best guess is that you are mostly seeing the overhead of the thread > inspecting the packets. Of course there is also additional overhead added by > using UDP, parsing VRT packets, parsing inline message packets. > > > Thanks for testing it out BTW! > -Josh > > On 11/04/2010 10:46 AM, David Campbell wrote: >> Hi All, >>I've noticed that the C++ interfaces provided in gnu-radio and UHD for >> usrp2 >> data streaming are CPU-intensive (UHD moreso than gnu-radio). I am >> wondering if >> there are easy ways to mitigate this or are there plans in the future to >> diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just >> on >> the uhd::device::recv functiion (not much less even when I use >> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single >> packet). For gnuradio's the CPU utilization is more like 36% - still a lot. >> >> I may try to recode some of the lower-level interfaces in UHD if there is >> not >> an easy way to help improve CPU utilization. >> >> Thanks for your help, >> David >> >> >> ___ >> Discuss-gnuradio mailing list >> Discuss-gnuradio@gnu.org >> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio > > ___ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
Re: [Discuss-gnuradio] CPU Utilization and USRP2
Well, there is extra overhead. A "pirate" thread in the the receive path spins on the socket and inspects the contents. The packet may be an asynchronous message packet for flow control or destined for the user. Or it may be a data packet, in which case it is placed into a queue to be popped off by the device::recv() call. No extra memcopies, its just managing pointers. Could this pirate thread be removed? If the async messages came in over a different UDP port, and the multi-device buffer alignment logic was re-written to be event driven (when recv() is called). Then yes. And I will probably implement this when I get the time. :-) So, my best guess is that you are mostly seeing the overhead of the thread inspecting the packets. Of course there is also additional overhead added by using UDP, parsing VRT packets, parsing inline message packets. Thanks for testing it out BTW! -Josh On 11/04/2010 10:46 AM, David Campbell wrote: Hi All, I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 data streaming are CPU-intensive (UHD moreso than gnu-radio). I am wondering if there are easy ways to mitigate this or are there plans in the future to diminish these. For UHD a decimate by 16 process chews up 75% of a CPU just on the uhd::device::recv functiion (not much less even when I use RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single packet). For gnuradio's the CPU utilization is more like 36% - still a lot. I may try to recode some of the lower-level interfaces in UHD if there is not an easy way to help improve CPU utilization. Thanks for your help, David ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio ___ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org http://lists.gnu.org/mailman/listinfo/discuss-gnuradio