Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Tom Rondeau
On Thu, Nov 4, 2010 at 4:07 PM, Marc Epard  wrote:
> This reminds me of a question. What do you guys use for profiling native code 
> on Linux? I have a lot more experience on Mac OS where we have Shark, 
> Instruments and the like.
>
> -Marc

Generally, I've used Oprofile. I have recently been exploring
cachegrind and callgrind (with valgrind) for use with Kcachegrind. I'm
really liking how it displays the results, but I'm still fairly new
with it (note: you can also use Kcachegrind with oprofile output).

Tom



> On Nov 4, 2010, at 2:23 PM, Josh Blum wrote:
>
>> Well, there is extra overhead. A "pirate" thread in the the receive path 
>> spins on the socket and inspects the contents. The packet may be an 
>> asynchronous message packet for flow control or destined for the user. Or it 
>> may be a data packet, in which case it is placed into a queue to be popped 
>> off by the device::recv() call. No extra memcopies, its just managing 
>> pointers.
>>
>> Could this pirate thread be removed? If the async messages came in over a 
>> different UDP port, and the multi-device buffer alignment logic was 
>> re-written to be event driven (when recv() is called). Then yes. And I will 
>> probably implement this when I get the time.  :-)
>>
>> So, my best guess is that you are mostly seeing the overhead of the thread 
>> inspecting the packets. Of course there is also additional overhead added by 
>> using UDP, parsing VRT packets, parsing inline message packets.
>>
>>
>> Thanks for testing it out BTW!
>> -Josh
>>
>> On 11/04/2010 10:46 AM, David Campbell wrote:
>>> Hi All,
>>>    I've noticed that the C++ interfaces provided in gnu-radio and UHD for 
>>> usrp2
>>> data streaming are CPU-intensive (UHD moreso than gnu-radio).  I am 
>>> wondering if
>>> there are easy ways to mitigate this or are there plans in the future to
>>> diminish these.  For UHD a decimate by 16 process chews up 75% of a CPU 
>>> just on
>>> the uhd::device::recv functiion (not much less even when I use
>>> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single
>>> packet).  For gnuradio's  the CPU utilization is more like 36% - still a 
>>> lot.
>>>
>>>   I may try to recode some of the lower-level interfaces in UHD if there is 
>>> not
>>> an easy way to help improve CPU utilization.
>>>
>>>   Thanks for your help,
>>> David
>>>
>>>
>>> ___
>>> Discuss-gnuradio mailing list
>>> Discuss-gnuradio@gnu.org
>>> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>> ___
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
> ___
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>

___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Josh Blum



On 11/04/2010 01:25 PM, Marcus D. Leech wrote:

On 11/04/2010 03:23 PM, Josh Blum wrote:

Well, there is extra overhead. A "pirate" thread in the the receive
path spins on the socket and inspects the contents. The packet may be
an asynchronous message packet for flow control or destined for the
user. Or it may be a data packet, in which case it is placed into a
queue to be popped off by the device::recv() call. No extra memcopies,
its just managing pointers.

When you say that this thread "spins", do you mean that it's in an
infinite loop, waiting on blocking,
   or non-blocking I/O?  That is, does it pause while it waits for data,
or is it in a tight CPU loop?



its a blocking call to a socket ::recv with a timeout

___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Eric Blossom
On Thu, Nov 04, 2010 at 03:07:42PM -0500, Marc Epard wrote:

> This reminds me of a question. What do you guys use for profiling
> native code on Linux? I have a lot more experience on Mac OS where
> we have Shark, Instruments and the like.

Marc,

I like to use oprofile.  It's packaged for Fedora and Ubuntu (and
probably the rest).  It gets the job done using the h/w performance
counters, and as such, the measurement doesn't perturb the "regular"
execution time, and there's no need to recompile with special options.

It would be a great tool to use on this UHD problem to get a better
idea of exactly where the cycles are getting burned.

http://oprofile.sourceforge.net/docs

On Fedora 13:

  $ rpm -qa | grep -i oprofile
  oprofile-0.9.6-6.fc13.x86_64
  oprofile-gui-0.9.6-6.fc13.x86_64

Eric


> On Nov 4, 2010, at 2:23 PM, Josh Blum wrote:
> 
> > Well, there is extra overhead. A "pirate" thread in the the receive path 
> > spins on the socket and inspects the contents. The packet may be an 
> > asynchronous message packet for flow control or destined for the user. Or 
> > it may be a data packet, in which case it is placed into a queue to be 
> > popped off by the device::recv() call. No extra memcopies, its just 
> > managing pointers.
> > 
> > Could this pirate thread be removed? If the async messages came in over a 
> > different UDP port, and the multi-device buffer alignment logic was 
> > re-written to be event driven (when recv() is called). Then yes. And I will 
> > probably implement this when I get the time.  :-)
> > 
> > So, my best guess is that you are mostly seeing the overhead of the thread 
> > inspecting the packets. Of course there is also additional overhead added 
> > by using UDP, parsing VRT packets, parsing inline message packets.
> > 
> > 
> > Thanks for testing it out BTW!
> > -Josh
> > 
> > On 11/04/2010 10:46 AM, David Campbell wrote:
> >> Hi All,
> >>I've noticed that the C++ interfaces provided in gnu-radio and UHD for 
> >> usrp2
> >> data streaming are CPU-intensive (UHD moreso than gnu-radio).  I am 
> >> wondering if
> >> there are easy ways to mitigate this or are there plans in the future to
> >> diminish these.  For UHD a decimate by 16 process chews up 75% of a CPU 
> >> just on
> >> the uhd::device::recv functiion (not much less even when I use
> >> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single
> >> packet).  For gnuradio's  the CPU utilization is more like 36% - still a 
> >> lot.
> >> 
> >>   I may try to recode some of the lower-level interfaces in UHD if there 
> >> is not
> >> an easy way to help improve CPU utilization.
> >> 
> >>   Thanks for your help,
> >> David

___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Marcus D. Leech
On 11/04/2010 03:23 PM, Josh Blum wrote:
> Well, there is extra overhead. A "pirate" thread in the the receive
> path spins on the socket and inspects the contents. The packet may be
> an asynchronous message packet for flow control or destined for the
> user. Or it may be a data packet, in which case it is placed into a
> queue to be popped off by the device::recv() call. No extra memcopies,
> its just managing pointers.
When you say that this thread "spins", do you mean that it's in an
infinite loop, waiting on blocking,
  or non-blocking I/O?  That is, does it pause while it waits for data,
or is it in a tight CPU loop?

-- 
Principal Investigator
Shirleys Bay Radio Astronomy Consortium
http://www.sbrac.org



___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Marc Epard
This reminds me of a question. What do you guys use for profiling native code 
on Linux? I have a lot more experience on Mac OS where we have Shark, 
Instruments and the like.

-Marc

On Nov 4, 2010, at 2:23 PM, Josh Blum wrote:

> Well, there is extra overhead. A "pirate" thread in the the receive path 
> spins on the socket and inspects the contents. The packet may be an 
> asynchronous message packet for flow control or destined for the user. Or it 
> may be a data packet, in which case it is placed into a queue to be popped 
> off by the device::recv() call. No extra memcopies, its just managing 
> pointers.
> 
> Could this pirate thread be removed? If the async messages came in over a 
> different UDP port, and the multi-device buffer alignment logic was 
> re-written to be event driven (when recv() is called). Then yes. And I will 
> probably implement this when I get the time.  :-)
> 
> So, my best guess is that you are mostly seeing the overhead of the thread 
> inspecting the packets. Of course there is also additional overhead added by 
> using UDP, parsing VRT packets, parsing inline message packets.
> 
> 
> Thanks for testing it out BTW!
> -Josh
> 
> On 11/04/2010 10:46 AM, David Campbell wrote:
>> Hi All,
>>I've noticed that the C++ interfaces provided in gnu-radio and UHD for 
>> usrp2
>> data streaming are CPU-intensive (UHD moreso than gnu-radio).  I am 
>> wondering if
>> there are easy ways to mitigate this or are there plans in the future to
>> diminish these.  For UHD a decimate by 16 process chews up 75% of a CPU just 
>> on
>> the uhd::device::recv functiion (not much less even when I use
>> RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single
>> packet).  For gnuradio's  the CPU utilization is more like 36% - still a lot.
>> 
>>   I may try to recode some of the lower-level interfaces in UHD if there is 
>> not
>> an easy way to help improve CPU utilization.
>> 
>>   Thanks for your help,
>> David
>> 
>> 
>> ___
>> Discuss-gnuradio mailing list
>> Discuss-gnuradio@gnu.org
>> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio
> 
> ___
> Discuss-gnuradio mailing list
> Discuss-gnuradio@gnu.org
> http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


Re: [Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread Josh Blum
Well, there is extra overhead. A "pirate" thread in the the receive path 
spins on the socket and inspects the contents. The packet may be an 
asynchronous message packet for flow control or destined for the user. 
Or it may be a data packet, in which case it is placed into a queue to 
be popped off by the device::recv() call. No extra memcopies, its just 
managing pointers.


Could this pirate thread be removed? If the async messages came in over 
a different UDP port, and the multi-device buffer alignment logic was 
re-written to be event driven (when recv() is called). Then yes. And I 
will probably implement this when I get the time.  :-)


So, my best guess is that you are mostly seeing the overhead of the 
thread inspecting the packets. Of course there is also additional 
overhead added by using UDP, parsing VRT packets, parsing inline message 
packets.



Thanks for testing it out BTW!
-Josh

On 11/04/2010 10:46 AM, David Campbell wrote:

Hi All,
I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2
data streaming are CPU-intensive (UHD moreso than gnu-radio).  I am wondering if
there are easy ways to mitigate this or are there plans in the future to
diminish these.  For UHD a decimate by 16 process chews up 75% of a CPU just on
the uhd::device::recv functiion (not much less even when I use
RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single
packet).  For gnuradio's  the CPU utilization is more like 36% - still a lot.

   I may try to recode some of the lower-level interfaces in UHD if there is not
an easy way to help improve CPU utilization.

   Thanks for your help,
David


___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio


[Discuss-gnuradio] CPU Utilization and USRP2

2010-11-04 Thread David Campbell
Hi All,
   I've noticed that the C++ interfaces provided in gnu-radio and UHD for usrp2 
data streaming are CPU-intensive (UHD moreso than gnu-radio).  I am wondering 
if 
there are easy ways to mitigate this or are there plans in the future to 
diminish these.  For UHD a decimate by 16 process chews up 75% of a CPU just on 
the uhd::device::recv functiion (not much less even when I use 
RECV_MODE_FULL_BUFF and size the buffer to be 100x the size of a single 
packet).  For gnuradio's  the CPU utilization is more like 36% - still a lot. 

  I may try to recode some of the lower-level interfaces in UHD if there is not 
an easy way to help improve CPU utilization. 

  Thanks for your help,
David


___
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
http://lists.gnu.org/mailman/listinfo/discuss-gnuradio