[Discuss-gnuradio] Run graph/ scheduler overhead

Dennis Glatting Sun, 12 Jul 2015 14:07:50 -0700

(Resent with pix removed.)


I am looking for pointers and papers on the overhead of the scheduler,
its performance, and high(?) data rates.


I enclosed a partial pix of my graph. The essence is:

  HackRF -> DC Block -> My Preamble Detect

There are other blocks in the graph but they do very little. BTW, the
sample rate is 10msps.

What is happening is overflow events from the HackRF code inside
osmocom. Even if I modify my Preamble detector to return noutput_items
at the begging or general_work() I still get overflow events.

I increased the buffer size between blocks from 32k to 128k
(GR_FIXED_BUFFER_SIZE in flat_flowgraph.cc). No impact.

I increased the priority on the DC blocker (dc_blocker_ff_impl.cc) and
my Preamble detector in their constructors (below). No impact.

      set_thread_priority( thread_priority() + 1 );

According to gr-ctrlport-monitor, the average time in the DC Blocker is
1,600,000, which I believe is 1.6ms, Preamble 400us, and HackRF 40us
against the clock tick of 1,000,000,000 (gr::high_res_timer_tps()). 

(I should mention I'm running on an 8core, 5GHz processor with 32GB of
memory. You can't do much better than that.)

The preamble detector has a lot of variance because a signal has to meet
a list of criteria and is rejected after failing any one of them.
Variance says: 1,100,000,000 but the interesting thing is variance
substantially decays over time so I'm not sure that number is
meaningful.  Regardless, even if I put a return statement at the head of
general_work() in Preamble I still get buffer overflows.

(Over the ten minutes I wrote this, the Preamble variance decayed from
1.1e9 to 7.9e8. I've seen it substantially, albeit slowly, decay and I
suspect variance (block_detail.cc) has an initialization problem.)

I spent a day inside the HackRF osmocom source (much coffee was
involved) and substantially modified its innards. However, this problem
persisted before I "operated." One of the source's problems was /inside/
general work where it waited for a minimum of three buffers from the
device executing a condition wait against a boost::condition_variable
(below).

  {
    boost::mutex::scoped_lock lock( _buf_mutex );

    while (_buf_used < 3 && running) // collect at least 3 buffers
      _buf_cond.wait( lock );
  }

I suspect that code fragment (and others) is badness because it worked
outside the graph/scheduler framework. Also, three buffers means 3x128k
I/Q 8bit samples, which seems like a ridiculous amount to wait. 

Samples are converted into gr_complex and pumped down the stream at the
stream's capacity, so nproduced is always near the stream size. Yet the
average work time is pretty low.

A curious set of variables are shown in control port. The average
nproduced is 15,773 but the average "output % full" is 0.52. How can
that be? I read some comment somewhere that a buffer is split in half
which, if true, the output buffer is really 15,773/16,536=0.95 (95%)
however contrasted against GR_FIXED_BUFFER_SIZE (flat_flowgraph.cc),
131,072/sizeof(gr_compex)=16,384. Consequently, I'm really confused what
those two numbers are telling me.

(BTW, I also added a couple of perf rpc variables, notably overflow
events and average (work in progress), to hackrf_source_c.cc (osmocom)
because I suspected the output of "O" (below) causes the scheduler to
hiccup.)

int
hackrf_source_c::hackrf_rx_callback(u_char *buf, uint32_t len) {

..
   std::cerr << "O" << std::flush;
..

The interesting thing about that code fragment is it is inside the
device's callback, which I felt had unknown consequences and
incrementing an event counter is a far better approach (i.e., to work
within the framework as much as possible).

I am considering adding code on overflow where all of the dirty buffers
are flushed but I experimented and it had no impact. (Oh, and I modified
the buffer management code from hard coded management to standard
containers with an effort to minimize allocation/deallocation. It's a
little cleaner.)

(BTW, to anyone looking at HackRF, have you even wondered why you can
modify the number of buffers but not buffer lengths (buflen)? It is
because the buffer length is hard coded in libhackrf and
hackrf_source_c.cc simply mirrors it.)


UPDATE:
--------------------------------------------------------------
I missed a very important debug step. I went back through my graph
deleting blocks. As deleted blocks the rate of overruns slowed but DID
NOT reach zero, even when the graph looked liked this:

  HackRF -> DC Block -> Null Sink
--------------------------------------------------------------


The result of all this nonsense is I am wondering about the scheduler's
management overhead which IS NOT tracked in any way I found. (Please
correct me if you know different.) It could be the scheduler's impact is
zero and my code simply sucks -- like, that's never happened before! :)

There are graphical output blocks in my graph, specifically two QT Time
Sinks and one QT GUI Sink. From an average work time perspective I
suspect these are non-issues but I also suspect the drawing is done
outside the running of the blocks. 

I am working on a Python equivalent of my graph but I'm not a Python
hacker so that will take some time. I am curious to compare that
performance against GRC.

At this point I am somewhat clueless as to why I am getting overflow
events.






_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

[Discuss-gnuradio] Run graph/ scheduler overhead

Reply via email to