(Resent with pix removed.)
I am looking for pointers and papers on the overhead of the scheduler, its performance, and high(?) data rates. I enclosed a partial pix of my graph. The essence is: HackRF -> DC Block -> My Preamble Detect There are other blocks in the graph but they do very little. BTW, the sample rate is 10msps. What is happening is overflow events from the HackRF code inside osmocom. Even if I modify my Preamble detector to return noutput_items at the begging or general_work() I still get overflow events. I increased the buffer size between blocks from 32k to 128k (GR_FIXED_BUFFER_SIZE in flat_flowgraph.cc). No impact. I increased the priority on the DC blocker (dc_blocker_ff_impl.cc) and my Preamble detector in their constructors (below). No impact. set_thread_priority( thread_priority() + 1 ); According to gr-ctrlport-monitor, the average time in the DC Blocker is 1,600,000, which I believe is 1.6ms, Preamble 400us, and HackRF 40us against the clock tick of 1,000,000,000 (gr::high_res_timer_tps()). (I should mention I'm running on an 8core, 5GHz processor with 32GB of memory. You can't do much better than that.) The preamble detector has a lot of variance because a signal has to meet a list of criteria and is rejected after failing any one of them. Variance says: 1,100,000,000 but the interesting thing is variance substantially decays over time so I'm not sure that number is meaningful. Regardless, even if I put a return statement at the head of general_work() in Preamble I still get buffer overflows. (Over the ten minutes I wrote this, the Preamble variance decayed from 1.1e9 to 7.9e8. I've seen it substantially, albeit slowly, decay and I suspect variance (block_detail.cc) has an initialization problem.) I spent a day inside the HackRF osmocom source (much coffee was involved) and substantially modified its innards. However, this problem persisted before I "operated." One of the source's problems was /inside/ general work where it waited for a minimum of three buffers from the device executing a condition wait against a boost::condition_variable (below). { boost::mutex::scoped_lock lock( _buf_mutex ); while (_buf_used < 3 && running) // collect at least 3 buffers _buf_cond.wait( lock ); } I suspect that code fragment (and others) is badness because it worked outside the graph/scheduler framework. Also, three buffers means 3x128k I/Q 8bit samples, which seems like a ridiculous amount to wait. Samples are converted into gr_complex and pumped down the stream at the stream's capacity, so nproduced is always near the stream size. Yet the average work time is pretty low. A curious set of variables are shown in control port. The average nproduced is 15,773 but the average "output % full" is 0.52. How can that be? I read some comment somewhere that a buffer is split in half which, if true, the output buffer is really 15,773/16,536=0.95 (95%) however contrasted against GR_FIXED_BUFFER_SIZE (flat_flowgraph.cc), 131,072/sizeof(gr_compex)=16,384. Consequently, I'm really confused what those two numbers are telling me. (BTW, I also added a couple of perf rpc variables, notably overflow events and average (work in progress), to hackrf_source_c.cc (osmocom) because I suspected the output of "O" (below) causes the scheduler to hiccup.) int hackrf_source_c::hackrf_rx_callback(u_char *buf, uint32_t len) { .. std::cerr << "O" << std::flush; .. The interesting thing about that code fragment is it is inside the device's callback, which I felt had unknown consequences and incrementing an event counter is a far better approach (i.e., to work within the framework as much as possible). I am considering adding code on overflow where all of the dirty buffers are flushed but I experimented and it had no impact. (Oh, and I modified the buffer management code from hard coded management to standard containers with an effort to minimize allocation/deallocation. It's a little cleaner.) (BTW, to anyone looking at HackRF, have you even wondered why you can modify the number of buffers but not buffer lengths (buflen)? It is because the buffer length is hard coded in libhackrf and hackrf_source_c.cc simply mirrors it.) UPDATE: -------------------------------------------------------------- I missed a very important debug step. I went back through my graph deleting blocks. As deleted blocks the rate of overruns slowed but DID NOT reach zero, even when the graph looked liked this: HackRF -> DC Block -> Null Sink -------------------------------------------------------------- The result of all this nonsense is I am wondering about the scheduler's management overhead which IS NOT tracked in any way I found. (Please correct me if you know different.) It could be the scheduler's impact is zero and my code simply sucks -- like, that's never happened before! :) There are graphical output blocks in my graph, specifically two QT Time Sinks and one QT GUI Sink. From an average work time perspective I suspect these are non-issues but I also suspect the drawing is done outside the running of the blocks. I am working on a Python equivalent of my graph but I'm not a Python hacker so that will take some time. I am curious to compare that performance against GRC. At this point I am somewhat clueless as to why I am getting overflow events. _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio