On Thu, Mar 23, 2017 at 5:48 AM, ags <alfred.g.schm...@gmail.com> wrote:
> OK, I will use the busy-wait loop w/ usleep and test. The reason I used > select was I thought it would allow me to do other things (I need to have > another process, thread, or loop in this same application serving out audio > data to another client, synchronized with this data). My understanding was > that the process blocking on select() to return would free the CPU for > other things, but allow a quick wake-up to refresh the buffer as needed. > I thought that select(), and all that should work too, initially. But you have to remember, we're talking about an OS here that has an "expected" latency of 100ms, or more- Depending. I can tell you that one could easily experiment, and find out for themselves. One of the easiest tests one could do for themself. Would be to run a loop, for 10,000 iterations, then compare using select() to a busy wait loop. Then run the cmdline command time on each to see the difference. This of course is not a super accurate test, but should be good enough to show a huge difference in executable completion time. *If* you're more of the scientific type, then get the system time in your test app before, and after the test code, then output the difference between those two times. Anyway, using an RT kernel, or an xenomai kernel may improve this latency *some*, but it is said that this comes at the expense of *some* other performance aspects of the OS. I've not actually tested that myself, but only read about it. > > BTW, I have only mentioned the problems - but it does *almost* work. In > my tests, I ran 12,500 4KiB buffers from ARM to PRU and measured (on the > PRU side, using the precise CYCLE counter) to see if the PRU ever had to > wait for the next buffer fill. Turns out that the PRU had to wait about 180 > times, or about 1.5% of the buffer fill events. The worse case wait (stall) > time was ~5milliSeconds. > One has to be very careful what they use in code when writing an executable that requires some degree of determinism from userspace. I can not think of the articles I've read in the past that led me to understand all this. But they're out there. Pretty much anything that is a system call, will incur a latency penalty. Because one ends up switching processor context from userspace, to kernelspace, and back to userspace. This in of it's self may not be too bad, but any variables that are needed will end up being copied back and forth as well. In these cases however, you can incur huge latency spikes that you may not have anticipated. Personally, I've run into this problem a couple times during two different projects. So my style of coding is to just get something working, right ? Then refactoring the code to perform to my expectations. Basically, starting with really "simple" stuff like printf(), select, etc. Then refactoring those out when / if needed. Many times, it's not needed, but when it is, one should understand the consequences of using such function calls in an executable. That way, one should have at least a rough idea where to start with "trimming the fat". But everyone falls into this "trap" at least once or twice when entering the embedded arena. My understanding of calls like select(), is that when they're used, you're yielding the processor back to the system, with the "promise" that eventually, the system will notify you when something related to that call has changed. But with a busy wait loop, you're defining the time period you're allowing the processor to be yielded back to the system. In the case of my example, approximately 1ms. Just be aware that with any non real-time OS, much faster than 1ms intervals will yield varying results. e.g. the system will( may ) not be able to keep up with your code. If your code is super efficient, you can potentially get hundreds of thousands of iterations. This is of course not guaranteed, but I've done it personally with the ADC, so I do know it can be possible. At this performance level, you're almost certainly using mmap(). You're almost certainly using a lot of processor time as well. 80% + Also my code was pseudo code that I picked apart myself after I posted. On the PRU side of things, you're probably going to want to do things a bit differently. For starters, you're going to want to time your data transfers from the PRU probably. That is, every 20ms, you're going to kick off a new data set. However, this has to be done smartly, as you do not want to override the userspace side file lock. So perhaps a double buffer will be needed ? That will depend on the outcome of your given situation. Another technique that could be used, would be data packing. As plain text data can be a lot larger in memory than a packed data structure. But it would also require a lot of thought on as how to do this smartly. As well as a strong understanding of struct / union "data objects" + data alignment. For the best results. There could potentially be a lot more to consider down the road. Just pick away at it one thing at a time. Eventually you'll be done with it. -- For more options, visit http://beagleboard.org/discuss --- You received this message because you are subscribed to the Google Groups "BeagleBoard" group. To unsubscribe from this group and stop receiving emails from it, send an email to beagleboard+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/beagleboard/CALHSORoHcjzsKchgUBWuOMPN5ow4V-TLxXOr7CGa67obQqPR4g%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.