On 07/13/2004 06:24 PM Philippe Gerum wrote: > On Tue, 2004-07-13 at 17:12, Wolfgang Grandegger wrote: >> On 07/13/2004 04:31 PM Philippe Gerum wrote: >> > On Tue, 2004-07-13 at 15:33, Wolfgang Grandegger wrote: >> >> On 07/13/2004 02:29 PM Philippe Gerum wrote: >> >> > On Wed, 2004-07-07 at 13:37, Wolfgang Grandegger wrote: >> >> >> Hi, >> >> >> >> >> >> I have now commited a first port of RTAI over ADEOS/ppc to the vesuvio >> >> >> branch. One first observation is, that the latencies and task switch >> >> >> times are almost doubled on slow PowerPC processors (MPC860 at 50 MHz). >> >> > >> >> >> I will give some more information on this port including performance >> >> >> figures later on. >> >> >> >> >> > >> >> > It would be interesting to know if the IRQ latencies are also observable >> >> > using the irq_jitter test (or some adaptation of) from the Adeos distro >> >> > (linux/examples/* IIRC). >> >> >> >> These figures look resonable but I cannot compare them directly with the >> >> pre-ADEOS case. From my point of view, the bigger task switch times and >> >> latencies are simply due to additional code to be processed, which shows >> >> up on slow processors (with little caches, etc.). Are there any figures >> > >> > I could understand a larger overhead due to processing the pipeline with >> > two stages encurring a preemption (still, it's only about 250ns on a >> > Celeron 1Ghz), but I don't get yet why the task switch time is impacted. >> > >> >> for x86? I will provide some preliminary ones for PPC today or tomorrow. >> >> OK, hear are some preliminary figures. I used the following PowerPC >> systems for testing: >> >> TQM855L: MPC855 at 80 MHz, 4 kB I-Cache 4 kB D-Cache >> Icecube: MPC5200 at 400 MHz, 16 kB I-Cache 16 kB D-Cache >> iBook2 : PowerPC 750CX at 500 MHz 256 KB L2-Cache >> >> The following lists the results from the vesuvio's testsuite tests >> "switches" and "latency". The latency test was ran under some similar >> load for about 10 minutes (ping -f to the target, console output, etc.). >> The results marked with "RTHAL" are for vesuvio made with a RTHAL- >> patched kernel and correspondingly for "ADEOS": >> >> TQM855L Icecube IBook2 >> >> SUSP/RES SWITCHES : 8600 3100 600 RTHAL >> 17300 4200 870 ADEOS >> > ^^^ > This one bothers me. Stalling the pipeline is a matter > of toggling a > bit which is even done without trashing the I-cache since the code is > inlined and not reached through a function pointer like w/ RTHAL. It > looks like an event is taken in the Adeos case which is not in the RTHAL > case.
OK, this bothers me as well. I will do more a detailed tracing and assembler code review when time permits. IMHO, code inlining is not a guarantee for faster code. It also results in bigger code provoking more cache refills and TLB misses and for these you may pay much more than a few instructions on slow systems. BTW: I also realized a change in the task switching times when moving from kilauea to vesuvio. >> SEM SIG/WAIT SWITCHES: 11000 3300 660 RTHAL >> 18000 4400 950 ADEOS >> >> Min-Latencies : 27000 7400 1400 RTHAL >> 47400 11000 2600 ADEOS >> >> Max-Latencies : 86000 34000 29000 RTHAL >> 156000 57000 36000 ADEOS >> > ^^^ > Ok, this one frightens me... However, having figures available > in the > first place shows that RTAI/PCC over Adeos can work, so 1) congrat's, 2) > there's hope :o) In the listing above, system performance increase from left to right and a 1GHz Celeron would be placed very far on the right side with little differences between RTHAL and ADEOS based RTAI. >> While the latency tests under RTAI-ADEOS have an additional domain >> switch involved, the task switch tests don't. Nevertheless, the soft >> cli, sti, etc. functions of RTAI-ADEOS are heavier than the hard ones >> used under RTHAL. I think the min-latency simply depends on the amount >> of processed code as the hardware related influence is low (like cache >> refills, TLB misses, etc.). > > Sounds consistent; maybe are we paying the price of playing the > interrupt log during the measurement more often than we use to with hard > cli/sti pairs too? What I've generally observed on mid-range x86 CPUs is > that min and avg latencies are usually higher with Adeos than RTHAL, but > worst cases are close, with a bonus for Adeos wrt deviation, which seems > lower than RTHAL's. > >> Maybe there is still something not OK in my >> port. I know already a few places for optimization but I don't expect >> big improvements. It would be very interesting to run similar test on a >> (very) slow x86 (embedded) system as well. >> > > Yep. And compare latencies with purely Adeos-based measurements, i.e. > w/o RTAI in the picture. Experience while porting the x86 code has shown > that subtle interactions can exist between both codebases. > >> Apart from performance issues, stability is already quite good for RTAI >> over ADEOS PowerPC. > > Sounds good. When I'm grown up, (i.e. when I'll have stopped fiddling > with x86s) I'll certainly try a monkey-see monkey-do port of fusion over > PPC using your work in order to have another point of comparison for > latencies. Well, I skipped x86 and started immediately with PowerPC ;-). The nice thing with the RTAI over ADEOS port now also is, that it closely follows the way interrupts are handled on x86. Even the PPC decrementer trap used as timer fits into this framework. I allocate a virq early in ADEOS to get it mapped to NR_IRQS (or IPIPE_VIRQ_BASE to be more precise) and then simply extend the irq related arrays by one field in the RTAI-HAL. Now I'm working on LXRT. After that I will look in more detail into the performance issues. Wolfgang.
