See below ...

Michel Doyon wrote:
> 
> Hi,
> 
> Sometime ago, I posted some questions/comments about semaphores. I have
> some more results that I would like to share and get some
> feedback/suggestions.
> 
> Using RTAI uniprocessor scheduler, the time between the begining of
> rt_sem_signal(&sem) and after the rt_sem_wait(&sem) turns to be around 1.5
> usec ... quite fast ! (note one rt task send the rt_sem_signal and another
> rt task receives it.)
> 
> Using RTAI smp scheduler (apic), the time between the begining of
> rt_sem_signal(&sem) and after the rt_sem_wait(&sem) turns to be around
> 
> 4.9 usec if both rt task are set to run on the SAME processor.
> But, with one task on one processor and the other on the other one, we
> measure 14.9 usec.
> 
> Roughly 10 usec more than in the case where both tasks are on the same CPU
> with smp scheduler  and close to 10 times the time compared to uni
> processor scheduler.
> 
> I can see that this maybe the price to pay to use a dual processor system
> since data transfer will be almost instantaneous through RAM but I wonder
> if anybody ever thought of looking at this more closely ? From a
> previous post, I was told that there was somekind of communication or
> signal between the CPU that would explain this delay. Could somebody point
> me the piece of code doing that ?
> 
> Note: tests performed with DUAL P-III 500 Mhz system
> 
> Thank you
> 
> Michel
> ----------------------------------------------
> Michel Doyon, M.Eng.
> Senior STVF Control Engineer
> Canadian Space Agency
> 6767 route de l'aeroport
> St-Hubert (Quebec)
> J3Y 8Y9 - CANADA
> Tel.:  (450) 926 4679 - Fax :  (450) 926 4695
> [EMAIL PROTECTED]
> 
> -- [rtl] ---
> To unsubscribe:
> echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
> echo "unsubscribe rtl <Your_email>" | mail [EMAIL PROTECTED]
> ---
> For more information on Real-Time Linux see:
> http://www.rtlinux.org/rtlinux/

-- 

Here's the catch:
This is from upscheduler/rtai_sched.c:
=============================================================================
int rt_sem_signal(SEM *sem)
{
/* ... */
        hard_save_flags_and_cli(flags);
        if ((sem->count)++ < 0) {
                if ((task = (sem->queue.next)->task)) {
                        sem->queue.next = task->queue.next;
                        (task->queue.next)->prev = &(sem->queue);
                        task->blocked_on.sem = NOTHING;
                        if ((task->state &= ~(SEMAPHORE | DELAYED)) == READY) {
                                rt_schedule();
                        }
                }
        }
        hard_restore_flags(flags);
/* ... */
=============================================================================
This is from smpscheduler/rtai_sched.c:
=============================================================================
int rt_sem_signal(SEM *sem)
{
/* ... */
        flags = rt_global_save_flags_and_cli();
        if ((sem->count)++ < 0) {
                if ((task = (sem->queue.next)->task)) {
                        sem->queue.next = task->queue.next;
                        (task->queue.next)->prev = &(sem->queue);
                        task->blocked_on.sem = NOTHING;
                        if ((task->state &= ~(SEMAPHORE | DELAYED)) == READY) {
                                rt_schedule();
                        }
                }
        }
        rt_global_restore_flags(flags);
/* ... */
=============================================================================

Notice that the code is almost identical except that the UP version uses
hard_save_flags_and_cli() while the SMP version uses rt_global_save_flags_and_cli().
What's the difference?

Well here's what hard_save_flags_and_cli() amounts to (include/rtai.h) :
#define hard_save_flags_and_cli(x) \
__asm__ __volatile__("pushfl; popl %0; cli": "=g" (x): :"memory")

This is pretty straight forward, he saves the flags and clears the interrupt flag.

Now, here's rt_global_save_flags_and_cli() (include/rtai.h):
static inline int rt_global_save_flags_and_cli(void)
{
        unsigned long flags;

        hard_save_flags(flags);
        hard_cli();
        if (!test_and_set_bit(hard_cpu_id(), locked_cpus)) {
                while (test_and_set_bit(31, locked_cpus));
                return ((flags & (1 << IFLAG)) + 1);
        } else {
                return (flags & (1 << IFLAG));
        }
}

hard_save_flags() and hard_cli() come from the linux source in include/asm/system.h
and basically amount to the same thing as the macro used in UP. hard_cpu_id() comes
from include/asm/smp.h and forces the caller to access the APIC. locked_cpus comes
from rtai.c. test_and_set_bit() comes from include/asm/bitops.h and looks like:
extern __inline__ int test_and_set_bit(int nr, volatile void * addr)
{
        int oldbit;

        __asm__ __volatile__( LOCK_PREFIX
                "btsl %2,%1\n\tsbbl %0,%0"
                :"=r" (oldbit),"=m" (ADDR)
                :"Ir" (nr));
        return oldbit;
}

Where LOCK_PREFIX resolves to the "lock" instruction prefix on an SMP machine.

>From what I understand, it is this lock prefix thats does all the powerfull stuff.
According to the intel instruction set manual for the Pentium family:

"The LOCK prefix causes the LOCK# signal of the Pentium processor to be asserted
during execution of the instruction that follows it. In a multiprocessor environment,
this signal can be used to ensure that the Pentium processor has exclusive use of
any shared memory while LOCK# is asserted."

In the "Multiple-processor management" part of intel's docs, this mechanism is
extensively covered. Basically LOCK ensures that the test is made atomically while 
locking
all processors. Once the test done, the old bit's value is stored in the CF flag. sbbl
uses that flag to do some operations and the overall effect is that test_and_set_bit
will return 1 if the old bit's value was 1 (it had already been locked from another
place in RTAI) and 0 if the old bit's value was 0 (the cpus had not been locked).

The point is that it will do a busy wait until the tested bit becomes 0. Which
means that all other parts of RTAI needing locked CPUs have called on
rt_global_restore_flags().

To sum up, rt_global_save_flags_and_cli() makes sure that the current processor
hasn't already locked up the system. If it hasn't, it locks up all the system.
Once done, it returns the flags back to the caller.

This is much more expensive than simply clearing the IF flag in UP mode since
it involves busy waiting on the locked_cpus flags and multiple CPU locking.

J'esp�re que �a vous aidera :)

===================================================
                 Karim Yaghmour
               [EMAIL PROTECTED]
          Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===================================================
-- [rtl] ---
To unsubscribe:
echo "unsubscribe rtl" | mail [EMAIL PROTECTED] OR
echo "unsubscribe rtl <Your_email>" | mail [EMAIL PROTECTED]
---
For more information on Real-Time Linux see:
http://www.rtlinux.org/rtlinux/

Reply via email to