Matt:

I am posting an edited version of this exchange to the group since I think the information may be generally useful.

It's in /usr/src/linux/include/asm/semaphore.h.  Now that you mention it, and now that I look at the components of the kernel semaphore_t structure, 12 long words looks to be a little tight.  You might try increasing that to 20 just to see what happens.

I will do likewise in LiS-2.17.

I don't want the size of the lis_semaphore_t or lis_spin_lock_t structures to depend upon the kernel version.  Doing so would make STREAMS drivers dependent upon the kernel version and would force compilation from source on the target machine.  So I just want to leave room in the LiS structures for a kernel structure to fit in.

By the way, if you use the lis_sem_alloc() routine it will allocate enough space for the kernel semaphore no matter what array size is in the lis_semaphore_t structure.  This is a way to guarantee that the structure is compatible without your driver having any knowledge of kernel semaphore structures.

-- Dave

At 07:23 PM 2/17/2004, Matthew Gierlach wrote:

Hi Dave:

        Would a change in the semaphore type in RH EL 3.0 produce this
        behavior? I've looked at the definition of lis_semaphore_t in
        LiS. It does not reference the Linux semaphore type, but it does
        reserve 12 long words for non-PPC compilations and 50 long words
        for PPC compilations. These long words are referenced as the semaphore
        in the LiS code.

        Why the difference between PPC and non-PPC?

        What files would I look in in the EL kernel source to determine
        if this is a compatibility issue?

        Thanks, Matt

On Tue, 17 Feb 2004, Dave Grothe wrote:

> No.  I was seeing different symptoms.  What I ran into was a chain of
> events that looked like runqueues calls service procedure calls kernel
> utility calls schedule().  But runqueues was holding a spin lock on the
> queue.  So schedule() bumped the runqueues thread off the CPU.  That caused
> a hang because of N other threads that wanted the same lock.  So all CPUs
> were spinning on the lock and the only thread that would release the lock
> was scheduled off the CPUs.  I have since changed the queue lock to a
> semaphore, plus a few other things that minimize this kind of contention in
> the first place.
>
> Your case, on the surface, looks like spin locks are not working on your
> system.  The message from LiS is an assertion failure that should never
> print out in the absence of contention for a queue head which is otherwise
> protected by a spin lock.  I have never seen the message that you are seeing.
>
> Is there something about your machine (caching? hardware locking? memory
> access sequencing) that would make the Linux implementation of spin locks
> fail?  My gut feel is that you are looking for something very near the
> hardware here.  Do you have another 2 CPU XEON machine to try it on?  I am
> using an IBM x335.  Take a careful walk through your machines setup menus
> to see if there is some BIOS option that might affect multiple requestors
> to memory.
>
> Remember, Sun builds SPARCs and is used to thinking about memory in a
> different way that us Intel guys.
>
> -- Dave
>
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.591 / Virus Database: 374 - Release Date: 2/17/2004

Reply via email to