I am posting an edited version of this exchange to the group since I think the information may be generally useful.
It's in /usr/src/linux/include/asm/semaphore.h. Now that you mention it, and now that I look at the components of the kernel semaphore_t structure, 12 long words looks to be a little tight. You might try increasing that to 20 just to see what happens.
I will do likewise in LiS-2.17.
I don't want the size of the lis_semaphore_t or lis_spin_lock_t structures to depend upon the kernel version. Doing so would make STREAMS drivers dependent upon the kernel version and would force compilation from source on the target machine. So I just want to leave room in the LiS structures for a kernel structure to fit in.
By the way, if you use the lis_sem_alloc() routine it will allocate enough space for the kernel semaphore no matter what array size is in the lis_semaphore_t structure. This is a way to guarantee that the structure is compatible without your driver having any knowledge of kernel semaphore structures.
-- Dave
At 07:23 PM 2/17/2004, Matthew Gierlach wrote:
Hi Dave:
Would a change in the semaphore type in RH EL 3.0 produce this
behavior? I've looked at the definition of lis_semaphore_t in
LiS. It does not reference the Linux semaphore type, but it does
reserve 12 long words for non-PPC compilations and 50 long words
for PPC compilations. These long words are referenced as the semaphore
in the LiS code.
Why the difference between PPC and non-PPC?
What files would I look in in the EL kernel source to determine
if this is a compatibility issue?
Thanks, Matt
On Tue, 17 Feb 2004, Dave Grothe wrote:
> No. I was seeing different symptoms. What I ran into was a chain of
> events that looked like runqueues calls service procedure calls kernel
> utility calls schedule(). But runqueues was holding a spin lock on the
> queue. So schedule() bumped the runqueues thread off the CPU. That caused
> a hang because of N other threads that wanted the same lock. So all CPUs
> were spinning on the lock and the only thread that would release the lock
> was scheduled off the CPUs. I have since changed the queue lock to a
> semaphore, plus a few other things that minimize this kind of contention in
> the first place.
>
> Your case, on the surface, looks like spin locks are not working on your
> system. The message from LiS is an assertion failure that should never
> print out in the absence of contention for a queue head which is otherwise
> protected by a spin lock. I have never seen the message that you are seeing.
>
> Is there something about your machine (caching? hardware locking? memory
> access sequencing) that would make the Linux implementation of spin locks
> fail? My gut feel is that you are looking for something very near the
> hardware here. Do you have another 2 CPU XEON machine to try it on? I am
> using an IBM x335. Take a careful walk through your machines setup menus
> to see if there is some BIOS option that might affect multiple requestors
> to memory.
>
> Remember, Sun builds SPARCs and is used to thinking about memory in a
> different way that us Intel guys.
>
> -- Dave
>
--- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.591 / Virus Database: 374 - Release Date: 2/17/2004
