Re: [Linux-streams] Re: SMP Panic we discussed

John A. Boyd Jr. Wed, 18 Feb 2004 08:30:10 -0800

I should also mention a few other things (consider this arm-twisting,
to get you to use this technigue...).

I'm sure you've seen "opaque" declarations, which are a sort of
forward declaration that accomplishes information hiding, e.g.,

struct lis_semaphore;

would allow one to use the 'lis_semaphore' struct without revealing
its internal structure.

A zero-element array at the end of a structure is conceptually
akin, i.e., it is an "opaque" array declaration.

Otherwise, I think I've seen it called an "implicit pointer".
The declaration of a zero-element array at the end of a struct does
not add to the size of the struct, nor is storage for the array
allocated by the compiler.  It indicates that the storage for the
array will immediately follow the storage for the struct, and
moreover that no space for either the array, or any reference to
it, is allocated by the compiler.  This affords a mechanism to
reference storage without using a separate stored address (e.g.,
a pointer inside a struct) - the address of the array is computed
by the compiler to be the address immediately following the
struct, computed as an offset, as one might expect.

-John

John A. Boyd Jr. wrote:

I noticed the sem_mem field a few days ago when I was looking at the
module loading code, and I wondered then about it.
Gcc allows one to use a zero-element array at the end of a struct
definition, to indicate either that the struct is of variable size,
or that the number of elements in an actual array is determined not
at compile time but at allocation time (the two interpretations are
somewhat equivalent).
E.g.,
    struct lis_semaphore {
        ...
        long sem_mem[0];
    } lis_semaphore_t;
I suspect you knew this.  This construct is gcc-portable, i.e., all
versions of gcc allow it (the implementors have committed to allowing
it).
I have seen places where this is almost necessary, and still others
where it really simplifies coding.  It would simplify lis_sem_alloc,
though just a little, for example:
lis_semaphore_t *lis_sem_alloc(int count)
{
    lis_semaphore_t *lsem ;
    int             sem_size ;
- sem_size = sizeof(*lsem) - sizeof(lsem->sem_mem) + sizeof(struct semaphore); + sem_size = sizeof(*lsem) + sizeof(struct semaphore); lsem = (lis_semaphore_t *) lis_alloc_kernel(sem_size);

if (lsem == NULL) return(NULL) ;
    memset(lsem, 0, sem_size) ;
    lis_sem_fill(lsem, count) ;
    lsem->allocated = 1 ;
    return(lsem) ;
}
It may seem like a simply dangerous construct at first, but this
kind of situation is made more dangerous in my view by using a
constant array size that can't be relied upon, since allocation
doesn't match that constant size.  An array size of 0 at least
makes it plain that the array's size is not fixed or determinable,
and (if the programmer is prudent) both that allocation of the
structure is required, and that calculation of the additional
size is required.  It is also a good idea to provide some indication
in such structures of how much is allocated, e.g., by adding a
field to the structure to store the allocated size.
At the least, the gcc folks have always thought this construct
appropriate enough to support (it's discussed in gcc's docs).
If you know that LiS will be compiled with gcc regardless of
platform, there would be no harm in replacing your conditional
size definitions for sem_mem[] with the above definition.
My $0.02...
-John
Dave Grothe wrote:
Matt:

I am posting an edited version of this exchange to the group since I think the information may be generally useful.

It's in /usr/src/linux/include/asm/semaphore.h. Now that you mention it, and now that I look at the components of the kernel semaphore_t structure, 12 long words looks to be a little tight. You might try increasing that to 20 just to see what happens.

I will do likewise in LiS-2.17.

I don't want the size of the lis_semaphore_t or lis_spin_lock_t structures to depend upon the kernel version. Doing so would make STREAMS drivers dependent upon the kernel version and would force compilation from source on the target machine. So I just want to leave room in the LiS structures for a kernel structure to fit in.

By the way, if you use the lis_sem_alloc() routine it will allocate enough space for the kernel semaphore no matter what array size is in the lis_semaphore_t structure. This is a way to guarantee that the structure is compatible without your driver having any knowledge of kernel semaphore structures.

-- Dave

At 07:23 PM 2/17/2004, Matthew Gierlach wrote:
Hi Dave:

Would a change in the semaphore type in RH EL 3.0 produce this behavior? I've looked at the definition of lis_semaphore_t in LiS. It does not reference the Linux semaphore type, but it does reserve 12 long words for non-PPC compilations and 50 long words for PPC compilations. These long words are referenced as the semaphore in the LiS code.

Why the difference between PPC and non-PPC?
        What files would I look in in the EL kernel source to determine
        if this is a compatibility issue?
Thanks, Matt

On Tue, 17 Feb 2004, Dave Grothe wrote:

> No. I was seeing different symptoms. What I ran into was a chain of > events that looked like runqueues calls service procedure calls kernel > utility calls schedule(). But runqueues was holding a spin lock on the > queue. So schedule() bumped the runqueues thread off the CPU. That caused > a hang because of N other threads that wanted the same lock. So all CPUs > were spinning on the lock and the only thread that would release the lock > was scheduled off the CPUs. I have since changed the queue lock to a > semaphore, plus a few other things that minimize this kind of contention in > the first place. > > Your case, on the surface, looks like spin locks are not working on your > system. The message from LiS is an assertion failure that should never > print out in the absence of contention for a queue head which is otherwise > protected by a spin lock. I have never seen the message that you are seeing. > > Is there something about your machine (caching? hardware locking? memory > access sequencing) that would make the Linux implementation of spin locks > fail? My gut feel is that you are looking for something very near the > hardware here. Do you have another 2 CPU XEON machine to try it on? I am > using an IBM x335. Take a careful walk through your machines setup menus > to see if there is some BIOS option that might affect multiple requestors > to memory. > > Remember, Sun builds SPARCs and is used to thinking about memory in a > different way that us Intel guys. > > -- Dave >
------------------------------------------------------------------------
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.591 / Virus Database: 374 - Release Date: 2/17/2004


_______________________________________________
Linux-streams mailing list
[EMAIL PROTECTED]
http://gsyc.escet.urjc.es/mailman/listinfo/linux-streams

Re: [Linux-streams] Re: SMP Panic we discussed

Reply via email to