This is an issue beyond my understanding - although it happens on a Solaris 8 
box maybe someone can shed some light on it:
There is an Oracle tool "maxmem". When it runs normally, it should produce 
following output:
$ maxmem
Memory starts at: 4296023312 (100101d10)
Memory ends at:   325599592444 (4bcf3f7ffc)
Memory available: 321303569132 (4acf2f62ec)

However on one of my boxes, it gets an SEGV without giving any output. In the 
mean time there is a kernel warning:
Sorry, no swap space to grow stack for pid 5931 (maxmem)
while swap -s shows we have 300GB free swap space and vmstat shows we have 80GB 
free memory:

#swap -s
total: 40900824k bytes allocated + 449680k reserved = 41350504k used, 
310922208k available

#vmstat 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s4 s6   in   sy   cs us sy id
 0 1 0 311815352 81960216 1036 7334 59 6 4 0 0 17 17 0 0 3109 6017 30030 7 8 85
 0 1 0 310948000 80751032 277 6628 40 3 3 0 0 22 22 0 0 5847 65754 44885 13 12 
75

So natually I put it under truss to study the behavior. And here's the fun part 
starts.

When I run "truss -o /tmp/truss-11.out maxmem", it works just fine and gave out 
the messages I want:
truss -o /tmp/truss-11.out maxmem
Memory starts at: 4296023312 (100101d10)
Memory ends at:   322708094970 (4b22e6bffa)
Memory available: 318412071658 (4a22d6a2ea)
There is no segv, no kernel warnings under /var/adm/messages.

However if I run "truss -o /tmp/truss-12.out <Full path to maxmem>", it gets a 
segv and generates a kernel warning, just like running under command line.

The truss output for the failed command looks like this:
time()                                          = 1146674313
brk(0x4B1CE1E000)                               = 0
time()                                          = 1146674313
time()                                          = 1146674313
brk(0x4B1CE1E010)                               Err#11 EAGAIN
    Incurred fault #6, FLTBOUNDS  %pc = 0xFFFFFFFF7F60AD14
      siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7FFFDFF0
    Received signal #11, SIGSEGV [default]
      siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7FFFDFF0
        *** process killed ***



The successful one looks like this:

time()                                          = 1146674335
brk(0x4B1CF98000)                               = 0
time()                                          = 1146674335
time()                                          = 1146674335
brk(0x4B1CF98010)                               Err#11 EAGAIN
ioctl(1, TCGETA, 0xFFFFFFFF7FFFE6FC)            = 0
write(1, " M e m o r y   s t a r t".., 41)      = 41
write(1, " M e m o r y   e n d s  ".., 44)      = 44
write(1, " M e m o r y   a v a i l".., 44)      = 44
lseek(0, 0, SEEK_CUR)                           = 30288
_exit(0)


Most likely this is a buggy application, however it's weird that the same 
command behaves different under truss, and the only difference is one used the 
full path name, the other didn't.

Any idea?
 
 
--
This messages posted from opensolaris.org

Reply via email to