This is an issue beyond my understanding - although it happens on a Solaris 8
box maybe someone can shed some light on it:
There is an Oracle tool "maxmem". When it runs normally, it should produce
following output:
$ maxmem
Memory starts at: 4296023312 (100101d10)
Memory ends at: 325599592444 (4bcf3f7ffc)
Memory available: 321303569132 (4acf2f62ec)
However on one of my boxes, it gets an SEGV without giving any output. In the
mean time there is a kernel warning:
Sorry, no swap space to grow stack for pid 5931 (maxmem)
while swap -s shows we have 300GB free swap space and vmstat shows we have 80GB
free memory:
#swap -s
total: 40900824k bytes allocated + 449680k reserved = 41350504k used,
310922208k available
#vmstat 5
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr s0 s1 s4 s6 in sy cs us sy id
0 1 0 311815352 81960216 1036 7334 59 6 4 0 0 17 17 0 0 3109 6017 30030 7 8 85
0 1 0 310948000 80751032 277 6628 40 3 3 0 0 22 22 0 0 5847 65754 44885 13 12
75
So natually I put it under truss to study the behavior. And here's the fun part
starts.
When I run "truss -o /tmp/truss-11.out maxmem", it works just fine and gave out
the messages I want:
truss -o /tmp/truss-11.out maxmem
Memory starts at: 4296023312 (100101d10)
Memory ends at: 322708094970 (4b22e6bffa)
Memory available: 318412071658 (4a22d6a2ea)
There is no segv, no kernel warnings under /var/adm/messages.
However if I run "truss -o /tmp/truss-12.out <Full path to maxmem>", it gets a
segv and generates a kernel warning, just like running under command line.
The truss output for the failed command looks like this:
time() = 1146674313
brk(0x4B1CE1E000) = 0
time() = 1146674313
time() = 1146674313
brk(0x4B1CE1E010) Err#11 EAGAIN
Incurred fault #6, FLTBOUNDS %pc = 0xFFFFFFFF7F60AD14
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7FFFDFF0
Received signal #11, SIGSEGV [default]
siginfo: SIGSEGV SEGV_MAPERR addr=0xFFFFFFFF7FFFDFF0
*** process killed ***
The successful one looks like this:
time() = 1146674335
brk(0x4B1CF98000) = 0
time() = 1146674335
time() = 1146674335
brk(0x4B1CF98010) Err#11 EAGAIN
ioctl(1, TCGETA, 0xFFFFFFFF7FFFE6FC) = 0
write(1, " M e m o r y s t a r t".., 41) = 41
write(1, " M e m o r y e n d s ".., 44) = 44
write(1, " M e m o r y a v a i l".., 44) = 44
lseek(0, 0, SEEK_CUR) = 30288
_exit(0)
Most likely this is a buggy application, however it's weird that the same
command behaves different under truss, and the only difference is one used the
full path name, the other didn't.
Any idea?
--
This messages posted from opensolaris.org