Hi Jim,
Thank you. see my update inline.
Thanks.
Best Regards,
Simon
On Fri, Nov 20, 2009 at 11:51 PM, Jim Mauro <James.Mauro at sun.com> wrote:
> If you're running out of memory, which it appears you are,
> you need to profile the memory consumers, and determine if
> you have either a memory leak somewhere, or an under-configured
> system. Note 16GB is really tiny by todays standards, especially for
> an M5000-class server. It's like putting an engine from a Ford sedan
> into an 18-wheel truck - the capacity to do work is severely limited
> by a lack of towing power. Laptops ship with 8GB these days...
>
> Back to memory consumers. We have;
> - The kernel
> - User processes
> - The file system cache (which is technically part of the kernel,
> but significant enough such that it should be measured
> seperately.
>
> If the database on a file system, and if so, which one (UFS? ZFS,
> VxFS?). How much shared memory is really being used
> (ipcs -a)?
>
Just UFS used.here's the ouput of "ipcs -a":
> If the system starts off well, and degrades over time, then you need
> to capture memory data over time and see what area is growing.
> Based on that data, we can determine if something is leaking memory,
> or you have an underconfigured machine.
>
> I would start with;
> echo "::memstat" | mdb -k
> ipcs -a
>
# ipcs -a
IPC status from <running system> as of Thu Nov 12 12:05:28 HKT 2009
T ID KEY MODE OWNER GROUP CREATOR CGROUP
CBYTES QNUM QBYTES LSPID LRPID STIME RTIME CTIME
Message Queues:
T ID KEY MODE OWNER GROUP CREATOR CGROUP
NATTCH SEGSZ CPID LPID ATIME DTIME CTIME
Shared Memory:
m 3 0xe9032d40 --rw------- sybase staff sybase
staff 3 738803712 1314 2125 20:47:22 no-entry 20:47:14
m 2 0x51 --rw-rw-r-- root root root
root 1 2000196 2122 8553 15:15:18 15:15:23 13:14:38
m 1 0x50 --rw-rw-r-- root root root
root 1 600196 2121 2121 13:14:38 no-entry 13:14:38
m 0 0xe9032d32 --rw------- sybase staff sybase
staff 3 7851147264 1314 2125 13:14:40 13:14:40 13:13:42
T ID KEY MODE OWNER GROUP CREATOR CGROUP
NSEMS OTIME CTIME
Semaphores:
s 1 0x51 --ra-ra-ra- root root root
root 6 12:05:28 13:14:38
s 0 0x50 --ra-ra-ra- root root root
root 6 12:05:28 13:14:38
# ipcs -mb (after adjust the share memory define in "/etc/system" from
0xfffffffff to 0x20000000)
IPC status from <running system> as of Thu Nov 19 16:38:17 HKT 2009
T ID KEY MODE OWNER GROUP SEGSZ
Shared Memory:
m 2 0x51 --rw-rw-r-- root root 2000196
m 1 0x50 --rw-rw-r-- root root 600196
m 0 0xe9032d32 --rw------- sybase staff 8548687872
> ps -eo pid,vsz,rss,class,pri,fname,args
> prstat -c 1 30
>
>From the "prstat" output,we found 3 sybase process,and each process derived
12 threads,the java process(launched by customer application) derived total
370 threads, I think it's too many threads(especially of "java" program)
that generate excessive stack/heaps,and finally used up the RAM ?
So I think decrease the share memory used by sybase(defined at sybase
configuration layer,not in "/etc/system" file) would be helpful ?
> kstat -n system_pages
>
I capatured the system_pages usage for about 0.5hr,one piece looks as below:
Mon Nov 16 17:24:25 2009
module: unix instance: 0
name: system_pages class: pages
availrmem 857798
crtime 89.53186
desfree 15914
desscan 8972
econtig 188874752
fastscan 1002870
freemem 30730
kernelbase 16777216
lotsfree 31828
minfree 7957
nalloc 66478696
nalloc_calls 19381
nfree 55736969
nfree_calls 14546
nscan 5520
pagesfree 30730
pageslocked 1169036
pagestotal 2037012
physmem 2058547
pp_kernel 189372
slowscan 100
snaptime 359704.2493636
>
> You need to collect that data and some regular interval
> with timestamps. The interval depends on how long it takes
> the machine to degrade. If the systems goes from fresh boot to
> degraded state in 1 hour, I'd collect the data every second.
> If the machine goes from fresh boot to degraded state in 1 week,
> I'd grab the data every 2 hours or so.
>
> /jim
>
>
> Simon wrote:
>
>> Hi Experts,
>>
>> Here's the performance related question,please help to review what can I
>> do to get the issue fixed ?
>>
>> IHAC who has one M5000 with Solaris 10 10/08(KJP: 138888-01) installed
>> and 16GB RAM configured,running sybase ASE 12.5 and JBOSS
>> application,recently,they felt the OS got very slow after OS running for
>> some sime,collected vmstat data points out memory shortage,as:
>>
>> # vmstat 5
>> kthr memory page disk faults cpu
>> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
>> 0 0 153 6953672 254552 228 228 1843 1218 1687 0 685 3 2 0 0 2334 32431
>> 3143 1 1 97
>> 0 0 153 6953672 259888 115 115 928 917 917 0 264 0 35 0 2 2208 62355 3332
>> 7 3 90
>> 0 0 153 6953672 255688 145 145 1168 1625 1625 0 1482 0 6 1 0 2088 40113
>> 3070 2 1 96
>> 0 0 153 6953640 256144 111 111 894 1371 1624 0 1124 0 6 0 0 2080 55278
>> 3106 3 3 94
>> 0 0 153 6953640 256048 241 241 1935 2585 3035 0 1009 0 18 0 0 2392 40643
>> 3164 2 2 96
>> 0 0 153 6953648 257112 236 235 1916 1710 1710 0 1223 0 7 0 0 2672 62582
>> 3628 3 4 93
>>
>> As above,the "w" column is very high all time,and "sr" column also kept
>> very high,which indicates the page scanner is activated and busying for
>> page out,but the CPU is very idle,checked "/etc/system",found one
>> improper entry:
>> set shmsys:shminfo_shmmax = 0xffffffffffff
>>
>> So I think it's the improper share memory setting to cause too many
>> physical RAM was reserved by application and suggest to adjustment the
>> share memory to 8GB(0x200000000),but as customer feedback,seems it got
>> worst result based on new vmstat output:
>>
>> kthr memory page disk faults cpu
>> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
>> 0 6 762 3941344 515848 18 29 4544 0 0 0 0 4 562 0 1 2448 25687 3623 1 2 97
>> 0 6 762 4235016 749616 66 21 4251 2 2 0 0 0 528 0 0 2508 50540 3733 2 5 93
>> 0 6 762 4428080 889864 106 299 4694 0 0 0 0 1 573 0 7 2741 182274 3907 10
>> 4 86
>> 0 5 762 4136400 664888 19 174 4126 0 0 0 0 6 511 0 0 2968 241186 4417 18 9
>> 73
>> 0 7 762 3454280 193776 103 651 2526 3949 4860 0 121549 11 543 0 5 2808
>> 149820 4164 10 12 78
>> 0 9 762 3160424 186016 61 440 1803 7362 15047 0 189720 12 567 0 5 3101
>> 119895 4125 6 13 81
>> 0 6 762 3647456 403056 44 279 4260 331 331 0 243 10 540 0 3 2552 38374
>> 3847 5 3 92
>>
>> the "w" & "sr" value increased instead,why ?
>>
>> And I also attached the "prstat" outout,it's a prstat snapshot after
>> share memory adjustment,please help to have a look ? what can I do next
>> to get the issue solved ? what's the possible factors to cause memory
>> shortage again and again,even they have 16GB RAM + 16GB Swap the physical
>> RAM really shortage?
>> Or is there any useful dtrace script to trace the problem ? Thanks very
>> much !
>>
>> Best Regards,
>> Simon
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://mail.opensolaris.org/pipermail/opensolaris-help/attachments/20091121/31dd3e1a/attachment.html>