[osol-help] [dtrace-discuss] Who're stealing memory ?

Simon Sat, 21 Nov 2009 14:27:41 +0800

Hi Jim,

Thank you. see my update inline.


Thanks.
Best Regards,
Simon


On Fri, Nov 20, 2009 at 11:51 PM, Jim Mauro <James.Mauro at sun.com> wrote:

> If you're running out of memory, which it appears you are,
> you need to profile the memory consumers, and determine if
> you have either a memory leak somewhere, or an under-configured
> system. Note 16GB is really tiny by todays standards, especially for
> an M5000-class server. It's like putting an engine from a Ford sedan
> into an 18-wheel truck - the capacity to do work is severely limited
> by a lack of towing power. Laptops ship with 8GB these days...
>
> Back to memory consumers. We have;
> - The kernel
> - User processes
> - The file system cache (which is technically part of the kernel,
>   but significant enough such that it should be measured
>   seperately.
>
> If the database on a file system, and if so, which one (UFS? ZFS,
> VxFS?). How much shared memory is really being used
> (ipcs -a)?
>

Just UFS used.here's the ouput of "ipcs -a":


> If the system starts off well, and degrades over time, then you need
> to capture memory data over time and see what area is growing.
> Based on that data, we can determine if something is leaking memory,
> or you have an underconfigured machine.
>
> I would start with;
> echo "::memstat" | mdb -k
> ipcs -a
>

# ipcs -a
IPC status from <running system> as of Thu Nov 12 12:05:28 HKT 2009
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP
CBYTES  QNUM QBYTES LSPID LRPID   STIME    RTIME    CTIME
Message Queues:
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP
NATTCH      SEGSZ  CPID  LPID   ATIME    DTIME    CTIME
Shared Memory:
m          3   0xe9032d40 --rw-------   sybase    staff   sybase
staff      3  738803712  1314  2125 20:47:22 no-entry 20:47:14
m          2   0x51       --rw-rw-r--     root     root     root
root      1    2000196  2122  8553 15:15:18 15:15:23 13:14:38
m          1   0x50       --rw-rw-r--     root     root     root
root      1     600196  2121  2121 13:14:38 no-entry 13:14:38
m          0   0xe9032d32 --rw-------   sybase    staff   sybase
staff      3 7851147264  1314  2125 13:14:40 13:14:40 13:13:42
T         ID      KEY        MODE        OWNER    GROUP  CREATOR   CGROUP
NSEMS   OTIME    CTIME
Semaphores:
s          1   0x51       --ra-ra-ra-     root     root     root
root     6 12:05:28 13:14:38
s          0   0x50       --ra-ra-ra-     root     root     root
root     6 12:05:28 13:14:38

# ipcs -mb (after adjust the share memory define in "/etc/system" from
0xfffffffff to 0x20000000)
IPC status from <running system> as of Thu Nov 19 16:38:17 HKT 2009
T         ID      KEY        MODE        OWNER    GROUP      SEGSZ
Shared Memory:
m          2   0x51       --rw-rw-r--     root     root    2000196
m          1   0x50       --rw-rw-r--     root     root     600196
m          0   0xe9032d32 --rw-------   sybase    staff 8548687872



> ps -eo pid,vsz,rss,class,pri,fname,args
> prstat -c 1 30
>

>From the "prstat" output,we found 3 sybase process,and each process derived
12 threads,the java process(launched by customer application) derived total
370 threads, I think it's too many threads(especially of "java" program)
that generate excessive stack/heaps,and finally used up the RAM ?

So I think decrease the share memory used by sybase(defined at sybase
configuration layer,not in "/etc/system" file) would be helpful ?


> kstat -n system_pages
>

I capatured the system_pages usage for about 0.5hr,one piece looks as below:

Mon Nov 16 17:24:25 2009
module: unix                            instance: 0
name:   system_pages                    class:    pages
    availrmem                       857798
    crtime                          89.53186
    desfree                         15914
    desscan                         8972
    econtig                         188874752
    fastscan                        1002870
    freemem                         30730
    kernelbase                      16777216
    lotsfree                        31828
    minfree                         7957
    nalloc                          66478696
    nalloc_calls                    19381
    nfree                           55736969
    nfree_calls                     14546
    nscan                           5520
    pagesfree                       30730
    pageslocked                     1169036
    pagestotal                      2037012
    physmem                         2058547
    pp_kernel                       189372
    slowscan                        100
    snaptime                        359704.2493636



>
> You need to collect that data and some regular interval
> with timestamps. The interval depends on how long it takes
> the machine to degrade. If the systems goes from fresh boot to
> degraded state in 1 hour, I'd collect the data every second.
> If the machine goes from fresh boot to degraded state in 1 week,
> I'd grab the data every 2 hours or so.
>
> /jim
>
>
> Simon wrote:
>
>> Hi Experts,
>>
>> Here's the performance related question,please help to review what can I
>> do to get the issue fixed ?
>>
>> IHAC who has one M5000 with Solaris 10 10/08(KJP: 138888-01) installed
>> and 16GB RAM configured,running sybase ASE 12.5 and JBOSS
>> application,recently,they felt the OS got very slow after OS running for
>> some sime,collected vmstat data points out memory shortage,as:
>>
>> # vmstat 5
>> kthr memory page disk faults cpu
>> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
>> 0 0 153 6953672 254552 228 228 1843 1218 1687 0 685 3 2 0 0 2334 32431
>> 3143 1 1 97
>> 0 0 153 6953672 259888 115 115 928 917 917 0 264 0 35 0 2 2208 62355 3332
>> 7 3 90
>> 0 0 153 6953672 255688 145 145 1168 1625 1625 0 1482 0 6 1 0 2088 40113
>> 3070 2 1 96
>> 0 0 153 6953640 256144 111 111 894 1371 1624 0 1124 0 6 0 0 2080 55278
>> 3106 3 3 94
>> 0 0 153 6953640 256048 241 241 1935 2585 3035 0 1009 0 18 0 0 2392 40643
>> 3164 2 2 96
>> 0 0 153 6953648 257112 236 235 1916 1710 1710 0 1223 0 7 0 0 2672 62582
>> 3628 3 4 93
>>
>> As above,the "w" column is very high all time,and "sr" column also kept
>> very high,which indicates the page scanner is activated and busying for
>> page out,but the CPU is very idle,checked "/etc/system",found one
>> improper entry:
>> set shmsys:shminfo_shmmax = 0xffffffffffff
>>
>> So I think it's the improper share memory setting to cause too many
>> physical RAM was reserved by application and suggest to adjustment the
>> share memory to 8GB(0x200000000),but as customer feedback,seems it got
>> worst result based on new vmstat output:
>>
>> kthr memory page disk faults cpu
>> r b w swap free re mf pi po fr de sr m0 m1 m4 m5 in sy cs us sy id
>> 0 6 762 3941344 515848 18 29 4544 0 0 0 0 4 562 0 1 2448 25687 3623 1 2 97
>> 0 6 762 4235016 749616 66 21 4251 2 2 0 0 0 528 0 0 2508 50540 3733 2 5 93
>> 0 6 762 4428080 889864 106 299 4694 0 0 0 0 1 573 0 7 2741 182274 3907 10
>> 4 86
>> 0 5 762 4136400 664888 19 174 4126 0 0 0 0 6 511 0 0 2968 241186 4417 18 9
>> 73
>> 0 7 762 3454280 193776 103 651 2526 3949 4860 0 121549 11 543 0 5 2808
>> 149820 4164 10 12 78
>> 0 9 762 3160424 186016 61 440 1803 7362 15047 0 189720 12 567 0 5 3101
>> 119895 4125 6 13 81
>> 0 6 762 3647456 403056 44 279 4260 331 331 0 243 10 540 0 3 2552 38374
>> 3847 5 3 92
>>
>> the "w" & "sr" value increased instead,why ?
>>
>> And I also attached the "prstat" outout,it's a prstat snapshot after
>> share memory adjustment,please help to have a look ? what can I do next
>> to get the issue solved ? what's the possible factors to cause memory
>> shortage again and again,even they have 16GB RAM + 16GB Swap the physical
>> RAM really shortage?
>> Or is there any useful dtrace script to trace the problem ? Thanks very
>> much !
>>
>> Best Regards,
>> Simon
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss at opensolaris.org
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://mail.opensolaris.org/pipermail/opensolaris-help/attachments/20091121/31dd3e1a/attachment.html>

[osol-help] [dtrace-discuss] Who're stealing memory ?

Reply via email to