Paging sub system seems extremely slow

Klaus Johansen Thu, 27 Dec 2007 00:25:04 -0800

Hi,

We’re experiencing a strange behaviour of our paging sub system and have big 
difficulties to solve the problem – hope some of you are able to help us. 
        
Our VM system pages extremely slowly: a long inactive Linux guest is paged in 
from DASD by z/VM with approximately 1 MB/sec in an otherwise almost idle 
system. The overall system page rate is 400-800 pages/second – half for reading 
pages and half for writing pages. We have 12 paging disks (3390-9) distributed 
over 2 LCUs and attached with 6 FICON channels – neither disks nor channels are 
in any way a bottleneck according to our performance readings.


At the same time the main storage is badly utilized: Performance Toolkit 
reports a storage utilization of 65-75% and it reports strange values for      
“Total real Storage” and “Total available” storage: namely 0kb. 

What can be wrong?

Kind regards,
Klaus Johansen

Additional information:
We have a z/VM system with approximately 25 zPenguins in an LPAR with 2 IFLs, 
12GB main storage and 3GB expanded storage. We have overcommitted storage 2:1. 
We are fully aware that the storage size for the Linux guest should be 
minimized as much as possible. Linux uses all available memory for file cache 
etc. - We have surely seen that z/VM pages file cache out on to the paging 
volumes, making allocation of memory slow.  We have not considered this a 
problem since the guests are rarely “active” at the same time (the memory 
accessed by active Linuxes should easily fit in main storage). But we didn’t 
know that VM pages with 1MB/sec…

The “User Wait States” screen in Performance Toolkit confirms that the guests 
are in page wait (97-100%) during these “1MB/sec. page-ins”.

The system is actually capable of paging faster: When all linuxes are paging at 
the same time (haven’t had time to reschedule daily cron job for log rotation) 
we see a momentarily page rate at 7000pages/sec. – much better but nothing 
extraordinary.  

We have considered that our SRM LDUBUF, SRM STOBUF etc. needs tuning but 
according the “Performance book” these values mostly affect the scheduler and 
dispatcher in relation to the Q1-Q3-queues – and there seems to be no problems 
entering the dispatch list (furthermore: it has no effect to enable “quick 
dispatch” for a slow paging guest).   

Storage Utilization
Interval 15:39:59-15:40:59, on 2007/12/21  (CURRENT interval, select average 
for mean data)
 
 Main storage utilization:                 XSTORE utilization: 
 Total real storage              0kB       Total available                  0kB 
 Total available                 0kB       Att. to virt. machines           0kB 
 Offline storage frames    .......kB       Size of CP partition         3'072MB 
 SYSGEN storage size       .......kB       CP XSTORE utilization           99% 
 CP resident nucleus       .......kB       Low threshold for migr.      1'680kB 
 Shared storage            117'377MB       XSTORE allocation rate           0/s 
 FREE storage pages        .......kB       Average age of XSTORE blks    2855s 
 FREE stor. subpools        52'976kB       Average age at migration       ...s 
 Subpool stor. utilization       0%        
 Total DPA size             12'132MB       MDCACHE utilization: 
 Locked pages               37'548kB       Min. size in XSTORE              0kB 
 Trace table               .......kB       Max. size in XSTORE          3'072MB 
 Pageable                   12'095MB       Ideal size in XSTORE        77'692kB 
 Storage utilization            86%        Act. size in XSTORE         88'788kB 
 Tasks waiting for a frame       3         Bias for XSTORE               1.00 
 Tasks waiting for a page        4/s       Min. size in main stor.          0kB 
                                           Max. size in main stor.     12'288MB 
 V=R area:                                 Ideal size in main stor.     1'265MB 
 Size defined                  ...kB       Act. size in main stor.    234'228kB 
 FREE storage                  ...kB       Bias for main stor.           1.00 
 V=R recovery area in use      ...%        MDCACHE limit / user        32'544kB 
 V=R user                 ........         Users with MDCACHE inserts       2 
                                           MDISK cache read rate            2/s 
 Paging / spooling activity:               MDISK cache write rate       ...../s 
 Page moves <2GB for trans.      0/s       MDISK cache read hit rate        2/s 
 Fast path page-in rate         89/s       MDISK cache read hit ratio      76% 
 Long path page-in rate          1/s       
 Long path page-out rate         0/s       VDISKs: 
 Page read rate                179/s       System limit (blocks)       Unlim.   
 Page write rate                 0/s       User limit (blocks)         Unlim.
 Page read blocking factor       6         Main store page frames       11312   
 Page write blocking factor    ...         Expanded stor. pages           302   
 Migrate-out blocking factor   ...         Pages on DASD               426964   
 Paging SSCH rate               20/s                
 SPOOL read rate                 0/s                                            
 SPOOL write rate                0/s           


CP owned disks: 1 out of 12 paging-disk during “1MB/sec paging in”
Interval 15:37:04-15:37:05, on 2007/12/21
 
 Detailed Analysis for Device 9F01 ( CP OWNED )                       
 Device type :  3390-9      Function pend.:     .2ms      Device busy   :   12% 
 VOLSER      :  VSPPG7      Disconnected  :     .0ms      I/O contention:    0% 
 Nr. of LINKs:       0      Connected     :    3.9ms      Reserved      :    0% 
 Last SEEK   :    1525      Service time  :    4.1ms      SENSE SSCH    :    0 
 SSCH rate/s :    30.0      Response time :    4.1ms      Recovery SSCH :    0 
 Avoided/s   :      .0      CU queue time :     .0ms      Throttle del/s:  ... 
 Status: ONLINE                                                                 
   
 System Page/Spool I/O Details
 Page reads/s  :  10.0      Total pages/s   :   87.5      PG serv. time:  1.7ms
 Page writes/s :  77.5      I/Os avoided/s  :     .0      PG resp. time:  1.7ms
 Spool reads/s :    .0      System I/Os /s  :   87.5      PG queue len.: 1.07
 Spool writes/s:    .0      User interfer./s:     .0      Avail. bsize :    1
 
 Path(s) to device 9F01:    61    65    78    79    6C    6F             
 Channel path status   :    ON    ON    ON    ON    ON    ON             
      
 Device            Overall CU-Cache Performance           Split
 DIR ADDR VOLSER   IO/S %READ %RDHIT %WRHIT ICL/S BYP/S   IO/S %READ %RDHIT    
 16  9F01 VSPPG7    5.3    15     25    100    .0    .0    5.3    15     25 (N)
                                                            .0     0      0 (S)
                                                            .0     0      0 (F)
 
         MDISK Extent       Userid   Addr IO/s VSEEK Status    LINK MDIO/s 
 +-------------------------------------------------------------------------+
 !          1 - 10016       System PAGE     RD/s  WR/s  MLOAD  Used   IO/S !    
        
 !                          LOAD   ====>    10.0  77.5    1.7   20%   87.5 !    
        
 +-------------------------------------------------------------------------+

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört?
Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger?did=10

Paging sub system seems extremely slow

Reply via email to