Could you please shift these questions to fm -> discuss, since I am not capable 
to do this.

Hello,

Generally, I have 2 questions regarding FMA (memory related) on the M-Series 
(here: SES M9000).

(1)     Is there any (significant) influence of page retirement on system 
performance ?

We have suffered from a DIMM degradation recently (more than 500 retired pages 
and an entry in “fmadm faulty”), which seemed to have an impact on system 
performance (Oracle database): it resulted in kind of several temporary “hangs” 
of the database. Unfortunately, we didn’t panic the system to get a system dump 
to get that analyzed afterwards. Therefore, we don’t have any proof on that 
assumption.

In another blog on another website I have read that somebody else seemed to 
have suffered from similar symptoms (although very like with another kernel 
patch level), which might be even severe under heavy load.

So I have the question, if there is any proof (knowledge) that there is a 
direct relation between page retirement and system performance (is U5 / U7 
behaving significantly different ?) ?
Is there anybody who has experienced a similar phenomenon ?

To illustrate our symptoms, here is some information of the domain.

1195 entries like these over a period of 11.5 hours (excerpt from fmdump –a):

May 03 10:17:50.1529 b634cf6e-cc33-e46d-ab6d-8d800ef05865 SUN4U-8000-1A

high run queue value (excerpt from sar):

runq-sz %runocc swpq-sz %swpocc
09:00:15   104.6      93     0.0       0
09:10:13   140.7      93     0.0       0
09:20:37   180.0      97     0.0       0
09:30:38   126.3      96     0.0       0
09:40:22   133.6      96     0.0       0
09:50:33   185.3      95     0.0       0
10:00:35   121.1      94     0.0       0
10:10:20    49.7      92     0.0       0
10:20:10   177.2      98     0.0       0
10:30:16    32.0      94     0.0       0
10:40:16     3.1      86     0.0       0

The entries in FMA stopped 10:24 and this is soon reflected in the normalizing 
run queue size.

(2)     Are there any public available tools/packages/commands (XSCF, Solaris 
onboard, downloadable tool) to simulate a page-retirement ?

We need an actual page-retirement (done by the memory controller) not an FMA 
simulation.

So far we considered 3 methods:

-       Solaris FMA demo kit, which doesn’t work, because there is no 
implementation for the M-Series
-       fminject: we were told that this causes only an error simulation in 
FMA, but doesn’t initiate an actual page retirement on the DIMM
-       mtst: seems to part of a package which is not available for the public 
and there is no public documentation

Our goal is to initiate several page retirements while having constant writes 
to the memory during a significant system load. We want to see whether there is 
a significant performance impact and whether there is a positive change in 
behavior from Solaris 10 U5 compared to U7(8).

Any suggestions how to achieve this ?

---------------------------------------------------------------------------

Bonus section.

Well, since I have a basic understanding of FMA, there are still some 
unanswered questions. My life doesn’t depend on getting those answers, but if 
you have any input based on the knowledge of the FMA reaction agents or a 
longtime experience especially with the M-Series and Solaris 10, then any input 
is welcome.

-       Why is the memory error evaluation of Solaris and the XSCF different 
(intended / benefit ?) -> there might be a degraded DIMM in XSCF but not in 
Solaris and vice versa ?
-       As far as I have understood the algorithm, a potential page retirement 
for a DIMM will be stopped at a certain threshold (which is not only the 0.1 % 
of total amount of memory). Is this true and what is (could be) the reason for 
it (perhaps the performance issue ?) ?
-       If there is a direct relation between excessive page retirements and 
system performance, is there a safe (!) way to stop page retirement on the fly 
? Well, I know I risk an UE, but the risk is there even while having page 
retirement in place (although decreased).

Any input regarding my questions is very much appreciated !
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-discuss mailing list
opensolaris-discuss@opensolaris.org

Reply via email to