Re: WLM issue with a proposed solution

2016-05-16 Thread Ted MacNEIL
Dispatching priorities mean nothing if the work is getting done. You're using 
the WLM; you should learn and use its terminology.

-teD
  Original Message  
From: Tracy Adams
Sent: Thursday, April 28, 2016 15:57
To: IBM-MAIN@LISTSERV.UA.EDU
Reply To: IBM Mainframe Discussion List
Subject: Re: WLM issue with a proposed solution

The importance (priority) of DB2 is set 2, as well as the CICS service class. 
It serves both the CICS and batch jobs.

I only speak of dispatching priorities because isn't ultimately that is driven 
by the collective results of WLM?

To Mark's question, I am not sure what is stalling those transactions, I will 
try to collect some delay information. 

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Martin Packer
Sent: Thursday, April 28, 2016 3:49 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

Hello Tracy.

What importance have you set DB2 address spaces' service class(es) to? 
Likewise the things it serves, such as CICS regions and CICS transactions/

If DB2 is getting locked out it could be caused by it being Imp 2 or something, 
rather than Imp 1 with a goal 70+.

I also note you're mainly talking dispatching priorities rather than WLM 
language.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator, Worldwide Cloud & Systems 
Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From: Tracy Adams <tad...@fbbrands.com>
To: IBM-MAIN@LISTSERV.UA.EDU
Date: 28/04/2016 19:22
Subject: WLM issue with a proposed solution
Sent by: IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



So here is my issue:

We have a soft capped LPAR that runs our DB2 and CICS regions and during the 
day some "marketing batch". On Wednesdays, the marketing batch (online submit 
via CICS) increases and by afternoon we hit our 4 hour soft cap. Once or twice 
while we are capped, the busiest CICS slow down to the point where some old 
automation kicks in to kill transactions over 45 seconds old, some of these 
transactions dump through DumpMaster, we then go to max sockets and more 
transactions dump and in 10 - 30 seconds all is fine again.

What I see: The CICS regions have a DP around EC and are meeting their service 
goal of 99% under .5 seconds. But there are tens of thousands transactions that 
have led to this. The batch jobs (3-5 of them), while running 10 - 15 % cpu 
have a DP of C0 and are in a discretionary level of the service class. I 
believe the problem lies with the DB2 service class. 
That has a definition of velocity at 66 and it tends to run below that when 
there is more contention in the system. The DP of the DB2 region is F6. 

My theory: when this brown out occurs the resources are maxed out and the CICS 
regions being the ones that have meet their goal and will have to suffer many 
transactions missing the service goal to make the DP go up. 
They get hung up just long enough to cause the delays that trigger the "panic" 
automation to clear the stalled transactions. Chaos breaks out! 

My proposal: A. limit the batch jobs to a max of three by controlling open 
initiators for their job class. B. change the DB2 velocity to 60 C. 
Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be a 
little more desperate.

Thoughts?

TIA,

Tracy

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-05-02 Thread Martin Packer
On the zIIP point assume most of DB2 DBM1 in V10 and nearly all in V11 is 
zIIP-eligible. (And, yes, SMF 30 gives you the actual numbers.)

And thereby hangs another tale... :-)

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Neil Duffee <nduf...@uottawa.ca>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   02/05/2016 20:20
Subject:        Re: WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



Caveat: as a daily digester, responses are implicitly delayed...

Tracy:  among other good advice you got, I'll emphasize that the 
Importance for your Databases (DB2, etc) must be higher than your 
Applications (Cics, etc) to avoid [some of] these time-out/deadlock 
scenarios.  I strongly suggest reading the WLM RedBook. [1]  It has 
specific chapters on Cics, DB2, etc.

Secondly, I'd avoid strangling WLM but, rather, tend to suggest loosening 
the rules.  If WLM has this leeway, it is more able to balance the 
workload and, after all, that's the whole point of a WorkLoad Manager.  I 
use the concept where the rules are, "what can you tolerate when things go 
south?" vs. "how do I want things to perform normally?" [2]  When there's 
sufficient resources, all your classes will over-perform.  By bumping up 
the Cics minimum you're forcing WLM to deprecate others of the same 
Importance (or less; such as DB2 from your message).  Rather, by loosening 
the restrictions, DB2 is allowed to breath some.  In fact, you'll see 
below [5] that our Online-Hi is 75% in 1 second but our typical Cics 
response is 0.3 seconds and 0.8 on bad days.

Third, you might consider removing your long-running Cics transactions to 
a different Transaction group because they can skew the accumulated WLM 
results.  Below [5], you'll see I have a group LONGRUN that encompasses 
monitoring tasks which, essentially, never end;  meaning *bad* response 
times.  Instead, because we cycle our production Cics each workday, 
they're shunted to the ONLINELG service class with 75% in 1 second so they 
don't pollute the ONLINEHI stats. 

Lastly, tho' I believe it is the default, make sure you have I/O Priority 
management [3] set to YES.  It will encourage WLM to promote lower classed 
work such as Batch to a higher DP (temporarily) to clear the blockage.  It 
will repeat the process if necessary and results can be seen in the RMF 
reporting [4] under LCK.  (LCK or ENQ?)  The Dynamic alias tuning 
management will let WLM manage your hyper-volSer UCB allocations as well. 
(can't remember the real name at the moment.)

A zIIP was suggested but, unless you're doing Java in Cics, it won't 
*directly* help your Cics/DB2 problems.  However, depending on your z/OS & 
DB2, more things are becoming zIIP-able ie. tcp/ip, system XML services, 
DRDA, etc.  Plus, it's not included in your 4hr cap or licencing.

ps.  the DB2 velocity goal can be a small, red herring.  It applies to 
activities that are not assigned to specific enclaves such as Dasd I/O & 
lock management.  Your Batch work will be in a Batch class enclave (SRB) 
within DB2 and be dispatched as such.  This is one of the places where you 
will see promotion by WLM occur due to enqueues/locks.

[1]  System Programmer’s Guide to: Workload Manager SG24-6472-03
[2]  The latter is from the old Dispatching Priority mentality that needs 
to be dropped.  Instead, DP is employed by WLM to achieve the minimum 
goals you have defined.

[3]  WLM samples:
Service Coefficient/Service Definition Options:
I/O priority management  . . . . . . . . YES
Dynamic alias tuning management  . . . . YES

[4]  RMF reporting 
--PROMOTED--
BLK0.062
ENQ   52.084
CRM   21.455
LCK  654.084
SUP0.000

[5]  WLM samples:
Transaction Name Group LONGRUN - Long running CICS transactions
  Qualifier  Starting 
  name   position  Description 
  -     
  B11R BETA93 
  C*   CICS supplied transactions 
  OSEC Omegamon 
  OSRV Omegamon
 -from the Cics monitor: CSSY, CSTP, CSNC, CSZI, CEX2, 
CSHQ, CSNE, OSRV, & OSEC all have elapsed/response times in days

Subsystem Type CICS - CICS transactions 
Classification: 
  Default service class is ONLINELO 
  Default report class is CICS 

Qualifier  Qualifier  Starting   Service 
  # type   name   position   Class 
  - -- -- -  
  1 SIGCICSPRD1  ONLINEHI
  2 . TNG  . LONGRUN ONLINELG


Service Class ONLINELG - Long running tr

Re: WLM issue with a proposed solution

2016-05-02 Thread Neil Duffee
75% complete within 00:00:01.000


>  signature = 8 lines follows  <
Neil Duffee, Joe Sysprog, uOttawa, Ottawa, Ont, Canada
telephone:1 613 562 5800 x4585  fax:1 613 562 5161
mailto:NDuffee of uOttawa.ca http:/ /aix1.uOttawa.ca/ ~nduffee
“How *do* you plan for something like that?”  Guardian Bob, Reboot
“For every action, there is an equal and opposite criticism.”
“Systems Programming: Guilty, until proven innocent”  John Norgauer 2004
"Schrodinger's backup: The condition of any backup is unknown until a restore 
is attempted."  John McKown 2015


-Original Message-
From: Tracy Adams [mailto:tad...@fbb...com] 
Sent: April 29, 2016 08:55
Subject: Re: WLM issue with a proposed solution

Thank you all for chiming in!  Yeah the bottom line... figure out why those sub 
second transactions get stalled!  Hard to tune your way out of a locking 
condition :-)

I will check out the SYSSTC actual velocity... that is a good bench mark to 
what my max achievable would be around.  

Happy Friday Martin, sounds like you have written the book on this!

Gotta go read about resource groups. 

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Scott Chapman
Sent: Friday, April 29, 2016 6:40 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

>If your batch jobs are running Dicretionary at a DP lower than CICS, it 
>is very unlikely that they are causing significant CICS delays.

True from a CPU perspective. But the batch jobs could be locking resources in 
DB2 that are delaying the CICS transactions. And if the batch jobs holding 
those locks are progressing very slowly due to running in discretionary when 
there's little CPU available, the locks may persist for an extended period of 
time, elongating CICS transaction response time. 

Or I saw a similar situation once where some batch queries exhausted the RID 
pool, which caused sub-second CICS transactions to start taking over 60 
seconds. That's fortunately harder to do on the later versions of DB2. 

In short, while adjusting the goals very well may be in order, I'd be inclined 
to first look into the apparently unusually long running CICS transactions to 
identify why those particular transactions are taking a long time.


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-29 Thread Martin Packer
Thanks!

One note on SYSSTC: Whether the velocity GOAL of "STCHI" matches the 
MEASURED velocity of SYSSTC or not the latter is still protected relative 
to the former.

Sensitised to this because a recent customer situation saw DBM1 in SYSSTC, 
competing with IRLM.

I normally - in my graphing at least - have SYSSTC as "Imp 0" and SYSTEM 
as "Imp -1", though in my REXX I probably make everyone take 1 step down. 
:-)

But I would expect the delivered velocity for "STCHI" to be (slightly) 
lower than SYSSTC.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Tracy Adams <tad...@fbbrands.com>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   29/04/2016 13:55
Subject:Re: WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



Thank you all for chiming in!  Yeah the bottom line... figure out why 
those sub second transactions get stalled!  Hard to tune your way out of a 
locking condition :-)

I will check out the SYSSTC actual velocity... that is a good bench mark 
to what my max achievable would be around. 

Happy Friday Martin, sounds like you have written the book on this!

Gotta go read about resource groups. 



-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On 
Behalf Of Scott Chapman
Sent: Friday, April 29, 2016 6:40 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

>If your batch jobs are running Dicretionary at a DP lower than CICS, it 
>is very unlikely that they are causing significant CICS delays.

True from a CPU perspective. But the batch jobs could be locking resources 
in DB2 that are delaying the CICS transactions. And if the batch jobs 
holding those locks are progressing very slowly due to running in 
discretionary when there's little CPU available, the locks may persist for 
an extended period of time, elongating CICS transaction response time. 

Or I saw a similar situation once where some batch queries exhausted the 
RID pool, which caused sub-second CICS transactions to start taking over 
60 seconds. That's fortunately harder to do on the later versions of DB2. 

In short, while adjusting the goals very well may be in order, I'd be 
inclined to first look into the apparently unusually long running CICS 
transactions to identify why those particular transactions are taking a 
long time.

Scott

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email 
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-29 Thread Tracy Adams
Thank you all for chiming in!  Yeah the bottom line... figure out why those sub 
second transactions get stalled!  Hard to tune your way out of a locking 
condition :-)

I will check out the SYSSTC actual velocity... that is a good bench mark to 
what my max achievable would be around.  

Happy Friday Martin, sounds like you have written the book on this!

Gotta go read about resource groups. 



-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Scott Chapman
Sent: Friday, April 29, 2016 6:40 AM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

>If your batch jobs are running Dicretionary at a DP lower than CICS, it 
>is very unlikely that they are causing significant CICS delays.

True from a CPU perspective. But the batch jobs could be locking resources in 
DB2 that are delaying the CICS transactions. And if the batch jobs holding 
those locks are progressing very slowly due to running in discretionary when 
there's little CPU available, the locks may persist for an extended period of 
time, elongating CICS transaction response time. 

Or I saw a similar situation once where some batch queries exhausted the RID 
pool, which caused sub-second CICS transactions to start taking over 60 
seconds. That's fortunately harder to do on the later versions of DB2. 

In short, while adjusting the goals very well may be in order, I'd be inclined 
to first look into the apparently unusually long running CICS transactions to 
identify why those particular transactions are taking a long time.

Scott

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-29 Thread Scott Chapman
>If your batch jobs are running Dicretionary at a DP lower than CICS, it is 
>very 
>unlikely that they are causing significant CICS delays.

True from a CPU perspective. But the batch jobs could be locking resources in 
DB2 that are delaying the CICS transactions. And if the batch jobs holding 
those locks are progressing very slowly due to running in discretionary when 
there's little CPU available, the locks may persist for an extended period of 
time, elongating CICS transaction response time. 

Or I saw a similar situation once where some batch queries exhausted the RID 
pool, which caused sub-second CICS transactions to start taking over 60 
seconds. That's fortunately harder to do on the later versions of DB2. 

In short, while adjusting the goals very well may be in order, I'd be inclined 
to first look into the apparently unusually long running CICS transactions to 
identify why those particular transactions are taking a long time.

Scott

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-29 Thread Martin Packer
Agree: "Achievable" is what's important here. Please measure it - with 
load.

This is where I struggle without IBM-MAIN  being a visual medium. :-) I 
plot velocity WITH LOAD and see how it droops. My code has done this for 
years and I present this graphing method regularly - to individual 
customers as well as at conferences.

Drooping tells you most of what you need to know. :-)

Of course it's an economic decision whether you let DB2's velocity (really 
the service class' velocity) falter with load. Or whether a velocity in 
the 30-40% range is acceptable. So I'm not making an absolute "70%" 
statement; I merely observe MOST customers achieve 60+, many 70+, some 
80+, a few 90+ .

One other observation: In the service class that DB2 (notionally "STCHI") 
I typically see the main "Using" sample being "Using I/O". It's worthwhile 
establishing this. Of course, if DBM1 is not the "dominant" address space 
this picture could look quite different. To repeat, slightly altered: It's 
worthwhile figuring out why "STCHI" has the velocity it has, when it has.

Hoping this helps, rather than confusing. In any case it's a great way to 
slide into a Friday. :-) And I've a feeling Marna and I could do a while 
podcast episode on just this one topic. :-)

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Edward Finnell <000248cce9f3-dmarc-requ...@listserv.ua.edu>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   29/04/2016 01:04
Subject:Re: WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



Some of the new features in RMF are an improvement into what's happening. 
SHARE papers and Redbooks give insight into what to look for in the 
'buckets'.  The Boebligen folks admit Velocity goals are really tough for 
RMF due to 
 rapidity of changing landscape. Configuration is very important. I'd hang 

some  zIIPs on that puppy for a start.
 
 
In a message dated 4/28/2016 6:49:39 P.M. Central Daylight Time, 
and...@blackhillsoftware.com writes:

available (maybe only 1 or 2), in which case a velocity of 70 is 
probably not achievable. 30 or 40 might be what you realistically  get.

Perhaps looking at the velocity of SYSSTC might give an idea of  the 
limit of achievable  velocity?


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-29 Thread Vernooij, CP (ITOPT1) - KLM
My experience is that CICS will suffer if the LPAR is being soft capped, no 
matter what you try to do to this situation. 

So I think the best and only solution is to avoid that the LPAR becomes capped 
by keeping the batch consumption under control. Not with a limited number of 
initiators, because which will not control CPU consumption, but with a Resource 
Group, which will keep the batch CPU consumption within the limits that would 
otherwise have caused capping.

Kees.

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Tracy Adams
Sent: 28 April, 2016 20:22
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: WLM issue with a proposed solution

So here is my issue:

We have a soft capped LPAR that runs our DB2 and CICS regions and during the 
day some "marketing batch".  On Wednesdays, the marketing batch (online submit 
via CICS) increases and by afternoon we hit our 4 hour soft cap.  Once or twice 
while we are capped, the busiest CICS slow down to the point where some old 
automation kicks in to kill transactions over 45 seconds old, some of these 
transactions dump through DumpMaster, we then go to max sockets and more 
transactions dump and in 10 - 30 seconds all is fine again.

What I see: The CICS regions have a DP around EC and are meeting their service 
goal of 99% under .5 seconds.  But there are tens of thousands transactions 
that have led to this.  The batch jobs (3-5 of them), while running 10 - 15 % 
cpu have a DP of C0 and are in a discretionary level of the service class.  I 
believe the problem lies with the DB2 service class.  That has a definition of 
velocity at 66  and it tends to run below that when there is more contention in 
the system.  The DP of the DB2 region is F6.  

My theory:  when this brown out occurs the resources are maxed out and the CICS 
regions being the ones that have meet their goal and will have to suffer many 
transactions missing the service goal to make the DP go up.  They get hung up 
just long enough to cause the delays that trigger the "panic" automation to 
clear the stalled transactions.  Chaos breaks out! 

My proposal:  A.  limit the batch jobs to a max of three by controlling open 
initiators for their job class.  B.  change the DB2 velocity to 60  C.  Starve 
the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little 
more desperate.

Thoughts?

TIA,

Tracy

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message. 

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt. 
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286




--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Tom Marchant
On Thu, 28 Apr 2016 19:57:32 +, Tracy Adams wrote:

>The importance (priority) of DB2 is set 2

Importance is NOT priority.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Tom Marchant
On Thu, 28 Apr 2016 18:22:11 +, Tracy Adams We have a soft capped LPAR that runs our DB2 and CICS regions and during 
>the day some "marketing batch".  On Wednesdays, the marketing batch (online 
>submit via CICS) increases and by afternoon we hit our 4 hour soft cap.  Once 
>or twice while we are capped, the busiest CICS slow down to the point where 
>some old automation kicks in to kill transactions over 45 seconds old, some of 
>these transactions dump through DumpMaster, we then go to max sockets and 
>more transactions dump and in 10 - 30 seconds all is fine again. 
 
>What I see: The CICS regions have a DP around EC and are meeting their 
>service goal of 99% under .5 seconds.  But there are tens of thousands 
>transactions that have led to this.  The batch jobs (3-5 of them), while 
>running 
>10 - 15 % cpu have a DP of C0 and are in a discretionary level of the service 
>class.  I believe the problem lies with the DB2 service class.  That has a 
>definition 
>of velocity at 66  and it tends to run below that when there is more 
>contention in 
>the system.  The DP of the DB2 region is F6.

Are your CICS regions still meeting their goals when these anomalies occur?

If your batch jobs are running Dicretionary at a DP lower than CICS, it is very 
unlikely that they are causing significant CICS delays.

You say that DB2 sometimes fails to meet its goals when the system is loaded. 
That suggests to me that 66% isn't achievable, and it may be causing WLM to 
work extra hard to try to meet that goal. If the DB2 address spaces really are 
running at higher DP than CICS when the problems occur, then they are 
probably ok.

Are your batch jobs using DB2 or other high priority address spaces?

If your DB2 address goals are too aggressive, dropping the velocity from 66 to 
60 
won't make much difference. Have you read John Arwe's paper on velocity goals?

I'm not a fan of percentile goals as high as 99%. It doesn't take many outliers 
to 
cause you to fail to meet your goal. Assuming that the vast majority of your 
transactions are quick, it won't matter whether your percentile is 99% or e.g. 
80%. 
I like to set my percentile response times for the fastest transactions in each 
CICS 
address space and let the rest go along for the ride. That's not likely your 
problem 
though.

How does your "old automation" determine that there is a problem?

When one of these long running transactions are canceled, do you know what was 
going on in them? Are they just unusual transactions that take a long time, or 
are 
they in a loop or something?

I wonder if the real problem is that this automation is canceling transactions 
that it 
shouldn't.

-- 
Tom Marchant

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Edward Finnell
Some of the new features in RMF are an improvement into what's happening.  
SHARE papers and Redbooks give insight into what to look for in the 
'buckets'.  The Boebligen folks admit Velocity goals are really tough for RMF 
due to 
 rapidity of changing landscape. Configuration is very important. I'd hang 
some  zIIPs on that puppy for a start.
 
 
In a message dated 4/28/2016 6:49:39 P.M. Central Daylight Time,  
and...@blackhillsoftware.com writes:

available (maybe only 1 or 2), in which case a velocity of 70 is  
probably not achievable. 30 or 40 might be what you realistically  get.

Perhaps looking at the velocity of SYSSTC might give an idea of  the 
limit of achievable  velocity?


--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Andrew Rowley

On 29/04/2016 6:06, Martin Packer wrote:

DB2 should have a higher importance than what it serves, so in this case
it should be Importance 1. I'd set its goal velocity to what's achievable
- probably 70, likely 80, maybe 90. I would not mess with eg 75, 85.

By "DB2" I mean DBM1, DIST and MSTR. IRLM should be in SYSSTC.


Achievable velocity depends on the number of CPUs available. The 
original problem sounded like a system with a limited number of CPUs 
available (maybe only 1 or 2), in which case a velocity of 70 is 
probably not achievable. 30 or 40 might be what you realistically get.


Perhaps looking at the velocity of SYSSTC might give an idea of the 
limit of achievable velocity?


--
Andrew Rowley
Black Hill Software
+61 413 302 386

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Martin Packer
DB2 should have a higher importance than what it serves, so in this case 
it should be Importance 1. I'd set its goal velocity to what's achievable 
- probably 70, likely 80, maybe 90. I would not mess with eg 75, 85.

By "DB2" I mean DBM1, DIST and MSTR. IRLM should be in SYSSTC.

You'd be surprised how many customers get this wrong. :-(

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Tracy Adams <tad...@fbbrands.com>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   28/04/2016 20:57
Subject:    Re: WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



The importance (priority) of DB2 is set 2, as well as the CICS service 
class.  It serves both the CICS and batch jobs.

I only speak of dispatching priorities because isn't ultimately that is 
driven by the collective results of WLM?

To Mark's question, I am not sure what is stalling those transactions, I 
will try to collect some delay information. 

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On 
Behalf Of Martin Packer
Sent: Thursday, April 28, 2016 3:49 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

Hello Tracy.

What importance have you set DB2 address spaces' service class(es) to? 
Likewise the things it serves, such as CICS regions and CICS transactions/

If DB2 is getting locked out it could be caused by it being Imp 2 or 
something, rather than Imp 1 with a goal 70+.

I also note you're mainly talking dispatching priorities rather than WLM 
language.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator, Worldwide Cloud & Systems 
Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Tracy Adams <tad...@fbbrands.com>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   28/04/2016 19:22
Subject:WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



So here is my issue:

We have a soft capped LPAR that runs our DB2 and CICS regions and during 
the day some "marketing batch".  On Wednesdays, the marketing batch 
(online submit via CICS) increases and by afternoon we hit our 4 hour soft 
cap.  Once or twice while we are capped, the busiest CICS slow down to the 
point where some old automation kicks in to kill transactions over 45 
seconds old, some of these transactions dump through DumpMaster, we then 
go to max sockets and more transactions dump and in 10 - 30 seconds all is 
fine again.

What I see: The CICS regions have a DP around EC and are meeting their 
service goal of 99% under .5 seconds.  But there are tens of thousands 
transactions that have led to this.  The batch jobs (3-5 of them), while 
running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of 
the service class.  I believe the problem lies with the DB2 service class. 

 That has a definition of velocity at 66  and it tends to run below that 
when there is more contention in the system.  The DP of the DB2 region is 
F6. 

My theory:  when this brown out occurs the resources are maxed out and the 
CICS regions being the ones that have meet their goal and will have to 
suffer many transactions missing the service goal to make the DP go up. 
They get hung up just long enough to cause the delays that trigger the 
"panic" automation to clear the stalled transactions.  Chaos breaks out! 

My proposal:  A.  limit the batch jobs to a max of three by controlling 
open initiators for their job class.  B.  change the DB2 velocity to 60 C. 

 Starve the CICS service goal by reducing it to 99% in .4 forcing his DP 
to be a little more desperate.

Thoughts?

TIA,

Tracy

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email 
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email 
to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscrib

Re: WLM issue with a proposed solution

2016-04-28 Thread Tracy Adams
The importance (priority) of DB2 is set 2, as well as the CICS service class.  
It serves both the CICS and batch jobs.

I only speak of dispatching priorities because isn't ultimately that is driven 
by the collective results of WLM?

To Mark's question, I am not sure what is stalling those transactions, I will 
try to collect some delay information.  

-Original Message-
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of Martin Packer
Sent: Thursday, April 28, 2016 3:49 PM
To: IBM-MAIN@LISTSERV.UA.EDU
Subject: Re: WLM issue with a proposed solution

Hello Tracy.

What importance have you set DB2 address spaces' service class(es) to? 
Likewise the things it serves, such as CICS regions and CICS transactions/

If DB2 is getting locked out it could be caused by it being Imp 2 or something, 
rather than Imp 1 with a goal 70+.

I also note you're mainly talking dispatching priorities rather than WLM 
language.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator, Worldwide Cloud & Systems 
Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Tracy Adams <tad...@fbbrands.com>
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   28/04/2016 19:22
Subject:WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List <IBM-MAIN@LISTSERV.UA.EDU>



So here is my issue:

We have a soft capped LPAR that runs our DB2 and CICS regions and during the 
day some "marketing batch".  On Wednesdays, the marketing batch (online submit 
via CICS) increases and by afternoon we hit our 4 hour soft cap.  Once or twice 
while we are capped, the busiest CICS slow down to the point where some old 
automation kicks in to kill transactions over 45 seconds old, some of these 
transactions dump through DumpMaster, we then go to max sockets and more 
transactions dump and in 10 - 30 seconds all is fine again.

What I see: The CICS regions have a DP around EC and are meeting their service 
goal of 99% under .5 seconds.  But there are tens of thousands transactions 
that have led to this.  The batch jobs (3-5 of them), while running 10 - 15 % 
cpu have a DP of C0 and are in a discretionary level of the service class.  I 
believe the problem lies with the DB2 service class. 
 That has a definition of velocity at 66  and it tends to run below that when 
there is more contention in the system.  The DP of the DB2 region is F6. 

My theory:  when this brown out occurs the resources are maxed out and the CICS 
regions being the ones that have meet their goal and will have to suffer many 
transactions missing the service goal to make the DP go up. 
They get hung up just long enough to cause the delays that trigger the "panic" 
automation to clear the stalled transactions.  Chaos breaks out! 

My proposal:  A.  limit the batch jobs to a max of three by controlling open 
initiators for their job class.  B.  change the DB2 velocity to 60 C. 
 Starve the CICS service goal by reducing it to 99% in .4 forcing his DP to be 
a little more desperate.

Thoughts?

TIA,

Tracy

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions, send email to 
lists...@listserv.ua.edu with the message: INFO IBM-MAIN

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Martin Packer
Hello Tracy.

What importance have you set DB2 address spaces' service class(es) to? 
Likewise the things it serves, such as CICS regions and CICS transactions/

If DB2 is getting locked out it could be caused by it being Imp 2 or 
something, rather than Imp 1 with a goal 70+.

I also note you're mainly talking dispatching priorities rather than WLM 
language.

Cheers, Martin

Martin Packer,
zChampion, Principal Systems Investigator,
Worldwide Cloud & Systems Performance, IBM

+44-7802-245-584

email: martin_pac...@uk.ibm.com

Twitter / Facebook IDs: MartinPacker

Blog: 
https://www.ibm.com/developerworks/mydeveloperworks/blogs/MartinPacker

Podcast Series (With Marna Walle): 
https://developer.ibm.com/tv/category/mpt/



From:   Tracy Adams 
To: IBM-MAIN@LISTSERV.UA.EDU
Date:   28/04/2016 19:22
Subject:WLM issue with a proposed solution
Sent by:IBM Mainframe Discussion List 



So here is my issue:

We have a soft capped LPAR that runs our DB2 and CICS regions and during 
the day some "marketing batch".  On Wednesdays, the marketing batch 
(online submit via CICS) increases and by afternoon we hit our 4 hour soft 
cap.  Once or twice while we are capped, the busiest CICS slow down to the 
point where some old automation kicks in to kill transactions over 45 
seconds old, some of these transactions dump through DumpMaster, we then 
go to max sockets and more transactions dump and in 10 - 30 seconds all is 
fine again.

What I see: The CICS regions have a DP around EC and are meeting their 
service goal of 99% under .5 seconds.  But there are tens of thousands 
transactions that have led to this.  The batch jobs (3-5 of them), while 
running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of 
the service class.  I believe the problem lies with the DB2 service class. 
 That has a definition of velocity at 66  and it tends to run below that 
when there is more contention in the system.  The DP of the DB2 region is 
F6. 

My theory:  when this brown out occurs the resources are maxed out and the 
CICS regions being the ones that have meet their goal and will have to 
suffer many transactions missing the service goal to make the DP go up. 
They get hung up just long enough to cause the delays that trigger the 
"panic" automation to clear the stalled transactions.  Chaos breaks out! 

My proposal:  A.  limit the batch jobs to a max of three by controlling 
open initiators for their job class.  B.  change the DB2 velocity to 60 C. 
 Starve the CICS service goal by reducing it to 99% in .4 forcing his DP 
to be a little more desperate.

Thoughts?

TIA,

Tracy

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Mike Shorkend
Do you know why some of the transactions are taking longer than 45 seconds?
A CICS performance monitor should be able to break down the response time.
In other words, are you sure that the delay is caused by CPU restraints?

On 28 April 2016 at 22:40, Staller, Allan 
wrote:

> Set the DB2 goal to be "more reasonable" FSVO reasonable and see what
> happens.
>
> 
> We have a soft capped LPAR that runs our DB2 and CICS regions and during
> the day some "marketing batch".  On Wednesdays, the marketing batch (online
> submit via CICS) increases and by afternoon we hit our 4 hour soft cap.
> Once or twice while we are capped, the busiest CICS slow down to the point
> where some old automation kicks in to kill transactions over 45 seconds
> old, some of these transactions dump through DumpMaster, we then go to max
> sockets and more transactions dump and in 10 - 30 seconds all is fine again.
>
> What I see: The CICS regions have a DP around EC and are meeting their
> service goal of 99% under .5 seconds.  But there are tens of thousands
> transactions that have led to this.  The batch jobs (3-5 of them), while
> running 10 - 15 % cpu have a DP of C0 and are in a discretionary level of
> the service class.  I believe the problem lies with the DB2 service class.
> That has a definition of velocity at 66  and it tends to run below that
> when there is more contention in the system.  The DP of the DB2 region is
> F6.
>
> My theory:  when this brown out occurs the resources are maxed out and the
> CICS regions being the ones that have meet their goal and will have to
> suffer many transactions missing the service goal to make the DP go up.
> They get hung up just long enough to cause the delays that trigger the
> "panic" automation to clear the stalled transactions.  Chaos breaks out!
>
> My proposal:  A.  limit the batch jobs to a max of three by controlling
> open initiators for their job class.  B.  change the DB2 velocity to 60
> C.  Starve the CICS service goal by reducing it to 99% in .4 forcing his DP
> to be a little more desperate.
> 
>
> This email – including attachments – may contain confidential information.
> If you are not the intended recipient, do not copy, distribute or act on
> it. Instead, notify the sender immediately and delete the message.
>
> --
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN
>



-- 
Mike Shorkend
m...@shorkend.com
www.shorkend.com
Tel: +972524208743
Fax: +97239772196

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN


Re: WLM issue with a proposed solution

2016-04-28 Thread Staller, Allan
Set the DB2 goal to be "more reasonable" FSVO reasonable and see what happens.


We have a soft capped LPAR that runs our DB2 and CICS regions and during the 
day some "marketing batch".  On Wednesdays, the marketing batch (online submit 
via CICS) increases and by afternoon we hit our 4 hour soft cap.  Once or twice 
while we are capped, the busiest CICS slow down to the point where some old 
automation kicks in to kill transactions over 45 seconds old, some of these 
transactions dump through DumpMaster, we then go to max sockets and more 
transactions dump and in 10 - 30 seconds all is fine again.

What I see: The CICS regions have a DP around EC and are meeting their service 
goal of 99% under .5 seconds.  But there are tens of thousands transactions 
that have led to this.  The batch jobs (3-5 of them), while running 10 - 15 % 
cpu have a DP of C0 and are in a discretionary level of the service class.  I 
believe the problem lies with the DB2 service class.  That has a definition of 
velocity at 66  and it tends to run below that when there is more contention in 
the system.  The DP of the DB2 region is F6.  

My theory:  when this brown out occurs the resources are maxed out and the CICS 
regions being the ones that have meet their goal and will have to suffer many 
transactions missing the service goal to make the DP go up.  They get hung up 
just long enough to cause the delays that trigger the "panic" automation to 
clear the stalled transactions.  Chaos breaks out! 

My proposal:  A.  limit the batch jobs to a max of three by controlling open 
initiators for their job class.  B.  change the DB2 velocity to 60  C.  Starve 
the CICS service goal by reducing it to 99% in .4 forcing his DP to be a little 
more desperate.


This email � including attachments � may contain confidential information. If 
you are not the intended recipient, do not copy, distribute or act on it. 
Instead, notify the sender immediately and delete the message.

--
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN