Ok, let's go through this one at a time.  See inserted comments.

Alan Ackerman wrote:


Creating new thread.

1. The folks that receive the data at my shop are z/OS folks. Historicall
the capture ratio of MVS was really poor. The notion was that you should use SMF data and never RMF data. I don't know if z/OS has cleaned up its act or not.
But I have heard the same thing from VM folks. (I've said it myself.)
As Barton says, the capture ratio in VM has always been quite high, due t
the way the data is captured in the VMDBK. However, Barton computes this (I think) by comparing different record types in the monitor data, not by
comparing monitor to accounting data.
There is system overhead, but it is captured in the SYSTEM VMDBK block. Accounting data and monitor data are using the same data, so they should get the same results. Of course, some time gets charged to the wrong user for example between the time an interrupt comes in and the new user is identified. But it shows up the same in the monitor and the accounting data. (User CPU time is more reproducible than total CPU time, for this reason.)

Is "some time gets charged to the wrong user" a validated and relevant issue? I've not seen any "overhead" issues in accounting or monitor data in MANY years.

2. Monitor sample data is taken at one minute samples. It used to be that
data for users that logged in or off between samples was dropped for the partial minutes. Is this still true? Was it ever true? Or is it urban folklore?

Transaction records are cut at logon/logoff, that is how we get 100.00% capture ratio. Nothing is lost.

3. On our systems, we sometimes see messages from CP that say the monitor
data has been thrown away because the user connected to *MONITOR did not respond in time. This happens when the system is overloaded, either in CP or storage. So we lose some minutes of monitor data, but not, I think, accounting data. Often you can fix this by increasing the segment sizes or give MONWRITE/ESAWRITE a bigger SHARE. Not always, though. In some cases the monitor segments get paged out. (We reported it to Velocity, who said it was a CP problem.) I think IBM could do things to make collection of monitor data more reliable in the extreme cases. Unfortunately, I'm not responsible for this and it is "only performance data". I think this can be dealt with, but it does take diligence and wor to keep your monitor data accurate. You don't have to do this work for accounting data. I think IBM could do things to make collection of monitor data more easy.

This still does happen occasionally when systems are thrashing so much that everything stops. At this point, accounting is probably lower priority. Capacity planning and performance tuning do need to be employed in this platform. IBM could stop the DCSS from being paged out when the system starts to thrash.

4. On our systems, we switch files (I think hourly) to keep them from getting too big. We lose a minute or two of data each time.

ESALPS does not lose data each hour. Capture ratio is 100%

5. The default for ESAWRITE is to collect User history records only for userids using more than 0.5% CPU. So when we go back to process CPU utilization for users, we get smaller totals for monitor than from accounting data. I assume this could be fixed by setting the threshold to zero. I don't know which of these, if any, affect the ESALPS data collection that Barton mentioned. We have tested ESALPS, but are not yet licensed.

The default for ESAWRITE is 100% capture ratio. ALL USER DATA is captured and retained for capacity planning and accounting. the thresholds only apply to current performance data. This has been the case for 20 years. I'll repeat, capture ratio for user data is ALWAYS 100.00%. You can't look at the interval data collected for performance and use it for accounting. The summary data for each hour is 100% and is what one would use for accounting and capacity planning.



Alan Ackerman Alan (dot) Ackerman (at) Bank of America (dot) com

Reply via email to