On 01/23/2012 01:21 PM, Bill Fairchild wrote:
In<45e5f2f45d7878458ee5ca679697335502e25...@usdaexch01.kbm1.loc>, on
01/23/2012
at 09:08 AM, "Staller, Allan"<allan.stal...@kbmg.com> said:
From the viewpoint of the Operating System, you now
have 3 times as much data behind the actuator on Mod-9's as Mod-3's.
If the Operating system *thinks* the device is busy, the IO is queued
off the UCB and never even tried until it comes to head of queue.
You MIGHT have up to three times as much data behind the actuator. That
depends on how fully loaded the three mod-3s are which are to be merged onto
the same single mod-9; i.e., it depends on which three mod-3s you choose to
merge together.
If all data sets on all volumes are equally and randomly accessed, then you
will have three times as much requirement to access the new mod-9 as any of the
three mod-3s had which were merged. However, most data centers have highly
skewed access patterns. 80% of the actuators might have only 20% of the total
I/O workload. Which means your volumes are almost certainly NOT equally and
randomly accessed. You have some volumes that are almost never accessed and
some others that are accessed all the time.
When z/OS starts in I/O on DASD device xxxx, z/OS turns on a flag bit in the
UCB for that device that indicates that this particular z/OS image has started
an I/O on that device. But if the device is shared, then another z/OS image
may have already started an I/O on the same device, turned that same device's
UCB flag bit on in its copy of the UCB for the device (which might be device
yyyy on the other image), and not informed any of the other sharing z/OS images
that it is now doing I/O on that shared device. So when image A tests its
private copy of the flag bit and finds it off, that does not necessarily mean
that the device is unbusy. Image A doesn't care, however. It starts the I/O
and turns the bit on. If the shared control unit attached to this device is
not an IBM 2105 SHARK (vintage ca. 2000), plug-compatible equivalent, or some
successor technology, then image A's I/O will not really be started until image
B's already started I/O ends. This will show up on im
a!
ge A as a spike in device pending time, not in IOSQ time. The 2105 and newer
technology have the ability to let multiple I/O requests from multiple sharing
systems run simultaneously against the same device as long as there is no
conflict between any of the simultaneous I/Os involving both reads and writes
for the same range of tracks.
The only way to know what will probably happen is to do I/O measurement on your current mod-3
workload. If you don't see much IOSQ time now, then you will see "not much" multiplied
by three after merging. How much is not much and/or is negligible is up to you to decide. You
might also get an idea as to how to merge volumes together based on their individual IOSQ times;
e.g., merge the one with the highest IOSQ time now with the two mod-3s that now have the lowest
average IOSQ times. After merging them, measure again for IOSQ time. Only if you have
"excessive" IOSQ time, where how much is excessive is up to you to decide, would you need
to consider using PAV devices.
Currently z/OS's I/O Supervisor has no knowledge of the real RAID architecture
backing the virtual SLED, so many of the classic performance- and space-related
bottlenecks can theoretically still occur.
Bill Fairchild
Note the original question from Dennis McCarthy (Jan 20) was not an
arbitrary 3390-3 to 3390-9 migration but specifically moving a VSAM file
occupying 27 3390-3's to 10 3390-9's, so except for last volume we ARE
definitely talking about three times the data behind a logical volume,
but the usage and activity rate of the dataset were not specified.
IOSQ time and related response time elongation is highly non-linear as
device utilization approaches 100%. You could see negligible IOSQ time
on each of three 3390-3's running at 34% utilization become astronomical
if you merge data from those to a single 3390-9 without PAV, trying to
run a load that can't even be satisfied at 100% device busy. Given PAVs
and assuming there is enough cache and different physical drives and
internal bandwidth on the EMC backing the logical volume, you can in
effect exceed 100% logical volume busy (have average number of active
I/Os to the volume exceed 1.00) and still get acceptable response.
--
Joel C. Ewing, Bentonville, AR jcew...@acm.org
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: INFO IBM-MAIN