Re: How to ensure I/O write order

Marcy Cortes Mon, 28 Sep 2009 12:03:24 -0700

 
If indeed all of your i/o is a consistent copy across that DMX 1000, then I 
think you could do it while they are up.
It would be no different than a VM abend or other failure.   You might get some 
file system errors that should be correctable with fsck (you are running a 
journaled FS presumably).  I'd still want the file system backups for data that 
can't otherwise be recreated for just in case...  And there's nothing like 
backups for "ooops, I erased that file I shouldn't have".


What does EMC say?   What happens too when you outgrow the 1 DMX 1000 and have 
to buy a 2nd box?  ( or maybe you don't have those whacky growth issues).

Marcy 

"This message may contain confidential and/or privileged information. If you 
are not the addressee or authorized to receive this for the addressee, you must 
not use, copy, disclose, or take any action based on this message or any 
information herein. If you have received this message in error, please advise 
the sender immediately by reply e-mail and delete this message. Thank you for 
your cooperation."


-----Original Message-----
From: Linux on 390 Port [mailto:linux-...@vm.marist.edu] On Behalf Of Hallock, 
Arthur T
Sent: Monday, September 28, 2009 11:47 AM
To: LINUX-390@VM.MARIST.EDU
Subject: Re: [LINUX-390] How to ensure I/O write order

>From Marcy:

What's a consisteny option?  Does it go across all LCU's and file systems?

Linux does timestamp its i/o.   VM does not (currently - see IBM).   I'm not 
familiar with the EMC's consistency.   We do use GDPS/XRC for our recovery, 
giving a recovery point of less than a minute.  Does this EMC solution use 
those?

Once you have many many interdependent applications the same point in time 
recovery becomes very important.


My response:

We have all VM and Linux volumes on a single physical controller EMC DMX-1000. 
On z/OS I run a batch job to execute an EMCSNAP utility. EMCSNAP allows me to 
identify the volumes to snap and to make a consistent (point-in-time) copy. The 
controller handles the I/O activity such that a point-in-time copy is made to 
the target volumes.

In our mirrored configuration, the local DMX-1000 is mirrored to another 
DMX-1000 using EMC's SRDF/A feature. The controllers handle all aspects of the 
mirroring (the remote DMX-1000 is not connected to a host until DR time). You 
can logically view this mirroring as the local controller performing a 
consistent snap of all volumes every thirty seconds to the remote controller.

We are primarily a z/OS shop and all of the production and system type volumes 
for z/OS, z/VM, and z/Linux reside on the local DMX-1000 controller. There is a 
second local EMC controller used to support development environments. It is not 
mirrored.

The z/VM is used to support the z/Linux only. No CMS users other than those to 
support the systems. The z/Linux servers are used to support a couple of 
lightly used applications under WebSphere with DB/2 as the database. An Oracle 
database is also on a z/Linux server and lightly used for a z/OS application.


>From Mark Post and David Boyes:

Both stress Linux cache will cause problems with recovery and that the 
databases should be backed up under the Linux OS.


My response:

We do perform database backups under the Linux OS. They can be used for local 
site recovery and are available at the DR site (because they are on the DASD 
that is mirrored and snapped/dumped to tape).

I suppose for a tape recovery, the database backups could be used since most of 
the time the snaps are performed after the database is backed up. The idea 
behind the mirroring is to reduce the amount of data loss. But, recovering from 
the database backup and forward recovering with the logs would work. In either 
case, it just would be easier to IPL and go at the DR site.

Since VM and most of the servers are static (little I/O), I don't expect 
problems getting them started at the DR site. The main concern is recovering 
the databases. If DB/2 states it can automatically restart/recover after a 
server failure and reboot, then what is the difference between a failure (where 
the cache didn't get written) and a consistent snap? I would think DB/2 and 
Oracle would need to somehow compensate for how Linux caches the write I/Os. 
Else their claim to be able to restart/recover from a crash is somewhat 
misleading.

I appreciate the feedback.
Art

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: How to ensure I/O write order

Reply via email to