Re: Bad Linux backups
Ar Sad, 2006-07-29 am 11:08 +0800, ysgrifennodd John Summerfield: > Aside from users' aversion to cookies, their correct use isn't any > easier than good backups;-) I reckon a lot of application authors trust > the data held cookies, saying "we provided that so we know it's okay." It is possible to use cookies for passing data (or hidden forms much the same way) and still not have to worry about which web server gets the request. You simply digitally sign the cookie with a secret only the webserver knows and ensure it includes enough info to stop long term reuse or reuse by another user. Alan -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Mark Perry wrote: - Start Original Message - Sent: Fri, 28 Jul 2006 12:35:26 +0200 From: Rob van der Heij <[EMAIL PROTECTED]> To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups I would hope that anything that prevents the flashcopy from starting could be reported for example with a command reject upon device end. I have not analyzed a CCW trace of the whole FLASHCOPY operation, but the z/VM FLASHCOPY command returns immediately if the request can be queued to the Shark. At some later point in time an error may be received by z/VM and an asynchronous console message is issued. This is the complication that Mike related to, in that handling such asynchronous messages in a REXX script is complicated and not for a novice. Once again this is practical advice, not theoretical. I have been through this, and it was painful. It seems to me simple enough:-) One script that initiates the copy. A second that is run when (or waits until) the copy _should_ have completed, confirms that it has worked and takes appropriate action depending on the results. I'll leave it to the more skilled to determine _how_ to tell whether the copy's done, is still in progress or has failed. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Post, Mark K wrote: From what I've seen, a lot of that information is usually kept in the user's browser via cookies or "session cookies." For things that aren't, mirroring the data on separate physical devices, on separate controllers, etc., etc., provides the redundancy needed. The whole point of clustering is not to have _any_ single points of failure. That's why clustering an application is _at least_ two times more expensive than not clustering it. Aside from users' aversion to cookies, their correct use isn't any easier than good backups;-) I reckon a lot of application authors trust the data held cookies, saying "we provided that so we know it's okay." -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
You are correct about the cacheing of filesystems. The clustered systems I referred to are DB2 Connect gateways and contain no shared filesystems. I will route all DB2C users through one gateway wile backing up the other. Then reverse positions. When both are done, bring them both online. For systems with active filesystems, they must be shut down and a SNAP taken. They can be brought up while the SNAP volumes are copied to tape. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of J Leslie Turriff Sent: Friday, July 28, 2006 11:04 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Well, I can see that clustering is a solution to the availability of a service while a host is shut down, but I don't see how it makes thinks any better in regards to backing up the filesystems used by the cluster. As long as any of the hosts in the cluster is using a filesystem in R/W mode there are going to be pieces of data cached in main memory that the filesystem doesn't know about, and that means that snapshots, etc. done from outside will not be valid. Seems like the base issue is that Linux doesn't do write-through cacheing, so the filesystem will almost never be valid to an outside observer (see also Schroedinger's cat). J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] >>>[EMAIL PROTECTED] 07/28/06 10:31 am >>> >From what I've seen, a lot of that information is usually kept in the user's browser via cookies or "session cookies." For things that aren't, mirroring the data on separate physical devices, on separate controllers, etc., etc., provides the redundancy needed. The whole point of clustering is not to have _any_ single points of failure. That's why clustering an application is _at least_ two times more expensive than not clustering it. Mark Post -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Thursday, July 27, 2006 8:37 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups David Boyes wrote: >I think Lea means: > >For cluster takeover to work seamlessly, your application has to keep >session data in some common location between the servers. There's the point that has me: how do you backup that location? Is it something that, if it fails, you quickly find a new one and tell the PC buyer you had a "technical problem" and would they mind starting again? -- Cheers John -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 !DSPAM:32225,44ca2f9e88571098210962! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> >From what I've seen, a lot of that information is usually kept in the > user's browser via cookies or "session cookies." For things that > aren't, mirroring the data on separate physical devices, on separate > controllers, etc., etc., provides the redundancy needed. The other common technique is storing the session data in a RDBMS. You use the DBMS live backup tools to dump to a stable copy on disk, and then backup the stable copy. > The whole > point of clustering is not to have _any_ single points of failure. > That's why clustering an application is _at least_ two times more > expensive than not clustering it. Yup. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Well, I can see that clustering is a solution to the availability of a service while a host is shut down, but I don't see how it makes thinks any better in regards to backing up the filesystems used by the cluster. As long as any of the hosts in the cluster is using a filesystem in R/W mode there are going to be pieces of data cached in main memory that the filesystem doesn't know about, and that means that snapshots, etc. done from outside will not be valid. Seems like the base issue is that Linux doesn't do write-through cacheing, so the filesystem will almost never be valid to an outside observer (see also Schroedinger's cat). J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] >>>[EMAIL PROTECTED] 07/28/06 10:31 am >>> >From what I've seen, a lot of that information is usually kept in the user's browser via cookies or "session cookies." For things that aren't, mirroring the data on separate physical devices, on separate controllers, etc., etc., provides the redundancy needed. The whole point of clustering is not to have _any_ single points of failure. That's why clustering an application is _at least_ two times more expensive than not clustering it. Mark Post -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Thursday, July 27, 2006 8:37 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups David Boyes wrote: >I think Lea means: > >For cluster takeover to work seamlessly, your application has to keep >session data in some common location between the servers. There's the point that has me: how do you backup that location? Is it something that, if it fails, you quickly find a new one and tell the PC buyer you had a "technical problem" and would they mind starting again? -- Cheers John -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 !DSPAM:32225,44ca2f9e88571098210962! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
>>On Wed, Jul 26, 2006 at 01:27:06PM -0500, J Leslie Turriff wrote: >>Okay, now, wait; are you saying that the storage device _does_ have a >>mechanism for communicating with the Linux filesystem to determine what >>filesystem pages are still cached in main storage and have not yet been >>commited to external storage? >It doesn't. It's also not as easy as having a list of pages that need >commiting. What we would need is a way for the storage device (or >rather the software controlling it) to call the existing Linux lockfs >functionality. A way for the storage device to call the lockfs via an API was what I was thinking of. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
>From what I've seen, a lot of that information is usually kept in the user's browser via cookies or "session cookies." For things that aren't, mirroring the data on separate physical devices, on separate controllers, etc., etc., provides the redundancy needed. The whole point of clustering is not to have _any_ single points of failure. That's why clustering an application is _at least_ two times more expensive than not clustering it. Mark Post -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Thursday, July 27, 2006 8:37 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups David Boyes wrote: > I think Lea means: > > For cluster takeover to work seamlessly, your application has to keep > session data in some common location between the servers. There's the point that has me: how do you backup that location? Is it something that, if it fails, you quickly find a new one and tell the PC buyer you had a "technical problem" and would they mind starting again? -- Cheers John -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
David Boyes wrote: I think Lea means: For cluster takeover to work seamlessly, your application has to keep session data in some common location between the servers. There's the point that has me: how do you backup that location? Is it something that, if it fails, you quickly find a new one and tell the PC buyer you had a "technical problem" and would they mind starting again? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
- Start Original Message - Sent: Fri, 28 Jul 2006 12:35:26 +0200 From: Rob van der Heij <[EMAIL PROTECTED]> To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > I would hope that anything that prevents the flashcopy > from starting could be reported for example with a command reject upon > device end. I have not analyzed a CCW trace of the whole FLASHCOPY operation, but the z/VM FLASHCOPY command returns immediately if the request can be queued to the Shark. At some later point in time an error may be received by z/VM and an asynchronous console message is issued. This is the complication that Mike related to, in that handling such asynchronous messages in a REXX script is complicated and not for a novice. Once again this is practical advice, not theoretical. I have been through this, and it was painful. Alan has stated that something better is coming, perhaps a rewite of the actual z/VM FLASHCOPY command? Can you enlighten us Alan? > If the device could just give up somewhere in the middle of copying a > volume as if it were normal business, then I think I don't want to > share my views on that design with folks on a public mailing list. It is very likely that I have already said many words relating to such views :-) Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On 7/28/06, Mark Perry <[EMAIL PROTECTED]> wrote: > Unless I am terribly misinformed, it *is* an atomic operation for the > operating system. Sorry Rob, but your are terribly misinformed. What I meant to say with "atomic operation" is that things remain in order. The flashcopy itself is initiated by some (undocumented?) CCWs in a channel program so it is very clear to the host which I/O got device end before the SSCH for the flashcopy was issued. If the operation is atomic for the host, then anything you wrote to disk before the flashcopy is in the copy, and anything you write after that is not. I can imagine it takes some smoke and mirrors (like keeping copies of tracks that were modified on the source during the background copy process, and redirecting reads to the copied volume while the copying takes place). I would hope that anything that prevents the flashcopy from starting could be reported for example with a command reject upon device end. (why would you give device end before sorting out that it will work). Slightly less convenient would be to report an issue with the next I/O to the device, but chaining a NOP in should make it synchronous. From that point on, it seems to me the device has no option but complete the given task. If things that could not be foreseen prevent this, then imho there's no option but reject reading from the source or writing to the target. Both rather unpleasant and a good reason to make the check robust. If the device could just give up somewhere in the middle of copying a volume as if it were normal business, then I think I don't want to share my views on that design with folks on a public mailing list. Rob -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Rob van der Heij wrote: On 7/26/06, Mark Perry <[EMAIL PROTECTED]> wrote: One point not mentioned yet, is that FLASHCOPY is an asynchronous process. You can start a FLASHCOPY operation and it *can* return an error status asynchronously. 90+% of the time this is not apparent, the request is made and the Shark goes happily on its way. However if the request that is queued within the Shark has to be terminated (Resource shortages, target volume errors etc.) then beware! Unless I am terribly misinformed, it *is* an atomic operation for the operating system. My reply is a little late, but emails from the list are not coming in synchronously ;-) Sorry Rob, but your are terribly misinformed. The Shark responds with a successful completion on accepting the request to perform the FLASHCOPY. Remember that a Shark is not just a bunch of DASD/Disks, it is actually an AIX system running on POWER - i.e. it is an intelligent system, and you are communicating with software, not hardware. A FLASHCOPY operation is one of many that get put onto queues and are processed asynchronously. This is not a failure, but any host system utilizing FLASHCOPY must be able to handle an asynchronous errors. I believe Alan has already confirmed what I was going to say, but hard experience using FLASHCOPY under z/VM has lead to my statements, I am not theorizing - it comes from painful experience. Painful examples, such as regular backups that utilize the same SOURCE and TARGET volumes: when an asynchronous error (resource shortage within the Shark) is "missed" then the data on the TARGET volume is "old". Any attempt to use the TARGET volume succeeds but utilizes the "old" data, hence backups etc. are useless. This problem hit us during FLASHCOPY cloning of a master system; we couldn't understand why the new clones behaved as they did (updates/changes missing etc.), until later analysis showed that the clones were running from the "old" data - the FLASHCOPY has failed asynchronously. It became obvious that we could not rely on the return code from the FLASHCOPY operation, we needed to wait longer for any asynchronous error to come back. The tricky decision was how long does one wait for such an error? If you wait too long you effectively defeat any advantage of using FLASHCOPY rather than a simple DASD copy operation. We compromised on a parameter value that defaulted to about 1 minute (remember that a copy operation could take 20-30 minutes.) Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
I think Lea means: For cluster takeover to work seamlessly, your application has to keep session data in some common location between the servers. If that's the case, then when the shutdown of the second server commences, it takes itself out of the load balancer queue, completes whatever transactions are in flight at that moment on that node, signals the other node that it's now in charge, and then takes a swan dive into oblivion. The other node takes the session state data from the shared location, and picks up where the original node left off. Works well for applications that are aware of how to play nicely. If you go all the way to OpenSSI, then a node shutdown triggers process migration to nodes other than the one going down, and the system continues operation without the application even noticing. It takes some planning to get clustered applications to work properly, but once that's done, it's pretty slick. RTFineM for 'cluster' for more interesting details. Once you have the cluster properly configured for takeover, then you use VMUTIL or S5INIT to issue the SIGNAL SHUTDOWN to each node in turn, back it up, then XAUTOLOG it so that it re-enters the cluster and all is well. David Boyes Sine Nomine Associates > -Original Message- > From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John > Summerfied > Sent: Thursday, July 27, 2006 7:46 PM > To: LINUX-390@VM.MARIST.EDU > Subject: Re: Bad Linux backups > > Stahr, Lea wrote: > > A piece of cake! Use VMUTIL on VM to do the shutdowns and startups and > > have the backups scheduled appropriately. Or get the CONTROL-M agent and > > have that do it all from ZOS. > > > I don't understand how that addresses my concern. > > > > Stahr, Lea wrote: > > > >>With clustering, you shut down one image and do an OFFLINE backup > > > > while > > > >>the application runs on the second image. Then bring up the primary > >>image and shutdown the secondary system for backup. > >> > > > > > > which sounds every bit as tricky to me as getting good backups from a > > live Linux system. > > > > I'm negotiating purchase of a PC with your online retail shop and you > > take down the box I'm talking to while I'm negotiating PC options such > > as RAM, CPU, disk > > > > > > > > -- > > > > > -- > > Cheers > John > > -- spambait > [EMAIL PROTECTED] [EMAIL PROTECTED] > Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ > > do not reply off-list > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or > visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 > > -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Stahr, Lea wrote: A piece of cake! Use VMUTIL on VM to do the shutdowns and startups and have the backups scheduled appropriately. Or get the CONTROL-M agent and have that do it all from ZOS. I don't understand how that addresses my concern. Stahr, Lea wrote: With clustering, you shut down one image and do an OFFLINE backup while the application runs on the second image. Then bring up the primary image and shutdown the secondary system for backup. which sounds every bit as tricky to me as getting good backups from a live Linux system. I'm negotiating purchase of a PC with your online retail shop and you take down the box I'm talking to while I'm negotiating PC options such as RAM, CPU, disk -- -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wed, Jul 26, 2006 at 03:04:34PM -0400, Alan Altmark wrote: > On Wednesday, 07/26/2006 at 01:27 EST, J Leslie Turriff > <[EMAIL PROTECTED]> wrote: > > Okay, now, wait; are you saying that the storage device _does_ have a > > mechanism for communicating with the Linux filesystem to determine what > > filesystem pages are still cached in main storage and have not yet been > > commited to external storage? > > No. I'm saying that an application that closes or flushes all of its open > files and then tells the filesystem "commit the filesystem to disk" (e.g. > sync) is then at a known point with respect to the dasd. It is free at > that point to kick off a flashcopy via some command or utility and start > running again. if you are doing an fsync that data is guarnateed to be on stable storage, yes. But that's not enough, because it is a) not specified where on stable storage, it could for example still be in the log of a data journaling device b) you risk sever corruption if the filesystem metadata is not in a coherent state, up to the point that you can't find your data anymore despite it beeing on stable storage. > > Alan Altmark > z/VM Development > IBM Endicott > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 ---end quoted text--- -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wed, Jul 26, 2006 at 01:27:06PM -0500, J Leslie Turriff wrote: > Okay, now, wait; are you saying that the storage device _does_ have a > mechanism for communicating with the Linux filesystem to determine what > filesystem pages are still cached in main storage and have not yet been > commited to external storage? It doesn't. It's also not as easy as having a list of pages that need commiting. What we would need is a way for the storage device (or rather the software controlling it) to call the existing Linux lockfs functionality. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wed, Jul 26, 2006 at 12:50:09PM -0400, Alan Altmark wrote: > You're right, however, and as we've been discussing, that these features > can be misused or misinterpreted to provide an *application*-consistent > view of the data. They don't do that. That applies to any operating > system, not just Linux. And it's not the lock/unlock features of a > filesystem that are important. Instead, the application must be able to > exert control on the filesystem in such a way that it *knows* that all > [relevant] data has been committed to disk and can say "OK. Now is a good > time to take that backup." With a transaction-oriented application the filesystem data is always coherent, if you application isn't transaction-based all hope for a coherent backup is lost. > > Properly used, these features can drastically reduce the amount of down > time needed to perform application-consistent backups. > > Alan Altmark > z/VM Development > IBM Endicott > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 ---end quoted text--- -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wed, Jul 26, 2006 at 06:21:03PM +0200, Rob van der Heij wrote: > On 7/26/06, Mark Perry <[EMAIL PROTECTED]> wrote: > > >One point not mentioned yet, is that FLASHCOPY is an asynchronous process. > >You can start a FLASHCOPY operation and it *can* return an error status > >asynchronously. 90+% of the time this is not apparent, the request is made > >and the Shark goes happily on its way. However if the request that is > >queued within the Shark has to be terminated (Resource shortages, target > >volume errors etc.) then beware! > > Unless I am terribly misinformed, it *is* an atomic operation for the > operating system. Doing it atomic is not enough. You need to put the filesystem into a coherent state first. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Thursday, 07/27/2006 at 09:57 ZE2, Carsten Otte <[EMAIL PROTECTED]> wrote: > I am sorry, but I have to disagree with Alan's statement. They _are_ > currently dangerous to use with Linux volumes that are being accessed > _because_ unlike dm-snapshot the filesystem is not frozen in Linux > (lockfs) and thus the data on disk is inconsitent due to caching. > DM-snapshot does the desired trick, flashcopy does not. > > I feel sorry for causing confusion by creating the expectation that > flashcopy can be used to snapshot linux volumes before. I had previously said that you shouldn't use Flashcopy on active volumes. I meant that Flashcopy is just a mechanism for making a copy of the media. If you don't worry about the mechanics underneath and just think of it as a super-fast copy function, then all will be fine. High-speed copy technology does not absolve the system manager from properly preparing the system for backup. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
A piece of cake! Use VMUTIL on VM to do the shutdowns and startups and have the backups scheduled appropriately. Or get the CONTROL-M agent and have that do it all from ZOS. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Wednesday, July 26, 2006 7:18 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > With clustering, you shut down one image and do an OFFLINE backup while > the application runs on the second image. Then bring up the primary > image and shutdown the secondary system for backup. > which sounds every bit as tricky to me as getting good backups from a live Linux system. I'm negotiating purchase of a PC with your online retail shop and you take down the box I'm talking to while I'm negotiating PC options such as RAM, CPU, disk -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Funny thing testing! I tested it and it worked four times in a row. Then when I actually needed it, it failed. Thank you fuzzy backups! Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Alan Altmark Sent: Wednesday, July 26, 2006 3:13 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups On Wednesday, 07/26/2006 at 02:55 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: > Okay. I may be wrong, but it seems to me that the majority of Linux > applications (probably excepting database packages and such) rely on the > filesystem to eventually get their data to disk without them doing > anything besides open, write and close operations. And the circle is closed. Hence this entire thread/rant about shutting down servers while you are flashcopying or otherwise performing external physical backups. If you know what you and the application are doing, take a live backup. If you don't, don't. If the application provides you with a set of backup functions, use them. Oh, and the point that actually started the whole thing: Test your backups. You should already be doing that in your DR tests, but if you change your processes, re-test. "There's a hole in my bucket, dear Liza, dear Liza" ;-) Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
J Leslie Turriff wrote: Sounds to me, then, like the use of the snapshot/mirror/peer-to-peer copy features of storage devices e.g. Shark, SATABeast, etc. are currently dangerous to use with Linux filesystems. They would need to be able to coordinate their activities with the filesystem lock/unlock components of the kernel to be made safe? My limited knowledge suggests you need do no more than reboot, maybe less (basically you want the filesystem ro for a moment), initiating the flashcopy at the right instant. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Carsten Otte wrote: Fargusson.Alan wrote: I agree. I think you should make your backups with the Linux system down. You should test this to make sure that there is not some other operational error causing problems. I think we got close to the bottom of the stack now: If one can take down the system for backup it is a good idea to do so because of the reasons discussed in this thread. Backing up a running system involves trust in the application and the file system. I've found this an interesting discussion; I've been wondering for some time how the pros who have their big businesses (and/or careers) on the line do it. I've always suspected it's not so simple as it might be, and this has confirmed my opinion: One's choices are 1. Do it perfectly, with the system down. 2. Do it less rigorously with the system up, but analyse the implications very carefully. You don't want your latest recruit doing this:-) It's become clear to me that even experienced folk can get this wrong, and argue that they're right! -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Alan Altmark wrote: On Wednesday, 07/26/2006 at 01:27 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: Okay, now, wait; are you saying that the storage device _does_ have a mechanism for communicating with the Linux filesystem to determine what filesystem pages are still cached in main storage and have not yet been commited to external storage? No. I'm saying that an application that closes or flushes all of its open files and then tells the filesystem "commit the filesystem to disk" (e.g. sync) is then at a known point with respect to the dasd. It is free at that point to kick off a flashcopy via some command or utility and start running again. _Only_ if all users of the filesystem agree! It seems less straightforward to me if you have more than one application writing to the filesystem, or if your application's file are spread across filesystems. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Stahr, Lea wrote: With clustering, you shut down one image and do an OFFLINE backup while the application runs on the second image. Then bring up the primary image and shutdown the secondary system for backup. which sounds every bit as tricky to me as getting good backups from a live Linux system. I'm negotiating purchase of a PC with your online retail shop and you take down the box I'm talking to while I'm negotiating PC options such as RAM, CPU, disk -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
J Leslie Turriff wrote: > Okay, now, wait; are you saying that the storage device _does_ have a > mechanism for communicating with the Linux filesystem to determine what > filesystem pages are still cached in main storage and have not yet been > commited to external storage? No, it does not. Invention required. cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> On Wednesday, 07/26/2006 at 10:33 EST, J Leslie Turriff > <[EMAIL PROTECTED]> wrote: >> Sounds to me, then, like the use of the >> snapshot/mirror/peer-to-peer copy features of storage devices e.g. >> Shark, SATABeast, etc. are currently dangerous to use with Linux >> filesystems. They would need to be able to coordinate their activities >> with the filesystem lock/unlock components of the kernel to be made >> safe? Alan Altmark wrote: > No, they are not "currently dangerous to use with Linux". The > snapshot/flashcopy features provide a point-in-time consistent view of an > entire device or range of blocks/cylinders. In a "normal" track-by-track > read, data on the device can change while you're reading. I am sorry, but I have to disagree with Alan's statement. They _are_ currently dangerous to use with Linux volumes that are being accessed _because_ unlike dm-snapshot the filesystem is not frozen in Linux (lockfs) and thus the data on disk is inconsitent due to caching. DM-snapshot does the desired trick, flashcopy does not. I feel sorry for causing confusion by creating the expectation that flashcopy can be used to snapshot linux volumes before. cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wednesday, 07/26/2006 at 02:55 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: > Okay. I may be wrong, but it seems to me that the majority of Linux > applications (probably excepting database packages and such) rely on the > filesystem to eventually get their data to disk without them doing > anything besides open, write and close operations. And the circle is closed. Hence this entire thread/rant about shutting down servers while you are flashcopying or otherwise performing external physical backups. If you know what you and the application are doing, take a live backup. If you don't, don't. If the application provides you with a set of backup functions, use them. Oh, and the point that actually started the whole thing: Test your backups. You should already be doing that in your DR tests, but if you change your processes, re-test. "There's a hole in my bucket, dear Liza, dear Liza" ;-) Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wednesday, 07/26/2006 at 02:20 AST, David Kreuter <[EMAIL PROTECTED]> wrote: > including ESTABLISH, QUERY and WITHDRAW ala ickdsf on z/OS? > will ickdsf on z/vm be changed to support these functions? I give an inch and you want a mile! :-) We will, among other things, be adding a QUERY capability for convenience. As far as ICKDSF goes, you can establish flashcopy relationships among your minidisks as long as they are on the same controller. You may need to order service for DSF to bring it's functionality up the that documented in the -30 level of the manual. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Okay. I may be wrong, but it seems to me that the majority of Linux applications (probably excepting database packages and such) rely on the filesystem to eventually get their data to disk without them doing anything besides open, write and close operations. J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] >>>[EMAIL PROTECTED] 07/26/06 2:04 pm >>> On Wednesday, 07/26/2006 at 01:27 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: >Okay, now, wait; are you saying that the storage device _does_ have a >mechanism for communicating with the Linux filesystem to determine what >filesystem pages are still cached in main storage and have not yet been >commited to external storage? No. I'm saying that an application that closes or flushes all of its open files and then tells the filesystem "commit the filesystem to disk" (e.g. sync) is then at a known point with respect to the dasd. It is free at that point to kick off a flashcopy via some command or utility and start running again. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 !DSPAM:32225,44c7bdcf88571709617740! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wednesday, 07/26/2006 at 01:27 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: > Okay, now, wait; are you saying that the storage device _does_ have a > mechanism for communicating with the Linux filesystem to determine what > filesystem pages are still cached in main storage and have not yet been > commited to external storage? No. I'm saying that an application that closes or flushes all of its open files and then tells the filesystem "commit the filesystem to disk" (e.g. sync) is then at a known point with respect to the dasd. It is free at that point to kick off a flashcopy via some command or utility and start running again. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Okay, now, wait; are you saying that the storage device _does_ have a mechanism for communicating with the Linux filesystem to determine what filesystem pages are still cached in main storage and have not yet been commited to external storage? J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] >>>[EMAIL PROTECTED] 07/26/06 11:50 am >>> On Wednesday, 07/26/2006 at 10:33 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: >Sounds to me, then, like the use of the >snapshot/mirror/peer-to-peer copy features of storage devices e.g. >Shark, SATABeast, etc. are currently dangerous to use with Linux >filesystems. They would need to be able to coordinate their activities >with the filesystem lock/unlock components of the kernel to be made >safe? No, they are not "currently dangerous to use with Linux". The snapshot/flashcopy features provide a point-in-time consistent view of an entire device or range of blocks/cylinders. In a "normal" track-by-track read, data on the device can change while you're reading. You're right, however, and as we've been discussing, that these features can be misused or misinterpreted to provide an -consistent view of the data. They don't do that. That applies to any operating system, not just Linux. And it's not the lock/unlock features of a filesystem that are important. Instead, the application must be able to exert control on the filesystem in such a way that it *knows* that all [relevant] data has been committed to disk and can say "OK. Now is a good time to take that backup." Properly used, these features can drastically reduce the amount of down time needed to perform application-consistent backups. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 !DSPAM:32225,44c79d9988571674117836! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
including ESTABLISH, QUERY and WITHDRAW ala ickdsf on z/OS? will ickdsf on z/vm be changed to support these functions? David Yes, the CP FLASHCOPY command is one of those asynchronous commands. But take heart! We are busily improving it, making it more suitable for use in scripts. (And adding function to it while we're at it.) Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wednesday, 07/26/2006 at 01:59 AST, Michael MacIsaac/Poughkeepsie/[EMAIL PROTECTED] wrote: > The z/VM FLASHCOPY command can give a return code of 0 and then *fail* > later asynchronously. It is difficult to trap in REXX (for a mere mortal > like myself). And it will fail reliably (and asynchronously) if you queue > up too much work to the Shark/DSx000. This behavior is not well suited to > scripting :(( As such we had to pull back in a few cases on FLASHCOPY in > "The Virtualization Cookbook". Yes, the CP FLASHCOPY command is one of those asynchronous commands. But take heart! We are busily improving it, making it more suitable for use in scripts. (And adding function to it while we're at it.) Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> Unless I am terribly misinformed, it *is* an atomic operation for the > operating system. Even though from a storage management point of view > it may take some time. The z/VM FLASHCOPY command can give a return code of 0 and then *fail* later asynchronously. It is difficult to trap in REXX (for a mere mortal like myself). And it will fail reliably (and asynchronously) if you queue up too much work to the Shark/DSx000. This behavior is not well suited to scripting :(( As such we had to pull back in a few cases on FLASHCOPY in "The Virtualization Cookbook". "Mike MacIsaac" <[EMAIL PROTECTED]> (845) 433-7061 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wednesday, 07/26/2006 at 10:33 EST, J Leslie Turriff <[EMAIL PROTECTED]> wrote: > Sounds to me, then, like the use of the > snapshot/mirror/peer-to-peer copy features of storage devices e.g. > Shark, SATABeast, etc. are currently dangerous to use with Linux > filesystems. They would need to be able to coordinate their activities > with the filesystem lock/unlock components of the kernel to be made > safe? No, they are not "currently dangerous to use with Linux". The snapshot/flashcopy features provide a point-in-time consistent view of an entire device or range of blocks/cylinders. In a "normal" track-by-track read, data on the device can change while you're reading. You're right, however, and as we've been discussing, that these features can be misused or misinterpreted to provide an *application*-consistent view of the data. They don't do that. That applies to any operating system, not just Linux. And it's not the lock/unlock features of a filesystem that are important. Instead, the application must be able to exert control on the filesystem in such a way that it *knows* that all [relevant] data has been committed to disk and can say "OK. Now is a good time to take that backup." Properly used, these features can drastically reduce the amount of down time needed to perform application-consistent backups. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On 7/26/06, Mark Perry <[EMAIL PROTECTED]> wrote: One point not mentioned yet, is that FLASHCOPY is an asynchronous process. You can start a FLASHCOPY operation and it *can* return an error status asynchronously. 90+% of the time this is not apparent, the request is made and the Shark goes happily on its way. However if the request that is queued within the Shark has to be terminated (Resource shortages, target volume errors etc.) then beware! Unless I am terribly misinformed, it *is* an atomic operation for the operating system. Even though from a storage management point of view it may take some time. And to maintain the illusion the device needs resources (e.g. cache, extra disk space, etc). The same applies to freezing the file system in Linux as suggested. If freezing means that dirty pages are held back until the freeze is over, then it will increase the demand for memory in the server. If the server is large enough this process would increase the working set size, but not worse than otherwise because page cache would be used anyway. Using snapshot on the DASD subsystem means you can shorten the time that Linux needs to hold its breath, and thus limit the amount of data to be held up. The alternative (file level backup inside Linux) will fill the page cache with meta data for the entire file system (rather than the content that changed during backup). Which is worse depends on your situation. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
J Leslie Turriff wrote: > Sounds to me, then, like the use of the > snapshot/mirror/peer-to-peer copy features of storage devices e.g. > Shark, SATABeast, etc. are currently dangerous to use with Linux > filesystems. They would need to be able to coordinate their activities > with the filesystem lock/unlock components of the kernel to be made > safe? Exactly, yes. cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
- Start Original Message - Sent: Wed, 26 Jul 2006 10:33:32 -0500 From: J Leslie Turriff <[EMAIL PROTECTED]> To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > Sounds to me, then, like the use of the > snapshot/mirror/peer-to-peer copy features of storage devices e.g. > Shark, SATABeast, etc. are currently dangerous to use with Linux > filesystems. They would need to be able to coordinate their activities > with the filesystem lock/unlock components of the kernel to be made > safe? One point not mentioned yet, is that FLASHCOPY is an asynchronous process. You can start a FLASHCOPY operation and it *can* return an error status asynchronously. 90+% of the time this is not apparent, the request is made and the Shark goes happily on its way. However if the request that is queued within the Shark has to be terminated (Resource shortages, target volume errors etc.) then beware! Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Sounds to me, then, like the use of the snapshot/mirror/peer-to-peer copy features of storage devices e.g. Shark, SATABeast, etc. are currently dangerous to use with Linux filesystems. They would need to be able to coordinate their activities with the filesystem lock/unlock components of the kernel to be made safe? J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] >>>[EMAIL PROTECTED] 07/26/06 9:04 am >>> On Wed, Jul 26, 2006 at 02:28:53PM +0200, Carsten Otte wrote: >Very interresting indeed. This pointed me to reading the >lockfs/unlockfs semantics in Linux, and I think I need to withdraw my >statement regarding flashcopy snapshots: because of the fact that >there is no lockfs/unlockfs interaction when doing flashcopy, and >because of dirty pages in the page cache during snapshot, flashcopy >will not generate a consistent snapshot. Therefore, using flashcopy on >an active volume from outside Linux is _not_ suitable for backup purposes. > >The only feasible way to get a consistent snapshot is to use >dm-snapshot from within Linux. This snapshot copy can later on be used >with a backup feature outside Linux. If you use xfs you can also put the filesystem in frozen state from userspace with the xfs_freeze utility. I know of inhouse backup tools at various companies that make use of this feature. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 !DSPAM:32225,44c776f188571486219204! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Wed, Jul 26, 2006 at 02:28:53PM +0200, Carsten Otte wrote: > Very interresting indeed. This pointed me to reading the > lockfs/unlockfs semantics in Linux, and I think I need to withdraw my > statement regarding flashcopy snapshots: because of the fact that > there is no lockfs/unlockfs interaction when doing flashcopy, and > because of dirty pages in the page cache during snapshot, flashcopy > will not generate a consistent snapshot. Therefore, using flashcopy on > an active volume from outside Linux is _not_ suitable for backup purposes. > > The only feasible way to get a consistent snapshot is to use > dm-snapshot from within Linux. This snapshot copy can later on be used > with a backup feature outside Linux. If you use xfs you can also put the filesystem in frozen state from userspace with the xfs_freeze utility. I know of inhouse backup tools at various companies that make use of this feature. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
- Start Original Message - Sent: Wed, 26 Jul 2006 11:49:45 +0200 From: Christoph Hellwig <[EMAIL PROTECTED]> To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > On Tue, Jul 25, 2006 at 01:22:53PM +0200, Mark Perry wrote: > > I believe that several DB systems offer direct/raw I/O to avoid Linux cache > > problems, and that journaling filesystems, although by default only journal > > meta-data, offer mount options to journal data too. This of course comes at > > a performance price, though Hans Reiser did claim that the new Resier4 FS > > will journal data without the previous performance penalties. > > Journalled filesystems only journal buffered I/O. Direct I/O means you > do direct dma operations from the storage controller to the user address > space. It's physically impossible to journal. I agree, I did not mean to connect direct I/O and journalling. I was merely commenting on features that are available to assist in ensuring data integrity on disk. Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Christoph Hellwig wrote: > But that's not how snapshot work. When you do a snapshot the filesystem > is frozen. That means: new file writers are blocked from dirtying the > filesystem throug the pagecache. The filesystem block callers that want > to create new transactions. Then the whole file cache is written out > and the asynchronous write ahead log (journal) is written out on disk. > The filesystem is in a fully consistant state. Trust me, I've > implemented this myself for XFS. Very interresting indeed. This pointed me to reading the lockfs/unlockfs semantics in Linux, and I think I need to withdraw my statement regarding flashcopy snapshots: because of the fact that there is no lockfs/unlockfs interaction when doing flashcopy, and because of dirty pages in the page cache during snapshot, flashcopy will not generate a consistent snapshot. Therefore, using flashcopy on an active volume from outside Linux is _not_ suitable for backup purposes. The only feasible way to get a consistent snapshot is to use dm-snapshot from within Linux. This snapshot copy can later on be used with a backup feature outside Linux. regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Tue, Jul 25, 2006 at 09:06:54AM +0800, John Summerfied wrote: > To avoid the nitpickers, let's say that David means all filesystems must > be flushed and ro. > > As I understand it, journalling (by default) logs metadata (dirctory > info) but not data. > > If you create a file, that's journalled. If you extend a file, that's > journalled. The data you write to the file are not. > > Let's say that you create a file, write 4K to it, close it. Let's say > you do a backup of the volume externally while the 4K data remains > unwritten. Note: read in "man 2 close" "A successful close does not > guarantee that the data has been successfully saved to disk." > > So now you have journalled (or comitted) metadata that says the file's > got 4K of data in it. > > But, it hasn't. In the ordinary course of events, the data gets written > to disk ans all is well. > > The same sort of thing happens when a file's updated in place, as I > expect databases commonly are. But that's not how snapshot work. When you do a snapshot the filesystem is frozen. That means: new file writers are blocked from dirtying the filesystem throug the pagecache. The filesystem block callers that want to create new transactions. Then the whole file cache is written out and the asynchronous write ahead log (journal) is written out on disk. The filesystem is in a fully consistant state. Trust me, I've implemented this myself for XFS. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Tue, Jul 25, 2006 at 01:22:53PM +0200, Mark Perry wrote: > I believe that several DB systems offer direct/raw I/O to avoid Linux cache > problems, and that journaling filesystems, although by default only journal > meta-data, offer mount options to journal data too. This of course comes at > a performance price, though Hans Reiser did claim that the new Resier4 FS > will journal data without the previous performance penalties. Journalled filesystems only journal buffered I/O. Direct I/O means you do direct dma operations from the storage controller to the user address space. It's physically impossible to journal. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
I haven't finished all the replies to this thread, so I apologize if I'm duplicating someone else's comment/question. The thing that comes to my mind is, what _exactly_ do you mean by "not boot." Nothing happens at all? The system starts to come up, but can't find the root file system? The root file system gets mounted, but things start dying for various reasons? The problem description needs to be filled in quite a bit more. Mark Post -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Stahr, Lea Sent: Tuesday, July 25, 2006 8:18 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Tuesday, 07/25/2006 at 02:19 ZE2, Carsten Otte <[EMAIL PROTECTED]> wrote: > Stahr, Lea wrote: > > FDR says working as designed. They back up the entire volume and restore > > the entire volume. I have restored 3 systems and they DO NOT BOOT. > How does FDR copy the volume? Do they sequentially copy track-by-track > or use flashcopy? You would have to presume track-by-track copy since flashcopy is an optional feature and isn't available on all dasd brands. Alan Altmark z/VM Development IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Sorry for asking this at such a late time but did you say the Linux guest was shutdown when the FDR backup jobs were run? >>> [EMAIL PROTECTED] 7/25/2006 11:27 AM >>> Gentlemen, I must agree with the validity of external backups, but only when the Linux is down. Any backup taken internally OR externally while the system is running may not work due to extensive caching by the system itself and by the applications. If I cannot restore my application to a current state, then it's broken. And these were all either EXT3 or REISERFS. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of David Boyes Sent: Tuesday, July 25, 2006 10:11 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > Therefore, dm-snapshot and > flashcopy are two sides of the same medal once the entire filesystem > is on a single dasd. That's a pretty large assumption, especially since the recommended wisdom for most "advanced applications" -- like DB/2 and WAS -- is *not* to put things like data and logs on the same filesystem for performance reasons. > > Given how quickly this can change in > > most real production systems, I don't have time or spare cycles to try > > to second-guess this, or make excuses when I miss backing up something > > important because someone didn't tell me that a change in data location > > was made. > The point is, that data is considered stable at any time. That's a > basic assumption which is true for ext3 and most applications. If you > run a file system or an application that does have inconsistent data > from time to time, you are in trouble in case of a power outage or > system crash. I hope this is not the case in any production environment. With respect, I think this is an unrealistic expectation. I don't control the application programmers at IBM or S/AP or Oracle, etc. If you want to preach on proper application design to those folks, I'll happily supply amens from the pews, but out here in the real world, it ain't so, and it ain't gonna be so for a good long while (or at least until the current crop of programmers re-discover all the development validation model work that we did back in the 70s at PARC). We're faced with dealing with the world as it is, not as we'd like it to be, and that reality contradicts your assertion. The filesystem contents may be technically consistent, but if the applications disagree for *any* reason, then that doesn't help us at all given what we have to work with in the field. It's a goal to build toward, but for now, it's just that: a goal. With *today's* applications, you need a guaranteed valid state both from the application *and* filesystem standpoint, and to get that, you need to coordinate backups from both inside and outside the guest if you want to use facilities outside the guest to dump the data. How you do that coordination is what I think you're trying to argue and there, your points are extremely valid and useful; my point still stands that without coordination between Linux and whatever else you're using, you're not going to get the desired result, which is a no-exceptions way to handle backup and restore of critical data in the most efficient manner available. -- db -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Fw: [LINUX-390] Bad Linux backups
Rob van der Heij wrote: On 7/25/06, John Campbell <[EMAIL PROTECTED]> wrote: In all of this, isn't the UNIONFS still a live deal? If as many client systems as possible use a set of backing F/Ss that are Read Only, wouldn't Yes, it's mostly working. I have done quite a lot with it on s390. You probably don't want to use it for all your data (for performance reasons) but just for parts of the file system that are mostly unmodified (like /etc). I would not use it for all data on the system though. With unionfs you can put a sparse R/W file system on top and have the modified files reside on some private R/W disk. Because that R/W disk still is a real file system, you could sort of run a file level backup of that disk outside the unionfs. For a stable backup you would have the issues we discussed in this thread though. But you could even put a temporary R/W layer on top and divert all writes to that layer, backup the (now frozen) first R/W layer, and then merge any updates during the backup back into the first R/W layer. This is neat because it's file level, but there may be a performance issue when files need to be copied up. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 AFAIK unionfs copies the entire file when a change is made, unlike other snapshot methods which only record the changes at a more granular level. Thus as Rob mentions it is *very* useful for ascii text file changes (/etc or source code etc.), but not for your DB files! Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Fw: [LINUX-390] Bad Linux backups
On 7/25/06, John Campbell <[EMAIL PROTECTED]> wrote: In all of this, isn't the UNIONFS still a live deal? If as many client systems as possible use a set of backing F/Ss that are Read Only, wouldn't Yes, it's mostly working. I have done quite a lot with it on s390. You probably don't want to use it for all your data (for performance reasons) but just for parts of the file system that are mostly unmodified (like /etc). I would not use it for all data on the system though. With unionfs you can put a sparse R/W file system on top and have the modified files reside on some private R/W disk. Because that R/W disk still is a real file system, you could sort of run a file level backup of that disk outside the unionfs. For a stable backup you would have the issues we discussed in this thread though. But you could even put a temporary R/W layer on top and divert all writes to that layer, backup the (now frozen) first R/W layer, and then merge any updates during the backup back into the first R/W layer. This is neat because it's file level, but there may be a performance issue when files need to be copied up. Rob -- Rob van der Heij Velocity Software, Inc http://velocitysoftware.com/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Fw: [LINUX-390] Bad Linux backups
In all of this, isn't the UNIONFS still a live deal? If as many client systems as possible use a set of backing F/Ss that are Read Only, wouldn't the local copy ONLY consist of changed files? And wouldn't the local copy (I'm not sure, UNIONFS _does_ handle having a R/W copy on a hard disk, right? Or am I talking through a sphincter again?) be in a form that could be copied off VERY quickly because it'd be pretty darn small? It seems, at least w/ z/VM, that this kind of trick would be almost *made* for a virtualized environment. I realize, though, that this discussion has been about LPAR'd environments so I'm not sure what would be different. Note that my experiences are currently limited to Intel and pSeries (PowerPC) based systems... John R. Campbell, Speaker to Machines (GNUrd) (813) 356-5322 (t/l 697) Adsumo ergo raptus sum Why MacOS X? Well, it's proof that making Unix user-friendly was much easier than debugging Windows. Red Hat Certified Engineer (#803004680310286, RHEL3) - Forwarded by John Campbell/Tampa/IBM on 07/25/06 01:20 PM - Adam Thornton <[EMAIL PROTECTED] mine.net> To Sent by: Linux on LINUX-390@VM.MARIST.EDU 390 Port cc <[EMAIL PROTECTED] IST.EDU> Subject Re: [LINUX-390] Bad Linux backups 07/24/06 04:38 PM Please respond to Linux on 390 Port <[EMAIL PROTECTED] IST.EDU> On Jul 24, 2006, at 1:35 PM, David Boyes wrote: >> Such an approach does require discipline to properly register what >> you >> have modified and to assure the copy of that customized file is held >> somewhere. > > Tripwire is a handy tool for this. Run it every night and have it > generate a list of changes in diff format. You can then turn that diff > into input to patch and deploy it as a .deb or .rpm. Or, if you're feeling REALLY parsimonious, you can use Bacula in its "verify" mode to do the same thing. Adam -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> Earlier in this thread there was mention of using clustering > services to avoid outages while doing backups. Wouldn't that involve > the same sort of data-in-flight issues? Not really, because the major thrust of clustering tools is to coordinate the services and workload between the cluster members. In that case, there is no exposure for the guest being shut down, because the work and inflight transactions have been moved elsewhere in a coordinated manner. Once the guest leaves the cluster, then the shutdown/logoff is trivial, and you can do what you like with dumping the disks for that guest from outside the guest. David Boyes Sine Nomine Associates -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
J Leslie Turriff wrote: > Earlier in this thread there was mention of using clustering > services to avoid outages while doing backups. Wouldn't that involve > the same sort of data-in-flight issues? If the data is shared among the nodes, like with nfs or a cluster filesystem, yes. with kind regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
With clustering, you shut down one image and do an OFFLINE backup while the application runs on the second image. Then bring up the primary image and shutdown the secondary system for backup. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of J Leslie Turriff Sent: Tuesday, July 25, 2006 10:54 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Earlier in this thread there was mention of using clustering services to avoid outages while doing backups. Wouldn't that involve the same sort of data-in-flight issues? J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Earlier in this thread there was mention of using clustering services to avoid outages while doing backups. Wouldn't that involve the same sort of data-in-flight issues? J. Leslie Turriff VM Systems Programmer Central Missouri State University Room 400 Ward Edwards Building Warrensburg MO 64093 660-543-4285 660-580-0523 [EMAIL PROTECTED] -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Jul 25, 2006, at 8:49 AM, Carsten Otte wrote: James Melin wrote: Not even remotely close to what I was thinking Greater minds than mine any ideas? Oh not me. The oops seems to be issued in the filesystem code, probably reiserfs. Lea, could you run the oops message through ksymoops please? That would utterly fail to surprise me. ReiserFS and I don't get along. Ext3, on the other hand, has rarely bitten me without extreme provocation. Adam -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
James Melin wrote: > Not even remotely close to what I was thinking Greater minds than > mine any ideas? Oh not me. The oops seems to be issued in the filesystem code, probably reiserfs. Lea, could you run the oops message through ksymoops please? regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Fargusson.Alan wrote: > I agree. I think you should make your backups with the Linux system down. > You should test this to make sure that there is not some other operational > error causing problems. I think we got close to the bottom of the stack now: If one can take down the system for backup it is a good idea to do so because of the reasons discussed in this thread. Backing up a running system involves trust in the application and the file system. with kind regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Not even remotely close to what I was thinking Greater minds than mine any ideas? "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc 07/25/2006 09:31 AM Subject Re: Bad Linux backups Please respond to Linux on 390 Port All systems are clones of an original, so all device assignments in Linux are the same across systems. Here is my ZIPL and FSTAB. I cannot generate the error messages as I had to re-clone the system and get it back online. I was receiving errors on different filesystems and I ran FSCK's on them, then received this abend in the kernel: I restored the Linux volumes from the backup tapes and it will not come up all the way. I switched tape sets with the same result. The kernel is abending. Unable to handle kernel pointer dereference at virtual kernel address 0c80 Oops: 0010 CPU:0Not tainted Process find (pid: 497, task: 09804000, ksp: 09805940) Krnl PSW : 07081000 8d020b96 Krnl GPRS: 09709a80 fffd2348 0c7fff79 6003 0c5c7570 d140 09a3e779 09a34300 b814a409 00643c03 0d14 6004 09a414b0 8d020864 8d020a54 09805c98 Krnl ACRS: 0001 Krnl Code: d2 ff 40 00 20 00 41 40 41 00 41 20 21 00 a7 16 ff f9 a7 15 Call Trace: <1>Unable to handle kernel pointer dereference at virtual kernel address 18 00 Oops: 0010 CPU:0Not tainted Process find (pid: 497, task: 09804000, ksp: 09805940) Krnl PSW : 07082000 800765aa Krnl GPRS: 00a83f80 0002 1800 00c0 ** [EMAIL PROTECTED]:~> cd /etc [EMAIL PROTECTED]:/etc> cat zipl.conf # Generated by YaST2 [defaultboot] default=ipl [ipl] target=/boot/zipl image=/boot/kernel/image ramdisk=/boot/initrd parameters="dasd=0201-020f,0300-030f root=/dev/dasda1" [dumpdasd] target=/boot/zipl dumpto=/dev/dasd?? [dumptape] target=/boot/zipl dumpto=/dev/rtibm0 [EMAIL PROTECTED]:/etc> cat fstab /dev/dasda1 /reiserfs defaults 1 1 /dev/dasda2 /tmp ext2 defaults 1 2 /dev/dasdb1 /usr reiserfs defaults 1 2 /dev/dasdd1 /var reiserfs defaults 1 2 /dev/dasde1 /homereiserfs defaults 1 2 /dev/dasdf1 /user2 reiserfs defaults 1 2 /dev/dasdc1 swap swap pri=42 0 0 /dev/dasdd2 /opt/IBM reiserfs defaults 0 2 devpts /dev/pts devpts mode=0620,gid=5 0 0 proc /procproc defaults 0 0 [EMAIL PROTECTED]:/etc> Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of James Melin Sent: Tuesday, July 25, 2006 8:43 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups I think I might have an idea as to where your problem MAY be - I'm guessing at this point so I'd like you to fill in some blanks Are your Linux guests device number consistent across Linuxen? As in, /dev/dasdb1 is always device 601 and /{root file system} is 600, etc? Or is it on Linux A it's 600,601,602 etc and Linux B it's 400,401,402 etc? What does your zipl.conf look like? Also can you post any messages you get when you try to boot them? "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc 07/25/2006 07:17 AM Subject Re: Bad Linux backups Please respond to Linux on 390 Port FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port
Re: Bad Linux backups
David Boyes wrote: >> Therefore, dm-snapshot and >> flashcopy are two sides of the same medal once the entire filesystem >> is on a single dasd. > > That's a pretty large assumption, especially since the recommended > wisdom for most "advanced applications" -- like DB/2 and WAS -- is *not* > to put things like data and logs on the same filesystem for performance > reasons. Yup, I know that "everything on a single dasd" is a strong limitation. But since flashcopy does'nt allow to snapshot multiple volumes at a time, it is the only way to get a snapshot of all data involved from outside the system that I know of. >> The point is, that data is considered stable at any time. That's a >> basic assumption which is true for ext3 and most applications. If you >> run a file system or an application that does have inconsistent data >> from time to time, you are in trouble in case of a power outage or >> system crash. I hope this is not the case in any production > environment. > > With respect, I think this is an unrealistic expectation. I don't > control the application programmers at IBM or S/AP or Oracle, etc. If > you want to preach on proper application design to those folks, I'll > happily supply amens from the pews, but out here in the real world, it > ain't so, and it ain't gonna be so for a good long while (or at least > until the current crop of programmers re-discover all the development > validation model work that we did back in the 70s at PARC). It depends on the type of application. For a fileserver or static webserver for example, this requirement is fullfilled. For more complex servers, it can get nasty. > With *today's* applications, you need a guaranteed valid state both from > the application *and* filesystem standpoint, and to get that, you need > to coordinate backups from both inside and outside the guest if you want > to use facilities outside the guest to dump the data. How you do that > coordination is what I think you're trying to argue and there, your > points are extremely valid and useful; my point still stands that > without coordination between Linux and whatever else you're using, > you're not going to get the desired result, which is a no-exceptions way > to handle backup and restore of critical data in the most efficient > manner available. Some people seem to trust today's applications more, for example the developers of dm-snapshot and the users of per-file backup soloutions like tsm which usually also run while the application is active. with kind regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
I agree. I think you should make your backups with the Linux system down. You should test this to make sure that there is not some other operational error causing problems. -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] Behalf Of Stahr, Lea Sent: Tuesday, July 25, 2006 8:28 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Gentlemen, I must agree with the validity of external backups, but only when the Linux is down. Any backup taken internally OR externally while the system is running may not work due to extensive caching by the system itself and by the applications. If I cannot restore my application to a current state, then it's broken. And these were all either EXT3 or REISERFS. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of David Boyes Sent: Tuesday, July 25, 2006 10:11 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > Therefore, dm-snapshot and > flashcopy are two sides of the same medal once the entire filesystem > is on a single dasd. That's a pretty large assumption, especially since the recommended wisdom for most "advanced applications" -- like DB/2 and WAS -- is *not* to put things like data and logs on the same filesystem for performance reasons. > > Given how quickly this can change in > > most real production systems, I don't have time or spare cycles to try > > to second-guess this, or make excuses when I miss backing up something > > important because someone didn't tell me that a change in data location > > was made. > The point is, that data is considered stable at any time. That's a > basic assumption which is true for ext3 and most applications. If you > run a file system or an application that does have inconsistent data > from time to time, you are in trouble in case of a power outage or > system crash. I hope this is not the case in any production environment. With respect, I think this is an unrealistic expectation. I don't control the application programmers at IBM or S/AP or Oracle, etc. If you want to preach on proper application design to those folks, I'll happily supply amens from the pews, but out here in the real world, it ain't so, and it ain't gonna be so for a good long while (or at least until the current crop of programmers re-discover all the development validation model work that we did back in the 70s at PARC). We're faced with dealing with the world as it is, not as we'd like it to be, and that reality contradicts your assertion. The filesystem contents may be technically consistent, but if the applications disagree for *any* reason, then that doesn't help us at all given what we have to work with in the field. It's a goal to build toward, but for now, it's just that: a goal. With *today's* applications, you need a guaranteed valid state both from the application *and* filesystem standpoint, and to get that, you need to coordinate backups from both inside and outside the guest if you want to use facilities outside the guest to dump the data. How you do that coordination is what I think you're trying to argue and there, your points are extremely valid and useful; my point still stands that without coordination between Linux and whatever else you're using, you're not going to get the desired result, which is a no-exceptions way to handle backup and restore of critical data in the most efficient manner available. -- db -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Gentlemen, I must agree with the validity of external backups, but only when the Linux is down. Any backup taken internally OR externally while the system is running may not work due to extensive caching by the system itself and by the applications. If I cannot restore my application to a current state, then it's broken. And these were all either EXT3 or REISERFS. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of David Boyes Sent: Tuesday, July 25, 2006 10:11 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > Therefore, dm-snapshot and > flashcopy are two sides of the same medal once the entire filesystem > is on a single dasd. That's a pretty large assumption, especially since the recommended wisdom for most "advanced applications" -- like DB/2 and WAS -- is *not* to put things like data and logs on the same filesystem for performance reasons. > > Given how quickly this can change in > > most real production systems, I don't have time or spare cycles to try > > to second-guess this, or make excuses when I miss backing up something > > important because someone didn't tell me that a change in data location > > was made. > The point is, that data is considered stable at any time. That's a > basic assumption which is true for ext3 and most applications. If you > run a file system or an application that does have inconsistent data > from time to time, you are in trouble in case of a power outage or > system crash. I hope this is not the case in any production environment. With respect, I think this is an unrealistic expectation. I don't control the application programmers at IBM or S/AP or Oracle, etc. If you want to preach on proper application design to those folks, I'll happily supply amens from the pews, but out here in the real world, it ain't so, and it ain't gonna be so for a good long while (or at least until the current crop of programmers re-discover all the development validation model work that we did back in the 70s at PARC). We're faced with dealing with the world as it is, not as we'd like it to be, and that reality contradicts your assertion. The filesystem contents may be technically consistent, but if the applications disagree for *any* reason, then that doesn't help us at all given what we have to work with in the field. It's a goal to build toward, but for now, it's just that: a goal. With *today's* applications, you need a guaranteed valid state both from the application *and* filesystem standpoint, and to get that, you need to coordinate backups from both inside and outside the guest if you want to use facilities outside the guest to dump the data. How you do that coordination is what I think you're trying to argue and there, your points are extremely valid and useful; my point still stands that without coordination between Linux and whatever else you're using, you're not going to get the desired result, which is a no-exceptions way to handle backup and restore of critical data in the most efficient manner available. -- db -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> Therefore, dm-snapshot and > flashcopy are two sides of the same medal once the entire filesystem > is on a single dasd. That's a pretty large assumption, especially since the recommended wisdom for most "advanced applications" -- like DB/2 and WAS -- is *not* to put things like data and logs on the same filesystem for performance reasons. > > Given how quickly this can change in > > most real production systems, I don't have time or spare cycles to try > > to second-guess this, or make excuses when I miss backing up something > > important because someone didn't tell me that a change in data location > > was made. > The point is, that data is considered stable at any time. That's a > basic assumption which is true for ext3 and most applications. If you > run a file system or an application that does have inconsistent data > from time to time, you are in trouble in case of a power outage or > system crash. I hope this is not the case in any production environment. With respect, I think this is an unrealistic expectation. I don't control the application programmers at IBM or S/AP or Oracle, etc. If you want to preach on proper application design to those folks, I'll happily supply amens from the pews, but out here in the real world, it ain't so, and it ain't gonna be so for a good long while (or at least until the current crop of programmers re-discover all the development validation model work that we did back in the 70s at PARC). We're faced with dealing with the world as it is, not as we'd like it to be, and that reality contradicts your assertion. The filesystem contents may be technically consistent, but if the applications disagree for *any* reason, then that doesn't help us at all given what we have to work with in the field. It's a goal to build toward, but for now, it's just that: a goal. With *today's* applications, you need a guaranteed valid state both from the application *and* filesystem standpoint, and to get that, you need to coordinate backups from both inside and outside the guest if you want to use facilities outside the guest to dump the data. How you do that coordination is what I think you're trying to argue and there, your points are extremely valid and useful; my point still stands that without coordination between Linux and whatever else you're using, you're not going to get the desired result, which is a no-exceptions way to handle backup and restore of critical data in the most efficient manner available. -- db -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Here is my FDR restore job step for a volume: //DISK5DD UNIT=SYSALLDA,VOL=SER=VML061,DISP=OLD //TAPE5DD DSN=DRP.OPR.SOV.M3DMP.VML061(-3), //SUBSYS=SOV,DISP=SHR //SYSPRIN5 DD SYSOUT=* Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Jeremy Warren Sent: Tuesday, July 25, 2006 8:50 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Did you try the fdr from cyl / to cyl options on the backups? This sounds eerily familar to our label issue. "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port 07/25/2006 08:17 AM Please respond to Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc Subject Re: [LINUX-390] Bad Linux backups FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Monday, July 24, 2006 6:03 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is > also accessed by ZOS. Backups are taken by ZOS using FDR full volume > copies on Saturday morning (low usage). When I restore a backup, it will > not boot. The backup and the restore have the same byte counts. Linux > support at MainLine Systems tells me that he has seen this before at > other customers. What is everyone using for Linux under ZVM backups? > HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
All systems are clones of an original, so all device assignments in Linux are the same across systems. Here is my ZIPL and FSTAB. I cannot generate the error messages as I had to re-clone the system and get it back online. I was receiving errors on different filesystems and I ran FSCK's on them, then received this abend in the kernel: I restored the Linux volumes from the backup tapes and it will not come up all the way. I switched tape sets with the same result. The kernel is abending. Unable to handle kernel pointer dereference at virtual kernel address 0c80 Oops: 0010 CPU:0Not tainted Process find (pid: 497, task: 09804000, ksp: 09805940) Krnl PSW : 07081000 8d020b96 Krnl GPRS: 09709a80 fffd2348 0c7fff79 6003 0c5c7570 d140 09a3e779 09a34300 b814a409 00643c03 0d14 6004 09a414b0 8d020864 8d020a54 09805c98 Krnl ACRS: 0001 Krnl Code: d2 ff 40 00 20 00 41 40 41 00 41 20 21 00 a7 16 ff f9 a7 15 Call Trace: <1>Unable to handle kernel pointer dereference at virtual kernel address 18 00 Oops: 0010 CPU:0Not tainted Process find (pid: 497, task: 09804000, ksp: 09805940) Krnl PSW : 07082000 800765aa Krnl GPRS: 00a83f80 0002 1800 00c0 ** [EMAIL PROTECTED]:~> cd /etc [EMAIL PROTECTED]:/etc> cat zipl.conf # Generated by YaST2 [defaultboot] default=ipl [ipl] target=/boot/zipl image=/boot/kernel/image ramdisk=/boot/initrd parameters="dasd=0201-020f,0300-030f root=/dev/dasda1" [dumpdasd] target=/boot/zipl dumpto=/dev/dasd?? [dumptape] target=/boot/zipl dumpto=/dev/rtibm0 [EMAIL PROTECTED]:/etc> cat fstab /dev/dasda1 /reiserfs defaults 1 1 /dev/dasda2 /tmp ext2 defaults 1 2 /dev/dasdb1 /usr reiserfs defaults 1 2 /dev/dasdd1 /var reiserfs defaults 1 2 /dev/dasde1 /homereiserfs defaults 1 2 /dev/dasdf1 /user2 reiserfs defaults 1 2 /dev/dasdc1 swap swap pri=42 0 0 /dev/dasdd2 /opt/IBM reiserfs defaults 0 2 devpts /dev/pts devpts mode=0620,gid=5 0 0 proc /procproc defaults 0 0 [EMAIL PROTECTED]:/etc> Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of James Melin Sent: Tuesday, July 25, 2006 8:43 AM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups I think I might have an idea as to where your problem MAY be - I'm guessing at this point so I'd like you to fill in some blanks Are your Linux guests device number consistent across Linuxen? As in, /dev/dasdb1 is always device 601 and /{root file system} is 600, etc? Or is it on Linux A it's 600,601,602 etc and Linux B it's 400,401,402 etc? What does your zipl.conf look like? Also can you post any messages you get when you try to boot them? "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc 07/25/2006 07:17 AM Subject Re: Bad Linux backups Please respond to Linux on 390 Port FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Monday, July 24, 2006 6:03 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is > also accessed by ZOS. Backups are taken by ZOS using FDR full volume > copies on Saturday morning (low usage). When I restore a backup, it will > not boot. The backup and the restore have the same byte counts. Linux > support at MainLine Systems tells me that he has seen this before at > other customers. What is everyone using for Linux under ZVM backups? > HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list --
Re: Bad Linux backups
What happens when you try to boot a restored system? Jon FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Did you try the fdr from cyl / to cyl options on the backups? This sounds eerily familar to our label issue. "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port 07/25/2006 08:17 AM Please respond to Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc Subject Re: [LINUX-390] Bad Linux backups FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Monday, July 24, 2006 6:03 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is > also accessed by ZOS. Backups are taken by ZOS using FDR full volume > copies on Saturday morning (low usage). When I restore a backup, it will > not boot. The backup and the restore have the same byte counts. Linux > support at MainLine Systems tells me that he has seen this before at > other customers. What is everyone using for Linux under ZVM backups? > HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
I think I might have an idea as to where your problem MAY be - I'm guessing at this point so I'd like you to fill in some blanks Are your Linux guests device number consistent across Linuxen? As in, /dev/dasdb1 is always device 601 and /{root file system} is 600, etc? Or is it on Linux A it's 600,601,602 etc and Linux B it's 400,401,402 etc? What does your zipl.conf look like? Also can you post any messages you get when you try to boot them? "Stahr, Lea" <[EMAIL PROTECTED]> Sent by: Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc 07/25/2006 07:17 AM Subject Re: Bad Linux backups Please respond to Linux on 390 Port FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Monday, July 24, 2006 6:03 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is > also accessed by ZOS. Backups are taken by ZOS using FDR full volume > copies on Saturday morning (low usage). When I restore a backup, it will > not boot. The backup and the restore have the same byte counts. Linux > support at MainLine Systems tells me that he has seen this before at > other customers. What is everyone using for Linux under ZVM backups? > HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Stahr, Lea wrote: > FDR says working as designed. They back up the entire volume and restore > the entire volume. I have restored 3 systems and they DO NOT BOOT. How does FDR copy the volume? Do they sequentially copy track-by-track or use flashcopy? cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
FDR says working as designed. They back up the entire volume and restore the entire volume. I have restored 3 systems and they DO NOT BOOT. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of John Summerfied Sent: Monday, July 24, 2006 6:03 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups Stahr, Lea wrote: > I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is > also accessed by ZOS. Backups are taken by ZOS using FDR full volume > copies on Saturday morning (low usage). When I restore a backup, it will > not boot. The backup and the restore have the same byte counts. Linux > support at MainLine Systems tells me that he has seen this before at > other customers. What is everyone using for Linux under ZVM backups? > HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
- Start Original Message - Sent: Tue, 25 Jul 2006 11:48:09 +0200 From: Ingo Adlung <[EMAIL PROTECTED]> To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups > Linux on 390 Port wrote on 25.07.2006 11:23:02: > > > Alan Altmark wrote: > > > But, Carsten, the application may start up just fine, however it may be > > > using old data. I have an application running on my workstation right > now > > > that saves its configuration data only when you shut it down (working > as > > > designed according to the vendor). Since the application is only > > > terminated when the system is shut down, a live backup of the disk > would > > > have no effect. I mean, it would restore and run just fine, but be > > > running with old data. > > Well, backups are always about using old data in case of recovery as > > far as I can see. Using an application that saves important data only > > on shutdown in a mission critical environment is very dangerous > > regardless of the backup soloution. > > > > Well, I guess the question is whether you relaunch from a deterministic > starting point, or whether your starting point is arbitrary. You are > arguing > along the line that one shouldn't be afraid about discretionary starting > points, as anecdotal knowledge suggests that it will usually work anyhow. > Alan is arguing that customers would typically not want to bet on > arbitraryness and we shouldn't paper the risks doing so but clearly > articulate > it. Either you have application support/awareness for live backups, or the > result by definition *is* arbitrary - unless you can guarantee a well > defined > transactional state (as viewed by an aplication) which we currently lack > file > system or more generally operating system support for. > Your articulation of the English language is quite exquisite :-) > > > I can jump up and down and stamp my feet, claiming that the application > is > > > broken, but that doesn't make it so. > > I fully support Alan's view. Whilst the application may not be "broken", it is most definitely unsuitable for use within an Enterprise. The robustness of an application, and its ability to recover from unexpected system errors without *any* data loss (Power outages etc.) is of paramount importance. I believe that several DB systems offer direct/raw I/O to avoid Linux cache problems, and that journaling filesystems, although by default only journal meta-data, offer mount options to journal data too. This of course comes at a performance price, though Hans Reiser did claim that the new Resier4 FS will journal data without the previous performance penalties. Mark -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Ingo Adlung wrote: > Whether the file system is consistent in itself after a restart is > irrelevant form an application perspective if the application has e.g. > state that is independent from the file system content. You can only > capture that by application collaboration or by forcing that state to > be hardened on persistent storage, hence shutting down the application > prior to backup/archive. True, but this is a general restricion of live backups (e.g. file level backup). Not specific to full volume snapshot backup. cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
David Boyes wrote: > As others have described, just dumping the data actually physically on > the disk doesn't always provde a consistent backup that is useful for > restoring the system. You *must* coordinate what happens inside the > guest with something outside the guest -- regardless of what that > outside system is -- to get something usable. I am well aware that there are other opinions. I am just trying to explain why those are wrong. > LVM snapshot is neat and cool, but it doesn't help in this case if the > outside system doesn't know it's happening or what data can be > considered "stable" to back up. LVM also does'nt know about filesystem interna, it just grabs a copy of the entire volume at a given point in time. Like flashcopy. Also, in Linux layering, LVM is "behind" the page cache (where the physical disk is) just like the flashcopy mechanism. Therefore, dm-snapshot and flashcopy are two sides of the same medal once the entire filesystem is on a single dasd. > Given how quickly this can change in > most real production systems, I don't have time or spare cycles to try > to second-guess this, or make excuses when I miss backing up something > important because someone didn't tell me that a change in data location > was made. The point is, that data is considered stable at any time. That's a basic assumption which is true for ext3 and most applications. If you run a file system or an application that does have inconsistent data from time to time, you are in trouble in case of a power outage or system crash. I hope this is not the case in any production environment. >> z/OS does not need to know what Linux is doing when the setup ensures >> consistent on-disk data at all times. > Clearly, from the live example that started this discussion, this is not > the case. I agree, it does'nt work in the current live example with the current setup. But the list is suggesting to "never do flashcopy backups from outside a running linux guest". This suggestion is wrong. > You are talking about something that happens INSIDE the Linux guest, > coordinating things on the Linux side to produce a copy of the data that > z/OS can dump in a consistent manner. Given that, then z/OS darn well > *better* know that the Linux system has done this, or the data you are > backing up is demonstratably crap. It can be demonstratably crap, or a reliably usable backup. If done proper, one can be sure to get the second one. regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Linux on 390 Port wrote on 25.07.2006 11:23:02: > Alan Altmark wrote: > > But, Carsten, the application may start up just fine, however it may be > > using old data. I have an application running on my workstation right now > > that saves its configuration data only when you shut it down (working as > > designed according to the vendor). Since the application is only > > terminated when the system is shut down, a live backup of the disk would > > have no effect. I mean, it would restore and run just fine, but be > > running with old data. > Well, backups are always about using old data in case of recovery as > far as I can see. Using an application that saves important data only > on shutdown in a mission critical environment is very dangerous > regardless of the backup soloution. > Well, I guess the question is whether you relaunch from a deterministic starting point, or whether your starting point is arbitrary. You are arguing along the line that one shouldn't be afraid about discretionary starting points, as anecdotal knowledge suggests that it will usually work anyhow. Alan is arguing that customers would typically not want to bet on arbitraryness and we shouldn't paper the risks doing so but clearly articulate it. Either you have application support/awareness for live backups, or the result by definition *is* arbitrary - unless you can guarantee a well defined transactional state (as viewed by an aplication) which we currently lack file system or more generally operating system support for. > > I can jump up and down and stamp my feet, claiming that the application is > > broken, but that doesn't make it so. I fully support Alan's view. > What application (Server Application on Linux) acts like you claim? > > regards, > Carsten > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 Ingo -- Ingo Adlung, STSM, System z Linux and Virtualization Architecture mail: [EMAIL PROTECTED] - phone: +49-7031-16-4263 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Linux on 390 Port wrote on 25.07.2006 11:10:04: > > Carsten Otte wrote: > >> Wrong. Due to caching, as correctly described by David Boyes, the > >> system may change on-disk content even when the application is not > >> running. Example: the syslogd generates a "mark" every 20 minutes. > > John Summerfied wrote: > > syslogd's mark message has nothing to do with caching. > > > > According to its man page, "sync forces changed blocks to disk, updates > > the super block." > > > > If you don't believe (or trust) that, then "mount -o remount" is your > > friend. > You missed my point: From the file system perspective, a snapshot of > an ext3 is _always_ consistent. No need to do remount, sync, shutdown > of application or shutdown of the entire system. > Whether the file system is consistent in itself after a restart is irrelevant form an application perspective if the application has e.g. state that is independent from the file system content. You can only capture that by application collaboration or by forcing that state to be hardened on persistent storage, hence shutting down the application prior to backup/archive. > regards, > Carsten > > -- > For LINUX-390 subscribe / signoff / archive access instructions, > send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit > http://www.marist.edu/htbin/wlvindex?LINUX-390 Ingo -- Ingo Adlung, STSM, System z Linux and Virtualization Architecture mail: [EMAIL PROTECTED] - phone: +49-7031-16-4263 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Alan Altmark wrote: > On Monday, 07/24/2006 at 06:35 ZE2, Carsten Otte <[EMAIL PROTECTED]> wrote: >>> But rather than focus on that "edge" condition, we are all, I think, > in >>> violent agreement that you cannot take a volume-by-volume physical > backup >>> from outside a running Linux system and expect to have a usable > backup. >> That is a wrong assumption, I clearly disagree with it. If planned >> proper, and I agree that there are lots of things one can do wrong >> when planning the setup, physical backup of mounted and actively used >> volumes _is_ reliable. > > But you are making assumptions about the applications, something I am not > willing to do quite yet. If a database update requires a change to the > data file, the index file, and the log file, how do you (from the outside) > know that all changes have been made and that it is safe to copy them? And > that another transaction has not started? As for the first part of the question: doing "sync" after the update ensures that everything relevant has been flushed out to disk proper. If another transaction has been started, fine. I expect the database to be capable of rolling back the transaction after restore. That brings things to the same situation as if I was doing the backup before the transaction. > From my days as a database application developer, a the transaction > journal was meant to be replayed against a copy of the database as it > existed at when the database was started, not replayed against a more > current snapshot. I.e. today's log is replayed against last night's > backup. And the transaction log is specifically NOT placed on the same > device as the data itself. In Linux terms, I guess that means don't place > it in the same filesystem since that's the smallest consistent unit of > data, right? If you lose the data device, you haven't lost a whole day's > worth of transactions. (Maybe database technology no longer requires such > precautions?) When using snapshots for backup purposes, you would obviously need a snapshot of both journal and data at the same time. Therefore you either need to use dm-snapshot if you have data and log on different devices, or you need to put both on the same device if you want to use flashcopy. > So I'll admit that I'm obviously not "getting it". If you would summarize > the steps needed to allow a reliable, usable, uncoordinated live backup of > Linux volumes, I for one would sincerely appreciate it. How do you > integrate them into your server? How do you automate the process? Right > now I'm a fan of SIGNAL SHUTDOWN, FLASHCOPY, XAUTOLOG, but that's just > me... Please don't get upset, I am doing my best to explain the situation. You need: - the capability of getting a consistent snapshot of all data relevant to a) the file system _and_ b) the application. If the file system or the data set relevant to the application spans multiple volumes, you need the capability to snapshot all volumes at the very same time. The easy way to fullfill this requirement is to use just a single file system - which can span multiple physical disks in case of dm-snapshot. - an application that has consistent on-disk data at all times (which is a basic requirement to any server application) - a file system that has consistent on-disk data at all times (such is ext3) Now you can: - take a snapshot backup at any time while the server is doing regular disk-IO - pull the plug (crash) - copy the data back to the original disk - start the snapshot copy of the server and let the file system replay its journal, then start the application again cheers, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Alan Altmark wrote: > But, Carsten, the application may start up just fine, however it may be > using old data. I have an application running on my workstation right now > that saves its configuration data only when you shut it down (working as > designed according to the vendor). Since the application is only > terminated when the system is shut down, a live backup of the disk would > have no effect. I mean, it would restore and run just fine, but be > running with old data. Well, backups are always about using old data in case of recovery as far as I can see. Using an application that saves important data only on shutdown in a mission critical environment is very dangerous regardless of the backup soloution. > I can jump up and down and stamp my feet, claiming that the application is > broken, but that doesn't make it so. What application (Server Application on Linux) acts like you claim? regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> Carsten Otte wrote: >> Wrong. Due to caching, as correctly described by David Boyes, the >> system may change on-disk content even when the application is not >> running. Example: the syslogd generates a "mark" every 20 minutes. John Summerfied wrote: > syslogd's mark message has nothing to do with caching. > > According to its man page, "sync forces changed blocks to disk, updates > the super block." > > If you don't believe (or trust) that, then "mount -o remount" is your > friend. You missed my point: From the file system perspective, a snapshot of an ext3 is _always_ consistent. No need to do remount, sync, shutdown of application or shutdown of the entire system. regards, Carsten -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Dominic Coulombe wrote: I'm sorry, but I don't get your point. "I don't mind losing data on the system filesystems as we are only interested in the database stuff." On 24-Jul-2006, at 19:19, John Summerfied wrote: so you don't care that it doesn't actually work! It might be that in your circumstances what you do is fine, because the stuff that does not work is not important to you. I'd need to consider the database stuff more carefully before agreeing that it really does work; there's too much I don't know. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
I'm sorry, but I don't get your point. On 24-Jul-2006, at 19:19, John Summerfied wrote: so you don't care that it doesn't actually work! -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Dominic Coulombe wrote: On 7/24/06, David Boyes <[EMAIL PROTECTED]> wrote: One more time: Unless your Linux systems *are completely down* at the time of backup, full volume dumps from outside the Linux system are more than likely to be useless. Can you explain why is that ? To avoid the nitpickers, let's say that David means all filesystems must be flushed and ro. As I understand it, journalling (by default) logs metadata (dirctory info) but not data. If you create a file, that's journalled. If you extend a file, that's journalled. The data you write to the file are not. Let's say that you create a file, write 4K to it, close it. Let's say you do a backup of the volume externally while the 4K data remains unwritten. Note: read in "man 2 close" "A successful close does not guarantee that the data has been successfully saved to disk." So now you have journalled (or comitted) metadata that says the file's got 4K of data in it. But, it hasn't. In the ordinary course of events, the data gets written to disk ans all is well. The same sort of thing happens when a file's updated in place, as I expect databases commonly are. If a database product says its backup program works with active databases, I expect it does, but I'd never trust an external program, let alone an external system, to backup my database, unless the database is down. I never experimented such failure after doing live backup of journaled filesystems. Have you looked for a failure? I think it more likely you've had a failure that you didn't notice than that you didn't have a failure. I've never noticed a problem with losing data due to a power failure (except when it took the hardware with it!), but I'm not so foolish as to assume that I've had no file corruption. It is like brute forcing a shutdown by logging off the VM machine : not ideal, but not supposed to break your Linux machine. It is the reason to use journaled filesystems. Thanks. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Depending on the model of the library, there might be a Linux version of the STK control software. All you need it to do is let you tell the library to put volume A in drive B, and make drive B available to the VM system. Bacula doesn't need more brains than that. If you can't get the z/OS side to let go of the drive, then for now you'd have to use the NFS trick we documented elsewhere. Once the mainline Bacula code supports volume-level migration as a non-experimental feature, it'd probably be worth porting the Bacula storage daemon to USS and writing the tape interface routines to let it use z/OS-based tape. That'd be a killer use for a hipersocket...hmm. Anyone got a C++ compiler on their z/OS box and want to collaborate a bit? David Boyes Sine Nomine Associates > -Original Message- > From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Jon > Brock > Sent: Monday, July 24, 2006 4:46 PM > To: LINUX-390@VM.MARIST.EDU > Subject: Re: Bad Linux backups > > Our robotics are controlled by Storagetek's software on our z/OS system; > we do not have a VM version. As far as APIs go, I have no idea. > > Jon -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Carsten Otte wrote: But Dominic, it has nothing to do with a journaled file system. The fact that you stopped the application and sync'd the file system (equivalent to unmounting it) is what makes it work, not the file system implementation. Wrong. Due to caching, as correctly described by David Boyes, the system may change on-disk content even when the application is not running. Example: the syslogd generates a "mark" every 20 minutes. syslogd's mark message has nothing to do with caching. According to its man page, "sync forces changed blocks to disk, updates the super block." If you don't believe (or trust) that, then "mount -o remount" is your friend. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> It shouldn't be hard to create a tape (or something) that IPLs and > restores. Should it? Sure -- you *could* create such an animal. On the other hand, there's a perfectly good one shipped with VM -- the IPLable DDR utility. DDR also handles z/OS, VSE, Linux, TPF, and pretty much any other stream of bits you can put on a DASD, and once you get a one-pack VM system up on the bare metal, you can do as many parallel restore streams as you have tape drives, restoring anything and everything that goes on the disks, regardless of source or creator. The point of effective DR restore is to get back on the air as quickly as possible, preferably to have one really effective tool that works for all the data you have, and you don't have to confuse your operators with different instructions for different types of data -- it all works the same way. There's already enough chaos in DR; no need to introduce any more. People use ADRDSSU or FDR for the same reason -- the tools are capable of doing the image dump and restore regardless of content, and it's a question of what you're most familiar with or already have procedures to deal with. People buy 3rd party gadgets like FDR or CA's VM:Backup-Hydro because they're more efficient or easier to use than the IBM utilities, but the purpose is the same: get the bits back on the disks as fast as possible. Once you have the basic snapshot laid down from the image backup, you can do file-level restores quickly to bring a guest up to the most recent date. The IBM utilities (DDR and ADRDSSU) do the job for the image backup part; the file level part is the thing that isn't widely deployed yet. Question for the list: if I were to put together a Bacula appliance image similar to the SSLSERV Enabler, would people contribute a small amount ($500-1K) to get it, or consider buying support for it? It'd take a week or so to get it right, and I don't want to waste the time if nobody would want it. David Boyes Sine Nomine Associates -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Rob van der Heij wrote: On 7/24/06, Adam Thornton <[EMAIL PROTECTED]> wrote: It strikes me that file-level backups are generally a lot easier to work with, and use less archival media. File level backup is great for "oops backup" when you erased a few files and want them back. I am not sure whether you ever tried to restore the entire server from file level backups when you lost the disk. Typically you will need to re-install a new system and then restore your backups on top of that. _I_ expect to boot a recovery system (on Intel it would be a bootable CD, but on Zeds I imagine I'd have a small system ready), repartition as my backup suggests. mount and copy - untar or whatever. I expect some application-specific work, but I don't see a good way round that (without other penalties). Whether a file-level backup is quicker than volume-level, like so much else, depends. dd (for example) minimises head movement, tar (for example) backs up only files actually mentioned in the directories. dump combines the two, but still has problems wirh files (maybe filesystems) that can be written. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Carsten Otte wrote: As Alan said, it's a question of one system knowing what's happening on the other system. There's no way that z/OS is going to be able to know that the Linux system is "safe" (without some kind of automation on BOTH systems to be able to signal same) so dumps taken from outside the Linux system (even with hardware features like flashcopy) are going to be inconsistent. z/OS does not need to know what Linux is doing when the setup ensures consistent on-disk data at all times. It is, I think, time for a bakeoff (http://www.isi.edu/in-notes/rfc1025.txt), Team A, lead by Carsten, contructs and implements a backup strategy that runs on z/OS and creates and restores safely, backups of Linux systems created by Team B, Team B, lead by David Boyes, will construct a Linux system and workload that Carsten cannot backup and restore. I nominate David Boies (http://en.wikipedia.org/wiki/David_Boies) to head the rules committee.. Iterations are allowed: I'd not expect Carsten to win in the first round. Nominations for someone to handle the betting? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
David Boyes wrote: Why is everyone so hung up on volume backups? It strikes me that file-level backups are generally a lot easier to work with, and use less archival media. Restore time in DR situations. Volume-level backups are a LOT faster to restore, and you don't have to configure anything special -- you restore all the data to disk using the same tools, regardless of source. It shouldn't be hard to create a tape (or something) that IPLs and restores. Should it? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Post, Mark K wrote: For one thing, full-volume backups preserve partition information, making recovery much simpler. If I had to recover a hundred Linux My backup script does this: sfdisk >/etc/disktab -d /dev/hda I _could_ copy it separately to a separate repository of this info, and I could handle similar info (eg filesystem labels, fstab) in a like manner: I have the info I need, and it doesn't make sense for _me_ to do anything more elaborate. When I initiated my plan, it included making a bootable DVD from which to restore. The last bit's not done, but the info I need is there and there are bootable systems _I_ can use (eg Knoppix) to use to do a manual restore. systems, dig through the system documentation to figure out which partitions were what size, and belonged to these particular file systems or were LVM PVs (or md volumes), a lot of time could go by before we even started restoring data from tape. From my perspective, if we could fix just that part of the equation with some kind of automation/tool, then file level backups would be the only thing needed (aside from database-specific requirements/tools). Mark Post -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Adam Thornton Sent: Monday, July 24, 2006 2:48 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups -snip- Why is everyone so hung up on volume backups? It strikes me that file-level backups are generally a lot easier to work with, and use less archival media. One has to think harder to get it right, and it's less obvious volume-level backups are risky. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
James Melin wrote: This is more of a 'how do YOU do it ancillary question'... Obviously to get a decent system backup from within Linux you should be in single user mode, or even quiesced completely (if you're doing CDL volume backups, for instance). What are people doing to get a given image into single-user mode (or shut down) and then restarted in an automated way? Just curious because as always, there's 10 ways to do something that achieve the same goal, and comparing the various methods/philosophies might be of use to some. I regularly backup a Linux server running on Intel. I've decided some files (eg logs) aren't that important: if they were, I'd log to another machine. I used to rsync from one box to another (over ADSL) but that was proved way too slow, whatever the rsync folk said. Now, I create an ext2 filesystem image: dd if=/dev/zero of=${Image} count=0 bs=1024 \ seek=$((7*1024*1024)) mke2fs -Fq ${Image} I mount it, populate it with tar: find /var -xdev -type p -o -type s >${excludes} tar clC / --exclude=backup.img --exclude=/tmp --exclude=/mnt\/* --exclude=/var/lock --exclude=swapfil\* \ --exclude=/var/autofs --exclude=lost+found --exclude=/var/tmp --exclude=/var/local --exclude=squid-cache \ --exclude=/var/spool/cyrus/mail-backup \ --exclude-from=${excludes} \ / /boot /home /var \ | buffer -m $((2*1024*1024)) -p 75\ | tar xpC /mnt/backup || { df -h ; exit ; } I want the files compressed and on an ISO filesystem, so: rm ${Image} mkzftree --one-filesystem /mnt/backup/ /var/tmp/backup umount /mnt/backup/ mkisofs -R -z -quiet -nobak \ -o /var/tmp/backup-${HOSTNAME}.iso /var/tmp/backup I then have an image which I can burn to DVD (if it still fits!), or "mount -o loop" and the Linux kernel decompresses the files. I use tar because it has adequate file filtering capability; mkzftree has none:-(( (and that's why I unlink the backup where I do). I'd exclude databases & c here and make separate arrangements for them. Once I have the image, I rsync it to images in two other locations, one local and one off-site. Using rsync to replicate the directory structure took hours, days if it got a bit behind (and took an enormous amount of virtual storage, fortunately without inducing swapping), whereas the image takes about an hour to sync. -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Dominic Coulombe wrote: On 7/24/06, Rob van der Heij <[EMAIL PROTECTED]> wrote: Stop dreaming. Not even in theory - at least not my theory. Hi, I'm sorry, but we managed to do live backup of our systems without any problem. We restored a lot of backup and all were recoverable without any problem. Even when data was stored on LVM volumes. We stop our databases prior to do the backup, sync the filesystems, do a flashcopy, then restart everything. As the databases are down, I don't see why we would lose data on those filesystems. I don't mind losing data on the system filesystems as we are only interested in the database stuff. so you don't care that it doesn't actually work! -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Stahr, Lea wrote: I run SuSE SLES 8 under ZVM 5.1 in an IFL. The DASD are in a SAN that is also accessed by ZOS. Backups are taken by ZOS using FDR full volume copies on Saturday morning (low usage). When I restore a backup, it will not boot. The backup and the restore have the same byte counts. Linux support at MainLine Systems tells me that he has seen this before at other customers. What is everyone using for Linux under ZVM backups? HELP! My backups are no good! What do the FDR suppliers say? -- Cheers John -- spambait [EMAIL PROTECTED] [EMAIL PROTECTED] Tourist pics http://portgeographe.environmentaldisasters.cds.merseine.nu/ do not reply off-list -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On 7/24/06, Jeremy Warren <[EMAIL PROTECTED]> wrote: After reading the previous post though, does anyone know if that method would correctly configure the boot sector. The bootstrap uses a list of block numbers for the kernel, initrd etc. When you restore a physical backup all these files go into their old location so the bootstrap will still work. A file-level restore that puts the kernel and initrd somewhere else on disk will leave the bootstrap incomplete, but that's a moot point because it will not restore the bootstrap itself either. Running "zipl" after the restore should take care of those. Rob -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
FWIW FDR's Upstream DR Recovery comes with a script you can use to automate a good chunk of the "due dillignce" aspect of this. It's not a panacea mind you but throw it in cron as shipped (unless running lvm then you need to uncomment some stuff) and at least you know you have all of the info you need to rebuild the box. If I had a few hours that I didn't know what to do with myself I always thought it could be tweaked to produce a script that did the recovery itself, rather than just a bunch of reports... After reading the previous post though, does anyone know if that method would correctly configure the boot sector. Right now we build an empty system from the reader images, Install FDR DR Recovery Tool Restore. In the previous post it sounds like we could skip that first step as long as all of the filesystems were correctly laid out beneath the rescue box? TIA jrw Rob van der Heij <[EMAIL PROTECTED]> Sent by: Linux on 390 Port 07/24/2006 04:23 PM Please respond to Linux on 390 Port To LINUX-390@VM.MARIST.EDU cc Subject Re: [LINUX-390] Bad Linux backups On 7/24/06, Stahr, Lea <[EMAIL PROTECTED]> wrote: > These are standard image systems that I can clone from a master and have > in production in 2 hours. But what if its not standard? Then I have > customizations that are lost. Such an approach does require discipline to properly register what you have modified and to assure the copy of that customized file is held somewhere. You know what files the clone should have, if you also have a list of what you consider variable data or stuff that otherwise does not need to be backed up, the difference is what should have been registered as customization. You can use the check either to correct your registration or to educate your colleagues. Bonus points for when you can enhance the cloning process to also re-apply these customization things. If you keep the copy of the customized files in a handy way (e.g. an NSF server) you could get a mechanism for applying changes with it. You might have a look at cfEngine. -- Rob -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
Our robotics are controlled by Storagetek's software on our z/OS system; we do not have a VM version. As far as APIs go, I have no idea. Jon What do you currently use to control your tape robotics? We did a demo of how to use a client-server program to make Bacula think it could drive your tape robotics. The freely available demo is just for a manual operator-driven back end, but if your library has a CMS-manipulatable interface then it's pretty easy to write a new back end that drives it. If the robot has any sort of API, then what you need is a server on the side that drives the robot, and a client on the Linux side that makes requesting tapes look to Bacula like its mtx-changer output. It's really pretty easy. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Jul 24, 2006, at 1:35 PM, David Boyes wrote: Such an approach does require discipline to properly register what you have modified and to assure the copy of that customized file is held somewhere. Tripwire is a handy tool for this. Run it every night and have it generate a list of changes in diff format. You can then turn that diff into input to patch and deploy it as a .deb or .rpm. Or, if you're feeling REALLY parsimonious, you can use Bacula in its "verify" mode to do the same thing. Adam -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
> Such an approach does require discipline to properly register what you > have modified and to assure the copy of that customized file is held > somewhere. Tripwire is a handy tool for this. Run it every night and have it generate a list of changes in diff format. You can then turn that diff into input to patch and deploy it as a .deb or .rpm. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On 7/24/06, Stahr, Lea <[EMAIL PROTECTED]> wrote: These are standard image systems that I can clone from a master and have in production in 2 hours. But what if its not standard? Then I have customizations that are lost. Such an approach does require discipline to properly register what you have modified and to assure the copy of that customized file is held somewhere. You know what files the clone should have, if you also have a list of what you consider variable data or stuff that otherwise does not need to be backed up, the difference is what should have been registered as customization. You can use the check either to correct your registration or to educate your colleagues. Bonus points for when you can enhance the cloning process to also re-apply these customization things. If you keep the copy of the customized files in a handy way (e.g. an NSF server) you could get a mechanism for applying changes with it. You might have a look at cfEngine. -- Rob -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
On Jul 24, 2006, at 12:47 PM, Rob van der Heij wrote: On 7/24/06, Adam Thornton <[EMAIL PROTECTED]> wrote: It strikes me that file-level backups are generally a lot easier to work with, and use less archival media. File level backup is great for "oops backup" when you erased a few files and want them back. I am not sure whether you ever tried to restore the entire server from file level backups when you lost the disk. Typically you will need to re-install a new system and then restore your backups on top of that. Think about how that works for many servers at the same time (because it probably must be a major problem if you actually lost DASD). It's not that bad. You should have a rescue system--which it *IS* a good idea to do volume backups of, and easy too because it's almost never running. This system has authority to link EVERYONE's disks. You bring it up. You attach disks in batches of however many you're comfortable with (I've only ever done it with one client at a time, but you certainly could do more). You format and then mount those disks in the right layout relative to the mount point. You do the restore of your files from the rescue system into the mounted filesystems. Then you do a chroot, run zipl (zipl -b would also work, I guess), unmount the file systems, detach the disks, and do the next batch. No, you don't want to try a restore onto the same devices that are actually RUNNING the system you're restoring on to. But the nice thing about VM is that it makes not doing that much, much easier than it is on discrete systems. Bacula also supports a Bootstrap Record feature, but this has not been extended to work with s390. The idea there is that you get a minimal system (on CD-ROM, as it stands) which has just enough smarts to find your disks, ask for your Bacula server, and then request the appropriate restore for that client (you have one bsr per client). This would be neat to port. Adam -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
FW: Bad Linux backups
Comments: What we have down is; 1. MANUAL JOB- SHUTDOWN LINUX 4 z/VM server-AFT 7pm. 2. DDR specific volumes thru an automated exec process. 3. MANUAL JOB- STARTLINUX 4 z/VM server- AFT 7pm. We are halting to shut down the Linux servers thru the The Linux Web interface. We then perform; DDR of the volumes. This is a 99.9% non violent approach! Down time: approx 1 hour per LINUX server. Start: XAUTOLOG the Linux 4 z/VM server back up; works out just fine. Yes availability, 100% up-time; but this is our trade off. Our experience for backing up the file system from the Network sever side, has had many issues, but our Staff keeps trying! P.S. No flashcopy Lic for VM at this time. -Original Message- On Behalf Of Alan Altmark But rather than focus on that "edge" condition, we are all, I think, in violent agreement that you cannot take a volume-by-volume physical backup from outside a running Linux system and expect to have a usable backup. Shared dasd on System z has all the same issues that shared LUNs have on distributed systems. The backup *strategies* are identical, even if the mechanisms used to create the backups are not. -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390
Re: Bad Linux backups
These are standard image systems that I can clone from a master and have in production in 2 hours. But what if its not standard? Then I have customizations that are lost. Lea Stahr Sr. System Administrator Linux/Unix Team 630-753-5445 [EMAIL PROTECTED] -Original Message- From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of Rob van der Heij Sent: Monday, July 24, 2006 2:47 PM To: LINUX-390@VM.MARIST.EDU Subject: Re: Bad Linux backups On 7/24/06, Adam Thornton <[EMAIL PROTECTED]> wrote: > It strikes me that file-level backups are generally a lot easier to > work with, and use less archival media. File level backup is great for "oops backup" when you erased a few files and want them back. I am not sure whether you ever tried to restore the entire server from file level backups when you lost the disk. Typically you will need to re-install a new system and then restore your backups on top of that. Think about how that works for many servers at the same time (because it probably must be a major problem if you actually lost DASD). I have been involved in several attempts to recover a system from file level backup, but none worked like planned. Last one I remember we found TSM trying to restore the upgraded glibc over the vanilla install of SuSE. Once you start looking at it, you will find that many servers don't really have data that you need to backup. You might be better off with some tooling to quickly create a fresh server and some structure to manage any customizing you do on top of that. Which eventually leaves the servers that actually hold business data in some application, and you can look at the best way to deal with those applications. Rob -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390