Re: Long Term Data Retention - off topic
Hi David, A few years ago I was working for a state health department that had similar sorts of retention issues and was about to retire their main patient admin system as they moved to a new one. In this case, even keeping existing data was not sufficient because different rules applied to different data. Some of it was supposed to be kept literally forever so that historians could get at it, some was required for 80 years so that epidemiological studies could be made, and some had retention lengths that depended on the life of the patient. In opposition to that, privacy legislation required that some data be deleted when there was no longer an operational need for it. After convincing them that TSM was not an appropriate vehicle, using a reducio ad absurdum argument, I researched a little further. The best method for long term data retention is probably flat XML files. These are well understood and self describing, require no specialist software to read, yet can be searched by machine when this is necessary, There are a number of specialized XML dialects developed for different purposes so a complete re-invention of the wheel is not necessary. I did not persue this to completion. It turned out that there was a section in the organization whose primary job was data retention : mostly paper based, but recognizably moving into data - just think of all those word documents and spreadsheets that also are subject to legal retention requirements, and the problem was passed to them. It did occur to me that there is a business opportunity for consulting on such problems. Just understanding the web of retention standards, which tend to refer to other standards nested three or four levels deep is a huge job, then applying those standards to the data at hand is another in order to write some code to produce the final XML. It would however take the sort of analytical accountant/actuary mindset to successfully do this and that is not my style. I hope that has given you some insight Regards Steve Steven Harris TSM Admin, Sydney Australia David Longo <[EMAIL PROTECTED] TH-FIRST.ORG> To Sent by: "ADSM: ADSM-L@VM.MARIST.EDU Dist Stor cc Manager" <[EMAIL PROTECTED] Subject .EDU> [ADSM-L] Long Term Data Retention 17/05/2008 01:35 AM Please respond to "ADSM: Dist Stor Manager" <[EMAIL PROTECTED] .EDU> Wanted to get some thoughts on what people are doing for Long Term Data Retention - specifically on obsolete applications. Say we have an NT 4.0 system that is no longer used. Business owner says we need to keep for 25 years. I know not practical/possible for a number of reasons. Even if we Vmware it, will they support NT 4.0 for 25 years? (Will ANYBODY support Windows 2008 in 25 years?) I know even if they take a DB dump and I Archive it for 25 years, if we retrieve the file 20 years from now, who can decipher it? There are several systems here that people are giving hints that they want to do this. I have hinted that they need to take whatever data and dump it to a text or pdf file and then I archive that. I realize that this may not be that simple for some applications as probably involves more than a simple data dump or whatever. Plus some applications are spread across multiple servers. So, before we have big meeting and I push the text or pdf file idea, what are people doing for retention of data on obsolete servers/applications? Thanks, David Longo # This message is for the named person's use only. It may contain private, proprietary, or legally privileged information. No privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it, and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Health First reserves the right to monitor all e-mail communications through its networks. Any views or opinions expressed in this message are solely those of the individual sender, except (1) where the message states such views or opinions are on behalf of a particular entity; and (2) the sender is authorized by the entity to give such views or opinions. #
Long Term Data Retention
Wanted to get some thoughts on what people are doing for Long Term Data Retention - specifically on obsolete applications. Say we have an NT 4.0 system that is no longer used. Business owner says we need to keep for 25 years. I know not practical/possible for a number of reasons. Even if we Vmware it, will they support NT 4.0 for 25 years? (Will ANYBODY support Windows 2008 in 25 years?) I know even if they take a DB dump and I Archive it for 25 years, if we retrieve the file 20 years from now, who can decipher it? There are several systems here that people are giving hints that they want to do this. I have hinted that they need to take whatever data and dump it to a text or pdf file and then I archive that. I realize that this may not be that simple for some applications as probably involves more than a simple data dump or whatever. Plus some applications are spread across multiple servers. So, before we have big meeting and I push the text or pdf file idea, what are people doing for retention of data on obsolete servers/applications? Thanks, David Longo # This message is for the named person's use only. It may contain private, proprietary, or legally privileged information. No privilege is waived or lost by any mistransmission. If you receive this message in error, please immediately delete it and all copies of it from your system, destroy any hard copies of it, and notify the sender. You must not, directly or indirectly, use, disclose, distribute, print, or copy any part of this message if you are not the intended recipient. Health First reserves the right to monitor all e-mail communications through its networks. Any views or opinions expressed in this message are solely those of the individual sender, except (1) where the message states such views or opinions are on behalf of a particular entity; and (2) the sender is authorized by the entity to give such views or opinions. #
Re: Long term data retention for retired clients
Many thanks for the excellent responses to my question Computers are great but organics are better. John ** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy Group. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Unless specifically stated otherwise, this email (or any attachments to it) is not an offer capable of acceptance or acceptance of an offer and it does not form part of a binding contractual agreement. Scottish Hydro-Electric, Southern Electric, SWALEC, S+S and SSE Power Distribution are trading names of the Scottish and Southern Energy Group. **
Re: Long term data retention for retired clients
Biggest problem I have with using EXPORT or BACKUPSET, is that people most likely are going to ask for a partial restore. And they probably aren't going to remember exactly what the directory structure & filenames were. Two years from now, NO ONE will have any idea what was really on that EXPORT tape. And you can't hunt for it effectively unless all the data is still in the DB. So what I've done as a compromise is use SQL SELECT to pull a list of all the file names/backup dates for a retired client from the TSM DB into a flat file, then do the EXPORT and delete the filespaces. The flat file remains around and can be searched using ordinary tools to figure out what is on the EXPORT tape. Wanda Prather "I/O, I/O, It's all about I/O" -(me) -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Allen S. Rout Sent: Thursday, July 14, 2005 1:49 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Long term data retention for retired clients ==> On Thu, 14 Jul 2005 10:27:49 +0100, John Naylor <[EMAIL PROTECTED]> said: > I have consideredf various approaches > 1) Export > 2) Backup set > 3) Create a new domain for retired clients which have the long term > retention requirement > I see export and backup sets as reducing database overhead, but being less > easy to track and rather unfriendly if you just need to get a subset of > the data back Export yes, backupset no; [ see below ] > The new domain would retain great visibilty for the data and allow easy > access to subsets of the data, but you would stiil have the database > overhead. Yes, but in the long term this overhead devolves to just space. For example, if everything in the node is ACTIVE, I don't believe that it represents much of an e.g. expiration hit. So there's an incrementally (hah) larger full DB backup, and more space on disk for the DB, but not a lot of day to day DB overhead. > Does the choice just depend, on how often in reality you will need to get > the data back ? I'd say that and how frequently you want to do the retention thing, for how long. Off the top of my head, if keeping the bits and pieces around would add up to, say, a third of my [DB space / data / whatever] I'd consider doing something nearline with it. [below] Keep in mind: you can restore from a backupset stored on the server; this need only be a little less convenient than restorations from online data. Gedankenexperiment: Say you want to deal with TheNode: rename node TheNode OLD-2005-07-12-13-TheNode [ in case you ever want to use TheNode name again ] GEN BACKUPSET OLD-2005-07-12-13-TheNode Terminal devclass=foobar RET=NOLimit which usess tapes FOO1 and FOO2. Then you DEL FILESPACE OLD-2005-07-12-13-TheNode * and CHECKOUT LIBVOLUME FOOLIB FOO1 CHECKOUT LIBVOLUME FOOLIB FOO2 At this point, you've got a -permanent- record of the state of TheNode, at the cost of a few records in the node and backupset tables. Of course, you have an increased exposure to media failure: no copypools for backupsets. Anyway, to restore from it all you have to do to use it is check the tapes back in, and issue a dsmc restore backupset Terminal -loc=server [sourcespec] [destspec] This is going to be a much less efficient restore than the online one, but only in wall clock time and tape use, not in human skull sweat. Plus, if someone gets crotchety about the archive, you can hand them the checked out tapes and tell them to get their own LTO3. (heh) - Allen S. Rout
Re: Long term data retention for retired clients
==> On Thu, 14 Jul 2005 08:45:11 -0400, Richard Rhodes <[EMAIL PROTECTED]> said: > If there was one thing I really wish in all this was a comments field. The > only place we found to put comments about a node is in the contacts field. > I wish there was another field where we could enter comments. AMEN, brother. Preach it! > I am interested in how others handle this also. Heh, I've been thinking about a white paper on just the topic: "What they left out, and what I did about it". I may just write it. The short version is: I built an XML 'application' (dialect) to hold a bunch of data about my servers, schedules, domains, storage pools and nodes. I generate all of my automation scripts by distilling that one big (dang, it's 50K now) file, and get out of it all the normal maintainance scripts and schedules for my 10 servers, chargeback accounting, and trending data. Oh, and my automatically-generated DR-restore-my-TSM-server shell scripts. :) Most of my needs could have been filled with a ~1K "comments" field on: nodes domains filespaces stgpools - Allen S. Rout
Re: Long term data retention for retired clients
==> On Thu, 14 Jul 2005 10:27:49 +0100, John Naylor <[EMAIL PROTECTED]> said: > I have consideredf various approaches > 1) Export > 2) Backup set > 3) Create a new domain for retired clients which have the long term > retention requirement > I see export and backup sets as reducing database overhead, but being less > easy to track and rather unfriendly if you just need to get a subset of > the data back Export yes, backupset no; [ see below ] > The new domain would retain great visibilty for the data and allow easy > access to subsets of the data, but you would stiil have the database > overhead. Yes, but in the long term this overhead devolves to just space. For example, if everything in the node is ACTIVE, I don't believe that it represents much of an e.g. expiration hit. So there's an incrementally (hah) larger full DB backup, and more space on disk for the DB, but not a lot of day to day DB overhead. > Does the choice just depend, on how often in reality you will need to get > the data back ? I'd say that and how frequently you want to do the retention thing, for how long. Off the top of my head, if keeping the bits and pieces around would add up to, say, a third of my [DB space / data / whatever] I'd consider doing something nearline with it. [below] Keep in mind: you can restore from a backupset stored on the server; this need only be a little less convenient than restorations from online data. Gedankenexperiment: Say you want to deal with TheNode: rename node TheNode OLD-2005-07-12-13-TheNode [ in case you ever want to use TheNode name again ] GEN BACKUPSET OLD-2005-07-12-13-TheNode Terminal devclass=foobar RET=NOLimit which usess tapes FOO1 and FOO2. Then you DEL FILESPACE OLD-2005-07-12-13-TheNode * and CHECKOUT LIBVOLUME FOOLIB FOO1 CHECKOUT LIBVOLUME FOOLIB FOO2 At this point, you've got a -permanent- record of the state of TheNode, at the cost of a few records in the node and backupset tables. Of course, you have an increased exposure to media failure: no copypools for backupsets. Anyway, to restore from it all you have to do to use it is check the tapes back in, and issue a dsmc restore backupset Terminal -loc=server [sourcespec] [destspec] This is going to be a much less efficient restore than the online one, but only in wall clock time and tape use, not in human skull sweat. Plus, if someone gets crotchety about the archive, you can hand them the checked out tapes and tell them to get their own LTO3. (heh) - Allen S. Rout
Re: Long term data retention for retired clients
Remember - moving the nodes to a new domain does nothing to rebind the data to new management classes - that only happens during an actual backup. So the data will stick around for as long as intended, except for the last active version of the files. Those you have to delete manually. One practice I've done is to rename the node, putting the date when it can be finally deleted to the beginning of the name, hence "NODENAME" becomes "051231_NODENAME". Moving this node to a new domain, with a descriptive name such as RETIRED, you can now easily see what nodes are able to be deleted, and when. The simple version: "query node domain=retired" or the more complex: "select node_name from nodes where domain='RETIRED' order by node_name" Nick Cassimatis email: [EMAIL PROTECTED]
Re: Long term data retention for retired clients
> If there was one thing I really wish in all this was a comments field. The > only place we found to put comments about a node is in the contacts field. > I wish there was another field where we could enter comments. The 'define machine' and 'insert machine' commands can be used to store large amounts of text data about nodes. I am not sure about the licensing requirements for these commands; they may be part of the Disaster Recovery Manager feature.
Re: Long term data retention for retired clients
On Jul 14, 2005, at 8:45 AM, Richard Rhodes wrote: If there was one thing I really wish in all this was a comments field. The only place we found to put comments about a node is in the contacts field. I wish there was another field where we could enter comments. ... Richard - With a "decommissioned" node, you could add 200 chars of annotation in the node "URL" field, which accepts arbitrary text. Richard Sims
Re: Long term data retention for retired clients
We have hundreds of retired servers, both from a server consolidation project and general server rollover. we decided we requred acces to the backups and archives on the retired servers, so exports and backup sets wouldn't work. Also, by keeping the data in the normal pools we keep redundancy (primary and copy pool) and DR issues covered. We thought about moving the nodes into their own domain, but decided not to because most domains use default management classes and we weren't sure how to keep the policies straight in one domain holding retired servers from many domains. We thought about a separate retired domain for each production domain, and rejected it. Finally we decided to simply rename the nodes. All retired nodes are given a prefix of "zzrt_". For example, node "someserver" becomes retired node "zzrt_someserver". Normal policies are allowed to expire inactive versions. We put a comment in the contact field for when the active versions can be deleted and the node removed. This is anything but ideal . . . . but I think we found that this was true for any method. Many if not most of the sql queries we run against the tsm db have logic to exclude any nodes that are like 'zzrt_%'. I think a cleaner method would be a separate tsm server for just retired nodes . . . . . but that has obvious drawbacks also. If there was one thing I really wish in all this was a comments field. The only place we found to put comments about a node is in the contacts field. I wish there was another field where we could enter comments. I am interested in how others handle this also. Rick John Naylor <[EMAIL PROTECTED]To: ADSM-L@VM.MARIST.EDU HERN.CO.UK> cc: Sent by: "ADSM: Dist Stor Subject: Long term data retention for retired clients Manager" 07/14/2005 05:27 AM Please respond to "ADSM: Dist Stor Manager" Hi out there, Just wondering what the consensus is on the best way to retain TSM client data that has to be kept for many years (legal requirement) after the client box is retired. I have consideredf various approaches 1) Export 2) Backup set 3) Create a new domain for retired clients which have the long term retention requirement I see export and backup sets as reducing database overhead, but being less easy to track and rather unfriendly if you just need to get a subset of the data back The new domain would retain great visibilty for the data and allow easy access to subsets of the data, but you would stiil have the database overhead. Does the choice just depend, on how often in reality you will need to get the data back ? Thoughts appreciated John ** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy Group. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Unless specifically stated otherwise, this email (or any attachments to it) is not an offer capable of acceptance or acceptance of an offer and it does not form part of a binding contractual agreement. Scottish Hydro-Electric, Southern Electric, SWALEC, S+S and SSE Power Distribution are trading names of the Scottish and Southern Energy Group. ** - The information contained in this message is intended only for the personal and confidential use of the recipient(s) named above. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately, and delete the original message.
Re: Long term data retention for retired clients
On Jul 14, 2005, at 5:27 AM, John Naylor wrote: I have considered various approaches 1) Export 2) Backup set 3) Create a new domain for retired clients which have the long term retention requirement Hi, John - Another possibility for your list is TSM for Data Retention. This would be more for a site where long-term, assured retention is a big, ongoing thing as it's a non-trivial implementation. Richard Sims
Re: Long term data retention for retired clients
Further to my query, I have not mentioned archive as this is not a function that we regularly use, but would this be a good candidate for retiring clients, and can you archive a whole drive ie D: thanks, John John Naylor/HAV/SSE 14/07/2005 10:27 To "ADSM: Dist Stor Manager" cc Subject Long term data retention for retired clients Hi out there, Just wondering what the consensus is on the best way to retain TSM client data that has to be kept for many years (legal requirement) after the client box is retired. I have consideredf various approaches 1) Export 2) Backup set 3) Create a new domain for retired clients which have the long term retention requirement I see export and backup sets as reducing database overhead, but being less easy to track and rather unfriendly if you just need to get a subset of the data back The new domain would retain great visibilty for the data and allow easy access to subsets of the data, but you would stiil have the database overhead. Does the choice just depend, on how often in reality you will need to get the data back ? Thoughts appreciated John ** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy Group. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Unless specifically stated otherwise, this email (or any attachments to it) is not an offer capable of acceptance or acceptance of an offer and it does not form part of a binding contractual agreement. Scottish Hydro-Electric, Southern Electric, SWALEC, S+S and SSE Power Distribution are trading names of the Scottish and Southern Energy Group. **
Re: Long term data retention for retired clients
I'd be interested in the same info as regards NetApp client data, ie: NDMP dump backups. Iain -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of John Naylor Sent: 14 July 2005 10:28 To: ADSM-L@VM.MARIST.EDU Subject: [ADSM-L] Long term data retention for retired clients Hi out there, Just wondering what the consensus is on the best way to retain TSM client data that has to be kept for many years (legal requirement) after the client box is retired. I have consideredf various approaches 1) Export 2) Backup set 3) Create a new domain for retired clients which have the long term retention requirement I see export and backup sets as reducing database overhead, but being less easy to track and rather unfriendly if you just need to get a subset of the data back The new domain would retain great visibilty for the data and allow easy access to subsets of the data, but you would stiil have the database overhead. Does the choice just depend, on how often in reality you will need to get the data back ? Thoughts appreciated John ** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy Group. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Unless specifically stated otherwise, this email (or any attachments to it) is not an offer capable of acceptance or acceptance of an offer and it does not form part of a binding contractual agreement. Scottish Hydro-Electric, Southern Electric, SWALEC, S+S and SSE Power Distribution are trading names of the Scottish and Southern Energy Group. ** -- This e-mail, including any attached files, may contain confidential and privileged information for the sole use of the intended recipient. Any review, use, distribution, or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive information for the intended recipient), please contact the sender by reply e-mail and delete all copies of this message.
Long term data retention for retired clients
Hi out there, Just wondering what the consensus is on the best way to retain TSM client data that has to be kept for many years (legal requirement) after the client box is retired. I have consideredf various approaches 1) Export 2) Backup set 3) Create a new domain for retired clients which have the long term retention requirement I see export and backup sets as reducing database overhead, but being less easy to track and rather unfriendly if you just need to get a subset of the data back The new domain would retain great visibilty for the data and allow easy access to subsets of the data, but you would stiil have the database overhead. Does the choice just depend, on how often in reality you will need to get the data back ? Thoughts appreciated John ** The information in this E-Mail is confidential and may be legally privileged. It may not represent the views of Scottish and Southern Energy Group. It is intended solely for the addressees. Access to this E-Mail by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. Any unauthorised recipient should advise the sender immediately of the error in transmission. Unless specifically stated otherwise, this email (or any attachments to it) is not an offer capable of acceptance or acceptance of an offer and it does not form part of a binding contractual agreement. Scottish Hydro-Electric, Southern Electric, SWALEC, S+S and SSE Power Distribution are trading names of the Scottish and Southern Energy Group. **