Hi all. Whilst investigating something else, we discovered a number of nodes that have old filespaces still stored within TSM - eg:
Node Name: (node name) Filespace Name: /data Hexadecimal Filespace Name: FSID: 4 Platform: SUN SOLARIS Filespace Type: UFS Is Filespace Unicode?: No Capacity (MB): 129,733.3 Pct Util: 92.1 Last Backup Start Date/Time: 06/09/05 20:03:56 Days Since Last Backup Started: 764 Last Backup Completion Date/Time: 06/09/05 20:05:16 Days Since Last Backup Completed: 764 Last Full NAS Image Backup Completion Date/Time: Days Since Last Full NAS Image Backup Completed: Node Name: (node name) Filespace Name: /Z/oracle Hexadecimal Filespace Name: FSID: 12 Platform: SUN SOLARIS Filespace Type: UFS Is Filespace Unicode?: No Capacity (MB): 119,642.2 Pct Util: 31.5 Last Backup Start Date/Time: 08/26/05 01:03:08 Days Since Last Backup Started: 686 Last Backup Completion Date/Time: 08/26/05 01:14:01 Days Since Last Backup Completed: 686 Last Full NAS Image Backup Completion Date/Time: Days Since Last Full NAS Image Backup Completed: Node Name: (node name) Filespace Name: /mnt Hexadecimal Filespace Name: FSID: 15 Platform: SUN SOLARIS Filespace Type: UFS Is Filespace Unicode?: No Capacity (MB): 120,992.9 Pct Util: 55.8 Last Backup Start Date/Time: 01/26/06 20:05:15 Days Since Last Backup Started: 533 Last Backup Completion Date/Time: 01/26/06 20:06:34 Days Since Last Backup Completed: 533 Last Full NAS Image Backup Completion Date/Time: Days Since Last Full NAS Image Backup Completed: These are all filesystems which existed at some time in the past, but which were removed as part of an application upgrade (or system rebuild, or ...), and hence no longer exist. It seems that TSM is taking the attitude of "if I can't see the filesystem, I'll not do anything about marking files in that filesystem inactive", so the data never expires. I can understand the reasoning behind this approach, but it does mean that there's a large amount of data floating around that is no longer needed (a quick and dirty estimate says around 83 TB across primary and copy pools, although some of that needs to stay). A delete filespace will clear them up quickly, obviously, but there's a twist: how can we identify filesystems like this, short of going around to each client node and doing a df or equivalent? Searching the filespaces table gives us some 600 filespaces all up; I *know* that several of these have to stay - eg, image backups don't update the backup_end timestamp, and there are some filespaces that are backed up exclusively with image backups. At the moment, the best I can come up with is to: * use a SELECT statement on the filespaces table to get a "first cut" (select node_name, filespace_name, filespace_id from filespaces where backup_end < current_timestamp - N days); * use QUERY OCCUPANCY on each of the filespaces mentioned in the first cut; if the total occupied space is below some threshold, ignore it as not being worth the effort; * use a SELECT statement on the backups table to confirm that no backups have come through in the past N days. (select 1 from db where exists (select object_id from backups where node_name=whatever and filespace_id=whatever and state=ACTIVE_VERSION and current_timestamp < backup_date+90 days) -- I use exists to try to minimise the effort TSM needs to put into the query; I also have the active_version check in there for the same reason (if there's only inactive versions, they'll drop off the radar anyway in due course). Hopefully TSM's SQL execution is optimised to stop in this case when it finds one match rather than trying to find all matches ...) Does anybody have any better ideas? Unfortunately, because of the nature of Monash's organisation, simply having central policies saying "you must do X when shuffling filesystems around" won't cut it (and let's be honest here - how many sysadmins are likely to remember such policies, given how infrequent such moves are?) Yes, I have a call open with IBM support about this. :-) If there's sufficient interest, I can summarise their eventual response to the mailing list (so far, it's mostly been clarification of the call, and a few pointers that match with what we've already done.) Thanks, Stuart.