John, After all the talks I've given promoting the use of VTLs with TSM and other products, it's good to finally hear from someone who has been able to actually DO it in multiple environments. I concur with almost all of your comments. I do have questions about some of them, and then I have some comments about the overall VTL industry. It sounds like you've got much more real-world experience with the TSM/VTL combination than I have, so please take my comments as curiosity and/or request for confirmation, not confrontation.
The first question that I have is what size environments have you been able to implement these recommendations for? Many of them strike me as perfect for small-to-medium shops, but not easy to implement in large shops. (Our customers are some of the largest TSM shops in the world.) >The second is a Redbook from IBM about their VTL solution. There is a >chapter in it specific to how a VTL can help in a TSM environment. You do realize that both products are Falconstor underneath, right? So at this point, the only material difference between the two is the hardware that IBM/EMC puts around it. Sun uses it as well. >1) Design the solution and size the CDL so that most or all Primary >storage pools can fit on the CDL. I couldn't agree more. The chaleng that I've found is that most of our large customers have been unable to justify the cost of VTLs that are the same size as their tape libraries. The advent of de-dupe is changing all that, as a 200 TB tape library can be replaced by 10 TB of disk. Let me speak to this for a second. De-dupe ratios definitely are an area where your mileage will vary. TSM filesystem progressive incrementals will not get the same level of de-dupe as other shops that do frequent full backups, as that is where a lot of the duplicated data comes from. However, duplicated data also comes from the same files being placed in multiple places (emails, filesystems, multiple users using the same doc but putting it in multiple places, etc.). It also comes from repeated incrementals of the same file that changes just a little bit each day, such as a spreadsheet that someone updates every day. So TSM environments will still see plenty de-dupe on their filesystem backups, just not 20:1. They will also see the same de-dupe ratios as everybody else when they backup Oracle, DB2, Exchange, SQL Server, etc, as TSM does the typical full/incremental backups there. The bummer thing about de-dupe is that it's not available in most of the major OEM VTLs. I believe Sun is selling Falconstor's de-dupe, and HDS is definitely selling Diligent's Protectier. IBM hasn't let me know what they're doing yet, and EMC is still saying they're going to write their own. HP's VTL (provided by SEPATON) doesn't yet offer their de-dupe feature. NetApp's VTL doesn't yet offer de-dupe. That means that those same large shops that I'm saying should use de-dupe won't use it because it's not available from their OEM. (As I said, you're fine if you use HDS or Sun, but not if you want to buy it from EMC, HP, or IBM -- yet.) Bummer. I'm pretty bullish on de-dupe and I think it's ready for prime time as long as everything is also on tape. (Your copies you're creating for offsite DR will do.) It solves a lot of problems. It reduces acquisition cost (by a factor as much as 20:1) and reduces power and cooling cost by the same factor. And as long as everything is also on tape for DR, you've got a risk mitigation copy in case you picked the wrong de-dupe product and it completely goes toes-up on you. So, I'd say that your idea is totally implementable in large shops if they use de-dupe. Otherwise, we're talking way too much disk when you consider that most people have 10 GB on tape for every 1 GB they have on disk. >Direct the client backups directly to the virtual tapes, instead of >going to disk storage pool. Again, I think this will work fine in many environments. Our large TSM customers have hundreds (or thousands) of clients backing up simultaneously to their disk pools. You can't define 500 or 1000 virtual tape drives, and you wouldn't want to if you could. So these customers would have an issue implementing your suggestion. >This will save you hours of time in >the schedule not having to migrate from disk to tape. Again, if you can do it, I agree. Many people can't, unfortunately. >There is no particular advantage to collocating storage pools in a >virtual tape environment. I'm not sure I'm sold on this. I'm not saying I disagree; I'm just not sold. My fellow consultants and I have discussed this ad nauseum. My experience is that mounting a virtual tape still takes a finite amount of time and when you multiple that finite amount times the number of tape mounts you may experience in a completely uncollocated world, it may add up to a significant amount of time. (Again, this may be a much bigger deal in larger environments.) I would have to do a test on a couple of hundred tape mounts in TSM to see how much time that would really take before I can come to a decision here. Have you done that? I can tell you I did it in NetBackup and NetWorker, and I was amazed at how long it took to mount a virtual tape. >By turning off collocation, you can get better overal utilization >of the disk space in the CDL. I'm really curious on this one. It's not for the same reason as tape, right? ...where you waste 380 GB of a 400 GB tape if you've got a 20 GB client? Most VTLs that I'm aware of only use up as much space as you use. IOW, if you have a 400 GB virtual tape but only send 20 GB to it, you only use 20 GB of the VTL. Now the area where I can potentially see savings is that you don't typically end up doing much reclamation on a collocated tape. You don't reclaim it because if it's not full, there's no point. But if you combine my first thought on this with this thought, you would end up using more disk with collocated, non-reclaimed tapes. Perhaps, then you should consider doing reclamation against those collocated tapes. If I'm right and hundreds of tape mounts takes hundreds of minutes, then maybe restore performance should take precedence over tape utilization -- just like in the real tape world. >If your primary storage pool is on a CDL, set the >reclamation threshold at 50% (or whatever you prefer) and leave it >there. I see two potential areas for concern in large environments. The first is that if you're doing reclamation during backups, reclamation is reading your VTL disks while you're writing to them if you're allowing it to run while backups are going on. That creates disk contention that may hurt the performance of your backups. The second potential area for concern is contention for the TSM database. Backups create quite a bit of activity in the database, and adding reclamation activity while backups are going will create additional updates and queries, possibly causing you to hit a point where the database can't keep up with all that activity, causing your backups to slow down. These two reasons are why we generally advise TSM environments to disable expiration and reclamation while backups are going on. We advise them to follow the typical serial TSM schedule of backup to disk pool, create DR copy of backup by copying disk pool to tape, migrate disk pool to tape, TSM database backup, expiration, and reclamation. (I like to backup the TSM database before and after expiration/reclamation if I can get away with it.)