Hi TSM/SP-ers,

We are struggling with the performance of our TSM servers for months now. We 
are running several servers with hardware (Data Domain) dedup for years without 
any problems, but on our new servers with directory container pools performance 
is really, really bad.
The servers and storage are designed according to the Blueprints and they are 
working fine as long as you do not add large database (Oracle and SAP) client 
to them. As soon as you do, the overall server performance becomes very bad: 
client and admin session initiation takes 20 to 40 seconds, SQL queries run for 
minutes where they should take a few seconds and q query stgpool sometimes 
takes more than a minute to respond!
I have two cases open for this. In one case we focused a lot on the OS and disk 
performance, but during that process I noticed that the performance is most 
likely caused by the way TSM processes large (few hundred MB) files. I 
performed a large amount of tests and came to the conclusion that it takes TSM 
a huge amount of time to delete large deduplicated files, both in container 
pools as deduplicated file pools. As test I use an TDP for Oracle client which 
uses a backup piece size of 900 MB. The client contains about 5000 files. 
Deleting the files from a container pool takes more than an hour. When you run 
a delete object for the files individually I see that most files take more than 
a second(!) to delete. If I put that same data in a non-deduplicated file pool, 
a delete filespace takes about 15 seconds...
The main issue is that the TDP clients are doing the exact same thing: as soon 
as a backup file is no longer needed, it's removed from the RMAN catalog and 
deleted from TSM. Since we have several huge database clients (multiple TB's 
each) these Oracle delete jobs tend to run for hours. These delete jobs also 
seem to slow down each other, because when there are several of those jobs 
running at the same time, they become even more slow. At this point I have one 
server where these jobs are running 24 hours per day! This server is at the 
moment the worst performing TSM server I have ever seen. On the other container 
pool servers I was able to move the Oracle and SAP server away to the old 
servers (the ones with the Data Domain), but on this one I can't because of 
Data Domain capacity reasons.
For this file deletion performance I also have a case open, but there is 
absolutely no progress. I proved IBM how bad the performance is and I even 
offered them a copy of our database so they can see for themselves, but only 
silence from development...
One thing I do not understand: I find it very hard to believe that we are the 
only one suffering from this issue. There must be dozens of TSM users out there 
that backup large databases to TSM container pools?

Kind regards,
Eric van Loon
Air France/KLM Storage & Backup
********************************************************
For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286
********************************************************

Reply via email to