Re: Frustrated by slowness in TSM 6.2

2010-11-18 Thread Josh Davis
Maybe something simple like verifying TCPWIN on the receiving side is 2x
TCPBUF on th sender.
Set COMPRESS=NO to make sure you're not misreading retransmits.
Check topas during your local backup to itself.
Check nmon's disk stats during the backup to see if you've got a hot LUN.
Check the same from any disk perf monitoring.
Check errpt
Check your db2 logs for any sort of errors

With the XIV, streaming thruput should be fine.  It's only the IOPS that
will be weak.  Your physical limit would be around 16k IOPS, though you have
on-disk cache and write combining, as well as the 120GB of cache (8*15).
 You could run into some back-side 10GE saturation if your LUN pathing isn't
well balanced.

VIO servers also have some limitations.  If you're using VIO MPIO, are you
set for round robin at every stage?  By default, you'll be active/passive
between the two vscsi adapters, and then whatever you're doing for load
balance on the VIO servers.

Also, the VIO servers will need CPU to drive IOPS.  Check topas on the VIO
servers during your tests.

NPIV is preferred for lower latency through the VIO server, plus you can run
4-path multipath with load balancing on the client rather than having the
VIO server(s) muddle through.

---
Sincerely,
Josh-Daniel S. Davis

On Fri, Oct 8, 2010 at 11:27, Andrew Carlson  wrote:

> Hi all
>
> I am running TSM 6.2.1.1 on AIX V615 in a LPAR on a P770.  The LPAR
> has 6 shared CPU's, 12 virtual CPU;s, and 64GB of memory.  There are 2
> VIO servers with 4 fiber channel connections to XIV storage for the DB
> and LOG, and 2 10Gbit Ethernet in each VIO in an Etherchannel
> configuration.  The storage pool is on Data Domain DD880's, 2 per AIX,
> 1 per instance.
>
> I am seeing consistenly poor performance from this setup.  I have
> tested network from VIO to cloud, and LPAR to VIO, which seems fine.
> I tested LPAR to Data Domain, and things seem fine.  But, when backups
> are running (and I only have a few nodes there yet, this is a new
> setup), TSM doesn't seem to want to go over 20 to 30MB/s throughput.
> I tried backing up the TSM server over lo, and that was a little
> better at 50MB/s, but not screaming.  I tried using chunk of SAN as a
> disk pool ahead of the Data Domain, no change.  I am at my whit's end.
>
> If anyone has any ideas, please let me know.  Thanks.
>
> --
> Andy Carlson
> ---
> Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
> The feeling of seeing the red box with the item you want in it:Priceless.


Re: Frustrated by slowness in TSM 6.2

2010-10-09 Thread John D. Schneider
Andrew,
   The crowd may be right, and the XIV may be your bottleneck for the
DB, but I wouldn't focus on that.  In your test environment, with only a
small number of backups running at once, there probably isn't all that
much database traffic generated, is there?  And not many database reads,
if much of your database should fit in memory.  Database writes should
be going to cache in the XIV, if it is as lightly loaded as you say, so
I don't see that as much of a bottleneck when only a few clients are
getting backed up.
   What kind of client backups are you testing?  Are they large file
database backups? Those can generate very good I/O throughput, because
the client is sending the data as fast as possible.  Or incremental
filesystem backups on Windows servers?  Those can generate very pool I/O
throughput, if they have to examine thousands of files for each file
that needs to be sent to the server.  Can you say with assurance that
the clients themselves are able to send more than 20-30MB/sec?
   Do you know what performance those same clients get when they backup
to your production environment? Try backing them up to their production
environment, at some time of night when the TSM server is not maxed out.
 Use that as a known starting point.  If you just want to test
throughput, and don't care about anything else:

1) Turn off client compression, if it is on.
2) Do "selective" backups of the whole filesystem, so the clients send
everything without having to make any time-consuming decisions about
what gets sent. 
3) Pick a time for the test with the client is very lightly loaded.
4) Try to pick a client with a small number of very large (multi-GB)
files, not zillions of small files.

Andrew, I know you already know these things, but I include them for the
benefit of the rest of the list.  The point I am making is to allow the
TSM client shove data across as fast as it can, and if it performs
really well, then the device that is absorbing all that incoming data
(The DataDomain, or other disk storage pool) is performing well.  If
another client is sending zillions of files, but performing very slowly,
maybe that client is creating a lot more traffic to the database, and
that is where your bottle neck.  In other words, different clients can
be used to show what part of the TSM server is the slowest performer.


Best Regards,

John D. Schneider
The Computer Coaching Community, LLC
Office: (314) 635-5424 / Toll Free: (866) 796-9226
Cell: (314) 750-8721



 Original Message ----
Subject: Re: [ADSM-L] Frustrated by slowness in TSM 6.2
From: Paul Zarnowski 
Date: Fri, October 08, 2010 11:37 pm
To: ADSM-L@VM.MARIST.EDU

Rick,

I think their response would be something along these lines...
The XIV can perform better than other traditional arrays because the
[cache miss] I/Os are spread across so many more spindles. I get that.
But it seems to be that that can break down when the overall I/O load
gets sufficiently high, across all of the spindles. In an I/O
intensive environment such as TSM, I think this could be more likely
to happen - particularly if you are using XIV for storage pools as
well as for database volumes.

I'm still skeptical about how far it can go. I can buy that it has
good performance --- for a SATA-based product. But not compared to a
pure 15K spindle-based product. Oh, and the SATA drives are larger
than the SAS or FC drives, which doesn't help.

..Paul

At 01:57 PM 10/8/2010, Richard Rhodes wrote:
>> I would be suspicious of having the db on XIV. Do you have any FC
>> or SAS Disk you could try putting the DB on? I know XIV has lots
>> of CPU & cache, but underneath it all is still SATA. I've heard
>> Marketing types rave about how fast XIV is, even with SATA,
>> because I/O can be spread across many spindles, but I'm not
>> entirely convinced it's as good as 15k FC or SAS.
>
>This is _exactly_ what IBM has not, and seems unwilling, to explain.
>
>Soon after IBM finalized the purchase of XIV, they had a series
>of seminars around the country (usa) about the box. This wasn't some
>little out of the way seminar . . . Moshe (inventor of the box)
>was there and gave much of the presentation. I attended one - Lets
>just say it was strange!!! They hammered on "high performance", over
>and over. They threw up one graph where they claimed 25k iops at
>3ms response time for a "cache miss" workload. Lets see, cache miss
>means having to go to the spindle to do the I/O. SATA drives come
>no where close to this response time. The workload was either
>not cache miss, or, they effectively short-stroked the drive such
>that the heads never moved. When I questioned this claim I
>got nowhere - just run-around.
>
>Rick
>
>
>
>-
>The information contained in this message is i

Re: Frustrated by slowness in TSM 6.2

2010-10-08 Thread Paul Zarnowski
Rick,

I think their response would be something along these lines...
The XIV can perform better than other traditional arrays because the
[cache miss] I/Os are spread across so many more spindles.  I get that.
But it seems to be that that can break down when the overall I/O load
gets sufficiently high, across all of the spindles.  In an I/O
intensive environment such as TSM, I think this could be more likely
to happen - particularly if you are using XIV for storage pools as
well as for database volumes.

I'm still skeptical about how far it can go.  I can buy that it has
good performance --- for a SATA-based product.  But not compared to a
pure 15K spindle-based product.  Oh, and the SATA drives are larger
than the SAS or FC drives, which doesn't help.

..Paul

At 01:57 PM 10/8/2010, Richard Rhodes wrote:
>> I would be suspicious of having the db on XIV. Do you have any FC
>> or SAS Disk you could try putting the DB on?  I know XIV has lots
>> of CPU & cache, but underneath it all is still SATA. I've heard
>> Marketing types rave about how fast XIV is, even with SATA,
>> because I/O can be spread across many spindles, but I'm not
>> entirely convinced it's as good as 15k FC or SAS.
>
>This is _exactly_ what IBM has not, and seems unwilling, to explain.
>
>Soon after IBM finalized the purchase of XIV, they had a series
>of seminars around the country (usa) about the box. This wasn't some
>little out of the way seminar . . . Moshe (inventor of the box)
>was there and gave much of the presentation.   I attended one - Lets
>just say it was strange!!!   They hammered on "high performance", over
>and over.  They threw up one graph where they claimed 25k iops at
>3ms response time for a "cache miss" workload.  Lets see, cache miss
>means having to go to the spindle to do the I/O.  SATA drives come
>no where close to this response time.  The workload was either
>not cache miss, or, they effectively short-stroked the drive such
>that the heads never moved.  When I questioned this claim I
>got nowhere - just run-around.
>
>Rick
>
>
>
>-
>The information contained in this message is intended only for the personal 
>and confidential use of the recipient(s) named above. If the reader of this 
>message is not the intended recipient or an agent responsible for delivering 
>it to the intended recipient, you are hereby notified that you have received 
>this document in error and that any review, dissemination, distribution, or 
>copying of this message is strictly prohibited. If you have received this 
>communication in error, please notify us immediately, and delete the original 
>message.


--
Paul ZarnowskiPh: 607-255-4757
Manager, Storage Services Fx: 607-255-8521
719 Rhodes Hall, Ithaca, NY 14853-3801Em: p...@cornell.edu


Re: Frustrated by slowness in TSM 6.2

2010-10-08 Thread Richard Rhodes
> I would be suspicious of having the db on XIV. Do you have any FC
> or SAS Disk you could try putting the DB on?  I know XIV has lots
> of CPU & cache, but underneath it all is still SATA. I've heard
> Marketing types rave about how fast XIV is, even with SATA,
> because I/O can be spread across many spindles, but I'm not
> entirely convinced it's as good as 15k FC or SAS.

This is _exactly_ what IBM has not, and seems unwilling, to explain.

Soon after IBM finalized the purchase of XIV, they had a series
of seminars around the country (usa) about the box. This wasn't some
little out of the way seminar . . . Moshe (inventor of the box)
was there and gave much of the presentation.   I attended one - Lets
just say it was strange!!!   They hammered on "high performance", over
and over.  They threw up one graph where they claimed 25k iops at
3ms response time for a "cache miss" workload.  Lets see, cache miss
means having to go to the spindle to do the I/O.  SATA drives come
no where close to this response time.  The workload was either
not cache miss, or, they effectively short-stroked the drive such
that the heads never moved.  When I questioned this claim I
got nowhere - just run-around.

Rick



-
The information contained in this message is intended only for the
personal and confidential use of the recipient(s) named above. If
the reader of this message is not the intended recipient or an
agent responsible for delivering it to the intended recipient, you
are hereby notified that you have received this document in error
and that any review, dissemination, distribution, or copying of
this message is strictly prohibited. If you have received this
communication in error, please notify us immediately, and delete
the original message.


Re: Frustrated by slowness in TSM 6.2

2010-10-08 Thread Andrew Carlson
It is a lightly loaded XIV, and the disk system does not seem under
pressure, unless I force it with dd or something in testing, but I
will check it out.

Any other ideas out there?

On Fri, Oct 8, 2010 at 11:37 AM, Paul Zarnowski  wrote:
> I would be suspicious of having the db on XIV. Do you have any FC or SAS Disk 
> you could try putting the DB on?  I know XIV has lots of CPU & cache, but 
> underneath it all is still SATA. I've heard Marketing types rave about how 
> fast XIV is, even with SATA, because I/O can be spread across many spindles, 
> but I'm not entirely convinced it's as good as 15k FC or SAS.
>
> ..Paul
>
>
> On Oct 8, 2010, at 12:27 PM, "Andrew Carlson"  wrote:
>
>> Hi all
>>
>> I am running TSM 6.2.1.1 on AIX V615 in a LPAR on a P770.  The LPAR
>> has 6 shared CPU's, 12 virtual CPU;s, and 64GB of memory.  There are 2
>> VIO servers with 4 fiber channel connections to XIV storage for the DB
>> and LOG, and 2 10Gbit Ethernet in each VIO in an Etherchannel
>> configuration.  The storage pool is on Data Domain DD880's, 2 per AIX,
>> 1 per instance.
>>
>> I am seeing consistenly poor performance from this setup.  I have
>> tested network from VIO to cloud, and LPAR to VIO, which seems fine.
>> I tested LPAR to Data Domain, and things seem fine.  But, when backups
>> are running (and I only have a few nodes there yet, this is a new
>> setup), TSM doesn't seem to want to go over 20 to 30MB/s throughput.
>> I tried backing up the TSM server over lo, and that was a little
>> better at 50MB/s, but not screaming.  I tried using chunk of SAN as a
>> disk pool ahead of the Data Domain, no change.  I am at my whit's end.
>>
>> If anyone has any ideas, please let me know.  Thanks.
>>
>> --
>> Andy Carlson
>> ---
>> Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
>> The feeling of seeing the red box with the item you want in it:Priceless.
>



-- 
Andy Carlson
---
Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
The feeling of seeing the red box with the item you want in it:Priceless.


Re: Frustrated by slowness in TSM 6.2

2010-10-08 Thread Paul Zarnowski
I would be suspicious of having the db on XIV. Do you have any FC or SAS Disk 
you could try putting the DB on?  I know XIV has lots of CPU & cache, but 
underneath it all is still SATA. I've heard Marketing types rave about how fast 
XIV is, even with SATA, because I/O can be spread across many spindles, but I'm 
not entirely convinced it's as good as 15k FC or SAS. 

..Paul


On Oct 8, 2010, at 12:27 PM, "Andrew Carlson"  wrote:

> Hi all
> 
> I am running TSM 6.2.1.1 on AIX V615 in a LPAR on a P770.  The LPAR
> has 6 shared CPU's, 12 virtual CPU;s, and 64GB of memory.  There are 2
> VIO servers with 4 fiber channel connections to XIV storage for the DB
> and LOG, and 2 10Gbit Ethernet in each VIO in an Etherchannel
> configuration.  The storage pool is on Data Domain DD880's, 2 per AIX,
> 1 per instance.
> 
> I am seeing consistenly poor performance from this setup.  I have
> tested network from VIO to cloud, and LPAR to VIO, which seems fine.
> I tested LPAR to Data Domain, and things seem fine.  But, when backups
> are running (and I only have a few nodes there yet, this is a new
> setup), TSM doesn't seem to want to go over 20 to 30MB/s throughput.
> I tried backing up the TSM server over lo, and that was a little
> better at 50MB/s, but not screaming.  I tried using chunk of SAN as a
> disk pool ahead of the Data Domain, no change.  I am at my whit's end.
> 
> If anyone has any ideas, please let me know.  Thanks.
> 
> --
> Andy Carlson
> ---
> Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
> The feeling of seeing the red box with the item you want in it:Priceless.


Frustrated by slowness in TSM 6.2

2010-10-08 Thread Andrew Carlson
Hi all

I am running TSM 6.2.1.1 on AIX V615 in a LPAR on a P770.  The LPAR
has 6 shared CPU's, 12 virtual CPU;s, and 64GB of memory.  There are 2
VIO servers with 4 fiber channel connections to XIV storage for the DB
and LOG, and 2 10Gbit Ethernet in each VIO in an Etherchannel
configuration.  The storage pool is on Data Domain DD880's, 2 per AIX,
1 per instance.

I am seeing consistenly poor performance from this setup.  I have
tested network from VIO to cloud, and LPAR to VIO, which seems fine.
I tested LPAR to Data Domain, and things seem fine.  But, when backups
are running (and I only have a few nodes there yet, this is a new
setup), TSM doesn't seem to want to go over 20 to 30MB/s throughput.
I tried backing up the TSM server over lo, and that was a little
better at 50MB/s, but not screaming.  I tried using chunk of SAN as a
disk pool ahead of the Data Domain, no change.  I am at my whit's end.

If anyone has any ideas, please let me know.  Thanks.

--
Andy Carlson
---
Gamecube:$150,PSO:$50,Broadband Adapter: $35, Hunters License: $8.95/month,
The feeling of seeing the red box with the item you want in it:Priceless.