Re: Large Linux clients

2005-04-01 Thread Henk ten Have
An old trick I used for many years:
to investigate a problem filesystem, do a find in that filesystem. 
If the find dies, tsm definitly will die.
I'll bet your find will die, and that's why your backup will die/hang or 
whatever also. A find will do a filestat on all files/dirs, actually the same 
the backup does.
So your issue is OS related and not tsm.

Cheers
Henk ()

On Tuesday 29 March 2005 12:11, you wrote:
 On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:
  ...However, then I try to backup the tree at the third-level (e.g.
  /coyote/dsk3/), the client pretty much siezes immediately and
  dsmerror.log
  says B/A Txn Producer Thread, fatal error, Signal 11.  The server
  shows
  the session as SendW and nothing going else going on

 Zoltan -

 Signal 11 is a segfault - a software failure.
 The client programming has a defect, which may be incited by a problem
 in that area of the file system (so have that investigated). A segfault
 can be induced by memory constraint, which in this context would most
 likely be Unix Resource Limits, so also enter the command 'limit' in
 Linux csh or tcsh and potentially boost the stack size ('unlimit
 stacksize'). This is to say that the client was probably invoked under
 artificially limited environmentals.

 Richard Sims


Re: Large Linux clients

2005-04-01 Thread Zoltan Forray/AC/VCU
Thanks for the suggestion.   However, this is not true.  We already tried
this.

We did find . | wc -l to get the object count (1.1M) with no problems.
But the backup still will not work. Constantly fails, in
unpredictable/inconsistant places, with the same Producer Thread error.

I spent 2+ days drilling through the various sub-directories (of this
directory that causes the failures), one-by-one, and was able to backup 38
of the 40 subdirs, totalling over 980K objects, with out a problem.  When
I included these two other directories, in the same pile, the backup would
fail.

When I then went back and individually selected the sub-sub directories of
these sub-directories (one at a time), I was able to backup *ALL* of the
sub-sub directories, no problem.  Then I went back and selected the
upper-level directory and backed it up, no problem..

Let me draw a picture of the structure of these directories.

The problem directories are in this directory:
/coyote/dsk3/patients/prostateReOpt/Mount_0/ .

If I try to backup the /Mount_0/ as a whole, crashes every time.   If I
point to sub-dirs below /Mount_0/ (40 of these - all with the same named
4-subsub dirs ), two of these cause a crash. I noted that these two both
have 72K objects while the other 38 have less than 60K objects.

Yet when I manually picked the 4-subsub dirs of the Patient_172 dir, the
backup worked (sort of - see below). Same for the Patient_173.

To really drive me crazy, the first attempt at backing up one of the
subsub dirs under Patient_172, the backup crashed. Yet I could backup the
other 3 with no issue. So, we started looking at the problem subdir and
noticed a weird file name that ended in a tilde (~).  When I excluded it,
the backup ran. Then when I went back and picked just the file with the
tilde, it backed up fine (my head is getting balder-and-balder !!).  I
then went back and re-selected the whole Patient_172 directory and it
backed up (or at least scanned it since everything was backed-up) just
fine !!!1  AGGH !!

This is maddening and shows no rhyme-or-reason.




Henk ten Have [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
04/01/2005 08:29 AM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






An old trick I used for many years:
to investigate a problem filesystem, do a find in that filesystem.
If the find dies, tsm definitly will die.
I'll bet your find will die, and that's why your backup will die/hang or
whatever also. A find will do a filestat on all files/dirs, actually the
same
the backup does.
So your issue is OS related and not tsm.

Cheers
Henk ()

On Tuesday 29 March 2005 12:11, you wrote:
 On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:
  ...However, then I try to backup the tree at the third-level (e.g.
  /coyote/dsk3/), the client pretty much siezes immediately and
  dsmerror.log
  says B/A Txn Producer Thread, fatal error, Signal 11.  The server
  shows
  the session as SendW and nothing going else going on

 Zoltan -

 Signal 11 is a segfault - a software failure.
 The client programming has a defect, which may be incited by a problem
 in that area of the file system (so have that investigated). A segfault
 can be induced by memory constraint, which in this context would most
 likely be Unix Resource Limits, so also enter the command 'limit' in
 Linux csh or tcsh and potentially boost the stack size ('unlimit
 stacksize'). This is to say that the client was probably invoked under
 artificially limited environmentals.

 Richard Sims


Re: Large Linux clients

2005-04-01 Thread Ben Bullock
Ya, 
Sorry, I have no answers for you, but you do have my sympathy.

I've had to do that kind of detective work before. Some times it
is an oddly named file, a very very long-named file, or some times it's
a file that somehow got a very bizarre date, like Apr 15  1904. In a
few cases it has also been hung NFS mounts somewhere in the path.

I've had to drill down each of the subdir one after another just
like you did to figure it out, because there was no filename or other
hints in the schedule or error logs, just a generic failed message.

Luckily I only have to do it about once or twice a year, but it
is time consuming.

 Ben


-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Zoltan Forray/AC/VCU
Sent: Friday, April 01, 2005 9:03 AM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: Large Linux clients

Thanks for the suggestion.   However, this is not true.  We already
tried
this.

We did find . | wc -l to get the object count (1.1M) with no problems.
But the backup still will not work. Constantly fails, in
unpredictable/inconsistant places, with the same Producer Thread
error.

I spent 2+ days drilling through the various sub-directories (of this
directory that causes the failures), one-by-one, and was able to backup
38 of the 40 subdirs, totalling over 980K objects, with out a problem.
When I included these two other directories, in the same pile, the
backup would fail.

When I then went back and individually selected the sub-sub directories
of these sub-directories (one at a time), I was able to backup *ALL* of
the sub-sub directories, no problem.  Then I went back and selected the
upper-level directory and backed it up, no problem..

Let me draw a picture of the structure of these directories.

The problem directories are in this directory:
/coyote/dsk3/patients/prostateReOpt/Mount_0/ .

If I try to backup the /Mount_0/ as a whole, crashes every time.   If I
point to sub-dirs below /Mount_0/ (40 of these - all with the same named
4-subsub dirs ), two of these cause a crash. I noted that these two both
have 72K objects while the other 38 have less than 60K objects.

Yet when I manually picked the 4-subsub dirs of the Patient_172 dir, the
backup worked (sort of - see below). Same for the Patient_173.

To really drive me crazy, the first attempt at backing up one of the
subsub dirs under Patient_172, the backup crashed. Yet I could backup
the other 3 with no issue. So, we started looking at the problem subdir
and noticed a weird file name that ended in a tilde (~).  When I
excluded it, the backup ran. Then when I went back and picked just the
file with the tilde, it backed up fine (my head is getting
balder-and-balder !!).  I then went back and re-selected the whole
Patient_172 directory and it backed up (or at least scanned it since
everything was backed-up) just fine !!!1
AGGH !!

This is maddening and shows no rhyme-or-reason.




Henk ten Have [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
04/01/2005 08:29 AM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






An old trick I used for many years:
to investigate a problem filesystem, do a find in that filesystem.
If the find dies, tsm definitly will die.
I'll bet your find will die, and that's why your backup will die/hang or
whatever also. A find will do a filestat on all files/dirs, actually the
same the backup does.
So your issue is OS related and not tsm.

Cheers
Henk ()

On Tuesday 29 March 2005 12:11, you wrote:
 On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:
  ...However, then I try to backup the tree at the third-level (e.g.
  /coyote/dsk3/), the client pretty much siezes immediately and 
  dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11.

  The server shows the session as SendW and nothing going else going

  on

 Zoltan -

 Signal 11 is a segfault - a software failure.
 The client programming has a defect, which may be incited by a problem

 in that area of the file system (so have that investigated). A 
 segfault can be induced by memory constraint, which in this context 
 would most likely be Unix Resource Limits, so also enter the command 
 'limit' in Linux csh or tcsh and potentially boost the stack size 
 ('unlimit stacksize'). This is to say that the client was probably 
 invoked under artificially limited environmentals.

 Richard Sims


Re: Large Linux clients

2005-03-29 Thread Zoltan Forray/AC/VCU
Thanks for the suggestion.

We have tried it.   Same results.   Things just go to sleep !




Mark D. Rodriguez [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/28/2005 05:30 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






Zoltan,

I am not sure if this will fix the problem or not.  I have seen in the
past when trying to backup directories (including sub-directories) with
a large number of files that the system runs out of memory and either
fails or hangs for ever.  The one thing that I have done and has worked
in some cases is to use the MEMORYEFfecientbackup option.  It is a
client side option and can be placed in the option file or called from
the command line.  I would try it and see if it helps.  BTW, there is a
downside to this and that is that backups will be slow however slow is
still faster than not at all!

Let us know if that helps.

--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.

===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education
IBM Certified Advanced Technical Expert, CATE
AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
Red Hat Certified Engineer, RHCE
===



Zoltan Forray/AC/VCU wrote:

I am having issues backing up a large Linux server (client=5.2.3.0).

The TSM server is also on a RH Linux box (5.2.2.5).

This system has over 4.6M objects.

A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.

The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).

If I point to this directory as a whole (via the web ba-client), and try
to back it up in one shot, it displays the inspecting objects message
and then never comes back.

If I drill down further and select the subdirs in groups of 10, it seems
to back them up, with no problem.

So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?

Any suggestions on what the problem could be.

Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.





Re: Large Linux clients

2005-03-29 Thread jsiegle
Zoltan,
I had a similar problem on a Windows box with 5.4 million files. Tivoli
said that I couldn't do the backup/restore with a 32 bit client because
each file in the catalog takes 1k and the 32 bit program could only
address 4 GB of memory. Here is a link they gave me:
http://www-1.ibm.com/support/entdocview.wss?rs=0context=SSGSG7q1=1197172uid=swg21197172loc=en_UScs=utf-8lang=NotUpdateReferer=
It doesn't quite address your problem if you are only considering 1.4
million files though. You may want to weigh in on virtualmountpoints.
And ironically enough MEMORYEF didn't help at all for the backup part.
I'm going to open a problem with Tivoli on this in May when I get the
scenario setup.
Oh and you probably know to check ulimits.
--
Jonathan
Zoltan Forray/AC/VCU wrote:
Thanks for the suggestion.
We have tried it.   Same results.   Things just go to sleep !

Mark D. Rodriguez [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/28/2005 05:30 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
To
ADSM-L@VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Large Linux clients


Zoltan,
I am not sure if this will fix the problem or not.  I have seen in the
past when trying to backup directories (including sub-directories) with
a large number of files that the system runs out of memory and either
fails or hangs for ever.  The one thing that I have done and has worked
in some cases is to use the MEMORYEFfecientbackup option.  It is a
client side option and can be placed in the option file or called from
the command line.  I would try it and see if it helps.  BTW, there is a
downside to this and that is that backups will be slow however slow is
still faster than not at all!
Let us know if that helps.
--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.
===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education
IBM Certified Advanced Technical Expert, CATE
AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
Red Hat Certified Engineer, RHCE
===

Zoltan Forray/AC/VCU wrote:

I am having issues backing up a large Linux server (client=5.2.3.0).
The TSM server is also on a RH Linux box (5.2.2.5).
This system has over 4.6M objects.
A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.
The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).
If I point to this directory as a whole (via the web ba-client), and try
to back it up in one shot, it displays the inspecting objects message
and then never comes back.
If I drill down further and select the subdirs in groups of 10, it seems
to back them up, with no problem.
So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?
Any suggestions on what the problem could be.
Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.



Re: Large Linux clients

2005-03-29 Thread Remco Post
Zoltan Forray/AC/VCU wrote:
I am having issues backing up a large Linux server (client=5.2.3.0).
The TSM server is also on a RH Linux box (5.2.2.5).
This system has over 4.6M objects.
A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.
The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).
If I point to this directory as a whole (via the web ba-client), and try
to back it up in one shot, it displays the inspecting objects message
and then never comes back.
If I drill down further and select the subdirs in groups of 10, it seems
to back them up, with no problem.
So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?
Any suggestions on what the problem could be.
Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.

We have noticed that TSM becomes very inefficient for filesystems with
over 1M files in them. We found that this seems to be a TSM server
database issue. The same server instance performs very well for smaller
filesystems.
Another issue could be the filesystem itself. Some filesystems (ext2)
are very bad at handeling very large directories.
--
Met vriendelijke groeten,
Remco Post
SARA - Reken- en Netwerkdiensten  http://www.sara.nl
High Performance Computing  Tel. +31 20 592 3000Fax. +31 20 668 3167
I really didn't foresee the Internet. But then, neither did the
computer industry. Not that that tells us very much of course - the
computer industry didn't even foresee that the century was going to
end. -- Douglas Adams


Re: Large Linux clients

2005-03-29 Thread Andrew Raibeck
First, you should work with whoever owns that system in order to ensure
that you can get the access you need to perform your investigations.

When the backup appears to hang, what does the QUERY SESSION admin
command show for this node?

Is the TSM process consuming any CPU?

Configure the client for a SERVICE trace by adding the following to
dsm.opt:

   TRACEFILE tsmtrace.txt
   TRACEFLAGS SERVICE
   TRACEMAX 20

(for TRACEFILE, specify whatever directory and file name you want, enough
to store a 20 MB file.)

Then use the command line client (dsmc) to run an incremental backup
against the problem file system and wait for the problem to reoccur. Check
the QUERY SESSION output and CPU utilization for dsmc, as I mentioned
above.

You can view the trace file with a text editor, and look for the line that
reads END OF DATA to see what the last thing the client was doing. Look
and see if you have any recursive directory structures. Open a problem
with IBM support and provide them with the results of the info I mentioned
above (let me know, too).

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED]
Internet e-mail: [EMAIL PROTECTED]

The only dumb question is the one that goes unasked.
The command line is your friend.
Good enough is the enemy of excellence.

ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28
15:22:05:

 I am having issues backing up a large Linux server (client=5.2.3.0).

 The TSM server is also on a RH Linux box (5.2.2.5).

 This system has over 4.6M objects.

 A standard incremental WILL NOT complete successfully. It usually
 hangs/times-out/etc.

 The troubles seem to be related to one particular directory with
 40-subdirs, comprising 1.4M objects (from the box owner).

 If I point to this directory as a whole (via the web ba-client), and try
 to back it up in one shot, it displays the inspecting objects message
 and then never comes back.

 If I drill down further and select the subdirs in groups of 10, it seems
 to back them up, with no problem.

 So, one question I have is, anyone out there backing up large Linux
 systems, similar to this ?

 Any suggestions on what the problem could be.

 Currently, I do not have access to the error-log files since this is a
 protected/firewalled system and I don't have the id/pw.


Re: Large Linux clients

2005-03-29 Thread Zoltan Forray/AC/VCU
I am going down this path, already. I have started doing some instrument
traces. However, the results seem to show nothing, when backing up only
specific sub-sub-subdirectories.  What time accumulation there is, is in
Solve Tree and/or Process Dirs. No big surprise, there.

However, then I try to backup the tree at the third-level (e.g.
/coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log
says B/A Txn Producer Thread, fatal error, Signal 11.  The server shows
the session as SendW and nothing going else going on.

I have tried upping ResourceUtilization to4 (server side set to 6),
MemoryEfficientBackup YES, all to no avail.

I am going to try the other TRACE options you specify, to see if I get any
more data.



Andrew Raibeck [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/29/2005 12:07 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






First, you should work with whoever owns that system in order to ensure
that you can get the access you need to perform your investigations.

When the backup appears to hang, what does the QUERY SESSION admin
command show for this node?

Is the TSM process consuming any CPU?

Configure the client for a SERVICE trace by adding the following to
dsm.opt:

   TRACEFILE tsmtrace.txt
   TRACEFLAGS SERVICE
   TRACEMAX 20

(for TRACEFILE, specify whatever directory and file name you want, enough
to store a 20 MB file.)

Then use the command line client (dsmc) to run an incremental backup
against the problem file system and wait for the problem to reoccur. Check
the QUERY SESSION output and CPU utilization for dsmc, as I mentioned
above.

You can view the trace file with a text editor, and look for the line that
reads END OF DATA to see what the last thing the client was doing. Look
and see if you have any recursive directory structures. Open a problem
with IBM support and provide them with the results of the info I mentioned
above (let me know, too).

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED]
Internet e-mail: [EMAIL PROTECTED]

The only dumb question is the one that goes unasked.
The command line is your friend.
Good enough is the enemy of excellence.

ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28
15:22:05:

 I am having issues backing up a large Linux server (client=5.2.3.0).

 The TSM server is also on a RH Linux box (5.2.2.5).

 This system has over 4.6M objects.

 A standard incremental WILL NOT complete successfully. It usually
 hangs/times-out/etc.

 The troubles seem to be related to one particular directory with
 40-subdirs, comprising 1.4M objects (from the box owner).

 If I point to this directory as a whole (via the web ba-client), and try
 to back it up in one shot, it displays the inspecting objects message
 and then never comes back.

 If I drill down further and select the subdirs in groups of 10, it seems
 to back them up, with no problem.

 So, one question I have is, anyone out there backing up large Linux
 systems, similar to this ?

 Any suggestions on what the problem could be.

 Currently, I do not have access to the error-log files since this is a
 protected/firewalled system and I don't have the id/pw.


Re: Large Linux clients

2005-03-29 Thread Zoltan Forray/AC/VCU
Some more details.

I added the TRACE options you recommended.

While the backup still immediately fails, I got some more information.

The Producer Thread failure now includes the detail:
linux86/psunxthr.cpp (184).   The only hit I get on these messages, is
apar: IC30292. But the applicable components listed says:  R42L PSY, which
I read as 4.2 ?   This client is 5.2.3.0 !




Andrew Raibeck [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/29/2005 12:07 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






First, you should work with whoever owns that system in order to ensure
that you can get the access you need to perform your investigations.

When the backup appears to hang, what does the QUERY SESSION admin
command show for this node?

Is the TSM process consuming any CPU?

Configure the client for a SERVICE trace by adding the following to
dsm.opt:

   TRACEFILE tsmtrace.txt
   TRACEFLAGS SERVICE
   TRACEMAX 20

(for TRACEFILE, specify whatever directory and file name you want, enough
to store a 20 MB file.)

Then use the command line client (dsmc) to run an incremental backup
against the problem file system and wait for the problem to reoccur. Check
the QUERY SESSION output and CPU utilization for dsmc, as I mentioned
above.

You can view the trace file with a text editor, and look for the line that
reads END OF DATA to see what the last thing the client was doing. Look
and see if you have any recursive directory structures. Open a problem
with IBM support and provide them with the results of the info I mentioned
above (let me know, too).

Regards,

Andy

Andy Raibeck
IBM Software Group
Tivoli Storage Manager Client Development
Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED]
Internet e-mail: [EMAIL PROTECTED]

The only dumb question is the one that goes unasked.
The command line is your friend.
Good enough is the enemy of excellence.

ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28
15:22:05:

 I am having issues backing up a large Linux server (client=5.2.3.0).

 The TSM server is also on a RH Linux box (5.2.2.5).

 This system has over 4.6M objects.

 A standard incremental WILL NOT complete successfully. It usually
 hangs/times-out/etc.

 The troubles seem to be related to one particular directory with
 40-subdirs, comprising 1.4M objects (from the box owner).

 If I point to this directory as a whole (via the web ba-client), and try
 to back it up in one shot, it displays the inspecting objects message
 and then never comes back.

 If I drill down further and select the subdirs in groups of 10, it seems
 to back them up, with no problem.

 So, one question I have is, anyone out there backing up large Linux
 systems, similar to this ?

 Any suggestions on what the problem could be.

 Currently, I do not have access to the error-log files since this is a
 protected/firewalled system and I don't have the id/pw.


Re: Large Linux clients

2005-03-29 Thread Richard Sims
On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:
...However, then I try to backup the tree at the third-level (e.g.
/coyote/dsk3/), the client pretty much siezes immediately and
dsmerror.log
says B/A Txn Producer Thread, fatal error, Signal 11.  The server
shows
the session as SendW and nothing going else going on
Zoltan -
Signal 11 is a segfault - a software failure.
The client programming has a defect, which may be incited by a problem
in that area of the file system (so have that investigated). A segfault
can be induced by memory constraint, which in this context would most
likely be Unix Resource Limits, so also enter the command 'limit' in
Linux csh or tcsh and potentially boost the stack size ('unlimit
stacksize'). This is to say that the client was probably invoked under
artificially limited environmentals.
   Richard Sims


Re: Large Linux clients

2005-03-29 Thread Zoltan Forray/AC/VCU
Here ya go. Pretty much no limits. I am open to suggestions on values to
change that might help !

FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the OS.

Also, while on the subject, I read the requirements on the 5.3.x client,
that says it has only been tested on RH AS 3. Anyone try the V5.3 client
on RH8 ?

[EMAIL PROTECTED] root]# ulimit -a
core file size(blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size   (kbytes, -m) unlimited
open files(-n) 1024
pipe size  (512 bytes, -p) 8
stack size(kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes(-u) 4092
virtual memory(kbytes, -v) unlimited



Richard Sims [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/29/2005 01:11 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:

 ...However, then I try to backup the tree at the third-level (e.g.
 /coyote/dsk3/), the client pretty much siezes immediately and
 dsmerror.log
 says B/A Txn Producer Thread, fatal error, Signal 11.  The server
 shows
 the session as SendW and nothing going else going on

Zoltan -

Signal 11 is a segfault - a software failure.
The client programming has a defect, which may be incited by a problem
in that area of the file system (so have that investigated). A segfault
can be induced by memory constraint, which in this context would most
likely be Unix Resource Limits, so also enter the command 'limit' in
Linux csh or tcsh and potentially boost the stack size ('unlimit
stacksize'). This is to say that the client was probably invoked under
artificially limited environmentals.

Richard Sims


Re: Large Linux clients

2005-03-29 Thread Richard Sims
On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote:
Here ya go. Pretty much no limits. I am open to suggestions on values
to
change that might help !
I did recommend addressing the Stacksize to try to head off the
defect...
FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the
OS.
Also, while on the subject, I read the requirements on the 5.3.x
client,
that says it has only been tested on RH AS 3. Anyone try the V5.3
client
on RH8 ?
[EMAIL PROTECTED] root]# ulimit -a
core file size(blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size   (kbytes, -m) unlimited
open files(-n) 1024
pipe size  (512 bytes, -p) 8
stack size(kbytes, -s) 8192
 
  That's FAR from unlimited.
cpu time (seconds, -t) unlimited
max user processes(-u) 4092
virtual memory(kbytes, -v) unlimited

Richard Sims [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/29/2005 01:11 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
To
ADSM-L@VM.MARIST.EDU
cc
Subject
Re: [ADSM-L] Large Linux clients


On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:
...However, then I try to backup the tree at the third-level (e.g.
/coyote/dsk3/), the client pretty much siezes immediately and
dsmerror.log
says B/A Txn Producer Thread, fatal error, Signal 11.  The server
shows
the session as SendW and nothing going else going on
Zoltan -
Signal 11 is a segfault - a software failure.
The client programming has a defect, which may be incited by a problem
in that area of the file system (so have that investigated). A segfault
can be induced by memory constraint, which in this context would most
likely be Unix Resource Limits, so also enter the command 'limit' in
Linux csh or tcsh and potentially boost the stack size ('unlimit
stacksize'). This is to say that the client was probably invoked under
artificially limited environmentals.
Richard Sims


Re: Large Linux clients

2005-03-29 Thread Zoltan Forray/AC/VCU
Did a ulimit -s unlimited.

Dies the same way when trying to backup the /coyote/dsk3/ fs - Producer
Thread




Richard Sims [EMAIL PROTECTED]
Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
03/29/2005 01:53 PM
Please respond to
ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


To
ADSM-L@VM.MARIST.EDU
cc

Subject
Re: [ADSM-L] Large Linux clients






On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote:

 Here ya go. Pretty much no limits. I am open to suggestions on values
 to
 change that might help !

I did recommend addressing the Stacksize to try to head off the
defect...


 FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the
 OS.

 Also, while on the subject, I read the requirements on the 5.3.x
 client,
 that says it has only been tested on RH AS 3. Anyone try the V5.3
 client
 on RH8 ?

 [EMAIL PROTECTED] root]# ulimit -a
 core file size(blocks, -c) 0
 data seg size (kbytes, -d) unlimited
 file size (blocks, -f) unlimited
 max locked memory (kbytes, -l) unlimited
 max memory size   (kbytes, -m) unlimited
 open files(-n) 1024
 pipe size  (512 bytes, -p) 8
 stack size(kbytes, -s) 8192
  
   That's FAR from unlimited.

 cpu time (seconds, -t) unlimited
 max user processes(-u) 4092
 virtual memory(kbytes, -v) unlimited



 Richard Sims [EMAIL PROTECTED]
 Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU
 03/29/2005 01:11 PM
 Please respond to
 ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU


 To
 ADSM-L@VM.MARIST.EDU
 cc

 Subject
 Re: [ADSM-L] Large Linux clients






 On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote:

 ...However, then I try to backup the tree at the third-level (e.g.
 /coyote/dsk3/), the client pretty much siezes immediately and
 dsmerror.log
 says B/A Txn Producer Thread, fatal error, Signal 11.  The server
 shows
 the session as SendW and nothing going else going on

 Zoltan -

 Signal 11 is a segfault - a software failure.
 The client programming has a defect, which may be incited by a problem
 in that area of the file system (so have that investigated). A segfault
 can be induced by memory constraint, which in this context would most
 likely be Unix Resource Limits, so also enter the command 'limit' in
 Linux csh or tcsh and potentially boost the stack size ('unlimit
 stacksize'). This is to say that the client was probably invoked under
 artificially limited environmentals.

 Richard Sims


Re: Large Linux clients

2005-03-29 Thread Stapleton, Mark
On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote:
 FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the
 OS.

Well, to be frank about it, you're using an unsupported version of
Linux.

That's a bit of a cop-out, I fear, but there may well be reasons that
RH8 (and the Beowulf cluster you running) breaks something in the TSM
client.

The 5.3 version of the Linux client is less dependent upon the version
of the kernel. You might give it a try.

--
Mark Stapleton ([EMAIL PROTECTED])
Office 262.521.5627


Re: Large Linux clients

2005-03-28 Thread Mark D. Rodriguez
Zoltan,
I am not sure if this will fix the problem or not.  I have seen in the
past when trying to backup directories (including sub-directories) with
a large number of files that the system runs out of memory and either
fails or hangs for ever.  The one thing that I have done and has worked
in some cases is to use the MEMORYEFfecientbackup option.  It is a
client side option and can be placed in the option file or called from
the command line.  I would try it and see if it helps.  BTW, there is a
downside to this and that is that backups will be slow however slow is
still faster than not at all!
Let us know if that helps.
--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.
===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education
IBM Certified Advanced Technical Expert, CATE
AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
Red Hat Certified Engineer, RHCE
===

Zoltan Forray/AC/VCU wrote:
I am having issues backing up a large Linux server (client=5.2.3.0).
The TSM server is also on a RH Linux box (5.2.2.5).
This system has over 4.6M objects.
A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.
The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).
If I point to this directory as a whole (via the web ba-client), and try
to back it up in one shot, it displays the inspecting objects message
and then never comes back.
If I drill down further and select the subdirs in groups of 10, it seems
to back them up, with no problem.
So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?
Any suggestions on what the problem could be.
Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.



Re: Large Linux clients

2005-03-28 Thread Meadows, Andrew
I would also definitely suggest Journaling after the first incremental
backup completes, that would help negate the slower backups with memory
efficient turned on.
 

-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Mark D. Rodriguez
Sent: Monday, March 28, 2005 4:30 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: Large Linux clients

Zoltan,

I am not sure if this will fix the problem or not.  I have seen in the
past when trying to backup directories (including sub-directories) with
a large number of files that the system runs out of memory and either
fails or hangs for ever.  The one thing that I have done and has worked
in some cases is to use the MEMORYEFfecientbackup option.  It is a
client side option and can be placed in the option file or called from
the command line.  I would try it and see if it helps.  BTW, there is a
downside to this and that is that backups will be slow however slow is
still faster than not at all!

Let us know if that helps.

--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.


===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education IBM Certified
Advanced Technical Expert, CATE AIX Support and Performance Tuning,
RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE

===



Zoltan Forray/AC/VCU wrote:

I am having issues backing up a large Linux server (client=5.2.3.0).

The TSM server is also on a RH Linux box (5.2.2.5).

This system has over 4.6M objects.

A standard incremental WILL NOT complete successfully. It usually 
hangs/times-out/etc.

The troubles seem to be related to one particular directory with 
40-subdirs, comprising 1.4M objects (from the box owner).

If I point to this directory as a whole (via the web ba-client), and 
try to back it up in one shot, it displays the inspecting objects 
message and then never comes back.

If I drill down further and select the subdirs in groups of 10, it 
seems to back them up, with no problem.

So, one question I have is, anyone out there backing up large Linux 
systems, similar to this ?

Any suggestions on what the problem could be.

Currently, I do not have access to the error-log files since this is a 
protected/firewalled system and I don't have the id/pw.




This message is intended only for the use of the Addressee and
may contain information that is PRIVILEGED and CONFIDENTIAL.

If you are not the intended recipient, you are hereby notified
that any dissemination of this communication is strictly prohibited.

If you have received this communication in error, please erase
all copies of the message and its attachments and notify us
immediately.

Thank you.



Re: Large Linux clients

2005-03-28 Thread Richard Sims
Some things to consider with large file systems, and Unix ones in
particular:
1. Use CLI type backups rather than GUI type, for speed.
2. Divide and conquer: Very large file systems are conspicuous
candidates for Virtualmountpoint TSM subdivision, which will greatly
improve things overall. For that matter, oversized file systems are
conspicuous candidates for splitting into multiple physical file
systems.
In suspect areas of Unix file systems I would interactively run a
'find' command with the -ls option to test the speed with which
sub-trees of the file system can be traversed, to point out ringers. A
single directory with a huge number of files can induce a lot of
overhead, and is ripe for re-architecting.
The big problem in all file systems is that they are simply created and
then ignored, and the users of those file systems fill them with
whatever they want, organized at whim in most cases. The architecting
of file systems for performance and organizational sanity is a greatly
overlooked subject area.
   Richard Sims
On Mar 28, 2005, at 5:22 PM, Zoltan Forray/AC/VCU wrote:
I am having issues backing up a large Linux server (client=5.2.3.0).
The TSM server is also on a RH Linux box (5.2.2.5).
This system has over 4.6M objects.
A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.
The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).
If I point to this directory as a whole (via the web ba-client), and
try
to back it up in one shot, it displays the inspecting objects message
and then never comes back.
If I drill down further and select the subdirs in groups of 10, it
seems
to back them up, with no problem.
So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?
Any suggestions on what the problem could be.
Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.


Re: Large Linux clients

2005-03-28 Thread Mark D. Rodriguez
Andrew,
This is a Linux client.  I do not believe that journal backups are
supported under Linux.  As far as I know it is a windows only thing or
at least thats what all the documentation says anyway.
--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.
===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education
IBM Certified Advanced Technical Expert, CATE
AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux
Red Hat Certified Engineer, RHCE
===

Meadows, Andrew wrote:
I would also definitely suggest Journaling after the first incremental
backup completes, that would help negate the slower backups with memory
efficient turned on.
-Original Message-
From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of
Mark D. Rodriguez
Sent: Monday, March 28, 2005 4:30 PM
To: ADSM-L@VM.MARIST.EDU
Subject: Re: Large Linux clients
Zoltan,
I am not sure if this will fix the problem or not.  I have seen in the
past when trying to backup directories (including sub-directories) with
a large number of files that the system runs out of memory and either
fails or hangs for ever.  The one thing that I have done and has worked
in some cases is to use the MEMORYEFfecientbackup option.  It is a
client side option and can be placed in the option file or called from
the command line.  I would try it and see if it helps.  BTW, there is a
downside to this and that is that backups will be slow however slow is
still faster than not at all!
Let us know if that helps.
--
Regards,
Mark D. Rodriguez
President MDR Consulting, Inc.

===
MDR Consulting
The very best in Technical Training and Consulting.
IBM Advanced Business Partner
SAIR Linux and GNU Authorized Center for Education IBM Certified
Advanced Technical Expert, CATE AIX Support and Performance Tuning,
RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE

===

Zoltan Forray/AC/VCU wrote:

I am having issues backing up a large Linux server (client=5.2.3.0).
The TSM server is also on a RH Linux box (5.2.2.5).
This system has over 4.6M objects.
A standard incremental WILL NOT complete successfully. It usually
hangs/times-out/etc.
The troubles seem to be related to one particular directory with
40-subdirs, comprising 1.4M objects (from the box owner).
If I point to this directory as a whole (via the web ba-client), and
try to back it up in one shot, it displays the inspecting objects
message and then never comes back.
If I drill down further and select the subdirs in groups of 10, it
seems to back them up, with no problem.
So, one question I have is, anyone out there backing up large Linux
systems, similar to this ?
Any suggestions on what the problem could be.
Currently, I do not have access to the error-log files since this is a
protected/firewalled system and I don't have the id/pw.



This message is intended only for the use of the Addressee and
may contain information that is PRIVILEGED and CONFIDENTIAL.
If you are not the intended recipient, you are hereby notified
that any dissemination of this communication is strictly prohibited.
If you have received this communication in error, please erase
all copies of the message and its attachments and notify us
immediately.
Thank you.