Re: Large Linux clients
An old trick I used for many years: to investigate a problem filesystem, do a find in that filesystem. If the find dies, tsm definitly will die. I'll bet your find will die, and that's why your backup will die/hang or whatever also. A find will do a filestat on all files/dirs, actually the same the backup does. So your issue is OS related and not tsm. Cheers Henk () On Tuesday 29 March 2005 12:11, you wrote: On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
Thanks for the suggestion. However, this is not true. We already tried this. We did find . | wc -l to get the object count (1.1M) with no problems. But the backup still will not work. Constantly fails, in unpredictable/inconsistant places, with the same Producer Thread error. I spent 2+ days drilling through the various sub-directories (of this directory that causes the failures), one-by-one, and was able to backup 38 of the 40 subdirs, totalling over 980K objects, with out a problem. When I included these two other directories, in the same pile, the backup would fail. When I then went back and individually selected the sub-sub directories of these sub-directories (one at a time), I was able to backup *ALL* of the sub-sub directories, no problem. Then I went back and selected the upper-level directory and backed it up, no problem.. Let me draw a picture of the structure of these directories. The problem directories are in this directory: /coyote/dsk3/patients/prostateReOpt/Mount_0/ . If I try to backup the /Mount_0/ as a whole, crashes every time. If I point to sub-dirs below /Mount_0/ (40 of these - all with the same named 4-subsub dirs ), two of these cause a crash. I noted that these two both have 72K objects while the other 38 have less than 60K objects. Yet when I manually picked the 4-subsub dirs of the Patient_172 dir, the backup worked (sort of - see below). Same for the Patient_173. To really drive me crazy, the first attempt at backing up one of the subsub dirs under Patient_172, the backup crashed. Yet I could backup the other 3 with no issue. So, we started looking at the problem subdir and noticed a weird file name that ended in a tilde (~). When I excluded it, the backup ran. Then when I went back and picked just the file with the tilde, it backed up fine (my head is getting balder-and-balder !!). I then went back and re-selected the whole Patient_172 directory and it backed up (or at least scanned it since everything was backed-up) just fine !!!1 AGGH !! This is maddening and shows no rhyme-or-reason. Henk ten Have [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 04/01/2005 08:29 AM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients An old trick I used for many years: to investigate a problem filesystem, do a find in that filesystem. If the find dies, tsm definitly will die. I'll bet your find will die, and that's why your backup will die/hang or whatever also. A find will do a filestat on all files/dirs, actually the same the backup does. So your issue is OS related and not tsm. Cheers Henk () On Tuesday 29 March 2005 12:11, you wrote: On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
Ya, Sorry, I have no answers for you, but you do have my sympathy. I've had to do that kind of detective work before. Some times it is an oddly named file, a very very long-named file, or some times it's a file that somehow got a very bizarre date, like Apr 15 1904. In a few cases it has also been hung NFS mounts somewhere in the path. I've had to drill down each of the subdir one after another just like you did to figure it out, because there was no filename or other hints in the schedule or error logs, just a generic failed message. Luckily I only have to do it about once or twice a year, but it is time consuming. Ben -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Zoltan Forray/AC/VCU Sent: Friday, April 01, 2005 9:03 AM To: ADSM-L@VM.MARIST.EDU Subject: Re: Large Linux clients Thanks for the suggestion. However, this is not true. We already tried this. We did find . | wc -l to get the object count (1.1M) with no problems. But the backup still will not work. Constantly fails, in unpredictable/inconsistant places, with the same Producer Thread error. I spent 2+ days drilling through the various sub-directories (of this directory that causes the failures), one-by-one, and was able to backup 38 of the 40 subdirs, totalling over 980K objects, with out a problem. When I included these two other directories, in the same pile, the backup would fail. When I then went back and individually selected the sub-sub directories of these sub-directories (one at a time), I was able to backup *ALL* of the sub-sub directories, no problem. Then I went back and selected the upper-level directory and backed it up, no problem.. Let me draw a picture of the structure of these directories. The problem directories are in this directory: /coyote/dsk3/patients/prostateReOpt/Mount_0/ . If I try to backup the /Mount_0/ as a whole, crashes every time. If I point to sub-dirs below /Mount_0/ (40 of these - all with the same named 4-subsub dirs ), two of these cause a crash. I noted that these two both have 72K objects while the other 38 have less than 60K objects. Yet when I manually picked the 4-subsub dirs of the Patient_172 dir, the backup worked (sort of - see below). Same for the Patient_173. To really drive me crazy, the first attempt at backing up one of the subsub dirs under Patient_172, the backup crashed. Yet I could backup the other 3 with no issue. So, we started looking at the problem subdir and noticed a weird file name that ended in a tilde (~). When I excluded it, the backup ran. Then when I went back and picked just the file with the tilde, it backed up fine (my head is getting balder-and-balder !!). I then went back and re-selected the whole Patient_172 directory and it backed up (or at least scanned it since everything was backed-up) just fine !!!1 AGGH !! This is maddening and shows no rhyme-or-reason. Henk ten Have [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 04/01/2005 08:29 AM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients An old trick I used for many years: to investigate a problem filesystem, do a find in that filesystem. If the find dies, tsm definitly will die. I'll bet your find will die, and that's why your backup will die/hang or whatever also. A find will do a filestat on all files/dirs, actually the same the backup does. So your issue is OS related and not tsm. Cheers Henk () On Tuesday 29 March 2005 12:11, you wrote: On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
Thanks for the suggestion. We have tried it. Same results. Things just go to sleep ! Mark D. Rodriguez [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/28/2005 05:30 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients Zoltan, I am not sure if this will fix the problem or not. I have seen in the past when trying to backup directories (including sub-directories) with a large number of files that the system runs out of memory and either fails or hangs for ever. The one thing that I have done and has worked in some cases is to use the MEMORYEFfecientbackup option. It is a client side option and can be placed in the option file or called from the command line. I would try it and see if it helps. BTW, there is a downside to this and that is that backups will be slow however slow is still faster than not at all! Let us know if that helps. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
Zoltan, I had a similar problem on a Windows box with 5.4 million files. Tivoli said that I couldn't do the backup/restore with a 32 bit client because each file in the catalog takes 1k and the 32 bit program could only address 4 GB of memory. Here is a link they gave me: http://www-1.ibm.com/support/entdocview.wss?rs=0context=SSGSG7q1=1197172uid=swg21197172loc=en_UScs=utf-8lang=NotUpdateReferer= It doesn't quite address your problem if you are only considering 1.4 million files though. You may want to weigh in on virtualmountpoints. And ironically enough MEMORYEF didn't help at all for the backup part. I'm going to open a problem with Tivoli on this in May when I get the scenario setup. Oh and you probably know to check ulimits. -- Jonathan Zoltan Forray/AC/VCU wrote: Thanks for the suggestion. We have tried it. Same results. Things just go to sleep ! Mark D. Rodriguez [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/28/2005 05:30 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients Zoltan, I am not sure if this will fix the problem or not. I have seen in the past when trying to backup directories (including sub-directories) with a large number of files that the system runs out of memory and either fails or hangs for ever. The one thing that I have done and has worked in some cases is to use the MEMORYEFfecientbackup option. It is a client side option and can be placed in the option file or called from the command line. I would try it and see if it helps. BTW, there is a downside to this and that is that backups will be slow however slow is still faster than not at all! Let us know if that helps. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw. We have noticed that TSM becomes very inefficient for filesystems with over 1M files in them. We found that this seems to be a TSM server database issue. The same server instance performs very well for smaller filesystems. Another issue could be the filesystem itself. Some filesystems (ext2) are very bad at handeling very large directories. -- Met vriendelijke groeten, Remco Post SARA - Reken- en Netwerkdiensten http://www.sara.nl High Performance Computing Tel. +31 20 592 3000Fax. +31 20 668 3167 I really didn't foresee the Internet. But then, neither did the computer industry. Not that that tells us very much of course - the computer industry didn't even foresee that the century was going to end. -- Douglas Adams
Re: Large Linux clients
First, you should work with whoever owns that system in order to ensure that you can get the access you need to perform your investigations. When the backup appears to hang, what does the QUERY SESSION admin command show for this node? Is the TSM process consuming any CPU? Configure the client for a SERVICE trace by adding the following to dsm.opt: TRACEFILE tsmtrace.txt TRACEFLAGS SERVICE TRACEMAX 20 (for TRACEFILE, specify whatever directory and file name you want, enough to store a 20 MB file.) Then use the command line client (dsmc) to run an incremental backup against the problem file system and wait for the problem to reoccur. Check the QUERY SESSION output and CPU utilization for dsmc, as I mentioned above. You can view the trace file with a text editor, and look for the line that reads END OF DATA to see what the last thing the client was doing. Look and see if you have any recursive directory structures. Open a problem with IBM support and provide them with the results of the info I mentioned above (let me know, too). Regards, Andy Andy Raibeck IBM Software Group Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED] Internet e-mail: [EMAIL PROTECTED] The only dumb question is the one that goes unasked. The command line is your friend. Good enough is the enemy of excellence. ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28 15:22:05: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
I am going down this path, already. I have started doing some instrument traces. However, the results seem to show nothing, when backing up only specific sub-sub-subdirectories. What time accumulation there is, is in Solve Tree and/or Process Dirs. No big surprise, there. However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on. I have tried upping ResourceUtilization to4 (server side set to 6), MemoryEfficientBackup YES, all to no avail. I am going to try the other TRACE options you specify, to see if I get any more data. Andrew Raibeck [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 12:07 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients First, you should work with whoever owns that system in order to ensure that you can get the access you need to perform your investigations. When the backup appears to hang, what does the QUERY SESSION admin command show for this node? Is the TSM process consuming any CPU? Configure the client for a SERVICE trace by adding the following to dsm.opt: TRACEFILE tsmtrace.txt TRACEFLAGS SERVICE TRACEMAX 20 (for TRACEFILE, specify whatever directory and file name you want, enough to store a 20 MB file.) Then use the command line client (dsmc) to run an incremental backup against the problem file system and wait for the problem to reoccur. Check the QUERY SESSION output and CPU utilization for dsmc, as I mentioned above. You can view the trace file with a text editor, and look for the line that reads END OF DATA to see what the last thing the client was doing. Look and see if you have any recursive directory structures. Open a problem with IBM support and provide them with the results of the info I mentioned above (let me know, too). Regards, Andy Andy Raibeck IBM Software Group Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED] Internet e-mail: [EMAIL PROTECTED] The only dumb question is the one that goes unasked. The command line is your friend. Good enough is the enemy of excellence. ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28 15:22:05: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
Some more details. I added the TRACE options you recommended. While the backup still immediately fails, I got some more information. The Producer Thread failure now includes the detail: linux86/psunxthr.cpp (184). The only hit I get on these messages, is apar: IC30292. But the applicable components listed says: R42L PSY, which I read as 4.2 ? This client is 5.2.3.0 ! Andrew Raibeck [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 12:07 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients First, you should work with whoever owns that system in order to ensure that you can get the access you need to perform your investigations. When the backup appears to hang, what does the QUERY SESSION admin command show for this node? Is the TSM process consuming any CPU? Configure the client for a SERVICE trace by adding the following to dsm.opt: TRACEFILE tsmtrace.txt TRACEFLAGS SERVICE TRACEMAX 20 (for TRACEFILE, specify whatever directory and file name you want, enough to store a 20 MB file.) Then use the command line client (dsmc) to run an incremental backup against the problem file system and wait for the problem to reoccur. Check the QUERY SESSION output and CPU utilization for dsmc, as I mentioned above. You can view the trace file with a text editor, and look for the line that reads END OF DATA to see what the last thing the client was doing. Look and see if you have any recursive directory structures. Open a problem with IBM support and provide them with the results of the info I mentioned above (let me know, too). Regards, Andy Andy Raibeck IBM Software Group Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED] Internet e-mail: [EMAIL PROTECTED] The only dumb question is the one that goes unasked. The command line is your friend. Good enough is the enemy of excellence. ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU wrote on 2005-03-28 15:22:05: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
Here ya go. Pretty much no limits. I am open to suggestions on values to change that might help ! FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the OS. Also, while on the subject, I read the requirements on the 5.3.x client, that says it has only been tested on RH AS 3. Anyone try the V5.3 client on RH8 ? [EMAIL PROTECTED] root]# ulimit -a core file size(blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size (512 bytes, -p) 8 stack size(kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes(-u) 4092 virtual memory(kbytes, -v) unlimited Richard Sims [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 01:11 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote: Here ya go. Pretty much no limits. I am open to suggestions on values to change that might help ! I did recommend addressing the Stacksize to try to head off the defect... FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the OS. Also, while on the subject, I read the requirements on the 5.3.x client, that says it has only been tested on RH AS 3. Anyone try the V5.3 client on RH8 ? [EMAIL PROTECTED] root]# ulimit -a core file size(blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size (512 bytes, -p) 8 stack size(kbytes, -s) 8192 That's FAR from unlimited. cpu time (seconds, -t) unlimited max user processes(-u) 4092 virtual memory(kbytes, -v) unlimited Richard Sims [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 01:11 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
Did a ulimit -s unlimited. Dies the same way when trying to backup the /coyote/dsk3/ fs - Producer Thread Richard Sims [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 01:53 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote: Here ya go. Pretty much no limits. I am open to suggestions on values to change that might help ! I did recommend addressing the Stacksize to try to head off the defect... FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the OS. Also, while on the subject, I read the requirements on the 5.3.x client, that says it has only been tested on RH AS 3. Anyone try the V5.3 client on RH8 ? [EMAIL PROTECTED] root]# ulimit -a core file size(blocks, -c) 0 data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size (512 bytes, -p) 8 stack size(kbytes, -s) 8192 That's FAR from unlimited. cpu time (seconds, -t) unlimited max user processes(-u) 4092 virtual memory(kbytes, -v) unlimited Richard Sims [EMAIL PROTECTED] Sent by: ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU 03/29/2005 01:11 PM Please respond to ADSM: Dist Stor Manager ADSM-L@VM.MARIST.EDU To ADSM-L@VM.MARIST.EDU cc Subject Re: [ADSM-L] Large Linux clients On Mar 29, 2005, at 12:37 PM, Zoltan Forray/AC/VCU wrote: ...However, then I try to backup the tree at the third-level (e.g. /coyote/dsk3/), the client pretty much siezes immediately and dsmerror.log says B/A Txn Producer Thread, fatal error, Signal 11. The server shows the session as SendW and nothing going else going on Zoltan - Signal 11 is a segfault - a software failure. The client programming has a defect, which may be incited by a problem in that area of the file system (so have that investigated). A segfault can be induced by memory constraint, which in this context would most likely be Unix Resource Limits, so also enter the command 'limit' in Linux csh or tcsh and potentially boost the stack size ('unlimit stacksize'). This is to say that the client was probably invoked under artificially limited environmentals. Richard Sims
Re: Large Linux clients
On Mar 29, 2005, at 1:39 PM, Zoltan Forray/AC/VCU wrote: FWIW, this is RH8 as a Beowulf cluster, so NO, I can not upgrade the OS. Well, to be frank about it, you're using an unsupported version of Linux. That's a bit of a cop-out, I fear, but there may well be reasons that RH8 (and the Beowulf cluster you running) breaks something in the TSM client. The 5.3 version of the Linux client is less dependent upon the version of the kernel. You might give it a try. -- Mark Stapleton ([EMAIL PROTECTED]) Office 262.521.5627
Re: Large Linux clients
Zoltan, I am not sure if this will fix the problem or not. I have seen in the past when trying to backup directories (including sub-directories) with a large number of files that the system runs out of memory and either fails or hangs for ever. The one thing that I have done and has worked in some cases is to use the MEMORYEFfecientbackup option. It is a client side option and can be placed in the option file or called from the command line. I would try it and see if it helps. BTW, there is a downside to this and that is that backups will be slow however slow is still faster than not at all! Let us know if that helps. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
I would also definitely suggest Journaling after the first incremental backup completes, that would help negate the slower backups with memory efficient turned on. -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Mark D. Rodriguez Sent: Monday, March 28, 2005 4:30 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Large Linux clients Zoltan, I am not sure if this will fix the problem or not. I have seen in the past when trying to backup directories (including sub-directories) with a large number of files that the system runs out of memory and either fails or hangs for ever. The one thing that I have done and has worked in some cases is to use the MEMORYEFfecientbackup option. It is a client side option and can be placed in the option file or called from the command line. I would try it and see if it helps. BTW, there is a downside to this and that is that backups will be slow however slow is still faster than not at all! Let us know if that helps. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw. This message is intended only for the use of the Addressee and may contain information that is PRIVILEGED and CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify us immediately. Thank you.
Re: Large Linux clients
Some things to consider with large file systems, and Unix ones in particular: 1. Use CLI type backups rather than GUI type, for speed. 2. Divide and conquer: Very large file systems are conspicuous candidates for Virtualmountpoint TSM subdivision, which will greatly improve things overall. For that matter, oversized file systems are conspicuous candidates for splitting into multiple physical file systems. In suspect areas of Unix file systems I would interactively run a 'find' command with the -ls option to test the speed with which sub-trees of the file system can be traversed, to point out ringers. A single directory with a huge number of files can induce a lot of overhead, and is ripe for re-architecting. The big problem in all file systems is that they are simply created and then ignored, and the users of those file systems fill them with whatever they want, organized at whim in most cases. The architecting of file systems for performance and organizational sanity is a greatly overlooked subject area. Richard Sims On Mar 28, 2005, at 5:22 PM, Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw.
Re: Large Linux clients
Andrew, This is a Linux client. I do not believe that journal backups are supported under Linux. As far as I know it is a windows only thing or at least thats what all the documentation says anyway. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Meadows, Andrew wrote: I would also definitely suggest Journaling after the first incremental backup completes, that would help negate the slower backups with memory efficient turned on. -Original Message- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Mark D. Rodriguez Sent: Monday, March 28, 2005 4:30 PM To: ADSM-L@VM.MARIST.EDU Subject: Re: Large Linux clients Zoltan, I am not sure if this will fix the problem or not. I have seen in the past when trying to backup directories (including sub-directories) with a large number of files that the system runs out of memory and either fails or hangs for ever. The one thing that I have done and has worked in some cases is to use the MEMORYEFfecientbackup option. It is a client side option and can be placed in the option file or called from the command line. I would try it and see if it helps. BTW, there is a downside to this and that is that backups will be slow however slow is still faster than not at all! Let us know if that helps. -- Regards, Mark D. Rodriguez President MDR Consulting, Inc. === MDR Consulting The very best in Technical Training and Consulting. IBM Advanced Business Partner SAIR Linux and GNU Authorized Center for Education IBM Certified Advanced Technical Expert, CATE AIX Support and Performance Tuning, RS6000 SP, TSM/ADSM and Linux Red Hat Certified Engineer, RHCE === Zoltan Forray/AC/VCU wrote: I am having issues backing up a large Linux server (client=5.2.3.0). The TSM server is also on a RH Linux box (5.2.2.5). This system has over 4.6M objects. A standard incremental WILL NOT complete successfully. It usually hangs/times-out/etc. The troubles seem to be related to one particular directory with 40-subdirs, comprising 1.4M objects (from the box owner). If I point to this directory as a whole (via the web ba-client), and try to back it up in one shot, it displays the inspecting objects message and then never comes back. If I drill down further and select the subdirs in groups of 10, it seems to back them up, with no problem. So, one question I have is, anyone out there backing up large Linux systems, similar to this ? Any suggestions on what the problem could be. Currently, I do not have access to the error-log files since this is a protected/firewalled system and I don't have the id/pw. This message is intended only for the use of the Addressee and may contain information that is PRIVILEGED and CONFIDENTIAL. If you are not the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please erase all copies of the message and its attachments and notify us immediately. Thank you.