Re: amanda 3.3.3 "too many files"
Jean-Louis, Jon, I've updated my amanda.conf to use auth="local" for the dumptypes I have in use in my disklist. > ulimit unlimited Per solaris instructions... > echo 'rlim_fd_max/d' | mdb -k rlim_fd_max: rlim_fd_max:0 > amcheck finsen Amanda Tape Server Host Check - Holding disk /lstripe: 546222 MB disk space available, using 546122 MB slot 9: volume 'Finsen31' Will write to volume 'Finsen31' in slot 9. NOTE: skipping tape-writable test NOTE: info dir /usr/local/etc/amanda/finsen/DailySet1/curinfo/finsen/_export2_samba_maldi does not exist NOTE: it will be created on the next run. NOTE: index dir /usr/local/etc/amanda/finsen/DailySet1/index/finsen/_export2_samba_maldi does not exist NOTE: it will be created on the next run. Server check took 4.691 seconds Amanda Backup Client Hosts Check ERROR: finsen: service selfcheck: selfcheck: Error opening pipe to child: Too many open files ERROR: finsen: service /usr/local/libexec/amanda/selfcheck failed: pid 5457 exited with code 1 Client check: 1 host checked in 130.727 seconds. 2 problems found. (brought to you by Amanda 3.3.3) The new DLE is fact did cause the retained snapshot to change by one DLE, in alpha order. It is (re)verified that this is not random and is tied to list position. So much for the solaris run time work-around. export LD_PRELOAD_32 /usr/lib/extendedFILE.so.1 then run amcheck. > amcheck finsen ld.so.1: amcheck: warning: /usr/lib/extendedFILE.so.1: open failed: illegal insecure pathname Amanda Tape Server Host Check - Holding disk /lstripe: 546222 MB disk space available, using 546122 MB slot 9: volume 'Finsen31' FILE.so.1: open failed: illegal insecure pathname ERROR: finsen: Application 'amgtar': can't run support command ERROR: finsen: Application 'amgtar': ld.so.1: amgtar: warning: /usr/lib/extendedFILE.so.1: open failed: illegal insecure pathname ERROR: finsen: Application 'amgtar': can't run support command ERROR: finsen: Application 'amgtar': ld.so.1: amgtar: warning: /usr/lib/extendedFILE.so.1: open failed: illegal insecure pathname ERROR: finsen: Application 'amgtar': can't run support command related to suid programs? Don't want to make further changes before the weekend, think I'll implement auth="local" for amdump on Monday and see how it performs. thank you, Brian On Wed, Jun 05, 2013 at 01:41:16PM -0400, Brian Cuttler wrote: > > Jean-Louis, > > Yes, I did find some information on a run time mechanism to > increase the 256 file limit (file limit stored in unsigned character). > > The work-around employes requires the exection of /usr/lib/extendedFILE.so.1 > prior to the binary being executed. > > Following up on your maxcheck and Spindle number, I wonder if I > couldn't automatically build an alternate disklist file with > spindle number and swap it in and out. It would have to be done > dynamically (since my disklist changes and making changes in > multiple locations is error prone), but that can be scripted and > called from cron. > > /* I need something that will handle both formats of DLE > * > finsen /export2 zfs-snapshot2 > finsen /export/home-AZ /export/home { > user-tar2 > include "./[A-Z]*" > } > * > */ > > Since this is an amanda-client issue, rather than an amanda server > issue, I need to ask you, how to execute this on the client-side > before attempting to check the DLE list. Is there a way to invoke > this from the amanda daemon? > > - Alternatively, if someone better versed than I am on the Solaris >inetd or in SMF knows how to insert the requisit command on the >client side - I would be appreciative if they would share their >information. > > thank you, > > Brian > > > On Wed, Jun 05, 2013 at 11:54:35AM -0400, Jean-Louis Martineau wrote: > > Brian, > > > > Can you increase the number of open files at the system level? > > > > amcheck check all DLEs in parallel, you can try to add spindle (in the > > disklist) to reduce parallelism but that can have a bad impact on dump > > performance, so it is not a good workaround. > > > > You would like a maxcheck setting similar to maxdump, I put it in my > > TODO list. > > > > Jean-Louis > > > > On 06/05/2013 11:05 AM, Brian Cuttler wrote: > > >Hello amanda users, > > > > > >I just updates amanda 3.3.0 to 3.3.0 on a Solaris 10/x86 system. > > >The system is both the server and the client, there are no other > > >clients of this system. > > > > > >We have ~265 DLEs on this system (large zfs arrays and all > > >samba shares are their own file systems and DLE, thank goodness > > >I was able to talk my manager out of making all user directories > > >their own DLE as well, though they are their own zfs f
Re: amflush doesn't run. at all
On 06/07/2013 11:02 AM, Abilio Carvalho wrote: Does this look familiar at all? The strace of "amflush BBPTape" at one point just becomes an endless (literally. I think it cycles endlessly, but I can't info the pattern, it's too large) series of these. Not only this machine, mind, every single DLE that's configured will show up in the log like this: stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0 stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0 open("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", O_RDONLY) = 5 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 read(5, "AMANDA: CONT_FILE 20130417171040"..., 32768) = 32768 close(5^C)= 0 Can you print a little more so I can see what repeat, do the number (17120) increase? You have a lot of file in the holding disk, you should use a larger chunksize. Lot of files == slow, slower, ... Jean-Louis Thanks. This issue is driving me mad, and the point is fast approaching where my boss just nixes amanda. (not its fault, really, I'm just bad at this) Cheers Abilio On May 16, 2013, at 2:00 PM, Jean-Louis Martineau wrote: Not creating the debug files is strange as it is one of the first thing it do. Use strace or a debugger (gdb) to find where it hang. Jean-Louis On 05/16/2013 05:16 AM, Abilio Carvalho wrote: Hi. still slowly trying to get my backup system to work, it's going OK but for some reason, whenever I get some dumps stuck in holding, I can't get those to tape at all. amflush just seems to hang immediately after I run it. amflush -fs also gives me exactly nothing to work with. Nothing in any logs anywhere that I can see. Similarly, amadmin holding list hangs in exactly the same way. I'm definitely not good enough yet to debug this by myself, can anyone help? with no logs or error messages or whatever I wouldn't even know where to start. Cheers Abilio ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by Clearswift for the presence of computer viruses. www.clearswift.com **
Re: amflush doesn't run. at all
Does this look familiar at all? The strace of "amflush BBPTape" at one point just becomes an endless (literally. I think it cycles endlessly, but I can't info the pattern, it's too large) series of these. Not only this machine, mind, every single DLE that's configured will show up in the log like this: stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0 stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0 open("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120", O_RDONLY) = 5 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 read(5, "AMANDA: CONT_FILE 20130417171040"..., 32768) = 32768 close(5^C)= 0 Thanks. This issue is driving me mad, and the point is fast approaching where my boss just nixes amanda. (not its fault, really, I'm just bad at this) Cheers Abilio On May 16, 2013, at 2:00 PM, Jean-Louis Martineau wrote: > > Not creating the debug files is strange as it is one of the first thing it do. > > Use strace or a debugger (gdb) to find where it hang. > > Jean-Louis > > On 05/16/2013 05:16 AM, Abilio Carvalho wrote: >> Hi. still slowly trying to get my backup system to work, it's going OK but >> for some reason, whenever I get some dumps stuck in holding, I can't get >> those to tape at all. amflush just seems to hang immediately after I run it. >> amflush -fs also gives me exactly nothing to work with. Nothing in any logs >> anywhere that I can see. >> >> Similarly, amadmin holding list hangs in exactly the same way. >> >> I'm definitely not good enough yet to debug this by myself, can anyone help? >> with no logs or error messages or whatever I wouldn't even know where to >> start. >> >> Cheers >> Abilio >> ** >> This email and any files transmitted with it are confidential and >> intended solely for the use of the individual or entity to whom they >> are addressed. If you have received this email in error please notify >> the system manager. >> >> This footnote also confirms that this email message has been swept by >> Clearswift for the presence of computer viruses. >> >> www.clearswift.com >> ** >> >
Re: amanda 3.3.3 "too many files"
Jean-Louis, added a couple of switches to # ls, got a much more informative output. [finsen]: /proc/734/fd > ls -F -C /proc/10832/fd 0= 1= 10 12| 13| 16| 17| 2= 20| 21| 3> 6| 8| On Thu, Jun 06, 2013 at 11:09:20AM -0400, Jean-Louis Martineau wrote: > On 06/05/2013 11:54 AM, Jean-Louis Martineau wrote: > >Brian, > > > >Can you increase the number of open files at the system level? > > > >amcheck check all DLEs in parallel, you can try to add spindle (in the > >disklist) to reduce parallelism but that can have a bad impact on dump > >performance, so it is not a good workaround. > > Forget that idea, adding spindle will not help. > > I think the problem is a file descriptor leak (files not closed), but it > can be in any process. > Can you monitor all opened file for all amanda processes? > I don't know how to do it with Solaris, but you 'ls /proc/PID/fd' on linux. > It will help to find which process leak. > > Jean-Louis --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773
amanda 3.3.3 not unwinding
Installed Amanda 3.3.0 on solaris 10/x86 two days ago. Have found that both amdumps since did not complete normally. While completing all DLE and sending the report. > amstatus finsen Using /usr/local/etc/amanda/finsen/DailySet1/amdump >From Thu Jun 6 18:30:00 EDT 2013 finsen:/0 43283m estimate done finsen:/export 0 20448m estimate done finsen:/export/home-A 0 118m estimate done finsen:/export/home-AZ 0 0m estimate done < MANY LINES REMOVED> finsen:hp10p/flyshare 0 403m estimate done finsen:hp10p/grifadmin 0 783m estimate done finsen:hp10p/hiu0 74691m estimate done finsen:hp10p/hiu2 0 22763m estimate done finsen:hp10p/ivcp 0 26015m estimate done finsen:hp10p/virologypt 015m estimate done SUMMARY part real estimated size size partition : 265 estimated : 265 3902078m flush : 0 0m failed : 00m ( 0.00%) wait for dumping: 00m ( 0.00%) dumping to tape : 00m ( 0.00%) dumping : 0 0m 0m ( 0.00%) ( 0.00%) dumped : 0 0m 0m ( 0.00%) ( 0.00%) wait for writing: 0 0m 0m ( 0.00%) ( 0.00%) wait to flush : 0 0m 0m (100.00%) ( 0.00%) writing to tape : 0 0m 0m ( 0.00%) ( 0.00%) failed to tape : 0 0m 0m ( 0.00%) ( 0.00%) taped : 0 0m 0m ( 0.00%) ( 0.00%) 12 dumpers idle : not-idle taper status: Idle taper qlen: 0 network free kps: 800 holding space :546122m (100.00%) 0 dumpers busy : 0:00:05 (100.00%)not-idle: 0:00:05 (100.00%) we where left with processes that did not unwind. > ps -ef | grep amanda amanda 16257 16256 13 18:30:01 ? 879:30 /usr/local/libexec/amanda/planner finsen --starttime 20130606183000 amanda 16271 16258 0 18:30:01 ? 0:00 dumper11 finsen amanda 16267 16258 0 18:30:01 ? 0:00 dumper7 finsen amanda 16263 16258 0 18:30:01 ? 0:00 dumper3 finsen amanda 27743 8729 0 09:11:17 pts/14 0:00 -tcsh amanda 16270 16258 0 18:30:01 ? 0:00 dumper10 finsen amanda 16260 16258 0 18:30:01 ? 0:00 dumper0 finsen amanda 16262 16258 0 18:30:01 ? 0:00 dumper2 finsen amanda 27766 27743 0 09:11:45 pts/14 0:00 grep amanda amanda 16268 16258 0 18:30:01 ? 0:00 dumper8 finsen amanda 27765 27743 0 09:11:45 pts/14 0:00 ps -ef amanda 16256 16253 0 18:30:01 ? 0:00 /usr/local/bin/perl /usr/local/sbin/amdump finsen amanda 16259 16258 0 18:30:01 ? 0:00 /usr/local/bin/perl /usr/local/libexec/amanda/taper finsen amanda 16266 16258 0 18:30:01 ? 0:00 dumper6 finsen amanda 16264 16258 0 18:30:01 ? 0:00 dumper4 finsen amanda 16261 16258 0 18:30:01 ? 0:00 dumper1 finsen amanda 16258 16256 0 18:30:01 ? 0:00 /usr/local/libexec/amanda/driver finsen amanda 16253 541 0 18:30:01 ? 0:00 sh -c /usr/local/sbin/amdump finsen amanda 16265 16258 0 18:30:01 ? 0:00 dumper5 finsen amanda 16269 16258 0 18:30:01 ? 0:00 dumper9 finsen This is new behavior since amanda 3.3.0 which was the previous version on this system. Amanda server has only one client, itself. I'm not sure where to even start unraveling this. thank you, Brian --- Brian R Cuttler brian.cutt...@wadsworth.org Computer Systems Support(v) 518 486-1697 Wadsworth Center(f) 518 473-6384 NYS Department of HealthHelp Desk 518 473-0773