Re: amanda 3.3.3 "too many files"

2013-06-07 Thread Brian Cuttler

Jean-Louis,
Jon,

I've updated my amanda.conf to use auth="local" for the
dumptypes I have in use in my disklist.

> ulimit
unlimited

Per solaris instructions...
> echo 'rlim_fd_max/d' | mdb -k
rlim_fd_max:
rlim_fd_max:0

> amcheck finsen
Amanda Tape Server Host Check
-
Holding disk /lstripe: 546222 MB disk space available, using 546122 MB
slot 9: volume 'Finsen31'
Will write to volume 'Finsen31' in slot 9.
NOTE: skipping tape-writable test
NOTE: info dir 
/usr/local/etc/amanda/finsen/DailySet1/curinfo/finsen/_export2_samba_maldi does 
not exist
NOTE: it will be created on the next run.
NOTE: index dir 
/usr/local/etc/amanda/finsen/DailySet1/index/finsen/_export2_samba_maldi does 
not exist
NOTE: it will be created on the next run.
Server check took 4.691 seconds

Amanda Backup Client Hosts Check

ERROR: finsen: service selfcheck: selfcheck: Error opening pipe to child: Too 
many open files
ERROR: finsen: service /usr/local/libexec/amanda/selfcheck failed: pid 5457 
exited with code 1
Client check: 1 host checked in 130.727 seconds.  2 problems found.

(brought to you by Amanda 3.3.3)

The new DLE is fact did cause the retained snapshot to change by
one DLE, in alpha order. It is (re)verified that this is not random
and is tied to list position.

So much for the solaris run time work-around.

export LD_PRELOAD_32 /usr/lib/extendedFILE.so.1

then run amcheck.
> amcheck finsen
ld.so.1: amcheck: warning: /usr/lib/extendedFILE.so.1: open failed: illegal 
insecure pathname
Amanda Tape Server Host Check
-
Holding disk /lstripe: 546222 MB disk space available, using 546122 MB
slot 9: volume 'Finsen31'

FILE.so.1: open failed: illegal insecure pathname
ERROR: finsen: Application 'amgtar': can't run support command
ERROR: finsen: Application 'amgtar': ld.so.1: amgtar: warning: 
/usr/lib/extendedFILE.so.1: open failed: illegal insecure pathname
ERROR: finsen: Application 'amgtar': can't run support command
ERROR: finsen: Application 'amgtar': ld.so.1: amgtar: warning: 
/usr/lib/extendedFILE.so.1: open failed: illegal insecure pathname
ERROR: finsen: Application 'amgtar': can't run support command

related to suid programs?

Don't want to make further changes before the weekend, think I'll
implement auth="local" for amdump on Monday and see how it performs.


thank you,

Brian




On Wed, Jun 05, 2013 at 01:41:16PM -0400, Brian Cuttler wrote:
> 
> Jean-Louis,
> 
> Yes, I did find some information on a run time mechanism to
> increase the 256 file limit (file limit stored in unsigned character).
> 
> The work-around employes requires the exection of /usr/lib/extendedFILE.so.1
> prior to the binary being executed.
> 
> Following up on your maxcheck and Spindle number, I wonder if I 
> couldn't automatically build an alternate disklist file with 
> spindle number and swap it in and out. It would have to be done
> dynamically (since my disklist changes and making changes in 
> multiple locations is error prone), but that can be scripted and
> called from cron.
> 
> /* I need something that will handle both formats of DLE
>  *
> finsen  /export2 zfs-snapshot2
> finsen  /export/home-AZ /export/home   {
> user-tar2
> include "./[A-Z]*"
> }
>  *
>  */
> 
> Since this is an amanda-client issue, rather than an amanda server
> issue, I need to ask you, how to execute this on the client-side
> before attempting to check the DLE list. Is there a way to invoke
> this from the amanda daemon?
> 
>  - Alternatively, if someone better versed than I am on the Solaris
>inetd or in SMF knows how to insert the requisit command on the
>client side - I would be appreciative if they would share their
>information.
> 
>   thank you,
> 
>   Brian
> 
> 
> On Wed, Jun 05, 2013 at 11:54:35AM -0400, Jean-Louis Martineau wrote:
> > Brian,
> > 
> > Can you increase the number of open files at the system level?
> > 
> > amcheck check all DLEs in parallel, you can try to add spindle (in the 
> > disklist) to reduce parallelism but that can have a bad impact on dump 
> > performance, so it is not a good workaround.
> > 
> > You would like a maxcheck  setting similar to maxdump, I put it in my 
> > TODO list.
> > 
> > Jean-Louis
> > 
> > On 06/05/2013 11:05 AM, Brian Cuttler wrote:
> > >Hello amanda users,
> > >
> > >I just updates amanda 3.3.0 to 3.3.0 on a Solaris 10/x86 system.
> > >The system is both the server and the client, there are no other
> > >clients of this system.
> > >
> > >We have ~265 DLEs on this system (large zfs arrays and all
> > >samba shares are their own file systems and DLE, thank goodness
> > >I was able to talk my manager out of making all user directories
> > >their own DLE as well, though they are their own zfs f

Re: amflush doesn't run. at all

2013-06-07 Thread Jean-Louis Martineau

On 06/07/2013 11:02 AM, Abilio Carvalho wrote:

Does this look familiar at all? The strace of "amflush BBPTape" at one point 
just becomes an endless (literally. I think it cycles endlessly, but I can't info the 
pattern, it's too large) series of these. Not only this machine, mind, every single DLE 
that's configured will show up in the log like this:


stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0
stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0
open("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 O_RDONLY) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)   = 0
read(5, "AMANDA: CONT_FILE 20130417171040"..., 32768) = 32768
close(5^C)= 0
Can you print a little more so I can see what repeat, do the number 
(17120) increase?


You have a lot of file in the holding disk, you should use a larger 
chunksize.

Lot of files == slow, slower, ...

Jean-Louis


Thanks. This issue is driving me mad, and the point is fast approaching where 
my boss just nixes amanda. (not its fault, really, I'm just bad at this)

Cheers
Abilio

On May 16, 2013, at 2:00 PM, Jean-Louis Martineau  wrote:


Not creating the debug files is strange as it is one of the first thing it do.

Use strace or a debugger (gdb) to find where it hang.

Jean-Louis

On 05/16/2013 05:16 AM, Abilio Carvalho wrote:

Hi. still slowly trying to get my backup system to work, it's going OK but for 
some reason, whenever I get some dumps stuck in holding, I can't get those to 
tape at all. amflush just seems to hang immediately after I run it. amflush -fs 
also gives me exactly nothing to work with. Nothing in any logs anywhere that I 
can see.

Similarly, amadmin holding list hangs in exactly the same way.

I'm definitely not good enough yet to debug this by myself, can anyone help? 
with no logs or error messages or whatever I wouldn't even know where to start.

Cheers
Abilio
**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
Clearswift for the presence of computer viruses.

www.clearswift.com
**





Re: amflush doesn't run. at all

2013-06-07 Thread Abilio Carvalho
Does this look familiar at all? The strace of "amflush BBPTape" at one point 
just becomes an endless (literally. I think it cycles endlessly, but I can't 
info the pattern, it's too large) series of these. Not only this machine, mind, 
every single DLE that's configured will show up in the log like this:


stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0
stat("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 {st_mode=S_IFREG|0600, st_size=1048576, ...}) = 0
open("/amanda/holding/20130417171040/.bbp.ch._dev_mapper_igtgrp-igtlv.0.17120",
 O_RDONLY) = 5
fcntl(5, F_SETFD, FD_CLOEXEC)   = 0
read(5, "AMANDA: CONT_FILE 20130417171040"..., 32768) = 32768
close(5^C)= 0


Thanks. This issue is driving me mad, and the point is fast approaching where 
my boss just nixes amanda. (not its fault, really, I'm just bad at this)

Cheers
Abilio

On May 16, 2013, at 2:00 PM, Jean-Louis Martineau  wrote:

> 
> Not creating the debug files is strange as it is one of the first thing it do.
> 
> Use strace or a debugger (gdb) to find where it hang.
> 
> Jean-Louis
> 
> On 05/16/2013 05:16 AM, Abilio Carvalho wrote:
>> Hi. still slowly trying to get my backup system to work, it's going OK but 
>> for some reason, whenever I get some dumps stuck in holding, I can't get 
>> those to tape at all. amflush just seems to hang immediately after I run it. 
>> amflush -fs also gives me exactly nothing to work with. Nothing in any logs 
>> anywhere that I can see.
>> 
>> Similarly, amadmin holding list hangs in exactly the same way.
>> 
>> I'm definitely not good enough yet to debug this by myself, can anyone help? 
>> with no logs or error messages or whatever I wouldn't even know where to 
>> start.
>> 
>> Cheers
>> Abilio
>> **
>> This email and any files transmitted with it are confidential and
>> intended solely for the use of the individual or entity to whom they
>> are addressed. If you have received this email in error please notify
>> the system manager.
>> 
>> This footnote also confirms that this email message has been swept by
>> Clearswift for the presence of computer viruses.
>> 
>> www.clearswift.com
>> **
>> 
> 




Re: amanda 3.3.3 "too many files"

2013-06-07 Thread Brian Cuttler


Jean-Louis,

added a couple of switches to # ls, got a much more informative output.

[finsen]: /proc/734/fd > ls -F -C /proc/10832/fd
0=  1=  10  12|  13|  16|  17|  2=  20|  21|  3>  6|  8|




On Thu, Jun 06, 2013 at 11:09:20AM -0400, Jean-Louis Martineau wrote:
> On 06/05/2013 11:54 AM, Jean-Louis Martineau wrote:
> >Brian,
> >
> >Can you increase the number of open files at the system level?
> >
> >amcheck check all DLEs in parallel, you can try to add spindle (in the 
> >disklist) to reduce parallelism but that can have a bad impact on dump 
> >performance, so it is not a good workaround.
> 
> Forget that idea, adding spindle will not help.
> 
> I think the problem is a file descriptor leak (files not closed), but it 
> can be in any process.
> Can you monitor all opened file for all amanda processes?
> I don't know how to do it with Solaris, but you 'ls /proc/PID/fd' on linux.
> It will help to find which process leak.
> 
> Jean-Louis
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773



amanda 3.3.3 not unwinding

2013-06-07 Thread Brian Cuttler

Installed Amanda 3.3.0 on solaris 10/x86 two days ago.
Have found that both amdumps since did not complete normally.

While completing all DLE and sending the report.

> amstatus finsen
Using /usr/local/etc/amanda/finsen/DailySet1/amdump
>From Thu Jun  6 18:30:00 EDT 2013



finsen:/0 43283m estimate done
finsen:/export  0 20448m estimate done
finsen:/export/home-A   0   118m estimate done
finsen:/export/home-AZ  0 0m estimate done

 < MANY LINES REMOVED>

finsen:hp10p/flyshare   0   403m estimate done
finsen:hp10p/grifadmin  0   783m estimate done
finsen:hp10p/hiu0 74691m estimate done
finsen:hp10p/hiu2   0 22763m estimate done
finsen:hp10p/ivcp   0 26015m estimate done
finsen:hp10p/virologypt 015m estimate done

SUMMARY  part  real  estimated
   size   size
partition   : 265
estimated   : 265  3902078m
flush   :   0 0m
failed  :   00m   (  0.00%)
wait for dumping:   00m   (  0.00%)
dumping to tape :   00m   (  0.00%)
dumping :   0 0m 0m (  0.00%) (  0.00%)
dumped  :   0 0m 0m (  0.00%) (  0.00%)
wait for writing:   0 0m 0m (  0.00%) (  0.00%)
wait to flush   :   0 0m 0m (100.00%) (  0.00%)
writing to tape :   0 0m 0m (  0.00%) (  0.00%)
failed to tape  :   0 0m 0m (  0.00%) (  0.00%)
taped   :   0 0m 0m (  0.00%) (  0.00%)
12 dumpers idle : not-idle
taper status: Idle
taper qlen: 0
network free kps:   800
holding space   :546122m (100.00%)
 0 dumpers busy :  0:00:05  (100.00%)not-idle:  0:00:05  (100.00%)

we where left with processes that did not unwind.

> ps -ef | grep amanda
  amanda 16257 16256  13 18:30:01 ? 879:30 
/usr/local/libexec/amanda/planner finsen --starttime 20130606183000
  amanda 16271 16258   0 18:30:01 ?   0:00 dumper11 finsen
  amanda 16267 16258   0 18:30:01 ?   0:00 dumper7 finsen
  amanda 16263 16258   0 18:30:01 ?   0:00 dumper3 finsen
  amanda 27743  8729   0 09:11:17 pts/14  0:00 -tcsh
  amanda 16270 16258   0 18:30:01 ?   0:00 dumper10 finsen
  amanda 16260 16258   0 18:30:01 ?   0:00 dumper0 finsen
  amanda 16262 16258   0 18:30:01 ?   0:00 dumper2 finsen
  amanda 27766 27743   0 09:11:45 pts/14  0:00 grep amanda
  amanda 16268 16258   0 18:30:01 ?   0:00 dumper8 finsen
  amanda 27765 27743   0 09:11:45 pts/14  0:00 ps -ef
  amanda 16256 16253   0 18:30:01 ?   0:00 /usr/local/bin/perl 
/usr/local/sbin/amdump finsen
  amanda 16259 16258   0 18:30:01 ?   0:00 /usr/local/bin/perl 
/usr/local/libexec/amanda/taper finsen
  amanda 16266 16258   0 18:30:01 ?   0:00 dumper6 finsen
  amanda 16264 16258   0 18:30:01 ?   0:00 dumper4 finsen
  amanda 16261 16258   0 18:30:01 ?   0:00 dumper1 finsen
  amanda 16258 16256   0 18:30:01 ?   0:00 
/usr/local/libexec/amanda/driver finsen
  amanda 16253   541   0 18:30:01 ?   0:00 sh -c /usr/local/sbin/amdump 
 finsen
  amanda 16265 16258   0 18:30:01 ?   0:00 dumper5 finsen
  amanda 16269 16258   0 18:30:01 ?   0:00 dumper9 finsen


This is new behavior since amanda 3.3.0 which was the previous
version on this system.

Amanda server has only one client, itself.

I'm not sure where to even start unraveling this.


thank you,

Brian
---
   Brian R Cuttler brian.cutt...@wadsworth.org
   Computer Systems Support(v) 518 486-1697
   Wadsworth Center(f) 518 473-6384
   NYS Department of HealthHelp Desk 518 473-0773