Deb,

The CRON issue happens with, and without the two clients who fail.

All of my normal testing, and manual runs are done with the two problem hosts disabled, because they are known problems, and I figure it is due to large amounts of files on both of them causing a timeout.

I am working on a separate solution to see if I can get a better method of backup on those two hosts. They are both VM's and one is notoriously slow in disk reads, in addition to a large amount of files in the directory that is backed up.
Two other similar VM's backup properly without issue.

Regards,
Seann

On 8/17/2015 3:05 PM, Debra S Baddorf wrote:
Does this include the two clients who fail — do THEY also say that their 
estimates are complete?   Or are they still working on estimates, and thus 
holding up the whole works?  All of the estimates seem to need to finish,  
before anybody gets to start.
Deb Baddorf
Fermilab

On Aug 17, 2015, at 2:33 PM, Seann <nombran...@tsukinokage.net> wrote:

All,

I am looking for a little direction on a problem that has cropped up for me 
recently.

I have a backup set, that was created using Amanda 2.5 (default on CentOS 5.11) 
and ran very well, both manually and from the cron job I had set for it.
It has approximately 13 hosts to backup, from as simple as backing up a single 
directory, to backing up the full system, and it ran with no issues on CentOS 
5.11.
The basic setup is using hard drives as the backup media, compressing the 
backups to save space, using server compression, these also use GNU-TAR as the 
archive format.

Fast forward to today, I have the system upgraded to CentOS 7, which also 
upgraded to Amanda 3.3.3-13, and after some configuration file re-writing, I 
got most of the backups to work.
Two systems, one backing up the web directory, the other backing up the full 
disk, fail constantly.
When these two disklist statements are removed, the backup runs, and takes 
approximately 2 and a half hours to run on the 8 other hosts (the other 3 hosts 
are currently offline and not in scope).

When the CRON job kicks off at midnight, it runs for over 12 hours (I have the 
etimeout set to one day, as the planner kept dying saying to timed out).
This is the same basic error that I get with the two above mentioned failing 
backups.

When the hung backup job is running, I see the dumpers and main dump process running on 
the backup server, but nothing in the logs outside of the "We started the backup 
job" type of log messages.
On all of the hosts, I don't see the client running, nor to I see any TAR 
processes running.
There are also no clues in the logs on which host the server is waiting on, and 
checking all the hosts in scope show they are all in the same state, that is 
they have sent the estimate to the backup server and are waiting on the next 
phase.


Any help on this would be appreciated, and also is there a better way of making 
sense of the logs (such as using something like Graylog2?), and on reporting 
for issues with Amanda 3.3?


Regards,
Seann


--

Regards,
Seann

This message is confidential. It may also be privileged or otherwise protected 
by work product immunity or other legal rules. If you have received it by 
mistake, please let us know by e-mail reply and delete it from your system; you 
may not copy this message or disclose its contents to anyone. Please send us by 
fax any message containing deadlines as incoming e-mails are not screened for 
response deadlines. The integrity and security of this message cannot be 
guaranteed on the Internet.
_____________________________________________________________________

This email and any files transmitted with it are confidential and intended 
solely for the use of the individual or entity to whom they are addressed. If 
you have received this email in error please notify the system manager. Please 
note that any views or opinions presented in this email are solely those of the 
author and do not necessarily represent those of the company. Finally, the 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email.

Tsukinokage.net Omaha, Nebraska



Reply via email to