Re: Expecting new tape...

2008-06-10 Thread Robert Kuropkat


Jon LaBadie wrote:
> On Mon, Jun 09, 2008 at 10:39:16PM -0400, Robert Kuropkat wrote:
>> Did something bad, but not sure what.  I had a tape with a bar code 
>> label but apparently no Amanda label.  So I labeled it and hoped the 
>> next backup would write to it.  It did not.  It was tape #28 and I had 
>> already passed that number in the sequence (Tape #30).  So I figured I 
>> did something goofy and just loaded the next set.  Unfortunately, it 
>> still says it needs a new tape.
>>
>> Unfortunately, I was confident I knew what I was doing, so I also made 
>> other changes while I was there.  I had taken the top 10 tapes out of 
>> the sequence reducing it from 100 to 90.  So I took the top ten entries 
>> out of the tapelist file and changed the tapecycle entry in amanda.conf 
>> to 90.
>>
>> I'm new to Amanda and inherited this setup so aside from posting every 
>> config file, I'm not sure what the relevant entries to post would be.
>>
> 
> Couple of things.  First the entries in the tapelist file can be
> active or inactive as indicated by the "reuse/noreuse" tag.  If
> you are getting a "need a new tape" message, the number of "reuse"
> tapes is less than the tapecycle value.
> 
> Second, tapecycle need not match the actual number of tapes
> in rotation.  It must be equal to or less.  By having it less,
> when a tape is damaged or archived (and marked "noreuse" in
> tapelist) you will not get the dreaded "need a new tape".
> 
> Third, amanda does not use the tapes in human defined numeric
> order.  If it originally saw the tapes in the order 1,3,5,99,
> that is the order it will expect them in the future.  And there
> are several ways the expected order can be affected later on.
> 
> So, even thought tape 30 has been passed, tape 28's turn may
> be coming up.  Alternatively, as you labelled it just recently,
> it may be the "new tape" that amanda is seeking.  If there
> are 90 tapes listed as "reuse" in tapelist and tape 28 is the
> only one without a date last used, that is likely the case.
> 
> jl


ooh, I think you nailed it.  One of the reasons I was taking the top 10
out of the cycle was because I wanted to fill in some holes where tapes
had gotten damaged and because I had 6 sets of 15 and 1 set of 10 tapes.
 Unfortunately, I had not yet added those tapes in so I had a tape cycle
of 90 and only 87 tapes in the tape list.

Running it now to see if that works.  If so, I will add a couple more
tapes at the top of the list again to prevent this in the future.

Thanks!

Robert Kuropkat



[Amanda-users] confusing problem with NO-NEW-TAPE

2008-06-10 Thread nbarss

Jean-Louis took a look at my config, and it turns out I had a foolish error.  I 
had used the following lines from the default config:


# flush-threshold-dumped, flush-threshold-scheduled, taperflush, and autoflush
# are used to control tape utilization. See the amanda.conf (5) manpage 
for
# details on how they work. Taping will not start until all criteria are
# satisfied. Here are some examples:
# You want to keep the most recent dumps on holding disk, for faster recovery.
# Older dumps will be rotated to tape during each run.
flush-threshold-dumped   300 # (or more)
flush-threshold-scheduled   300 # (or more)
taperflush   300
autoflush     yes


This configuration instructs the system to not write any tape volumes until the 
holding disk contains 300 *percent* of a volume size.  I had understood it to 
mean 300mb on disk.  I have no idea why I thought that.

Commenting out those lines allowed the backup to work as I had originally 
thought that it would.

I just thought that I would post this for the others that are having the same 
issue - read the manual...as they say.

Thanks for the help!

+--
|This was sent by [EMAIL PROTECTED] via Backup Central.
|Forward SPAM to [EMAIL PROTECTED]
+--




Re: dumper abort()ing occasionally

2008-06-10 Thread Jean-Louis Martineau

Douglas,

Many bugs are already fixed in the 2.5.1 tree, but we didn't made a release.
You can try the latest 2.5.1 snapshot from 
http://www.zmanda.com/community-builds.php

It is only bug fixes since the 2.5.1p3 release.

I don't remember if this bug was fixed.

Jean-Louis

Douglas K. Rand wrote:

Once or twice a week my amanda backups are failing when a dumper exits
on signal 6, SIGABRT:

   Jun  5 23:27:38 scotch kernel: pid 82566 (dumper), uid 0: exited on signal 6
   Jun  8 19:53:43 scotch kernel: pid 96672 (dumper), uid 0: exited on signal 6

In looking at the source there clearly are calls to abort() in several
places. I'm assuming that there is an overflow problem with file
descriptors, that 4294967295 isn't a valid FD?

  driver: event_register: Invalid file descriptor 4294967295

I'm running FreeBSD 6.3 with Amanda 2.5.1p3 from ports. 


There isn't much information in the log:
  




Re: dumper abort()ing occasionally

2008-06-10 Thread Douglas K. Rand
Ian> Are there any details regarding this issue in the /tmp/amanda
Ian> debug files for this dumper?

Doug> Nothing that I saw. Here is the dumper.*.debug file for that
Doug> pid. I also uploaded all of the debug files for that run from
Doug> the server to:

Doug>   http://meridian-enviro.com/rand/amanda/

Dustin> I can't look at the logs at that URL -- the Apache user
Dustin> doesn't have read permission on the files themselves.

Doh!  Checked that index worked, not that they were
readable. Sorry. Fixed.

Dustin> The debug logs do show the client connection timing out,
Dustin> though.  It's likely that this condition is what is tickling
Dustin> the dumper bug, and since 2.5.1 is no longer maintained, the
Dustin> solution is to stop tickling the bug :).  See if you can
Dustin> figure out why that connection is timing out -- busy network?
Dustin> Downed client?  Network partition?

Well, in this particular case it was because the system being backed
up froze. (I think the motherboard is failing.) Usually when this
happens it is not due to a crashed system.

I see that the FreeBSD port for Amanda is still at 2.5.1 (I'm usually
lazy and assume that if I'm up to date to the port I'm up to date with
the software.) I'll see about getting the port upgraded to 2.6.0.

Thanks for the help.


Nominate Amanda for SourceForge Community Choice Awards

2008-06-10 Thread Dustin J. Mitchell
Reminder: nominations for SourceForge's community choice awards close
soon.  If you haven't already, please take a moment to nominate
Amanda!

  http://sourceforge.net/community/cca08-nominate?group_id=120

You can nominate in multiple categories -- after your first
nomination, simply click on the link "nominate this project in another
category."

Dustin

-- 
Storage Software Engineer
http://www.zmanda.com


Re: dumper abort()ing occasionally

2008-06-10 Thread Dustin J. Mitchell
On Mon, Jun 9, 2008 at 5:11 PM, Douglas K. Rand
<[EMAIL PROTECTED]> wrote:
> Once or twice a week my amanda backups are failing when a dumper exits
> on signal 6, SIGABRT:
>
>   Jun  5 23:27:38 scotch kernel: pid 82566 (dumper), uid 0: exited on signal 6
>   Jun  8 19:53:43 scotch kernel: pid 96672 (dumper), uid 0: exited on signal 6
>
> In looking at the source there clearly are calls to abort() in several
> places. I'm assuming that there is an overflow problem with file
> descriptors, that 4294967295 isn't a valid FD?
>
>  driver: event_register: Invalid file descriptor 4294967295

That large integer is also known as -1.  I'm guessing that when the
dumper exits unexpectedly, the driver gets an EOF from its file
descriptor and sets that fd to -1, but then incorrectly tries to
re-register it with the event system.  The pre-2.6.0 event system was
a careful balancing act, but in this case it seems to have handled the
error correctly.

The problem is to figure out why dumper aborted.  Most (all?) abort
calls in Amanda are through the error() macro, which should log a
message to the debug log as well.  But looking at the debug logs you
sent, I see no such thing.  I can't look at the logs at that URL --
the Apache user doesn't have read permission on the files themselves.

The debug logs do show the client connection timing out, though.  It's
likely that this condition is what is tickling the dumper bug, and
since 2.5.1 is no longer maintained, the solution is to stop tickling
the bug :).  See if you can figure out why that connection is timing
out -- busy network?  Downed client?  Network partition?

Dustin

-- 
Storage Software Engineer
http://www.zmanda.com