Re: amcheck 0 problems, but timeout on reply pipe

Chris Hoogendyk Thu, 22 Jan 2015 10:44:23 -0800

Thanks.

It happens that the 16th was the last day before students and faculty started arriving back.However, I would have thought that the three day weekend would have been quieter. That server doeshave some large storage areas that had caused problems in the past. It was the Amanda server. Now Ihave a new Ubuntu server that is starting to take over, and the first thing I did was to buildAmanda server on the new server to start using the new LTO6 NEO200. So, now that server is notbacking itself up, but being backed up by the new server.

Unfortunately, the stuff in /tmp/amanda/client/daily gets trimmed. I have only back to the 18th, andsince I don't backup /tmp, there is no way to get them.

I think I will take Jean-Louis' suggestion of using a "faster estimate method: calcsize or server."While Deb's suggestion of bumping etimeout would probably work as well, it seems it would be morefragile. My etimeout was 900, and just now I figured "what the heck" and changed it to 2000. I careless about delays than about getting the backups. I will still go ahead with the faster estimatemethod on the larger DLEs, since that will both avoid the delays and get the backups done.

I'm inclined to suggest that Amanda backup what it can and report individual DLEs that have issues.I don't know if that would be an easy or sensible change in its planner strategy, but it makes senseto me on first blush.



On 1/21/15 6:25 PM, Nathan Stratton Treadway wrote:

On Wed, Jan 21, 2015 at 16:39:33 -0500, Chris Hoogendyk wrote:

If some of the DLEs were fully estimated in short time, would those
also fail just because other DLEs on the same host caused long time
delays?

Yes -- to over-simplify a bit, Amanda waits for all the estimates from a
particular machine to complete before proceeding with any dumps from
that machine...

I just find it odd that things were working smoothly up to the 16th
and then consistently and completely failing after the 16th.

Can you go back and see how long the estimate was taking before the
16th?

If it was nowhere near 6 hours, then probably something suddenly made it
stop working (e.g. a hung NFS mount, as Joi mentioned).

If it was just a few minutes under 6 hours, then maybe the file count
just grew enough that it tipped over the estimate timeout, in which case
bumping the timeout in the config might be enough to get things working
again with the minium of changes.  (However, 6 hours seems like a long
time to be waiting for the estimate phase, so switching to a different
estimate method might make sense in terms of speeding your overall
run....)

                                                        Nathan

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  [email protected]  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
  GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
  Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geology Departments
 (*) \(*) -- 347 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<[email protected]>

---------------

Erdös 4

Re: amcheck 0 problems, but timeout on reply pipe

Reply via email to