On 1/6/20 4:34 PM, Debra S Baddorf wrote:
On Jan 6, 2020, at 3:13 PM, Chris Hoogendyk <hoogen...@bio.umass.edu> wrote:


On 9/5/19 6:48 PM, Nathan Stratton Treadway wrote:
On Thu, Sep 05, 2019 at 14:12:29 -0400, Chris Hoogendyk wrote:
 From various pieces of information, I decided there were two runs
from August 31 and Septermber 1st that were hung and their tapers
were holding the drives. amcleanup -k said:

    amcleanup: no unprocessed logfile to clean up
    amcleanup: /usr/local/sbin/amcleanupdisk stderr: amcleanupdisk: Can't kill 
a non-numeric process
    ID at /usr/local/share/perl/5.22.1/Amanda/Holding.pm line 244.

    amcleanup: /usr/local/sbin/amcleanupdisk stderr:

On Thu, Sep 05, 2019 at 17:17:51 -0400, Chris Hoogendyk wrote:
amanda? (or amcleanup being able to deal with multiple instances for
that matter?) Is that a bug? Or just development that was never
completed? And how difficult would it be revise the code to do this?
Unfortunately I think only Jean-Louis really knew the answer to that,
but looking at the code for amcleanup it doesn't appear to make any
attempt to deal with multiple instances.

More generally, amcleanup simply looks for a "log" symlink in the
"logdir" directory, and processes the log.<DATESTAMP> pointed to by
that.  As far as I understand, that symlink is created each time amdump
starts, pointing to that instance's log file.

So, as soon as some new parallel instance starts, there's no longer any
"log" symlink pointing to the earlier instance(s)'s log file(s).  If
that latest instance then terminates cleanly (as, for example, was
probably the case for the instance at your site which gave up when it
couldn't find an any available tape drives), then the "log" symlink will
continue to point to a "completed" log... even though earlier instances
are still out there running (or died without a clean shutdown).

I haven't tried it myself, but based on what I am reading it looks like
the next time you run in to this situation, you should be able to
manually update the "log" symlink to point to the log.* file for a
still-running instance before you run "amcleanup", thus allowing that
particular instance to get cleaned up.  If you did this once for each
still-running instance, theoretically you'd end up with everything
properly killed and Amanda email reports for each one, etc....

(But note that you would need to make sure there was at least enough
free space on the holding disk that the "pid" files could be created
successfully, or you run into that "can't kill a non-numeric process ID"
bug....)

I suspect a "real" fix for this situation would involve some
re-architecting of the whole parallel-instances situation....

For example, in addition to the simple "log", "amdump", "amdump.1",
"amflush", and "amflush.1" symlinks currently used, perhaps there should
also be "<prefix>.<DATESTAMP>.running" symlink created at the start of
the run, and then removed as part of the end-of-run cleanup.  That way,
both amstatus and amcleanup could just search for *.running symlinks as
a way to detect still-running (or uncleanly shut down) instances.

But obviously that involves changing all the places where these files
and symlinks are initially created and where they are cleaned up... so
I'm not sure how hard that would be.

                                                Nathan

hmm.

Well, this situation came up again, and that didn't actually work.

I had two jobs running, one started Saturday evening and one started Sunday evening. Both 
holding disks were 100% full. I ran amstatus and found that the "current" run 
was flushing a reasonably large DLE and that there should be more than sufficient space 
left on the tape. Half an hour later, I checked again, and the numbers were identical. No 
progress. The web interface for the tape library showed both tape drives idle, but with 
appropriate tapes loaded. So, I issued an `amcleanup -k daily`. That, of course, worked 
fine. I got a report for the Sunday night run and all it's processes were gone.

I tried switching the log symlink to point to the Saturday night log file and 
then running amcheck. That didn't work. So, I tried also changing the amdump 
symlink to the Saturday night amdump file. The two together gave me the output 
for the Saturday night run. It showed that the tape was full; and, with the 
holding disks also full, 5 DLEs were waiting for dumping, one had failed, and 
Amanda was simply hung waiting. Assuming that the symlinks were doing the job, 
I issued an `amcleanup -k daily`. That seemed to work. The processes were 
killed and a report was sent. However, the report was virtually empty. So 
something else is missing in terms of symlinks or coding or something.

Report from the two day old run is below.

Then I ran amcheck and everything looked fine, but amflush ran into the "can't kill 
anon-numeric process ID". I found several empty pid files on the holding disks and 
removed them. Then amflush launched alright. However, it only offered me the last two 
runs to flush files from. The holding disks show directories for a couple of other dates. 
How does that happen? How does one clean that up?

Running Amanda version 3.5.1 on Ubuntu 16.04 Server LTS.


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geosciences Departments
(*) \(*) -- 315 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogen...@bio.umass.edu>


This is too simplistic, but when amflush shows nothing it is willing to flush,  
I frequently
delete things from the holding areas …. especially when their date is 
significantly old.
      rm -fr   <specifics>/amanda/daily/20191002*
does the trick.  Amanda never asks me about them or any such.

Deb Baddorf
Fermilab


Yup. I have been doing that as well. I was hoping for a better solution.


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology & Geosciences Departments
 (*) \(*) -- 315 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogen...@bio.umass.edu>

---------------

Erdös 4

Reply via email to