On Fri, Nov 16, 2018 at 12:53:53 -0500, Chris Hoogendyk wrote:
> Only a little puzzled that I got zero responses to my question(s)
> posted on 10/30. Maybe this is just pushing into an area that needs
> an answer from JLM? Anyone else?

Yes... I'm guessing that at the moment no one but JLM really understands
how it is all supposed to work (or what additional functionality was
planned but never implemented).


(I will mention that back in Nov 2017 [after a discussion here on this
list under the Subject heading "amflush run while amdump underway tries
to flush .tmp files"], JLM did implement additional "pid locks" to
eliminate some problems with simultaneous Amanda processes [git commit
8e32004764d8105772670ac5b36f2acd13501379, in particular]... but that fix
doesn't seem to have made it into the 3.4 branch....

I don't believe any of the issues you asked about have been changed,
but given that the parallelism you are talking about was new and
not-much-tested in 3.4, I would expect v3.5 to have somewhat fewer
parallelism-related bugs overall....)


> 
> 
> On 10/30/18 3:46 PM, Chris Hoogendyk wrote:
> >Today, my afternoon cron that runs amcheck reported no tapes
> >available on the LTO6 (it only has one drive). However, last
> >night's backup had completed mid morning. When I looked at the
> >processes, I saw quite a few dumpers and a taper running. So, I
> >ran an amstatus. That showed about 96 DLUs in a state "wait for"
> >either flushing or writing. The date for the backup indicated that
> >it had started 5 days ago, filled both the LTO6 (administrative
> >backups) and the LTO7 (research backups), and then hung. When I
> >ran an amcleanup -k, it said that there was no unprocessed log
> >file to clean up. hmm. Back to looking at the processes. I issued
> >a kill on the parent of all the dumpers, looked at the processes
> >again, and so on until it was "cleaned up." I ended up with an
> >email report out of that. I tried amstatus again, and it reported
> >that it failed to open the amdump_log file amdump.1. Then I went
> >back and did an amcheck again to see if that would be alright. It
> >seemed to be. However, looking at
> >/usr/local/etc/amanda/daily/log/, I found that there was an
> >amdump.2 symlink to amdump.20180905233001 and an amdump.3 symlink
> >to amdump.20180904233002, but, indeed, no amdump.1. Furthermore,
> >neither of those two files exist. The amdump files have been
> >trimmed to October (12 through 29).
> >
> >So, what's the deal here?
> >
> >Why does amanda hang under those circumstances?

(Wild guess, but do you have any "interactivity" enabled in your
config?  If so, and the taperscan algorithm can't find a volume to use,
all Amanda processes related to that amdump run will just sit there and
wait for the interactivity request to be satisified or aborted.  

In this case, the hang isn't because of parallelism, but obviouly it
would cause parallelism to happen when your next night's cron job kicked
off.)


> >
> >Why doesn't amcleanup see the older run?

My impression is that amcleanup just looks for a symlink  named "log" in
the log directory... and presumably that "log" symlink no longer existed
after later amdump runs (i.e. the one you refer to when you say "last
night's backup had completed mid morning") had run to completion.

> >
> >If I happened to have two instances of amanda backup running (with
> >all their subprocesses), why wouldn't amstatus report that?

amstatus seems to just look for one of the following files in the log
directory: amdump, amflush, amflush.1, amdump.1 -- and it reports on the
first one it finds.  As far as I understand the code, it makes no
attempt to detect the situation where more than one amdump/amflush run
is occuring at the same time.  

You can use the -f parameter to tell it to use a different log file
(but you have to notice yourself that that log file exists).

(Note that amstatus relies on processing the log file; it doesn't go
searching for running Amanda subprocesses or anything.)

> >
> >Do amstatus and amcleanup understand the parallelism and the
> >possibility of multiple backups running at once?
> >
> >It seems they should behave a little like amflush, which will tell
> >you what date backups are represented in the holding space and
> >allow you to choose what you want to flush.

(It doesn't look like any such functionality has been implemented
yet...)

> >
> >And, should I delete those two hanging symlinks? Where did they come from 
> >anyway?
> >

I doubt they will hurt anything (but deleting them seems harmless,
too)... but now I am curious if they are still there or of they got
cleaned up somewhere along the way since your original post?


As far as I can see in the code, amcleanup will rename each existing
amdump.<N> to amdump.<N+1> prior to renaming the current "amdump" file
to "amdump.1" ... but amdump itself doesn't seem to care anything about
previously-existing amdump.<N> files, and simply creates a new "amdump.1"
symlink pointing to the amdump.DATESTAMP file that it just created as
part of wrapping up the run.

So, if I'm correct, it seems like the amdump.2 and amdump.3 symlinks
were created by amcleanup at some point, but possibly they are left over
from a long time ago since probably nothing else ever looks at or
touches them.  (If they still exist, I guess looking at their m- and
ctimes [e.g. "stat amdump.2"] might tell you how long ago they were
actually created and when they were renamed to their current name.)

                                                        Nathan

----------------------------------------------------------------------------
Nathan Stratton Treadway  -  natha...@ontko.com  -  Mid-Atlantic region
Ray Ontko & Co.  -  Software consulting services  -   http://www.ontko.com/
 GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt   ID: 1023D/ECFB6239
 Key fingerprint = 6AD8 485E 20B9 5C71 231C  0C32 15F3 ADCD ECFB 6239

Reply via email to