On Fri, Nov 16, 2018 at 12:53:53 -0500, Chris Hoogendyk wrote: > Only a little puzzled that I got zero responses to my question(s) > posted on 10/30. Maybe this is just pushing into an area that needs > an answer from JLM? Anyone else?
Yes... I'm guessing that at the moment no one but JLM really understands how it is all supposed to work (or what additional functionality was planned but never implemented). (I will mention that back in Nov 2017 [after a discussion here on this list under the Subject heading "amflush run while amdump underway tries to flush .tmp files"], JLM did implement additional "pid locks" to eliminate some problems with simultaneous Amanda processes [git commit 8e32004764d8105772670ac5b36f2acd13501379, in particular]... but that fix doesn't seem to have made it into the 3.4 branch.... I don't believe any of the issues you asked about have been changed, but given that the parallelism you are talking about was new and not-much-tested in 3.4, I would expect v3.5 to have somewhat fewer parallelism-related bugs overall....) > > > On 10/30/18 3:46 PM, Chris Hoogendyk wrote: > >Today, my afternoon cron that runs amcheck reported no tapes > >available on the LTO6 (it only has one drive). However, last > >night's backup had completed mid morning. When I looked at the > >processes, I saw quite a few dumpers and a taper running. So, I > >ran an amstatus. That showed about 96 DLUs in a state "wait for" > >either flushing or writing. The date for the backup indicated that > >it had started 5 days ago, filled both the LTO6 (administrative > >backups) and the LTO7 (research backups), and then hung. When I > >ran an amcleanup -k, it said that there was no unprocessed log > >file to clean up. hmm. Back to looking at the processes. I issued > >a kill on the parent of all the dumpers, looked at the processes > >again, and so on until it was "cleaned up." I ended up with an > >email report out of that. I tried amstatus again, and it reported > >that it failed to open the amdump_log file amdump.1. Then I went > >back and did an amcheck again to see if that would be alright. It > >seemed to be. However, looking at > >/usr/local/etc/amanda/daily/log/, I found that there was an > >amdump.2 symlink to amdump.20180905233001 and an amdump.3 symlink > >to amdump.20180904233002, but, indeed, no amdump.1. Furthermore, > >neither of those two files exist. The amdump files have been > >trimmed to October (12 through 29). > > > >So, what's the deal here? > > > >Why does amanda hang under those circumstances? (Wild guess, but do you have any "interactivity" enabled in your config? If so, and the taperscan algorithm can't find a volume to use, all Amanda processes related to that amdump run will just sit there and wait for the interactivity request to be satisified or aborted. In this case, the hang isn't because of parallelism, but obviouly it would cause parallelism to happen when your next night's cron job kicked off.) > > > >Why doesn't amcleanup see the older run? My impression is that amcleanup just looks for a symlink named "log" in the log directory... and presumably that "log" symlink no longer existed after later amdump runs (i.e. the one you refer to when you say "last night's backup had completed mid morning") had run to completion. > > > >If I happened to have two instances of amanda backup running (with > >all their subprocesses), why wouldn't amstatus report that? amstatus seems to just look for one of the following files in the log directory: amdump, amflush, amflush.1, amdump.1 -- and it reports on the first one it finds. As far as I understand the code, it makes no attempt to detect the situation where more than one amdump/amflush run is occuring at the same time. You can use the -f parameter to tell it to use a different log file (but you have to notice yourself that that log file exists). (Note that amstatus relies on processing the log file; it doesn't go searching for running Amanda subprocesses or anything.) > > > >Do amstatus and amcleanup understand the parallelism and the > >possibility of multiple backups running at once? > > > >It seems they should behave a little like amflush, which will tell > >you what date backups are represented in the holding space and > >allow you to choose what you want to flush. (It doesn't look like any such functionality has been implemented yet...) > > > >And, should I delete those two hanging symlinks? Where did they come from > >anyway? > > I doubt they will hurt anything (but deleting them seems harmless, too)... but now I am curious if they are still there or of they got cleaned up somewhere along the way since your original post? As far as I can see in the code, amcleanup will rename each existing amdump.<N> to amdump.<N+1> prior to renaming the current "amdump" file to "amdump.1" ... but amdump itself doesn't seem to care anything about previously-existing amdump.<N> files, and simply creates a new "amdump.1" symlink pointing to the amdump.DATESTAMP file that it just created as part of wrapping up the run. So, if I'm correct, it seems like the amdump.2 and amdump.3 symlinks were created by amcleanup at some point, but possibly they are left over from a long time ago since probably nothing else ever looks at or touches them. (If they still exist, I guess looking at their m- and ctimes [e.g. "stat amdump.2"] might tell you how long ago they were actually created and when they were renamed to their current name.) Nathan ---------------------------------------------------------------------------- Nathan Stratton Treadway - natha...@ontko.com - Mid-Atlantic region Ray Ontko & Co. - Software consulting services - http://www.ontko.com/ GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239 Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239