Hi Stefan, thanks for your detailled explanation! But, what's more important, a BIG THANKS for your continious help, effort, engagement, ...!
Stay healthy! Cheers. l8er manfred On Mon, 27 Apr 2020, 17:49:52 +0200, Stefan Botter wrote: > Hi Packmans, > > I know, it is lame to self-reply, ... but anyhow ... > a tl;dr is at the end of the mail :) > > Am Montag, 20. April 2020, 15:21:42 CEST schrieb Stefan Botter: > ... > > I hope, I can give more insight in the next few days. > > Now is "next few days". > > What happened? > The initial problem arose during the evening hours of Apr 1st, when a > rather unusual blackout hit the part of town, where my servers are > hosted. > I have a UPS, but it supports for 8-10 minutes only, and the blackout > lasted 30 minutes. There should be emergency power by means of a diesel > generator (which by-the-way was scheduled to be replaced the following > weekend, but this is postponed due to COVID-19), but for unknown reason > the generator did not kick in. I could restart everything Thursday > morning. > > A secondary problem surfaced, it affected the whole system badly, and I > have been rather clueless until today. > PMBS runs along my personal VMs as a VMware guest on my lab system (two > ESXi hosts). The lab is setup according to best practices, with two > network facing switches, and two separate switches for storage. The > storage device is a Synology DS620 with 4 1TB SSDs, connected via iSCSI. > Backup is done inside the storage network to a separate DS216+II, and > until Apr 10th was done by Synology's Advanced Backup for Business, > which basically does snapshots of the VMs, and copies the changed blocks > to the backup storage space. > Since the blackout every time backup ran, at least one of the ESXi hosts > froze or lost network connectivity. > Since Apr 15th PMBS is now backed up by simple means of rsync, there is > one backup copy created daily. This does not seem to put such a heavy > strain on the network. > I am still contemplating a versioned backup with rdiff-backup, which I > use regularly with my other machines, but I am not sure, if my available > backup space will be sufficient, and how long backup runs take on PMBS. > So this is on the "maybe-ToDo-list". > > Still I did not know the cause of the lock-ups. > By chance I discovered an almost similar behavior with network > interruptions early last week, when upon a download of a VM image to my > home system network connectivity was lost. It recovered automagically > after 10-30 minutes, and was reproducible. > > Over the course of the weekend and today I managed to investigate > further, and found that one of the network add-in cards in one of the > servers acted strangely under load. I reconfigured the ESXi servers to > use the lan-on-mainboard (LOM) adapters only, and am now more convinced, > that the system runs stable again. > I have some spare quad-port cards lying around, and will replace the > thought-to-be-defective adapters some time in the future, to have the > lab again conforming to best practices, but for now everything should > work without frequent interruptions. > > > As the world-wide COVID-19 calamity and the now emergency-emerging ;) > changes to schooling environment is putting a heavy demand for immediate > action by the school's IT, I have been having rather few time to work on > "personal fun", it took a while longer to resolve the branching issue, > which caused this thread. The cause of the reported errors were based on > the frequent unwanted shutdowns, which left some state-recording files > for sourceserver and schedulers with binary garbage at the end. > > > I thought it was a good idea to document the events and sort-of- > solution, for you to enjoy, and me to remember, as I will probably > forget what happened and what I did in a few weeks :) > > > tl;dr: everything should work again without frequent interruptions. > > > Greetings, > > Stefan > -- > Stefan Botter zu Hause > Bremen
signature.asc
Description: PGP signature
_______________________________________________ Packman mailing list Packman@links2linux.de http://lists.links2linux.de/cgi-bin/mailman/listinfo/packman