I would like to express my sincere thanks to omd for eir continued support of the mailing lists. I hope that everyone recognizes what hard work it is to manage web servers. I would be interested in awarding them the patent title, "Technical Hero of Agora" or a lesser patent title, if a heroic title could not be agreed upon. What do others think on the matter?
On Sun, Jun 24, 2018 at 8:23 PM, comex <com...@gmail.com> wrote: > So, 11 days ago I received an email from Ørjan saying that the lists > weren't working. As I responded at the time (and e then forwarded to > a-d): > >> Some Python processes were in an infinite loop eating all the CPU. >> Bizarrely, I think the cause is some kind of bug in Debian's patched >> Python, as the symptoms seem similar to >> https://bugs.python.org/issue14903… apparently it can be caused by a >> transient out-of-memory condition. >> >> I'm not really sure how to deal with this. For now, I killed the >> processes, so hopefully things should respond again… > > Yesterday I got another email from Ørjan saying that agora-official > wasn't working, which I managed to forget to act on until P. > Scholasticus sent me a separate message an hour ago about the same > issue. Once again, sorry about the delay. > > As it turned out, agora-official was in an… interesting state. The > configuration seemed to have reverted to 2013, when I took over the > lists from Taral. The admin password didn't work, the owner email was > set to tar...@gmail.com, and the member list was missing new players. > Also, the list URL had reverted to www.agoranomic.org from > mailman.agoranomic.org, which explains P. Scholasticus's observation > that agora-official didn't appear in the list of mailing lists > (https://mailman.agoranomic.org/cgi-bin/mailman/listinfo): that page > has a filter based on whether the request URL matches the list domain. > > How could this be? Well, when I took over the lists I installed a > slightly newer version of Mailman than Taral had been using, which > switched the format and name of its list configuration files, from > "config.db" to "config.pck". Whenever Mailman runs, it preferentially > tries to load config.pck; if config.pck isn't valid but config.db is > valid, it automatically converts the data to the new format and writes > that out as config.pck, but doesn't delete config.db. Somehow, the > bugged Python processes must have corrupted config.pck, causing > Mailman to re-migrate the config.db that had been sitting unmodified > in the configuration directory since 2013. (What a great database > format, that can get corrupted by one process getting wedged, > apparently without being provoked by any hardware failure.) > > Good thing I have backups… > > …or so I thought, until I learned that backups had stopped working > almost exactly a year ago, when as part of a system upgrade, the duply > package was updated to a version with a backwards-incompatible change > [1] to the configuration format. …Oops. I really, really need to set > up some kind of status dashboard for my personal servers, so I can get > notified when things go wrong, rather than at best having the cron > daemon send a message to a mailbox I don't read. > > So this is bona fide data loss. Luckily, the list *archives* for > agora-official seem to be intact; only the configuration and member > list is affected. (But anyone who subscribed since 2013 would have > found themselves unable to log in to the archives.) I should be able > to copy the configuration from agora-business, at the cost of messing > up anyone who had a different subscription state on agora-official and > agora-business. However, I haven't done that yet; for now I've just > disabled agora-official. (Right now I have something I need to do, > but I should have time to finish ealing with this soon.) > > This shouldn't have affected agora-business or agora-discussion (other > than the initial unresponsiveness caused by the Python processes > spinning, which should have stopped when I killed them), so if there > have been delivery problems with those lists, they had a different > cause. I will investigate this too when I have a chance. > > I apologize for letting everyone down. > > [1] > https://www.guyrutenberg.com/2017/10/12/duply-credential-error-when-using-amazon-s3/