seems wedged again? sorry for the bad news Shane, thanks for all the work on fixing it
On Mon, Mar 18, 2019 at 4:02 PM shane knapp <skn...@berkeley.edu> wrote: > ok, i dug through the logs and noticed that rsyslogd was dropping messages > to do imuxsock being spammed by postfix... which i then tracked down to > our installation of fail2ban being incorrectly configured and attempting to > send IP ban/unban status emails to 'em...@example.com'. > > since we're a university, and especially one w/a reputation like ours, we > are constantly under attack. the logs of the attempted dictionary attacks > would astound you in their size and scope. since we have so many ban/unban > actions happening for all of these unique IP address, each of which > generates an email that was directed to an invalid address, we ended up > w/well over 100M of plain-text messages waiting in the mail queue. postfix > was continually trying to send these messages, which was causing the system > to behave strangely, including breaking rsyslogd. > > so, i disabled email reports in fail2ban, restarted the impacted services, > picked my sysadmin's brain and then purged the mail queue (when was the > last time anyone actually used postfix?). jenkins now seems to be behaving > (maybe?). > > i'm not entirely sure that this will fix the strange GUI hangs, but all > reports i found on stackoverflow and other sites detail strange system > behavior across the board when rsyslogd starts dropping messages. at the > very least we won't be (potentially) losing system-level log messages > anymore, which might actually help me track down what's happening if > jenkins gets wedged again. > > and finally, the obligatory IT Crowd clip: > https://www.youtube.com/watch?v=5UT8RkSmN4k > > shane (who expects jenkins to crash within 5 minutes of this email going > out) > > On Fri, Mar 15, 2019 at 8:22 PM Sean Owen <sro...@gmail.com> wrote: > >> It's not responding again. Is there any way to kick it harder? I know >> it's well understood but this means not much can be merged in Spark >> >> On Fri, Mar 15, 2019 at 12:08 PM shane knapp <skn...@berkeley.edu> wrote: >> > >> > well, that box rebooted in record time! we're back up and building. >> > >> > and as always, i'll keep a close eye on things today... jenkins >> usually works great, until it doesn't. :\ >> > >> > On Fri, Mar 15, 2019 at 9:52 AM shane knapp <skn...@berkeley.edu> >> wrote: >> >> >> >> as some of you may have noticed, jenkins got itself in a bad state >> multiple times over the past couple of weeks. usually restarting the >> service is sufficient, but it appears that i need to hit it w/the reboot >> hammer. >> >> >> >> jenkins will be down for the next 20-30 minutes as the node reboots >> and jenkins spins back up. i'll reply here w/any updates. >> >> >> >> shane >> >> -- >> >> Shane Knapp >> >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> >> https://rise.cs.berkeley.edu >> > >> > >> > >> > -- >> > Shane Knapp >> > UC Berkeley EECS Research / RISELab Staff Technical Lead >> > https://rise.cs.berkeley.edu >> > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >