I'd say wireshark is too low level for what the problem is here. We are talking about having too many HTTP requests for specific URLs.
I can think two main measures: - Trigger an alarm (an e-mail notification?) if there's a specific UserAgent that has a specific portion of the queries we have in a specific day in the services we care about. - Offer plots to see how queries by UserAgent evolve over the last couple of months (or couple of years). Aleix On Thu, Mar 3, 2022 at 9:59 AM Ben Cooksley <bcooks...@kde.org> wrote: > On Thu, Mar 3, 2022 at 8:41 AM Aleix Pol <aleix...@kde.org> wrote: > >> (dropping the distros list) >> >> @sysadmin have you been able to look into any tools we devs can have to >> make sure this situation doesn't repeat in the future? >> > > Hi Aleix, > > To be honest i've been struggling to think of ways that we could detect > this on the server side prior to it becoming a massive issue. > By the time an issue is evident server side it is usually much too late. > > The main tools i'd usually recommend would be the standard tools you would > use for monitoring the network activity of any application - such as > Wireshark. > > Is there something you were thinking of specifically in terms of us being > able to provide? > > Thanks, > Ben > > >> >> Aleix >> >> On Thu, Feb 10, 2022 at 1:10 PM Aleix Pol <aleix...@kde.org> wrote: >> >>> On Thu, Feb 10, 2022 at 11:05 AM Ben Cooksley <bcooks...@kde.org> wrote: >>> > >>> > >>> > >>> > On Thu, Feb 10, 2022 at 8:20 AM Aleix Pol <aleix...@kde.org> wrote: >>> >> >>> >> [Snip] >>> >> >>> >> We still haven't discussed here is how to prevent this problem from >>> >> happening again. >>> >> >>> >> If we don't have information about what is happening, we cannot fix >>> problems. >>> > >>> > >>> > Part of the issue here is that the problem only came to Sysadmin >>> attention very recently, when the system ran out of disk space as a result >>> of growing log files. >>> > It was at that point we realised we had a serious problem. >>> > >>> > Prior to that the system load hadn't climbed to dangerous levels (> >>> number of CPU cores) and Apache was keeping up with the traffic, so none of >>> our other monitoring was tripped. >>> > >>> > If you have any thoughts on what sort of information you are thinking >>> of that would be helpful. >>> >>> We could have plots of the amount of queries we get with a KNewStuff/* >>> user-agent over time and their distribution. >>> >>> > It would definitely be helpful though to know when new software is >>> going to be released that will be interacting with the servers as we will >>> then be able to monitor for abnormalities. >>> >>> We make big announcements of every Plasma release... (?) >>> >>> >> Is there anything that could be done in this front? The issue here >>> >> could have been addressed months ago, we just never knew it was >>> >> happening. >>> > >>> > >>> > One possibility that did occur to me today would be for us to >>> integrate some kind of killswitch that our applications would check on >>> first initialisation of functionality that talks to KDE.org servers. >>> > This would allow us to disable the functionality in question on user >>> systems. >>> > >>> > The check would only be done on first initialization to keep load low, >>> while still ensuring all users eventually are affected by the killswitch >>> (as they will eventually need to logout/reboot for some reason or another). >>> > >>> > The killswitch would probably work best if it had some kind of version >>> check in it so we could specify which versions are disabled. >>> > That would allow for subsequent updates - once delivered by >>> distributions - to restore the functionality (while leaving it disabled for >>> those who haven't updated). >>> >>> The file we are serving here effectively is the kill switch to all of >>> KNewStuff. >>> >>> Aleix >>> >>