Re: Critical Denial of Service bugs in Discover

Aleix Pol Thu, 03 Mar 2022 03:49:22 -0800

I'd say wireshark is too low level for what the problem is here. We are
talking about having too many HTTP requests for specific URLs.


I can think two main measures:
- Trigger an alarm (an e-mail notification?) if there's a specific
UserAgent that has a specific portion of the queries we have in a specific
day in the services we care about.
- Offer plots to see how queries by UserAgent evolve over the last couple
of months (or couple of years).

Aleix


On Thu, Mar 3, 2022 at 9:59 AM Ben Cooksley <bcooks...@kde.org> wrote:

> On Thu, Mar 3, 2022 at 8:41 AM Aleix Pol <aleix...@kde.org> wrote:
>
>> (dropping the distros list)
>>
>> @sysadmin have you been able to look into any tools we devs can have to
>> make sure this situation doesn't repeat in the future?
>>
>
> Hi Aleix,
>
> To be honest i've been struggling to think of ways that we could detect
> this on the server side prior to it becoming a massive issue.
> By the time an issue is evident server side it is usually much too late.
>
> The main tools i'd usually recommend would be the standard tools you would
> use for monitoring the network activity of any application - such as
> Wireshark.
>
> Is there something you were thinking of specifically in terms of us being
> able to provide?
>
> Thanks,
> Ben
>
>
>>
>> Aleix
>>
>> On Thu, Feb 10, 2022 at 1:10 PM Aleix Pol <aleix...@kde.org> wrote:
>>
>>> On Thu, Feb 10, 2022 at 11:05 AM Ben Cooksley <bcooks...@kde.org> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Feb 10, 2022 at 8:20 AM Aleix Pol <aleix...@kde.org> wrote:
>>> >>
>>> >> [Snip]
>>> >>
>>> >> We still haven't discussed here is how to prevent this problem from
>>> >> happening again.
>>> >>
>>> >> If we don't have information about what is happening, we cannot fix
>>> problems.
>>> >
>>> >
>>> > Part of the issue here is that the problem only came to Sysadmin
>>> attention very recently, when the system ran out of disk space as a result
>>> of growing log files.
>>> > It was at that point we realised we had a serious problem.
>>> >
>>> > Prior to that the system load hadn't climbed to dangerous levels (>
>>> number of CPU cores) and Apache was keeping up with the traffic, so none of
>>> our other monitoring was tripped.
>>> >
>>> > If you have any thoughts on what sort of information you are thinking
>>> of that would be helpful.
>>>
>>> We could have plots of the amount of queries we get with a KNewStuff/*
>>> user-agent over time and their distribution.
>>>
>>> > It would definitely be helpful though to know when new software is
>>> going to be released that will be interacting with the servers as we will
>>> then be able to monitor for abnormalities.
>>>
>>> We make big announcements of every Plasma release... (?)
>>>
>>> >> Is there anything that could be done in this front? The issue here
>>> >> could have been addressed months ago, we just never knew it was
>>> >> happening.
>>> >
>>> >
>>> > One possibility that did occur to me today would be for us to
>>> integrate some kind of killswitch that our applications would check on
>>> first initialisation of functionality that talks to KDE.org servers.
>>> > This would allow us to disable the functionality in question on user
>>> systems.
>>> >
>>> > The check would only be done on first initialization to keep load low,
>>> while still ensuring all users eventually are affected by the killswitch
>>> (as they will eventually need to logout/reboot for some reason or another).
>>> >
>>> > The killswitch would probably work best if it had some kind of version
>>> check in it so we could specify which versions are disabled.
>>> > That would allow for subsequent updates - once delivered by
>>> distributions - to restore the functionality (while leaving it disabled for
>>> those who haven't updated).
>>>
>>> The file we are serving here effectively is the kill switch to all of
>>> KNewStuff.
>>>
>>> Aleix
>>>
>>

Re: Critical Denial of Service bugs in Discover

Reply via email to