Re: [HelenOS-devel] async not fast enough for crazy mouse (was: Framebuffer problems)

Ján Veselý Sun, 21 Apr 2013 06:53:09 -0700

On Sat, Apr 20, 2013 at 5:23 PM, Jiří Zárevúcky
<[email protected]> wrote:
>>>
>>> I know that using separate fibrils may be beneficial for some drivers,
>>> but I still think the handling of notifications is currently broken. The
>>> problem is not in the size of the stack (although allocating whole page
>>> when you just need to process a few bytes seems like a waste of
>>> resources if you ask me), but the real problem is that there is *no
>>> upper bound* on the number of fibrils spawned. In most cases you just
>>> need a few fibrils anyway, because you have a limited number of things
>>> you do in a driver (e.g. for audio you might have as many fibrils as
>>> there are channels you are handling, but this number is low and doesn't
>>> change much over time).
>>
>> I don't see a reason why there should be an upper bound on the number
>> of fibrils used. The only problem I see is that failure to create new fibril 
>> is
>> not handled gracefully. In general I don't see high number of fibrils
>> as a problem,
>>
>
> The fact a simple driver exhausts all available memory any time the
> system is not fast enough to process all data the hardware throws at
> it, seems like a pretty serious defect. You can't just say that the
> systems needs to meet hard real-time constraints, regardless how
> lenient they might be, in order not to collapse. That is just not the
> sane way of designing general purpose system.


You have to say that in order to have any form of guarantees.
Otherwise it's best effort (which is fine), and you are playing catch
up, relying on the fact that such a situation never (rarely) happens
in real world. The problem of IRQ storms is known and there are
different ways to handle it (like switching to polling mode instead of
interrupts).

Our problem is a bit different, PS/2 was not designed to handle 1000
events per second. It uses interrupt for every single byte, that is
the most striking inefficiency in this case. Your ping_pong test
showed that your system is capable of handling cca 6.5k round trip
messages per second even if it uses the "efficient way" (does not
spawn fibrils). I agree that spawning fibrils probably significantly
lowers this number. ps2mouse needs at least 6000 messages just to send
the events to the input service. This number gets higher if you follow
every event to the end (movement on screen). We are already on a very
thin ice trying to handle it.

The reason I suggested USB interface is because it works differently.
It receives all movement data in one go and the driver only asks for
more data when it's done processing the previous event. The point is,
there is a reason your mouse needs at least full speed USB.

There is no way for ps2 driver to know whether an interrupt event can
be safely dropped, dropping one part of a message desync and breaks
ps2 protocol, thus it makes no sense to drop interrupts while there
are resources to handle them. Even if you replace the implicit fibril
queue with say endless fifo, the problem does not disappear, it just
gets harder to hit (try slower  machine). Given the extremes of this
situation I would argue it's hard to hit already. Relying on kernel
message queue (by using single fibril) would just change the type of
resource that runs out.

Yes, I mean to say that you need sufficiently fast machine to handle
workload produced by attached hardware. 10G NIC under full load will
stress even today's systems. Our problem is a smaller scale of this
situation. (486 class hw WILL have hard time processing 4000 irqs per
second).

It makes sense to argue that we need something that guarantees
sequential processing of interrupt messages for ps2 and probably
serial port too (unless it uses dma buffer or internal fifos in which
case it does not matter). It is required if we want to use more than
one thread to handle driver fibrils.

Other than that I see only one problem: the ability to handle lack of
resources gracefully. The rest is just a performance improvement to
handle a rare situation that is pathological to begin with.

There are two things to do:
1) fix fibril creation so that it won't kill the task if there is not
enough memory. This will result in unusable mouse and probably
stuck/ignored keys, but should prevent driver kill.

2) add single fibril interrupt handling mode to guarantee serial
handling and hope that the added benefit of better performance is good
enough to remedy your situation. If the performance is still not good,
it will result in unusable mouse and stuck/ignored keys.

Your setup is a nice test case so implementing 1) first might be a
good idea. I'll try to have a look
on both when I get some time.


Jan

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] async not fast enough for crazy mouse (was: Framebuffer problems)

Reply via email to