Re: [HelenOS-devel] async not fast enough for crazy mouse (was: Framebuffer problems)

Martin Sucha Sat, 20 Apr 2013 04:41:07 -0700

On 04/19/2013 08:18 PM, Ján Veselý wrote:
> On Fri, Apr 19, 2013 at 6:52 PM, Martin Sucha <[email protected]> wrote:
>> On 04/19/2013 05:49 PM, Jiří Zárevúcky wrote:
>>>> have you tried tester pingpong?
>>>> it should give an idea about ipc perf limits.
>>>>
>>>
>>> 6490 rt/s
>> The pingpong test doesn't create a new fibril for each received IPC
>> message, whereas notification handling does. Is this necessary? Why
>> don't we have just one fibril handling all notfications just like there
>> is a single fibril handling messages from a client? This would reduce
>> the (unnecessary) overhead of creating/destroying a fibril for each
>> message without any drawback I can think of (the async framework doesn't
>> do any routing for notifications anyway).
>>
>> So the notification handler fibril/function might look like this
>> (compare with connection handler function):
>> void handle_notification(void) {
>>   loop {
>>     callid = async_get_notification(&call);
>>     somehow_handle_the_notification(callid, call);
>>   }
>> }
> 
> using separate fibrils for each interrupt allows you to handle them in
> different paths and asynchronously. In this case it might not be
> necessary (you still have different paths for ps2a and ps2b), but some
> device drivers might make use of it. Interrupts that arrived later can
> be handled sooner/faster. examles that come to mind:
> Full duplex audio that handles recording and playback separately
> (different fragment sizes, formats,...).
> graphics cards that need to handle vblank interrupts ASAP without
> waiting for output connect/disconnect handlers.
> 
> ideally there would be more ways to handle interrupts, one fibril,
> #cpu fibrils, dynamic fibrils, so that each driver can select the most
> suitable handler
> 
> in reality I don't think this is a real problem now (after the stack
> size was reduced to 1 page).


I know that using separate fibrils may be beneficial for some drivers,
but I still think the handling of notifications is currently broken. The
problem is not in the size of the stack (although allocating whole page
when you just need to process a few bytes seems like a waste of
resources if you ask me), but the real problem is that there is *no
upper bound* on the number of fibrils spawned. In most cases you just
need a few fibrils anyway, because you have a limited number of things
you do in a driver (e.g. for audio you might have as many fibrils as
there are channels you are handling, but this number is low and doesn't
change much over time).

The async framework doesn't know how the notifications are handled and
shouldn't spawn new fibril for each notification, but (as you also
pointed out) the driver should choose the most suitable strategy. I
don't think that we should have different handlers in the async
framework, as the driver may easily implement whatever policy is best in
its loop handling notifications (i.e. it may route the notifications to
different fibrils, enqueue them in a buffer, etc.).

> Interrupts that arrived later can be handled sooner/faster.
Well, just spawning a new fibril does not help by itself. You either
need to have multiple kernel threads or wait for some condition (so
there is a fibril switch) to actually handle the interrupts in different
order. But once you start waiting for some condition in an interrupt
handler (e.g. a buffer you are writing to becomes full), it is very
likely that the other fibrils handling the same interrupt type will
become blocked as well.

> the buggy situation combines extremely
> slow host (emulated noKVM), with extremely intensive workload, in
> reality ps2 is limited to max 200 Hz (usually 120Hz), and if the mouse
> was connected via USB it would require less ipc messages.
Yes this is what triggers the bugs (the behaviour is not a single bug
actually), but it does not mean it should not be fixed.

The driver gets killed because it spawns new fibrils at a rate greater
than at which they are executed. One bug is that it actually spawns so
much fibrils, the other that the function to spawn a fibril crashes the
task if the allocation is not possible.

Note that the spawning of new fibrils is used as a rather inefficient
unbounded queue here (the i8042 driver runs in a single kernel thread,
just like all the other drivers do).
The i8042 driver actually relies on the fact that notification fibrils
are scheduled sequentially, because the I/O port is read in the kernel
and the read data is delivered in the notification itself. Spawn a new
kernel thread in the driver or change the implementation of how the
fibrils get scheduled and the data might be appended in the wrong order
depending on which fibril gets executed first (the same bug is present
in my ns8250 modifications).

So, I still think that the proper solution is not to spawn a new fibril
for each notification. And changing this behaviour is not a mere
optimization -- it fixes a design flaw in notification handling.

Martin Sucha


_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Re: [HelenOS-devel] async not fast enough for crazy mouse (was: Framebuffer problems)

Reply via email to