> Can you update the commit message to match our usual format? And below
> the first line include a description of exactly how the fix works. If
> this is a workaround that should be removed in the future, please also
> add a COMPAT comment in the code linking to the issue it is working
> around.
Sure

> You moved some things around, but as far as I can tell the actual
> change is to switch from 'return timeout' to 'timer.add_task(timeout,
> ...)'.
Yes it is, but I did not want to have this timeout handling in two places, so 
I coalesced it into one single function.

> From what you wrote on the lua-http issue report I understand this is
> to prevent rescheduling the timer (which would be immediately
> executed) but create a new one instead (which will be executed on the
> next "tick").
Yes

> However...... I don't see that this is actually the case. In Prosody
> `return timeout` from a timer callback already creates a new timer. It
> doesn't reschedule the current one. So I feel like I'm missing
> something. Any ideas? Also, what network backend are you experiencing
> the issue with? Each has slightly different handling of timers.
I'm using the epoll backend and here it does reschedule the current one.
net/server_epoll.lua line 133 inserts a new timer using the variable "elapsed" 
which is invariant in the whole timer loop --> returning 0.0 from the timer 
sets the new timer's timestamp to "elapsed".
That timer gets peeked later in the loop and thus the if in line 124 does not 
trigger (peek > elapsed), which makes the loop execute the rescheduled timer 
again without ever leaving the timer loop at all.
Adding a new timer (with zero timeout) through the timer api will not use the 
"frozen" elapsed variable, but a new call to monotonic() to calculate the 
timeout, which will be biggern than "elapsed" and will be executed only after 
the next epoll() call.

net/server_select.lua does not have this behaviour as far as I can tell.

> All this said, I'm not opposed to merging the patch (with the changes
> mentioned above) because I'm fairly confident nobody but you is using
> net.cqueues right now and it's a critical production issue for you.
> But I'd still like to understand the problem and solution to make sure
> we're not missing something (e.g. a deeper bug in Prosody's timer
> handling).
Sure, maybe a better fix would be to fix net/server_epoll.lua to not allow 
creating endless loops through timers at all.
Initially I didn't want to patch server_epoll because I was not sure what 
assumptions the prosody codebase makes about the network backend module.
But given that server_select does not exhibit the same behaviour, patching 
server_epoll would bring the epoll backend back in line with the select based 
one.
That makes my cqueues patch obsolete and I propose the following change of 
line 132 instead:
-local next_time = elapsed+ret;
+local next_time = monotonic()+ret;

-tmolitor



-- 
You received this message because you are subscribed to the Google Groups 
"prosody-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to prosody-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/prosody-dev/5240858.JIUQPJlfFm%40laptop.

Reply via email to