> Can you update the commit message to match our usual format? And below > the first line include a description of exactly how the fix works. If > this is a workaround that should be removed in the future, please also > add a COMPAT comment in the code linking to the issue it is working > around. Sure
> You moved some things around, but as far as I can tell the actual > change is to switch from 'return timeout' to 'timer.add_task(timeout, > ...)'. Yes it is, but I did not want to have this timeout handling in two places, so I coalesced it into one single function. > From what you wrote on the lua-http issue report I understand this is > to prevent rescheduling the timer (which would be immediately > executed) but create a new one instead (which will be executed on the > next "tick"). Yes > However...... I don't see that this is actually the case. In Prosody > `return timeout` from a timer callback already creates a new timer. It > doesn't reschedule the current one. So I feel like I'm missing > something. Any ideas? Also, what network backend are you experiencing > the issue with? Each has slightly different handling of timers. I'm using the epoll backend and here it does reschedule the current one. net/server_epoll.lua line 133 inserts a new timer using the variable "elapsed" which is invariant in the whole timer loop --> returning 0.0 from the timer sets the new timer's timestamp to "elapsed". That timer gets peeked later in the loop and thus the if in line 124 does not trigger (peek > elapsed), which makes the loop execute the rescheduled timer again without ever leaving the timer loop at all. Adding a new timer (with zero timeout) through the timer api will not use the "frozen" elapsed variable, but a new call to monotonic() to calculate the timeout, which will be biggern than "elapsed" and will be executed only after the next epoll() call. net/server_select.lua does not have this behaviour as far as I can tell. > All this said, I'm not opposed to merging the patch (with the changes > mentioned above) because I'm fairly confident nobody but you is using > net.cqueues right now and it's a critical production issue for you. > But I'd still like to understand the problem and solution to make sure > we're not missing something (e.g. a deeper bug in Prosody's timer > handling). Sure, maybe a better fix would be to fix net/server_epoll.lua to not allow creating endless loops through timers at all. Initially I didn't want to patch server_epoll because I was not sure what assumptions the prosody codebase makes about the network backend module. But given that server_select does not exhibit the same behaviour, patching server_epoll would bring the epoll backend back in line with the select based one. That makes my cqueues patch obsolete and I propose the following change of line 132 instead: -local next_time = elapsed+ret; +local next_time = monotonic()+ret; -tmolitor -- You received this message because you are subscribed to the Google Groups "prosody-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to prosody-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/prosody-dev/5240858.JIUQPJlfFm%40laptop.