On 15/02/19 08:48 +0100, Jan Friesse wrote: > Ulrich Windl napsal(a): >> IMHO any process running at real-time priorities must make sure >> that it consumes the CPU only for short moment that are really >> critical to be performed in time.
Pardon me, Ulrich, but something is off about this, especially if meant in general. Even if the infrastructure of the OS was entirely happy with switching scheduling parameters constantly and in the furious rate (I assume there may be quite a penalty when doing so, in the overhead caused with reconfiguration of the schedulers involved), the time-to-proceed-critical sections do not appear to be easily divisible (if at all) in the overall code flow of a singlethreaded program like corosync, since everything is time-critical in a sense (token and other timeouts are ticking), and offloading some side/non-critical tasks for asynchronous processing is likely not on the roadmap for corosync, given the historical move from multithreading (only retained for logging for which an extra precaution is needed so as to prevent priority inversion, which will generally always be a threat when unequal priority processes do interface, even if transitively). The step around multithreading is to have another worker process with IPC of some sorts, but with that, you only add more overhead and complexity around such additionally managed queues into the game (+ possibly priority inversion yet again). BTW. regarding "must make sure" part, barring self-supervision of any sort (new complexity + overhead), that's a problem of fixed-priority scheduling assignment. I've been recently raising an awareness of (Linux-specific) *deadline scheduler* [1,2], which: - has even higher hierarchical priority compared to SCHED_RR policy (making the latter possibly ineffective, which would not be very desirable, I guess) - may better express not only the actual requirements, but some "under normal circumstances, using reasonably scoped HW for the task" (speaking of hypothetical defaults now, possibly user configurable and/or influenced with actual configured timeouts at corosync level) upper boundary for how much of CPU run-time shall be allowed for the process in absolute terms, possibly preventing said livelock scenarios (being throttled when exceeded, presumably speeding the loss of the token and subsequent fencing up) Note that in systemd deployments, it would be customary for the service launcher (unit file executor) to actually expose this stuff as yet another user-customizable wrapping around the actual run, but support for this very scheduling policy is currently missing[3]. >> Specifically having some code that performs poorly (for various >> reasons) is absolutely _not_ a candidate to be run with real-time >> priorities to fix the bad performance! You've managed to flip (more-or-less, have no contrary evidence) isolated occurrence of evidently buggy behaviour to a generalized description of the performance of the involved pieces of SW. If that was that bad, we would hear there's not enough room for the actual clustered resources all the time, but I am not aware of that. With buggy behaviour, I mean, logs from https://clbin.com/9kOUM and https://github.com/ClusterLabs/libqb/commit/2a06ffecd bug fix from the past seem to have something in common, like high load as a surrounding circumstance, and the missed event/job (on, presumably a socket, fd=15 in the log ... since that never gets handled even when there's no other input event). Guess that another look is needed at _poll_and_add_to_jobs_ function (not sure why it's without leading/trailing underscore in the provided gdb backtrace [snipped]: >>> Thread 1 (Thread 0x7f6fd43c7b80 (LWP 16242)): >>> #0 0x00007f6fd31c5183 in epoll_wait () from /lib64/libc.so.6 >>> #1 0x00007f6fd3b3dea8 in poll_and_add_to_jobs () from /lib64/libqb.so.0 >>> #2 0x00007f6fd3b2ed93 in qb_loop_run () from /lib64/libqb.so.0 >>> #3 0x000055592d62ff78 in main () ) and its use. >> So if corosync is using 100% CPU in real-time, this says something >> about the code quality in corosync IMHO. ... or in any other library that's involved (primary suspect: libqb) down to kernel level, and, keep in mind, no piece of nontrivial SW is bug-free, especially if the reproducer requires rather a specific environment that is not prioritized by anyone incl. those tasked with quality assurance. >> Also SCHED_RR is even more cooperative than SCHED_FIFO, and another >> interesting topic is which of the 100 real-time priorities to >> assign to which process. (I've written some C code that allows to >> select the scheduling mechanism and the priority via command-line >> argument, so the user and not the program is responsible if the >> system locks up. Maybe corosync should thing about something >> similar. > > And this is exactly why corosync option -p (-P) exists (in 3.x these > were moved to corosync.conf as a sched_rr/priority). > >> Personally I also think that a program that sends megabytes of XML >> as realtime-priority task through the network is broken by design: >> If you care about response time, minimize the data and processing >> required before using real-time priorities. This is partially done already (compressing big XML chunks) before sending on the pacemaker side. The next reasonable step there would be to move towards some of the nicely wrapped binary formats (e.g. Protocol Buffers or FlatBuffers[4]), but it is a speculative long-term direction, and core XML data interchange will surely be retained for a long long time for compatibility reasons. Other than that, corosync doesn't interpret transferred data, and conversely, pacemaker daemons do not run with realtime priorities. >>>> Edwin Török <edvin.to...@citrix.com> 14.02.19 18.34 Uhr >>> [...] >>> >>> This appears to be a priority inversion problem, if corosync runs >>> as realtime then everything it needs (timers...) should be >>> realtime as well, otherwise running as realtime guarantees we'll >>> miss the watchdog deadline, instead of guaranteeing that we >>> process the data before the deadline. This may not be an immediate priority inversion problem per se, but (seemingly) a rare bug (presumably in libqb, see the other similar one above) accented with the fixed-priority (only very lightly upper-bounded) realtime scheduling and the fact this all somehow manages to collide with as vital processes as those required for an actual network packets delivery, IIUIC (yielding some conclusions about putting VPNs etc. into the mix). Note sure if this class of problems in general would be at least partially self-solved with a deadline (word used twice in the above excerpt, out of curiousity) scheduling with some reasonable parameters. >>> [...] >>> >>> Also would it be possible for corosync to avoid hogging the CPU in >>> libqb? ...or possibly (having no proof), for either side not to get inconsistent event tracking, which may slow any further progress down (if not preventing it), see the similar libqb issue referenced above. >>> (Our hypothesis is that if softirqs are not processed then timers >>> wouldn't work for processes on that CPU either) Interesting. Anyway, thanks for sharing your observations. >>> [...] [1] https://lwn.net/Articles/743740/ [2] https://lwn.net/Articles/743946/ [3] https://github.com/systemd/systemd/issues/10034 [4] https://bugs.clusterlabs.org/show_bug.cgi?id=5376#c3 -- Jan (Poki)
pgpnPDGNsL3fv.pgp
Description: PGP signature
_______________________________________________ Users mailing list: Users@clusterlabs.org https://lists.clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org