On Fri, Oct 25, 2013 at 3:02 PM, David Lang <[email protected]> wrote: > On Fri, 25 Oct 2013, Pavel Levshin wrote: > > 25.10.2013 16:28, David Lang: >> >> no matter what we do with global/state variables, this will be >>> problematic because the two log messages may end up being processed by >>> different threads, the first log message being at the end of a batch for >>> thread 1 while the second log message being at the beginning of a batch >>> processed by thread 2 and therefor be processed first. >>> >>> SIMD can be thought of as effectively having a thread per message >>> spawned for each statement for every message in the batch. They aren't OS >>> threads, but the resulting race conditions are similar (just that the 'odd' >>> thing happens regularly) >>> >>> But even without SIMD, threading and batching will break the same use >>> cases, just not as consistantly. >>> >>> >>> I think a statement along the lines of the following is going to be >>> needed in any case, and with such expectations for state variables set (as >>> opposed to global variables, which as you say bring special expectations), >>> more restrictions may not be needed. >>> >>> Rsyslog is makes very heavy use of threads and out-of-order log >>> processing for performance. This results in two major limitations for the >>> use of state variables >>> >>> Changes to state variables may not be visible when processing log >>> messages that arrive 'shortly' after the log message that triggers the >>> change, or they may be visible when processing log messages that arrived >>> 'shortly' before the log message that triggers the change. On systems >>> handling hundreds of thousands of logs per second, 'shortly' can be +- a >>> few thousand log messages depending on the configuration, generally still >>> within a small fraction of a second. >>> >>> State variables may be changed multiple times during the processing of >>> a single log message as the result of the processing other log messages. >>> >>> This means that State variables are not suitable for several use cases >>> >>> Log Correlation (if message A is followed immediatly by message B) >>> >>> Counters ($/x = $/x + 1) >>> >>> State variables are useful for things that change infrequently, and >>> where the results of the changes do not have to take effect immediately. >>> Such uses include: >>> >>> setting per-system values at startup that never change >>> >>> toggling outputs (if debuglog then set $/debug=1; if $/debug==1 then >>> <action>) >>> >>> redirecting outputs (if specialmessage then set >>> $/filename=re_extract()) >>> >>> >>> >>> you could implement the entire shadow system as you outlined and narrow >>> the problem a little bit, but without eliminating threads and batching, I >>> don't think you can eliminate the problems. >>> >>> So I think that rather than trying to "make global variables work", the >>> better answer is to rename them, and implement a separate set of counter >>> related functions. >>> >> >> Log correlation is impossible in general, because messages are reordered. >> It has nothing with global/state variables. >> > > much harder, not quite impossible. But in any case, not something that > rsyslog wants to support. > > with SEC you can do something like > > set flag1 when you see message1 > set flag2 when you see message2 > > if flag1 and flag2 then alert > > this only works if flags can be set to clear after time. > > > Precise counters are possible, if atomic operations are in place. SIMD is >> no problem for statements like x=x+1, because there is no concurrency >> between messages in a batch for a given statement. But concurrent threads >> require true atomicity. Luckily enough, for counters you need just one new >> atomic function. >> > > I think you need two, you need a report_and_set() function as well as > atomic_add() > > the atomic_add() will leave the variable in consistent state. So any read will read a valid value. If used exclusively, read will always return a "valid" value.
> > If shadow system were implemented as described, the problem could be >> relaxed greatly. Of course, some words of warning should be said. >> > > does it relax it enough to be worth keeping the same syntax since it > breaks in so many other corner cases? I still can't see where it is so seriously broken, but I've finally run out of time to try to settle this. > or is it better to completely split off the counter use-case into > something only accessed via functions and not have the counters show up as > normal looking variables? > OK, unfortunately, we are going in circles and probably all of this is not really worth the effort. I need to think a bit more about it and most importantly talk to the folks that over here that would need to support that beast. My gut feeling is that we are back to Tuesday morning, and I will remove global variables and not introduce state vars. I really don't like introducing such a nondeterministic and counter-intuitive beast (global vars), no matter how good the use cases may look. Instead, I think I'll merge in mmsequence, which keeps track of the load balancing issue ... and will object extending it to any case that does not play well with the current engine (like setting and reading variables at will). read-only vars can be done via the lookup table system... and the rest must wait until the engine is ready, whenever this is. Sorry I found no better solution, but I need to finally table that beast. Doesn't make sense to invest even more time in something that sounds like very controversial. Then better use that towards engine enhancements in general. But, again, it's a gut feeling and I'll verify over the weekend. Thanks though for the discussion! _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

