On Fri, 25 Oct 2013, Pavel Levshin wrote:
25.10.2013 16:28, David Lang:
no matter what we do with global/state variables, this will be problematic
because the two log messages may end up being processed by different
threads, the first log message being at the end of a batch for thread 1
while the second log message being at the beginning of a batch processed by
thread 2 and therefor be processed first.
SIMD can be thought of as effectively having a thread per message spawned
for each statement for every message in the batch. They aren't OS threads,
but the resulting race conditions are similar (just that the 'odd' thing
happens regularly)
But even without SIMD, threading and batching will break the same use
cases, just not as consistantly.
I think a statement along the lines of the following is going to be needed
in any case, and with such expectations for state variables set (as opposed
to global variables, which as you say bring special expectations), more
restrictions may not be needed.
Rsyslog is makes very heavy use of threads and out-of-order log
processing for performance. This results in two major limitations for the
use of state variables
Changes to state variables may not be visible when processing log
messages that arrive 'shortly' after the log message that triggers the
change, or they may be visible when processing log messages that arrived
'shortly' before the log message that triggers the change. On systems
handling hundreds of thousands of logs per second, 'shortly' can be +- a
few thousand log messages depending on the configuration, generally still
within a small fraction of a second.
State variables may be changed multiple times during the processing of a
single log message as the result of the processing other log messages.
This means that State variables are not suitable for several use cases
Log Correlation (if message A is followed immediatly by message B)
Counters ($/x = $/x + 1)
State variables are useful for things that change infrequently, and where
the results of the changes do not have to take effect immediately. Such
uses include:
setting per-system values at startup that never change
toggling outputs (if debuglog then set $/debug=1; if $/debug==1 then
<action>)
redirecting outputs (if specialmessage then set
$/filename=re_extract())
you could implement the entire shadow system as you outlined and narrow the
problem a little bit, but without eliminating threads and batching, I don't
think you can eliminate the problems.
So I think that rather than trying to "make global variables work", the
better answer is to rename them, and implement a separate set of counter
related functions.
Log correlation is impossible in general, because messages are reordered. It
has nothing with global/state variables.
much harder, not quite impossible. But in any case, not something that rsyslog
wants to support.
with SEC you can do something like
set flag1 when you see message1
set flag2 when you see message2
if flag1 and flag2 then alert
this only works if flags can be set to clear after time.
Precise counters are possible, if atomic operations are in place. SIMD is no
problem for statements like x=x+1, because there is no concurrency between
messages in a batch for a given statement. But concurrent threads require
true atomicity. Luckily enough, for counters you need just one new atomic
function.
I think you need two, you need a report_and_set() function as well as
atomic_add()
If shadow system were implemented as described, the problem could be relaxed
greatly. Of course, some words of warning should be said.
does it relax it enough to be worth keeping the same syntax since it breaks in
so many other corner cases? or is it better to completely split off the counter
use-case into something only accessed via functions and not have the counters
show up as normal looking variables?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.