On Fri, 25 Oct 2013, Rainer Gerhards wrote:

On Fri, Oct 25, 2013 at 12:34 PM, David Lang <[email protected]> wrote:

On Fri, 25 Oct 2013, Rainer Gerhards wrote:

 On Fri, Oct 25, 2013 at 10:42 AM, David Lang <[email protected]> wrote:

 On Fri, 25 Oct 2013, Rainer Gerhards wrote:

 Hi all,


I thought out the details of what I have on my mind and think the
solution
can work and support all known use cases. I've also managed to write it
down this morning:

http://blog.gerhards.net/2013/****10/a-proposal-for-rsyslog-**<http://blog.gerhards.net/2013/**10/a-proposal-for-rsyslog-**>
state-variables.html<http://**blog.gerhards.net/2013/10/a-**
proposal-for-rsyslog-state-**variables.html<http://blog.gerhards.net/2013/10/a-proposal-for-rsyslog-state-variables.html>



I would appreciate if you could check it and see how the spec can be
technically broken or identify use cases which it will not be able to
handle.



One case I don't think it can handle is the current work that mmcount is
doing.

As I understand mmcount, it creates a counter for each appname that it
sees, and uses that to count how many times that appname has been seen.

the fact that you can use $/a!b lets you have the variables without
having
the conflict with anything else,

but you end up with no way to know what variables exist

you can't output the entire set of counts without being able to use $/a

you can't even lookup the count for the current message's appname without
being able to use domething like $a/{$appname}


 you are right, that use case breaks. Thanks! I'll review mmcount in more
depth now and see how it is actually used.

The ability to support subtrees can be added, but will be relatively
expensive.


I don't think you need normal full subtree support, you could get away
with a function like export($/a) to return the tree, this could be with
atomic locking if needed.

I think the hardest thing to do (although possibly the most valuable in
the long run) is the concept of $[/!.]a!{$b} to allow you to use a variable
as part of the name of another variable.


right - but we need to think a bit of what we want to do with the *config*
file script. This was never intended to morph into a real programming
language with an underlying syslog library ;)

I agree, which is why I'm not suggesting foreach and similar :-)

Beside this general consideration, I agree (and tend to think that this
doesn't yet go totally over the edge). Indirection may even not so hard to
implement -- but I want to stay focussed, I can't do this now as a
side-activity in any case (and global vars tell a bit about doing something
quickly as a side-activity ;)).

agreed, I do have an interesting case that popped up related to this

an app sends out structured log data of the format

[context@12345 environment="qa"][event@12345 <details>]

where 12345 is a category# relavent to the business and <details> is a lot of additional name=value pairs

which rsyslog converts to

$!RFC5424-SD!context@12345!environment = "qa"
$!RFC5424-SD!event@12345!<details...>

I would like to be able to say the equivalent of:

set $!environment = $!RFC5424-SD!context@*!environment = "qa"

and also be able to say extract the category value

I suspect that when people start working with JSON a lot more, we're going to see cases where they do the same sort of silly thing and encode data into a JSON tag name :-(

 depending on how large the table of global variables ends up, I wonder if
it would be easier (and because of the simpler design, possibly even
faster) to just make a shadow copy of all global variables at the start
of
the message processing, (I'm thinking the RCU read-copy-update) mechanism
may be a good fit for this)


I don't think so, as with each of this "copies" I would  need to update
work flags (like modifiable) for all variables. Also, I need to know which
one is actually shadowed, so that would be a new flag (currently, it is
just a failed lookup on the shadow dictionary).


my thought is that you essentially shadow everything. each variable would
have a flag that in the master copy is modifiable.

for each message, you would

grab a pointer to the 'current' master table.

if all you do is read variables, you read them from the copy you have a
pointer to.

when you do an update, you copy the entire table, modify your copy and it
becomes the new master copy (with a different pointer), any other thread
has a pointer to the old copy and keeps using it (making the old master
become the 'shadow' version for all the other threads). once all threads
have finished a batch, old copies can be garbage collected (how to track
this is a different discussion that I will leave out for the moment for
simplicity)


That won't work, as it would not permit in-batch math. Assume 10 messages
and one counter var (currently at 1). Each message would update it's own
shadow, going from 1 to 2. So after batch execution, each of these shadows
would have the value 2, whereas it would need to be 2, 3, 4, 5, ..., 11 for
the respective messages. So updates must actually be made to a global state.

correct, this would not work for math (in batch or across threads)


atomic operations would have to lock the pointer to the current master
table, check to see if it matches the pointer the thread is using, if so,
continue normally, if not, it would have to update the variable based on
the current master, not it's 'shadow' copy.


 My  assumption (I should
have spelled this out) is also that we have a very low number of state
variable updates, and even a low number of read accesses to them.


I think this assumption is incorrect. I think there are four distinct
use-cases

1. no global variables in use

no updates, no reads. (obviously :-)

2. global variables used for configuration/path type capabilities

few, if any updates, extremely frequent reads.

3a. counters for reporting

extremely frequent updates, infrequent reads.

3b. counters for load balancing or sequence numbers

extremely frequent updates and reads.



sorry, my wording implied another thing. I actually did not mean the number
of update opertions, but the number of variables involved.

ahh, ok, that makes a bit more sense.

the RCU approach works extremely well for cases #1 and #2, but not for #3a
and #3b

I'm actually thinking that we should possibly split the two use cases.

State Variables ($/) would be intended for infrequent updates (case #2)
and could be handled very well with the RCU style mechanism that I'm
talking about. If you use them for counters, performance will be poor and
you have no atomic operations

Counter Variables would only be accessed through function calls

atomic_set(counter='varname' value=number)
  sets value, returns oldvalue

atomic_add(counter='varname' step=number)
  modifies value, returns newvalue

atomic_report(counter='**varname')
  returns value

Counter variables would not be shadowed, they would be true global
variables, and can be modified in unexpected ways between accesses. if you
want to use the value of a counter for something, you save it's value into
a local/message variable. Because these do not look or act like normal
variables, there isn't the expectation that you can do math and multiple
actions on them.


A general note: I have done already very large changes during the
refactoring. Thus, I would like to keep whatever I do to as simple as
possible, to reduce any further bug potential. We can always optimize later
on. Also, I  really have hard time constraints (getting harder each
day...). The simpler the idea, the faster I can code it up. I may become a
question between "I can do it" and "I put it on the TODO list and finally
disable global vars without replacement". So let's try to stay focussed on
the simple solution. Optimizations can always happen later, even month
later.

correct, and thinking about the problem more, the RCU stuff is not needed for 'correctness' here in any case, just saying that counters need to be done differently.

Just think that I orignally wanted to improve imfile for those frequently
requested use cases (much more frequently requested than state variables!).
This is already dead for the next time, and I am not very happy with that.
So let's try to get to a very simple, but *working* solution. As a
side-note, I am not sure if I can work on any of this next week, so I had
preferred to finish some things today, but I guess that's no longer
possible (and not so since yesterday, so it probably looked bad since
mid-week...).

my feeling is just to disallow math on global variables for the short to mid term.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to