On Mon, 20 Apr 2009, Rainer Gerhards wrote:

> David,
>
> I start with some quick pointers. I think it makes sense to move the results
> of this discussion into a document - or alternatively move it to the wiki, if
> you (or others) find this useful. I have to admit that I am a bit skeptic
> about the wiki, I guess mail is better for discussion here. But I wanted to
> mention this option.
>
> Now on to the meat:
>
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of [email protected]
>> Sent: Saturday, April 18, 2009 12:29 AM
>> To: rsyslog-users
>> Subject: [rsyslog] multi-message handling and databases
>>
>> the company that I work for has decided to sponser multi-message queue
>> output capability, they have chosen to remain anonomous (I am posting
>> from
>> my personal account)
>>
>> there are two parts to this.
>>
>> 1. the interaction between the output module and the queue
>>
>> 2. the configuration of the output module for it's interaction with the
>> database
>>
>> for the first part (how the output module interacts with the queue),
>> the
>> criteria are that
>>
>> 1. it needs to be able to maintain guarenteed delivery (even in the
>> face
>> of crashes, assuming rsyslog is configured appropriately)
>>
>> 2. at low-volume times it must not wait for 'enough' messages to
>> accumulate, messages should be processed with as little latency as
>> possible
>>
>>
>>
>> to meet these criteria, what is being proposed is the following
>>
>> a configuration option to define the max number of messages to be
>> processed at once.
>>
>> the output module goes through the following loop
>
> This sentence covers much of the complexity of this change ;)
>
> The "problem" is that is it the other way around. It is not the output module
> that asks the queue engine for data, it is the queue engine that pushes data
> to the output module. While this sounds like a simple change of positions, it
> has greater implications.
>
> ... especially if you think about the data flow. At this point, it may make
> sense to review the data flow. I have described it here:
>
> http://www.rsyslog.com/Article350.phtml

I will do this later today.

> Even if you don't listen to the presentation, the diagram is useful. In it,
> you see there are n queues, with n being 1 + number of actions. The "1"-queue
> is the main message queue. So each message moves first into the main queue,
> is dequeued there (in the push-way described above), run through the filter
> engine and then placed into the relevant action queues.
>
> So the new interface does not necessarily need to modify the main queue (but
> there is much benefit in doing so). But it must change the way action queues
> deliver messages. That, in turn, means that the new batch mode can only work
> if the action is configured to use any actual queueing mode (not the default
> "DIRECT" mode, where incoming messages are directly handed over to the action
> processing without any actual in-memory buffering).

hmm, I suspect that having the 'direct' mode able to do this IFF (if 
and only if) all output modules are able to do the multi-message handling 
would be a win.

specificly I expect to find that the locking process to deliver a single 
message is expensive enough that it's a big win even for the simple 
default case of writing to a file. I also expect to see wins for moving 
events from the main queue to the action queues.

> So the approach is probably to enhance the queue object (which drives both
> the main and action queues) to support dequeueing of multiple messages at
> once (what, as a side-effect, will also greatly reduce looking conflicts).
> Under normal operations, this is relatively straightforward.

so far so good.

> It gets messy when there is failure in the actions and it gets very complex
> if we think about the various shutdown scenarios (not to mention disk
> assisted queues actually running in DA mode). I have begin to look at these
> issues (part of today's and over-the-weekend thinking ;)), but this will
> probably need some more time to finally solve - plus some discussion, I
> guess...

would it simplify things significantly to say that the multi-message 
output and having multiple worker threads are exclusive?

>>
>> X=max_messages
>>
>> if (messages in queue)
>>    mark that it is going to process the next X messages
>>    grab the messages
>>    format them for output
>>    attempt to deliver the messages
>>    if (message delived sucessfully)
>>      mark messages in the queue as delivered
>>      X=max_messages (reset X in case it was reduced due to delivery
>> errors)
>>    else (delivering this batch failed, reset and try to deliver the
>> first half)
>
> I think, in our previous discussion (mailing list archive), we concluded that
> there is no value in re-trying with half of the batch.

very possibly, I'm not remembering it.

not doing so will simplify the code considerably, but the advantages of 
retrying with half the batch are:

1. you deliver as much as you can

2. when you finally get stuck, you can pinpoint directly what message you 
were stuck on (in case you have a failure based on the data, say quotes in 
something that then gets formatted into a database, or slashes in 
something that becomes a filename component)

your call

>>      unmark the messages that it tried to deliver (putting them back
>> into the status where no delivery has been attempted)
>>      X=int(# messages attempted / 2)
>>      if (X=0)
>>        unable to deliver a single message, do existing message error
>> process
>>
>>
>>
>> this approach is more complex than a simple 'wait for X messages, then
>> insert them all', but it has some significant advantages
>>
>> 1. no waiting for 'enough' things to happen before something gets
>> written
>>
>> 2. if you have one bad message, it will transmit all the good messages
>> before the bad one, then error out only on the bad one before picking
>> up
>> with the ones after the bad one.
>
> This needs to be specified. Again, I think our prior conclusion was that this
> would not make much sense. After all, if e.g. a SQL statement is invalid in
> the template, how should it recover? If the sql statement is correct, why
> should it eternally fail? Or should we drop a message if it fails after n
> attempts (OK, we can do that already ;)). Hard to do for non-transactional
> outputs.

as noted above, I'm thinking in terms of the data in the particular log 
message being something that it shouldn't be, that causes problems for the 
output module

for databases this could be quotes

for file output with dynamic files you could get a hostname or program 
that has a slash (or ../../../../../../etc/shadow) in it.

in theory these should all be detected by the module and scrubbed before 
being submitted, in practice bugs happen (especially if/when rsyslog 
starts dealing with unicode messages), being able to pinpoint 'this is the 
message that I was unable to deal with' is very helpful.

with a vector interface, another option would be to allow the output 
module to report back how many of the submitted messages it sucessfully 
delivered. that way any 'retry half' type logic could be in the module, 
and only if it makes sense. for a file output module, if you ran out of 
disk space partway through the write, it could report on the number that 
it sucessfully wrote.

as I said before, your call.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to