On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of [email protected] >> >> On Mon, 20 Apr 2009, Rainer Gerhards wrote: >> >>> David, >>> >>> I start with some quick pointers. I think it makes sense to move the >> results >>> of this discussion into a document - or alternatively move it to the >> wiki, if >>> you (or others) find this useful. I have to admit that I am a bit >> skeptic >>> about the wiki, I guess mail is better for discussion here. But I >> wanted to >>> mention this option. >>> >>> Now on to the meat: >>> >>>> -----Original Message----- >>>> From: [email protected] [mailto:rsyslog- >>>> [email protected]] On Behalf Of [email protected] >> >> >> hmm, I suspect that having the 'direct' mode able to do this IFF (if >> and only if) all output modules are able to do the multi-message >> handling >> would be a win. > > You can't do that, because if it is in direct mode, there always is at most > one message inside the queue. You can not operate on the main message queue > "batch", as this is not yet filtered, so you do not know which message is for > which action. So, from the action perspective, nothing is queued at this > point. Thus, you need a queue running in a real queue mode. I hope it will > become more clear if you have looked at the data flow (otherwise I need to > write some big overview about it...).
I had not thought about the filtering issue >> >> specificly I expect to find that the locking process to deliver a >> single >> message is expensive enough > > This is handled by the main queue batch. So even in direct mode, we have the > benefit from the locking code improvement (I agree, potentially a *very big* > gain). I guess you currently think of a single big queue inside rsyslog, > which is the wrong picture. We have chained queues and you always need to > look which part of the message processing works on which queues. Very > important implications! this is a big difference. yes, I was thinking that there was one big queue (unless you defined action queues explicitly), I'll pay very careful attention to the tutorial and let you know if it explains this. >> that it's a big win even for the simple >> default case of writing to a file. I also expect to see wins for moving >> events from the main queue to the action queues. > > Yup, thus the direct mode oft he action queue does not affect the main queue > at all (and in direct mode we have no locing in the action queues, why should > we ... nothing needs to by synchronized if you just stick the message into > the output...) and if you have multiple output threads? >>> It gets messy when there is failure in the actions and it gets very >> complex >>> if we think about the various shutdown scenarios (not to mention disk >>> assisted queues actually running in DA mode). I have begin to look at >> these >>> issues (part of today's and over-the-weekend thinking ;)), but this >> will >>> probably need some more time to finally solve - plus some discussion, >> I >>> guess... >> >> would it simplify things significantly to say that the multi-message >> output and having multiple worker threads are exclusive? > > Unlikely (but I don't like to totally outrule it, probability less than 5%) Ok, not an issue then >> >>>> >>>> X=max_messages >>>> >>>> if (messages in queue) >>>> mark that it is going to process the next X messages >>>> grab the messages >>>> format them for output >>>> attempt to deliver the messages >>>> if (message delived sucessfully) >>>> mark messages in the queue as delivered >>>> X=max_messages (reset X in case it was reduced due to delivery >>>> errors) >>>> else (delivering this batch failed, reset and try to deliver the >>>> first half) >>> >>> I think, in our previous discussion (mailing list archive), we >> concluded that >>> there is no value in re-trying with half of the batch. >> >> very possibly, I'm not remembering it. >> >> not doing so will simplify the code considerably, but the advantages of >> retrying with half the batch are: >> >> 1. you deliver as much as you can >> >> 2. when you finally get stuck, you can pinpoint directly what message >> you >> were stuck on (in case you have a failure based on the data, say quotes >> in >> something that then gets formatted into a database, or slashes in >> something that becomes a filename component) >> >> your call > > I need to refer you back to our previous discussion. Unfortunately, it was > private. I dug the link out and sent it via private mail. Sorry all others, > please stand by a little moment. If I have not read it wrong, it boiled down > to we have no non-transactional sources that were problematic and we had not > identified cases where it would be useful to retry with fewer elements. > > I'd provide a more complete description, but that would probably take me > another 2...4 hours, and I hope to get around (yes, it was a reeeaaaly long > discussion). David, if you like to quote anything from me, feel free to do > so. I'll dig through this today and tonight and review this to be clear, I'm mostly concerned about the debugging/troubleshooting issues (which one of these 1000 messages made the database complain..). but I guess this can be addressed by stopping rsyslog and restarting it with a smaller batch size until you track it down. it should be rare enough to make that tolerable. David Lang _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

