Hi JF, thanks for the note. Of course, it triggers a couple of responses (see below).
Rainer > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:rsyslog- > [EMAIL PROTECTED] On Behalf Of Jan-Frode Myklebust > Sent: Sunday, March 09, 2008 11:05 PM > To: [email protected] > Subject: Re: [rsyslog] wiki with various application logging examples > > On 2008-03-09, Rainer Gerhards <[EMAIL PROTECTED]> wrote: > > > > Now let me take on the imfile example. The key point - at least IMHO > - > > is that there is no single line in rsyslog core's code that has been > > added in support of imfile. And, more importantly, if imfile would go > > away, not a single line of code could be removed. So the imfile > plugin > > (project) does neither add complexity nor code nor other overhead to > > rsyslog core. So can it be evil? > > It's evil that you're forcing me to upgrade to rsyslog v3.x to take > advantage of it ;-) That's what triggered my previous post.. I want > to be tracking non-syslog logfiles, and if imfile/plugins where more > in the unix philosophy of small tools that chain easily.. I might have > been able to pick it from v3.x and use it on stable systems. I've see the smily but let me do a couple of comments. First of all, the "philosophy of small tools that chain easily" implies that they chain via a pipe. This is great for a lot of applications, but it has its drawbacks. A plain pipe is a simplex, relatively loosely coupled IPC method. So if one part of the pipe dies, other parts will learn about it eventually, but not at the same instant and, most importantly, they do not exactly know what was processed and what not. It's pretty much the same thing as plain TCP transport, which pretends reliability but still has a few windows of exposure where message loss may occur (see http://www.monitorware.com/en/workinprogress/selp.txt section 2.4 and, yes, rsyslog is victim to this as long as we don't have a full RFC 3195 implementation). So relying on plain pipes is not exactly my premier communications method if I would like to implement a reliable syslogd (and "reliable" is the "r" in rsyslog). One can circumvent the problem by defining an app-level protocol on top of the pipe, working with app-level acks. HOWEVER, with that you would run into the pretty same situation, namely that you could not use a new version together with an old engine. One can circumvent that, too, but only as far as the old engine has implemented such methodology. And v1/2 do not have this (due to time constraints, rsyslog is still only about half finished...). There is some other reason that makes me avoid pipes. If I do an app-level ack, I need to do at least 4 system calls to pass a single message: 1. input write to pipe 2. engine read msg from pipe 3. engine post processing stat (ACK) to pipe 4. input read ACK from pipe If you do a little bit of math, you'll see with how many user/kernel space transitions you end up, plus how many cycles are needed to run the necessary housekeeping code inside the kernel and libraries. I don't sum them up now, but I am pretty sure that I can completely process the message in less time than is need just for the IPC in that case. That probably is fine if you look at a low-end workstation syslogd, but you don't want to have this overhead if you aim for an enterprise solutions capable of handling massive data input. Also, if everything goes to the system log socket, there is little you can multithread. But we need to be able to multithread as much as possible. If I write a new engine (as I currently do), I would like to see it well working for at least the next 10 years. If I think about hardware trends during that period, it is clear that a single core will become not much faster than it is today. But the number of cores will greatly increase. In order to utilize that, an application must be able to run on as many threads as possible - all with reasonable overhead, of course. So my conclusion is that rsyslog must be able to run massively multithreaded for the high-end use case. This also prohibits using primary interfaces which cannot easily be multithreaded. Finally there is the issue of flow control. Rsyslog *does* flow control, and will do more advanced flow control in the future. Especially with world-dominating UDP syslog it is vitally important to do flow control, because UDP cannot be flow controlled. Sound strange? Well... If we can not flow control UDP syslog, we need to apply intelligent and adoptable flow control that flow controls sources who can (tcp syslog, rfc 3195 and of course file data!) so that buffer space is kept for those precious UDP message which are lost if we can not buffer them at the right instant. If you think this through, you'll see that this requires different level and methods of flow control, depending on the source [so far, I see three levels: can not (UDP), can somewhat (TCP, local sockets), can easily (log files and other sources that generate data themselves)]. Having a native interface greatly reduces code complexity and thus program reliability when it comes at implementing these features. Also, it would require specialized plugins in any case, you couldn't do it with a simple "pipe me in approach" (well, ... some things yes, but at a complexity cost). Also, while I too believe in the Unix approach of small tools, I also think it is important that the average user is able to configure it. Rsyslog aims not only at being enterprise-class but at the some time aims at being easy to use for the novice. Novices don't understand complex scripting to get the job done. I think relying too much on complex glue doesn't help getting the job done. To come back to your orginal post, rsyslogd *should* of course support the Unix way of piping. It looks like there is a small plugin missing to read natively from a pipe. However, I never got a request to implement it. I guess most people use the logger trick to accomplish that task. I know syslog-ng can natively read from pipes, but have not yet considered this important enough given the lack of requests from the community. If you like such a plugin, it's probably a good idea to speak up now ;) > > > > from the small shell script you provided, it looks like there is a > > problem if > > > > a) script is in sleep period > > b) data is appended to text file > > c) text file is rotated > > d) new lines are written to text file > > e) script awake for new polling loop > > > > On a quick look, it looks like the data written in b) will never make > it > > to the syslogd. Imfile handles that. > > Yes, you're right. Cool that imfile handles it. > > > > On the mail output case (though I need to be a bit brief as dinner is > > approaching ;)): I actually intend to add an email output plugin. > > The quote was about being able to *read* mail: I was replying here to Michael Biebel. It looks I was a bit too much in hurry to point this out. Sorry... > > "Every program attempts to expand until it can read mail. > Those programs which cannot so expand are replaced by ones > which can." > > and I was thinking it not too far fetched since "splunk" can do it (it > can > download email messages via IMAP, index them and create alerts on > suspicious > content). In short: receiving email is very low on my agenda. Keep in mind that I already have architecture and at least partly written such a beast on Windows: http://www.monitorware.com/en/Product/product_comparision.php The email question never was in much demand. But if demand comes up, it for sure is not a big thing to add it... (and, of course, you already can do it today with a bit of scripting, the right mailbox rules and logger -- but that isn't appealing to most folks and is one reason I tend to write plugins ;)). > It might seem like you want to take rsyslog in that > direction, i.e. > your complete eventlogd&alertSystem fork of rsyslogd that can read any > input > (syslog, other-logfile, email, snmptraps) and analyze and alert on the > data. > > That's not what *I* want from a syslog server. I just want it to > reliably > collect and store the logs in an organized manner. Then I'll use other > tools to read and analyze them. ... and this is of course perfectly fine with me, too. However, if you look at the core engine needs, you'll see that the "do it all" eventprocessor and the "plain simple syslogd" have exactly the same needs - at least if you would like to extend the syslogd to be enterprise class. Take the queue engine in v3. Its complex. Actually horribly complex. I didn't like to include that complex beast, but it was the only clean solution to the need of being massively concurrent AND being able to queue data while a destination is down. Any other alternative IMHO would have been dirty and hackish - and in the long term much less maintainable. So I just did the right thing (hopefully), even though it was a really big effort and even though it probably will need a few more firedrills before it is really ready for prime time in all scenarios. Another alternative would have been to use one of the big enterprise class message queueing projects. But that would have created a dependency for such a system on each desktop - ouch... I hope I made the right compromise. Currently, the full queue engine is part of the core. If that turns out to be a problem, I can outsource that to a plugin, but that isn't currently very appealing to me. After all, it's "just" some memory overhead - if you don't need the advanced features, no code is executed to do that. The queue uses an internal driver model and simply configuration means simple code. Thus, the v3 queue engine is as reliable as v2, except if you use all those bells and whistles where I am sure currently a few bugs wait to be detected (even though the situation has much improved recently and will improve with each new feedback I receive). Another good example is the config file: of course, there is no need to have a scriptable configuration for a simple syslogd. But while thinking about the (necessary) expression support and a lot of user requests for a better to read config file format, I came to the conclusion that creating a scriptable format is actually the right route to take: http://rgerhards.blogspot.com/2008/02/introducing-rainerscript-and-some. html Anything else (IMHO) would again be less clean, less maintainable and, in this case, would even take longer to implement. So one might think it is evil to include a virtual machine inside a syslogd, but to me it is actually the least effort to implement things. Of course, you can rightly argue that all of this is over the board if you just want to have a plain local logger that takes messages from the local log socket, maybe UDP syslog and store it to local files. You are probably right. But in this case, you can still continue to use sysklogd. After all, rsyslog was spawned from it to create and advanced syslogd. So it comes at no surprise that I am adding features which may not be required for the simple use cases. ;) Let me conclude with two core points: - rsyslog core is as slim as possible, plugins are separate projects that extend the core; This means nobody is forced to run more code than actually required for his job - both a simple, but fairly enhanced syslogd as well as a full network Event processor share the same root engine needs Thus, rsyslog implements this core engine and I occasionally add a plugin here and there to take advantage of the core. Right now, rsyslog core is far from being finished, as are the plugins. At this time, I am working on getting the core right and doing the most requested plugins. When I am done with that, I'll look at the *real* advanced plugins for all kinds of things that users have interest in. I don't see any need to fork of a separate core project for that. In fact, I think it would be counter-productive as I would need to maintain two code bases and the newly forked project would always be able to do what rsyslogd does. So what would be the motivation to maintain another, feature-less project...? Anyhow, I may be totally wrong. Feedback on this topic is still highly appreciated (be it brief or elaborate ;)). I am probably abusing this thread to also tell you a bit about design decisions I have not yet communicated (sorry for that, so much to do, so few time...;)). Rainer > > > > -jf > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog

