Re: [rsyslog] wiki with various application logging examples

Rainer Gerhards Mon, 10 Mar 2008 00:53:02 -0700

Hi JF,

thanks for the note. Of course, it triggers a couple of responses (see
below).

Rainer

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:rsyslog-
> [EMAIL PROTECTED] On Behalf Of Jan-Frode Myklebust
> Sent: Sunday, March 09, 2008 11:05 PM
> To: [email protected]
> Subject: Re: [rsyslog] wiki with various application logging examples
> 
> On 2008-03-09, Rainer Gerhards <[EMAIL PROTECTED]> wrote:
> >
> > Now let me take on the imfile example. The key point - at least IMHO
> -
> > is that there is no single line in rsyslog core's code that has been
> > added in support of imfile. And, more importantly, if imfile would
go
> > away, not a single line of code could be removed. So the imfile
> plugin
> > (project) does neither add complexity nor code nor other overhead to
> > rsyslog core. So can it be evil?
> 
> It's evil that you're forcing me to upgrade to rsyslog v3.x to take
> advantage of it ;-) That's what triggered my previous post.. I want
> to be tracking non-syslog logfiles, and if imfile/plugins where more
> in the unix philosophy of small tools that chain easily.. I might have
> been able to pick it from v3.x and use it on stable systems.

I've see the smily but let me do a couple of comments. First of all, the
"philosophy of small tools that chain easily" implies that they chain
via a pipe. This is great for a lot of applications, but it has its
drawbacks. A plain pipe is a simplex, relatively loosely coupled IPC
method. So if one part of the pipe dies, other parts will learn about it
eventually, but not at the same instant and, most importantly, they do
not exactly know what was processed and what not. It's pretty much the
same thing as plain TCP transport, which pretends reliability but still
has a few windows of exposure where message loss may occur (see
http://www.monitorware.com/en/workinprogress/selp.txt section 2.4 and,
yes, rsyslog is victim to this as long as we don't have a full RFC 3195
implementation).

So relying on plain pipes is not exactly my premier communications
method if I would like to implement a reliable syslogd (and "reliable"
is the "r" in rsyslog). One can circumvent the problem by defining an
app-level protocol on top of the pipe, working with app-level acks.
HOWEVER, with that you would run into the pretty same situation, namely
that you could not use a new version together with an old engine. One
can circumvent that, too, but only as far as the old engine has
implemented such methodology. And v1/2 do not have this (due to time
constraints, rsyslog is still only about half finished...).

There is some other reason that makes me avoid pipes. If I do an
app-level ack, I need to do at least 4 system calls to pass a single
message:

1. input write to pipe
2. engine read msg from pipe
3. engine post processing stat (ACK) to pipe
4. input read ACK from pipe

If you do a little bit of math, you'll see with how many user/kernel
space transitions you end up, plus how many cycles are needed to run the
necessary housekeeping code inside the kernel and libraries. I don't sum
them up now, but I am pretty sure that I can completely process the
message in less time than is need just for the IPC in that case. That
probably is fine if you look at a low-end workstation syslogd, but you
don't want to have this overhead if you aim for an enterprise solutions
capable of handling massive data input.

Also, if everything goes to the system log socket, there is little you
can multithread. But we need to be able to multithread as much as
possible. If I write a new engine (as I currently do), I would like to
see it well working for at least the next 10 years. If I think about
hardware trends during that period, it is clear that a single core will
become not much faster than it is today. But the number of cores will
greatly increase. In order to utilize that, an application must be able
to run on as many threads as possible - all with reasonable overhead, of
course. So my conclusion is that rsyslog must be able to run massively
multithreaded for the high-end use case. This also prohibits using
primary interfaces which cannot easily be multithreaded.

Finally there is the issue of flow control. Rsyslog *does* flow control,
and will do more advanced flow control in the future. Especially with
world-dominating UDP syslog it is vitally important to do flow control,
because UDP cannot be flow controlled. Sound strange? Well... If we can
not flow control UDP syslog, we need to apply intelligent and adoptable
flow control that flow controls sources who can (tcp syslog, rfc 3195
and of course file data!) so that buffer space is kept for those
precious UDP message which are lost if we can not buffer them at the
right instant. If you think this through, you'll see that this requires
different level and methods of flow control, depending on the source [so
far, I see three levels: can not (UDP), can somewhat (TCP, local
sockets), can easily (log files and other sources that generate data
themselves)]. 

Having a native interface greatly reduces code complexity and thus
program reliability when it comes at implementing these features. Also,
it would require specialized plugins in any case, you couldn't do it
with a simple "pipe me in approach" (well, ... some things yes, but at a
complexity cost).

Also, while I too believe in the Unix approach of small tools, I also
think it is important that the average user is able to configure it.
Rsyslog aims not only at being enterprise-class but at the some time
aims at being easy to use for the novice. Novices don't understand
complex scripting to get the job done. I think relying too much on
complex glue doesn't help getting the job done.

To come back to your orginal post, rsyslogd *should* of course support
the Unix way of piping. It looks like there is a small plugin missing to
read natively from a pipe. However, I never got a request to implement
it. I guess most people use the logger trick to accomplish that task. I
know syslog-ng can natively read from pipes, but have not yet considered
this important enough given the lack of requests from the community. If
you like such a plugin, it's probably a good idea to speak up now ;)

> 
> 
> > from the small shell script you provided, it looks like there is a
> > problem if
> >
> > a) script is in sleep period
> > b) data is appended to text file
> > c) text file is rotated
> > d) new lines are written to text file
> > e) script awake for new polling loop
> >
> > On a quick look, it looks like the data written in b) will never
make
> it
> > to the syslogd. Imfile handles that.
> 
> Yes, you're right. Cool that imfile handles it.
> 
> 
> > On the mail output case (though I need to be a bit brief as dinner
is
> > approaching ;)): I actually intend to add an email output plugin.
> 
> The quote was about being able to *read* mail:

I was replying here to Michael Biebel. It looks I was a bit too much in
hurry to point this out. Sorry...

> 
>       "Every program attempts to expand until it can read mail.
>         Those programs which cannot so expand are replaced by ones
> which can."
> 
> and I was thinking it not too far fetched since "splunk" can do it (it
> can
> download email messages via IMAP, index them and create alerts on
> suspicious
> content).

In short: receiving email is very low on my agenda. Keep in mind that I
already have architecture and at least partly written such a beast on
Windows:

http://www.monitorware.com/en/Product/product_comparision.php

The email question never was in much demand. But if demand comes up, it
for sure is not a big thing to add it... (and, of course, you already
can do it today with a bit of scripting, the right mailbox rules and
logger -- but that isn't appealing to most folks and is one reason I
tend to write plugins ;)).

>  It might seem like you want to take rsyslog in that
> direction, i.e.
> your complete eventlogd&alertSystem fork of rsyslogd that can read any
> input
> (syslog, other-logfile, email, snmptraps) and analyze and alert on the
> data.
> 
> That's not what *I* want from a syslog server. I just want it to
> reliably
> collect and store the logs in an organized manner. Then I'll use other
> tools to read and analyze them.

... and this is of course perfectly fine with me, too. However, if you
look at the core engine needs, you'll see that the "do it all"
eventprocessor and the "plain simple syslogd" have exactly the same
needs - at least if you would like to extend the syslogd to be
enterprise class.

Take the queue engine in v3. Its complex. Actually horribly complex. I
didn't like to include that complex beast, but it was the only clean
solution to the need of being massively concurrent AND being able to
queue data while a destination is down. Any other alternative IMHO would
have been dirty and hackish - and in the long term much less
maintainable. So I just did the right thing (hopefully), even though it
was a really big effort and even though it probably will need a few more
firedrills before it is really ready for prime time in all scenarios.
Another alternative would have been to use one of the big enterprise
class message queueing projects. But that would have created a
dependency for such a system on each desktop - ouch... I hope I made the
right compromise. Currently, the full queue engine is part of the core.
If that turns out to be a problem, I can outsource that to a plugin, but
that isn't currently very appealing to me. After all, it's "just" some
memory overhead - if you don't need the advanced features, no code is
executed to do that. The queue uses an internal driver model and simply
configuration means simple code. Thus, the v3 queue engine is as
reliable as v2, except if you use all those bells and whistles where I
am sure currently a few bugs wait to be detected (even though the
situation has much improved recently and will improve with each new
feedback I receive).

Another good example is the config file: of course, there is no need to
have a scriptable configuration for a simple syslogd. But while thinking
about the (necessary) expression support and a lot of user requests for
a better to read config file format, I came to the conclusion that
creating a scriptable format is actually the right route to take:

http://rgerhards.blogspot.com/2008/02/introducing-rainerscript-and-some.
html

Anything else (IMHO) would again be less clean, less maintainable and,
in this case, would even take longer to implement. So one might think it
is evil to include a virtual machine inside a syslogd, but to me it is
actually the least effort to implement things.

Of course, you can rightly argue that all of this is over the board if
you just want to have a plain local logger that takes messages from the
local log socket, maybe UDP syslog and store it to local files. You are
probably right. But in this case, you can still continue to use
sysklogd. After all, rsyslog was spawned from it to create and advanced
syslogd. So it comes at no surprise that I am adding features which may
not be required for the simple use cases. ;)

Let me conclude with two core points:

- rsyslog core is as slim as possible, plugins are separate projects 
  that extend the core; This means nobody is forced to run more code
  than actually required for his job

- both a simple, but fairly enhanced syslogd as well as a full network
  Event processor share the same root engine needs

Thus, rsyslog implements this core engine and I occasionally add a
plugin here and there to take advantage of the core. Right now, rsyslog
core is far from being finished, as are the plugins. At this time, I am
working on getting the core right and doing the most requested plugins.
When I am done with that, I'll look at the *real* advanced plugins for
all kinds of things that users have interest in. I don't see any need to
fork of a separate core project for that. In fact, I think it would be
counter-productive as I would need to maintain two code bases and the
newly forked project would always be able to do what rsyslogd does. So
what would be the motivation to maintain another, feature-less
project...?

Anyhow, I may be totally wrong. Feedback on this topic is still highly
appreciated (be it brief or elaborate ;)). I am probably abusing this
thread to also tell you a bit about design decisions I have not yet
communicated (sorry for that, so much to do, so few time...;)).

Rainer
> 
> 
> 
>   -jf
> 
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog

Re: [rsyslog] wiki with various application logging examples

Reply via email to