Hi Scott,

will reply in more detail later today, but one thing quickly. This is
very good information. The wall code is ooooooooooold, stems back to
sysklogd and has only been slightly modified. I have to admit it is not
much on my testing radar. So I assume you found the bug. I think I'll
need to re-design it. With the queue engine, forking off should not
really be necessary.

I need to see if that goes into the v3-stable soon or is kept at the
devel tree. I am a bit hesitant to move it to stable, as it involves a
good chance for additional bugs...

Rainer

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:rsyslog-
> [EMAIL PROTECTED] On Behalf Of Scott Phuong
> Sent: Thursday, July 03, 2008 8:48 AM
> To: [email protected]
> Subject: Re: [rsyslog] rsyslog threads questions
> 
> Sorry for sending a lot of emails. I believe I have root-cause the
> issue in 3.17.5.  It appears that when the rsyslog.conf file does
> something like this:
> emerg.* *;mytemplate
> 
> The code executes wallmsg which does forks. This is how I got a lot of
> processes were being created in my scenario. Although one would hope
> that a system doesn't misbehave this awful, nonetheless it does
> happen. First, it seems that the emergency messages should go to all
> those that are logged in via whatever mechanism (i.e. telnet, console,
> stty, ssh, etc...), this message should appear on all those "screens".
> I see that it does not even if I just logged 1 message at this
> serverity. I believe this is how syslog described setting up a wall
> message and what it is. Is this a bug?
> 
> Lastly, when the child of the fork exits, it does not appear to be
> removed from Linux and still continues to eat up memory and is
> reported as a zombie. It appears that when the workerthread has been
> "destroyed/deconstructed" does things clear up. I am not sure if this
> is a rsyslogd problem or a linux or gcc problem. Any ideas?
> 
> Thanks,
> 
> Scott
> 
> 
> 
> On Wed, Jul 2, 2008 at 5:57 PM, Scott Phuong <[EMAIL PROTECTED]>
> wrote:
> > Hi,
> >
> > I've attached four files. Two of which are debug dumps, one is the
> > conf file and the last one is a test case scenario that constantly
> > fails on my end. I hope this gives a little more information.
> > Furthermore, the dumps are from 3.17.5 which is the "closest"
version
> > to 3.18.0 that I was able to find.
> >
> > Both failed scenarios occur when lots of messages were being flooded
> > to rsyslogd at a very fast rate (look at logtest.c) The
> > my_arm_rsyslog_suicide_debug.txt received a sigsegv fault while
> > my_arm_rsyslog_sh_cannot_fork caused so many "Z    [rsyslogd]"
> threads
> > that it took up so much memory that executing any command as simple
> as
> > 'ls -l' would not work from the command line. I think the number of
> > threads grew as much as the number of messages. In the latter
> > scenario, after killing logtest.c, it didn't look like the those
> > zombies threads went away until I did a CTRL+C to the rsyslogd which
> > was running in the foreground since I use the "-dn" option.
> >
> > This is on an embedded system that runs significantly slower than a
> > desktop or laptop so maybe it would be harder to reproduce on a
> > regular computer. I looked at all the parameters that I believe
could
> > affect this and believe for the most part the defaults are more than
> > adequate. The main message queue never looked like it hit the high
> > water mark but it did hit the lower one. So, I don't think messages
> > were being dropped (not sure) or an overflow condition occurred.
> >
> > The processor is ARM-based and it is using Linux kernel 2.6.16.12
and
> > compiled using GCC and the standard GNU C libraries version 3.4.5.
> > Rsyslog source code is cross-compiled using the following configure
> > line:
> >
> > ./configure --disable-zlib --disable-largefile
> >                       --enable-share=yes
> >                        --prefix=/
> >                        --host=arm-unknown-linux-gnu
> >                        ac_cv_func_malloc_0_nonnull=yes
> >                        ac_cv_func_realloc_0_nonnull=yes
> >
> ac_cv_func_lstat_dereferences_slashed_symlink=yes
> >                        ac_cv_func_stat_empty_string_bug=no
> >                        enable_debug=no
> >                        enable_rtinst=no
> >
> > Lastly, the logtest was executed with just the "-s" parameter. It is
> a
> > simple C file that I came up with.
> >
> > I took a look at the debug messages and it does not appear that new
> > threads are created via calls to wtpStartWrkr in wtp.c.
> >
> > Any help I can bring to solve this issue, please let me know. I hope
> I
> > am not doing anything wrong here.
> >
> > Thanks,
> >
> > Scott
> >>
> >> On Wed, Jul 2, 2008 at 12:04 PM, Scott Phuong
> <[EMAIL PROTECTED]> wrote:
> >>> Hi Rainer,
> >>>
> >>> Thanks for your reply.  Looking at the default settings (from the
> >>> online help's configuration page), they are what I wanted. The
main
> >>> messages queue is set to fix sized array with 1 worker thread
> created
> >>> at maximum and action queues are direct mode which according to
the
> >>> queue document page, means that there will not be a worker thread
> >>> created.  Is my understanding correct? If yes, how do I quickly
> check
> >>> without using the -d option if the defaults are set correctly? Or
> what
> >>> do I look for in the debug messages that gets printed out to
ensure
> >>> this?
> >>>
> >>> You also mentioned that version 3.18.0 is probably going to be
> >>> released as the stable version next week. I see on the webpage
> there
> >>> is a 3.17.4 and 3.17.5. Are these two versions similiar to 3.18.0?
> >>>
> >>> Also, how come I did not get your reply in my email inbox? My
> account
> >>> settings look correct.
> >>>
> >>> Thanks,
> >>>
> >>> Scott Phuong
> >>>
> >>> As for the syslog buffer size, that applies to syslogd and does
not
> >>> apply to rsyslog.
> >>>
> >>>
> >>>
> >>> My configuration files do not change the Action queue or Worker
> queue
> >>> parameters at all. Looking at
> >>> On Wed, 2008-07-02 at 01:15 -0700, Scott Phuong wrote:
> >>>> Hi,
> >>>>
> >>>> I have 3.16.2 which was recently released. I see that under
> certain
> >>>> conditions rsyslogd spawns a lot of threads:
> >>>>  5949 root      11216 S   rsyslogd
> >>>>  5950 root      11216 S   rsyslogd
> >>>>  5951 root      11216 S   rsyslogd
> >>>>  5952 root      11216 S   rsyslogd
> >>>>  5953 root      11216 S   rsyslogd
> >>>>  5954 root      11216 S   rsyslogd
> >>>>  5985 root            Z   [rsyslogd]
> >>>>  6445 root            Z   [rsyslogd]
> >>>>
> >>>> I had to kill the rsyslogd and restart it. The first invocation
> had a
> >>>> pid of 219 before it had to be killed. The second invocation of
> pid
> >>>> which you see above starts with 5949. The difference is the
amount
> of
> >>>> zombie threads that were invoked by rsyslogd before I had to kill
> the
> >>>> first invocation of it.
> >>>
> >>> I have no explanation yet for the zombies. They should not happen
> and so
> >>> far I have never seen them. We may need to go through a debug log
> (which
> >>> will become very large) to find out what's going on.
> >>>
> >>>> The question is under what conditions does rsyslogd spawn a new
> >>>> thread/process and why was it a zombie?
> >>>
> >>> Unfortunately, there is no quick answer. A quick one may be: when
> it
> >>> needs them, based on queue watermark settings and based on you
> >>> configuration. But to really understand it, you need to read this
> doc:
> >>>
> >>> http://www.rsyslog.com/doc-queues.html
> >>>
> >>> The doc also describes all the knobs that you can use to control
> thread
> >>> creation. There are many ;)
> >>>
> >>>>  I am running rsyslogd in an
> >>>> embedded environment and not a regular laptop/desktop.
> >>>
> >>> Interesting use case...
> >>>
> >>>> In addition, I
> >>>> am using busybox and I believe the syslog buffer size is set to
> >>>
> >>> what do yo mean by "syslog buffer size"? The length of a receive
> buffer?
> >>> It is 2K, thus single messages up to 2K are supported. It can be
> changed
> >>> by modifying the MAXLINE define. Note that stock syslogd (and
> RFC3164)
> >>> support only up to 1K.
> >>>
> >>>> something very low or perhaps none at all. Would this be a
factor?
> >>>> Furthermore, I ran rsyslogd with -c3 and also without -c3 and
both
> >>>> cases happen.
> >>>
> >>> The compatibility modes do not affect queue operation.
> >>>
> >>>> Are these issues already known and fixed in a later version?
> Sorry, if
> >>>> I am asking the same questions or have the same issues as
previous
> >>>> people but without the ability to search (or at least, I don't
> know
> >>>> how to) the archive, I don't know if my problem/questions has
> already
> >>>> been seen and/or resolved.
> >>>
> >>> If we need to find out about the zombies, we need to move on to
the
> >>> current devel version. So I would give that a try in any case.
> 3.16.2
> >>> will (most probably) be replaced by 3.18.0 (based on the current
> beta)
> >>> next week. So I won't touch it any longer.
> >>>
> >>> Looking forward to your feedback,
> >>> Rainer
> >>>
> >>>>
> >>>> Thank you very much for your support.
> >>>>
> >>>> Scott
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>
> >>
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog

Reply via email to