2012/3/12 Rainer Gerhards <[email protected]>:
> Thanks for the report. It looks that working on the stats module acutally
> broke the queue handler. This is now fixed:
>
> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=16cc84fc699fc3f830b0c28d
> 677d6ae0daa35723
>
> I guess this is the problem you experience and would appreciate if you could
> try out the patch and report back.
Probably not. I notice the only change except those for debugging
purpose in commit 16cc84 was to save pThis->iQueueSize before
statsobj.AddCounter() and restore the saved value afterwards in
runtime/queue.c
+ /* we need to save the queue size, as the stats module
initializes it to 0! */
+ iQueueSizeSave = pThis->iQueueSize;
CHKiRet(statsobj.AddCounter(pThis->statsobj, UCHAR_CONSTANT("size"),
ctrType_Int, &pThis->iQueueSize));
+ pThis->iQueueSize = iQueueSizeSave;
The necessarity was introduced by commit 8d2f66 (link blow, more
discussion later) as of 5.8.7, which enforces the initialization in
runtime/statsobj.c
http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=8d2f66;hp=5ba3fc7238c49f8317ba4d694091d8a191744ed5
switch(ctrType) {
case ctrType_IntCtr:
ctr->val.pIntCtr = (intctr_t*) pCtr;
+ *(ctr->val.pIntCtr) = 0;
break;
case ctrType_Int:
ctr->val.pInt = (int*) pCtr;
+ *(ctr->val.pInt) = 0;
break;
}
addCtrToList(pThis, ctr);
Actually I am running 5.8.6 based code and verified no such
enforcement. This version even does not have counter issues supposed
to be fixed by 8d2f66, because the only AddCounter() invocations are:
in runtime/queue.c, where pThis->mutCtrEnqueued, pThis->ctrFull, and
pThis->ctrMaxqsize are intialized by STATSCOUNTER_INIT(), and the
other one, pThis->iQueueSize, which is also the one in 16cc84, should
already be initialized somewhere else by qqueueConstruct() or set by
qqueueSetProperty().
in plugins/imuxsock/imuxsock.c, where all of ctrSubmit,
ctrLostRatelimit and ctrNumRatelimiters are global variables which are
by default initialized to zero.
So I came to the conclusion that 16cc84 was not my case. In addition,
I suppose the enforcement in 8d2f66 is not a good way to fix the
problem about uninitialized counters. Callers of AddCounter() are
supposed to call, if necessary, STATSCOUNTER_INIT precedingly to do
initialization protected by mutex. The problems in new versions are
that plugins/imptcp/imptcp.c, plugins/imudp/imudp.c and tcpsrv.c miss
the call of STATSCOUNTER_INIT, while action.c,
plugins/imuxsock/imuxsock.c,
plugins/omelasticsearch/omelasticsearch.c, runtime/queue.c are OK. The
enforcement coexisting with STATSCOUNTER_INIT is confusing, and not
documented as well.
Anyway, I will verify it.
>
> Thanks,
> Rainer
>
>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of Kaiwang Chen
>> Sent: Monday, March 12, 2012 7:36 AM
>> To: rsyslog-users
>> Subject: [rsyslog] lots of queue files left in working directory
>>
>> Hi All,
>>
>> I have an rsyslog-5.8.6 with patch
>> http://git.adiscon.com/?p=rsyslog.git;a=commitdiff;h=791b16ce06d75944e33
>> 8a6e5fa14c0394bde6f1d,
>> as central log receiver accepting connections at udp/514, tcp/514 and
> uxsock,
>> and feeding to a mysql backend as well as /var/log/messages.
>> Last week I found the messages file, /var/log/messages, was empty, with
>> the last update from rotated archive /var/log/messages.1 being:
>>
>> Feb 29 16:23:39 host81 snmpd[324] Received SNMP packet(s) from UDP:
>> [ip_230] 55109
>>
>> I also observed that the working directory was holding lots of disk queue
>> files, # ls -l /var/spool/rsyslog/mq.00000* -h
>> -rw------- 1 root root 5.1M Feb 29 17:03 /var/spool/rsyslog/mq.00000001
>> -rw------- 1 root root 5.1M Feb 29 17:32 /var/spool/rsyslog/mq.00000002 ...
>> -rw------- 1 root root 5.1M Mar 12 12:37 /var/spool/rsyslog/mq.00000786
>> -rw------- 1 root root 110K Mar 12 12:37 /var/spool/rsyslog/mq.00000787
>>
>> with the first entry in /var/spool/rsyslog/mq.00000001 being:
>> <Obj:1:msg:1:
>> +iProtocolVersion:2:1:0:
>> +iSeverity:2:2:-1:
>> +iFacility:2:2:-1:
>> +msgFlags:2:2:48:
>> +ttGenTime:2:10:1330503819:
>> +tRcvdAt:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
>> +tTIMESTAMP:3:35:2:2012:2:29:16:23:39:131013:6:+:8:0:
>> +pszTAG:1:0::
>> +pszRawMsg:1:94:<30>1 2012-02-29T16:23:39+08:00 host81 snmpd 324 - -
>> Connection from UDP: [ip_230]:49203
>> :
>> +pszInputName:1:6:imptcp:
>> +pszRcvFrom:1:11:ip_81:
>> +pszRcvFromIP:1:11:ip_81:
>> +offMSG:2:2:-1:
>> >End
>>
>> So I think messages were queued rather than lost. And the head and tail of
>> disk queue were also observed as reported by "lsof -p <pid> -nP"
>> rsyslogd 25177 root 1w REG 8,1 112214 186663427
>> /var/spool/rsyslog/mq.00000787
>> rsyslogd 25177 root 70r REG 8,1 5243319 167918536
>> /var/spool/rsyslog/mq.00000001
>>
>>
>> An intuitive guess is the queue consumer was stuck for some reason, so I
>> check the mysqld, yes it accepts connection and allows writting. I tried
>> restarting the instance with the hope that disk queue would be consumed.
>> It's not. And I did another restart after having commented out the ommail
>> filtering rule as well as the debug file. To my astonishment, the queue
> head
>> and tail were no longer reported in "lsof -p <pid> -nP", it looks to me
> that
>> rsyslog has lost the disk queue?
>> Any clue to debug the problem? I wish the queue could be recovered.
>>
>>
>> Thanks,
>> Kaiwang
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
Thanks,
Kaiwang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/