Re: [rsyslog] Corrupt disk queue - known issues?

Otis Gospodnetić Fri, 21 Aug 2015 05:31:33 -0700

Btw. I think the main part that Ciprian wants to point out us:






*2631.119110261:main_json_token queue[DA]:Reg/w0: main_json_token
queue[DA]:DA queue is in emergency mode, disabling DA in
parent2631.119146744:main_json_token queue[DA]:Reg/w0: Called LogMsg, msg:
fatalerror on disk queue 'main_json_token queue[DA]', emergency switch to
directmode*

But I think the problem Ciprian is trying to point out here is this:

* there is a problem with disk queue
* but rsyslog doesn't report it
* Ciprian saw it *only* because he restarted rsyslog in *debug* mode.

I would think that if there is a problem with the on disk queue rsyslog
would complain about it immediately and loudly, but I guess it doesn't?
Small bug?

Thanks,
Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Aug 21, 2015 at 7:27 AM, Ciprian Hacman <[email protected]
> wrote:

> I just started Rsyslog on debug to see if there is any issue, and found we
> have one, maybe this can help:
>
> 2631.117432280:main_json_token queue[DA]:Reg/w0: wti 0x1348150: worker
> starting
>
> 2631.117464099:main_json_token queue[DA]:Reg/w0: DeleteProcessedBatch: we
> deleted 0 objects and enqueued 0 objects
>
> 2631.117492596:main_json_token queue[DA]:Reg/w0: doDeleteBatch: delete
> batch from store, new sizes: log 4530, phys 4530
>
> 2631.117522634:main_json_token queue[DA]:Reg/w0: strm 0x134b970: file 19
> read 0 bytes
>
> 2631.117551194:main_json_token queue[DA]:Reg/w0: strm 0x134b970: file 19
> EOF
>
> 2631.117580484:main_json_token queue[DA]:Reg/w0: strm 0x134b970: file
> 19(json_and_token_action) closing
>
> 2631.117626778:main_json_token queue[DA]:Reg/w0: file
> '/mnt/rsyslog/queues/json_and_token_action.00000070' opened as #-1 with
> mode 384
>
> 2631.117668522:main_json_token queue[DA]:Reg/w0: strm 0x134b970: open error
> 2, file '/mnt/rsyslog/queues/json_and_token_action.00000070': No such file
> or directory
>
> 2631.117700793:main_json_token queue[DA]:Reg/w0: objDeserialize error -2040
> during header processing - trying to recover
>
> 2631.117732844:main_json_token queue[DA]:Reg/w0: file
> '/mnt/rsyslog/queues/json_and_token_action.00000070' opened as #-1 with
> mode 384
>
> 2631.117764597:main_json_token queue[DA]:Reg/w0: strm 0x134b970: open error
> 2, file '/mnt/rsyslog/queues/json_and_token_action.00000070': No such file
> or directory
>
> 2631.117794460:main_json_token queue[DA]:Reg/w0: deserializer has possibly
> been able to re-sync and recover, state -2040
>
> 2631.117823674:main_json_token queue[DA]:Reg/w0: main_json_token queue[DA]:
> error -2040 dequeueing element - ignoring, but strange things may happen
>
> 2631.117852905:main_json_token queue[DA]:Reg/w0: main_json_token queue[DA]:
> got 'file not found' error -2040, queue defunct
>
> 2631.117882095:main_json_token queue[DA]:Reg/w0: strm 0x1349450: file
> 17(json_and_token_action) closing
>
> 2631.117911082:main_json_token queue[DA]:Reg/w0: strm 0x1349450: file
> 17(json_and_token_action) flush, buflen 0 (no need to flush)
>
> 2631.117942739:main_json_token queue[DA]:Reg/w0: strm 0x134b970: file
> -1(json_and_token_action) closing
>
> 2631.117974536:main_json_token queue[DA]:Reg/w0: strm 0x134a6e0: file
> 18(json_and_token_action) closing
>
> 2631.118004770:main_json_token queue[DA]:Reg/w0: strmCloseFile: deleting
> '/mnt/rsyslog/queues/json_and_token_action.00000069'
>
> 2631.119110261:main_json_token queue[DA]:Reg/w0: main_json_token queue[DA]:
> DA queue is in emergency mode, disabling DA in parent
>
> 2631.119146744:main_json_token queue[DA]:Reg/w0: Called LogMsg, msg: fatal
> error on disk queue 'main_json_token queue[DA]', emergency switch to direct
> mode
>
> 2631.119183273:main_json_token queue[DA]:Reg/w0: main Q: qqueueAdd: entry
> added, size now log 1, phys 9 entries
>
> 2631.119212733:main_json_token queue[DA]:Reg/w0: main Q: EnqueueMsg advised
> worker start
>
> rsyslogd: fatal error on disk queue 'main_json_token queue[DA]', emergency
> switch to direct mode [v8.12.0 try http://www.rsyslog.com/e/2040 ]
>
> 2631.119266265:main_json_token queue[DA]:Reg/w0: regular consumer finished,
> iret=-2183, szlog 0 sz phys 0
>
> 2631.119293213:main_json_token queue[DA]:Reg/w0: DDDD: wti 0x1348150:
> worker cleanup action instances
>
>
> The queue dir only contained a meta file, no data files:
>
> <OPB:1:qqueue:1:
>
> +iQueueSize:2:4:4530:
>
> +tVars.disk.sizeOnDisk:2:7:9517544:
>
> >End
>
> .
>
> <Obj:1:strm:1:
>
> +iCurrFNum:2:2:69:
>
> +pszFName:1:21:json_and_token_action:
>
> +iMaxFiles:2:8:10000000:
>
> +bDeleteOnClose:2:1:0:
>
> +sType:2:1:1:
>
> +tOperationsMode:2:1:2:
>
> +tOpenMode:2:3:384:
>
> +iCurrOffs:2:7:9517544:
>
> +inode:2:1:0:
>
> +bPrevWasNL:2:1:0:
>
> >End
>
> .
>
> <Obj:1:strm:1:
>
> +iCurrFNum:2:2:69:
>
> +pszFName:1:21:json_and_token_action:
>
> +iMaxFiles:2:8:10000000:
>
> +bDeleteOnClose:2:1:1:
>
> +sType:2:1:1:
>
> +tOperationsMode:2:1:1:
>
> +tOpenMode:2:3:384:
>
> +iCurrOffs:2:7:9517544:
>
> +inode:2:1:0:
>
> +bPrevWasNL:2:1:0:
>
> >End
>
> .
>
>
> Ciprian
> ---
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
> On Fri, Aug 21, 2015 at 12:58 PM, Ciprian Hacman <
> [email protected]> wrote:
>
> > Hi David,
> >
> > We see Rsyslog starting to use a lot of memory when it cannot send data
> to
> > Elasticsearch.
> > We expected to see logs written to disk, but instead found a message on
> > startup that looked like this:
> >
> >> fatal error on disk queue 'main_nojson queue[DA]', emergency switch to
> >> direct mode [v8.11.0 try http://www.rsyslog.com/e/2040 ]
> >
> >
> > Permissions were correct on the queue files.
> >
> > Thanks,
> > Ciprian
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> > On Fri, Aug 21, 2015 at 12:34 PM, David Lang <[email protected]> wrote:
> >
> >> On Fri, 21 Aug 2015, Otis Gospodnetić wrote:
> >>
> >> Hi,
> >>>
> >>> Are there known situations where rsyslog disk queues can get corrupt?
> >>>
> >>> Sorry for such a high-level and open question, but we sometimes see
> disk
> >>> queue corruption and I wanted to see if anyone else sees that in
> certain
> >>> situation (e.g. when rsyslog is under pressure, when it is stopped in a
> >>> certain way, when it runs out of memory, or something along these
> lines)?
> >>>
> >>
> >> Well, if it crashes as it's writing data, or if it's trying to flush the
> >> queue to disk on shutdown and gets killed by -9 while it's doing so I
> would
> >> expect problems (some distros send a kill -15, wait a bit and then do
> kill
> >> -9, if there's too much work to do in writing the memory queue to disk,
> >> rsyslog will be caught by this)
> >>
> >> Other than that, nothing specific to disk queues.
> >>
> >> what sort of corruption are you seeing?
> >>
> >> David Lang
> >> _______________________________________________
> >> rsyslog mailing list
> >> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >> http://www.rsyslog.com/professional-services/
> >> What's up with rsyslog? Follow https://twitter.com/rgerhards
> >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> >> DON'T LIKE THAT.
> >>
> >
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] Corrupt disk queue - known issues?

Reply via email to