Re: [rsyslog] RELP does not resume when target goes back after a few hours down
According to http://www.rsyslog.com/doc/v8-stable/configuration/actions.html /If multiple retires fail, the interval is automatically extended. This is to prevent excessive ressource use for retires. After each 10 retries, the interval is extended by itself. To be precise, the actual interval is (numRetries / 10 + 1) * $ActionResumeInterval. so after the 10th try, it by default is 60 and after the 100th try it is 330./ Does this mean this interval could grow until overflow? Is this value a "per-message" setting, or it is shared for the qhole queue? How this will behave on a high-throughput queue? Going to enable debug, trying to see something. Any other feedback would be highly appreciated. On 12/06/17 10:13, Peter Viskup wrote: Check the rsyslog error messages on "action 'NAME' suspended, next retry is" the next message should be "action 'NAME' resumed". The options $ActionResumeInterval and $ActionResumeRetryCount needs to be configured according your expectations. More information in Documentation: http://www.rsyslog.com/doc/rsconf1_actionresumeinterval.html http://www.rsyslog.com/doc/v8-stable/tutorials/reliable_forwarding.html ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
Re: [rsyslog] RELP does not resume when target goes back after a few hours down
Check the rsyslog error messages on "action 'NAME' suspended, next retry is" the next message should be "action 'NAME' resumed". The options $ActionResumeInterval and $ActionResumeRetryCount needs to be configured according your expectations. More information in Documentation: http://www.rsyslog.com/doc/rsconf1_actionresumeinterval.html http://www.rsyslog.com/doc/v8-stable/tutorials/reliable_forwarding.html -- Peter On Mon, Jun 12, 2017 at 9:58 AM, mostolog--- via rsyslogwrote: > Hi > > We have been running a RELP->KAFKA infrastructure for a few weeks now, and > this tests allowed us to detect issues and problems on our processing > pipeline. > > Before planning imkafka tests and deployment, we have been making some > fault-tolerant tests and we have observed the same undesired behavior every > time kafka goes down for a few hours on a high-traffic/throughput queue: > rsyslog /forwarder/ is not resuming sending at all (even hour later) when > kafka goes back online, until a rsyslog restart. > > If I understood correctly, relp will retry forever if /resumeretrycount/ is > set to -1, but increasing wait periods between calls. Is that right? > What's the maximum retry-timeout value? Is that value printed on impstats? > Can that value be set in config? Do you think this could be unrelated to our > issue? > > Will enabling log (on demand debug) would show what's going on here? > > We are using 8.25 on forwarder and 8.27 on receiver. Disk assisted queues on > both, with a discard limit on origin. Here are the configs: > > Forwarder: > >set $!appname = "group/unknown" ; >template(name="ajsontemplate" type="string" >string="<%pri%>%timegenerated:::date-rfc3339% %$myhostname% >%$!appname%: %$!data%" >) >ruleset(name="fwdto" >queue.spoolDirectory="/var/spool/rsyslog" >queue.type="LinkedList" >queue.filename="ruleset_forward" >queue.maxdiskspace="768M" >queue.saveonshutdown="on" >queue.size="100" >queue.discardmark="90" >queue.highwatermark="60" >queue.lowwatermark="10" >queue.discardseverity="5" >) { > set $!data!msg=$rawmsg; > action( >name="fwdto_action" >type="omrelp" >action.resumeRetryCount="-1" >target="receiver" >port="20514" >template="ajsontemplate" > ) >} > > Receiver: > >input( >port="20514" >type="imrelp" >name="imrelp" >ruleset="relp" >) > >ruleset( >name="relp" >#queue.filename="relp" (no borrar esta línea, se usa en rsyslog.sh >para usar una cola diferente en cada nodo) >queue.maxdiskspace="1G" >queue.saveonshutdown="on" >queue.lowwatermark="10" >queue.highwatermark="60" >queue.size="100" >queue.type="LinkedList" > ) { > set $!host_received=$$myhostname; > set $!time_received=exec_template("time_received"); > set $!host_forwarded=$hostname; #$fromhost > set $!time_forwarded=exec_template("time_reported"); > # TODO permitSlashInProgramname (v8.25) > set $!group=field($syslogtag,"/",1); > set $!app=field($syslogtag,"/",2); > set $!app=replace($!app,":",""); > set $!period=$$now; > set $.type="logs"; > set $!time_generated=exec_template("time_reported"); > action( >name="json" >cookie="" >type="mmjsonparse" > ) > if $parsesuccess != "OK" then { >action( >name="error" >type="omfile" >file="/logs/rsyslog-errors.log" > ) >stop > } > action( >name="kafka" >action.resumeRetryCount="-1" >action.reportsuspension="on" >#action.reportSuspensionContinuation="on" >type="omkafka" >broker=["kafka:9092"] >dynatopic="on" >topic="topic" >partitions.auto="on" >dynatopic.cachesize="300" >template="json" >errorFile="/logs/rsyslog-kafka.json" > ) >} > > > Any idea? Shall I file a new issue? > Regards > > ___ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > What's up with rsyslog? Follow https://twitter.com/rgerhards > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of > sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T > LIKE THAT. ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.
[rsyslog] RELP does not resume when target goes back after a few hours down
Hi We have been running a RELP->KAFKA infrastructure for a few weeks now, and this tests allowed us to detect issues and problems on our processing pipeline. Before planning imkafka tests and deployment, we have been making some fault-tolerant tests and we have observed the same undesired behavior every time kafka goes down for a few hours on a high-traffic/throughput queue: rsyslog /forwarder/ is not resuming sending at all (even hour later) when kafka goes back online, until a rsyslog restart. If I understood correctly, relp will retry forever if /resumeretrycount/ is set to -1, but increasing wait periods between calls. Is that right? What's the maximum retry-timeout value? Is that value printed on impstats? Can that value be set in config? Do you think this could be unrelated to our issue? Will enabling log (on demand debug) would show what's going on here? We are using 8.25 on forwarder and 8.27 on receiver. Disk assisted queues on both, with a discard limit on origin. Here are the configs: Forwarder: set $!appname = "group/unknown" ; template(name="ajsontemplate" type="string" string="<%pri%>%timegenerated:::date-rfc3339% %$myhostname% %$!appname%: %$!data%" ) ruleset(name="fwdto" queue.spoolDirectory="/var/spool/rsyslog" queue.type="LinkedList" queue.filename="ruleset_forward" queue.maxdiskspace="768M" queue.saveonshutdown="on" queue.size="100" queue.discardmark="90" queue.highwatermark="60" queue.lowwatermark="10" queue.discardseverity="5" ) { set $!data!msg=$rawmsg; action( name="fwdto_action" type="omrelp" action.resumeRetryCount="-1" target="receiver" port="20514" template="ajsontemplate" ) } Receiver: input( port="20514" type="imrelp" name="imrelp" ruleset="relp" ) ruleset( name="relp" #queue.filename="relp" (no borrar esta línea, se usa en rsyslog.sh para usar una cola diferente en cada nodo) queue.maxdiskspace="1G" queue.saveonshutdown="on" queue.lowwatermark="10" queue.highwatermark="60" queue.size="100" queue.type="LinkedList" ) { set $!host_received=$$myhostname; set $!time_received=exec_template("time_received"); set $!host_forwarded=$hostname; #$fromhost set $!time_forwarded=exec_template("time_reported"); # TODO permitSlashInProgramname (v8.25) set $!group=field($syslogtag,"/",1); set $!app=field($syslogtag,"/",2); set $!app=replace($!app,":",""); set $!period=$$now; set $.type="logs"; set $!time_generated=exec_template("time_reported"); action( name="json" cookie="" type="mmjsonparse" ) if $parsesuccess != "OK" then { action( name="error" type="omfile" file="/logs/rsyslog-errors.log" ) stop } action( name="kafka" action.resumeRetryCount="-1" action.reportsuspension="on" #action.reportSuspensionContinuation="on" type="omkafka" broker=["kafka:9092"] dynatopic="on" topic="topic" partitions.auto="on" dynatopic.cachesize="300" template="json" errorFile="/logs/rsyslog-kafka.json" ) } Any idea? Shall I file a new issue? Regards ___ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.