Re: [rsyslog] RELP does not resume when target goes back after a few hours down

2017-06-12 Thread mostolog--- via rsyslog

According to http://www.rsyslog.com/doc/v8-stable/configuration/actions.html

/If multiple retires fail, the interval is automatically extended. This 
is to prevent excessive ressource use for retires. After each 10 
retries, the interval is extended by itself. To be precise, the actual 
interval is (numRetries / 10 + 1) * $ActionResumeInterval. so after the 
10th try, it by default is 60 and after the 100th try it is 330./


Does this mean this interval could grow until overflow? Is this value a 
"per-message" setting, or it is shared for the qhole queue? How this 
will behave on a high-throughput queue?


Going to enable debug, trying to see something.

Any other feedback would be highly appreciated.



On 12/06/17 10:13, Peter Viskup wrote:

Check the rsyslog error messages on "action 'NAME' suspended, next
retry is" the next message should be "action 'NAME' resumed".
The options $ActionResumeInterval and $ActionResumeRetryCount needs to
be configured according your expectations.

More information in Documentation:
http://www.rsyslog.com/doc/rsconf1_actionresumeinterval.html
http://www.rsyslog.com/doc/v8-stable/tutorials/reliable_forwarding.html



___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.


Re: [rsyslog] RELP does not resume when target goes back after a few hours down

2017-06-12 Thread Peter Viskup via rsyslog
Check the rsyslog error messages on "action 'NAME' suspended, next
retry is" the next message should be "action 'NAME' resumed".
The options $ActionResumeInterval and $ActionResumeRetryCount needs to
be configured according your expectations.

More information in Documentation:
http://www.rsyslog.com/doc/rsconf1_actionresumeinterval.html
http://www.rsyslog.com/doc/v8-stable/tutorials/reliable_forwarding.html

-- 
Peter

On Mon, Jun 12, 2017 at 9:58 AM, mostolog--- via rsyslog
 wrote:
> Hi
>
> We have been running a RELP->KAFKA infrastructure for a few weeks now, and
> this tests allowed us to detect issues and problems on our processing
> pipeline.
>
> Before planning imkafka tests and deployment, we have been making some
> fault-tolerant tests and we have observed the same undesired behavior every
> time kafka goes down for a few hours on a high-traffic/throughput queue:
> rsyslog /forwarder/ is not resuming sending at all (even hour later) when
> kafka goes back online, until a rsyslog restart.
>
> If I understood correctly, relp will retry forever if /resumeretrycount/ is
> set to -1, but increasing wait periods between calls. Is that right?
> What's the maximum retry-timeout value? Is that value printed on impstats?
> Can that value be set in config? Do you think this could be unrelated to our
> issue?
>
> Will enabling log (on demand debug) would show what's going on here?
>
> We are using 8.25 on forwarder and 8.27 on receiver. Disk assisted queues on
> both, with a discard limit on origin. Here are the configs:
>
> Forwarder:
>
>set $!appname = "group/unknown" ;
>template(name="ajsontemplate" type="string"
>string="<%pri%>%timegenerated:::date-rfc3339% %$myhostname%
>%$!appname%: %$!data%"
>)
>ruleset(name="fwdto"
>queue.spoolDirectory="/var/spool/rsyslog"
>queue.type="LinkedList"
>queue.filename="ruleset_forward"
>queue.maxdiskspace="768M"
>queue.saveonshutdown="on"
>queue.size="100"
>queue.discardmark="90"
>queue.highwatermark="60"
>queue.lowwatermark="10"
>queue.discardseverity="5"
>) {
>   set $!data!msg=$rawmsg;
>   action(
>name="fwdto_action"
>type="omrelp"
>action.resumeRetryCount="-1"
>target="receiver"
>port="20514"
>template="ajsontemplate"
>   )
>}
>
> Receiver:
>
>input(
>port="20514"
>type="imrelp"
>name="imrelp"
>ruleset="relp"
>)
>
>ruleset(
>name="relp"
>#queue.filename="relp" (no borrar esta línea, se usa en rsyslog.sh
>para usar una cola diferente en cada nodo)
>queue.maxdiskspace="1G"
>queue.saveonshutdown="on"
>queue.lowwatermark="10"
>queue.highwatermark="60"
>queue.size="100"
>queue.type="LinkedList"
> ) {
> set $!host_received=$$myhostname;
> set $!time_received=exec_template("time_received");
> set $!host_forwarded=$hostname; #$fromhost
> set $!time_forwarded=exec_template("time_reported");
> # TODO permitSlashInProgramname (v8.25)
> set $!group=field($syslogtag,"/",1);
> set $!app=field($syslogtag,"/",2);
> set $!app=replace($!app,":","");
> set $!period=$$now;
> set $.type="logs";
> set $!time_generated=exec_template("time_reported");
> action(
>name="json"
>cookie=""
>type="mmjsonparse"
> )
> if $parsesuccess != "OK" then {
>action(
>name="error"
>type="omfile"
>file="/logs/rsyslog-errors.log"
> )
>stop
> }
> action(
>name="kafka"
>action.resumeRetryCount="-1"
>action.reportsuspension="on"
>#action.reportSuspensionContinuation="on"
>type="omkafka"
>broker=["kafka:9092"]
>dynatopic="on"
>topic="topic"
>partitions.auto="on"
>dynatopic.cachesize="300"
>template="json"
>errorFile="/logs/rsyslog-kafka.json"
> )
>}
>
>
> Any idea? Shall I file a new issue?
> Regards
>
> ___
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
> sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T
> LIKE THAT.
___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

[rsyslog] RELP does not resume when target goes back after a few hours down

2017-06-12 Thread mostolog--- via rsyslog

Hi

We have been running a RELP->KAFKA infrastructure for a few weeks now, 
and this tests allowed us to detect issues and problems on our 
processing pipeline.


Before planning imkafka tests and deployment, we have been making some 
fault-tolerant tests and we have observed the same undesired behavior 
every time kafka goes down for a few hours on a high-traffic/throughput 
queue: rsyslog /forwarder/ is not resuming sending at all (even hour 
later) when kafka goes back online, until a rsyslog restart.


If I understood correctly, relp will retry forever if /resumeretrycount/ 
is set to -1, but increasing wait periods between calls. Is that right?
What's the maximum retry-timeout value? Is that value printed on 
impstats? Can that value be set in config? Do you think this could be 
unrelated to our issue?


Will enabling log (on demand debug) would show what's going on here?

We are using 8.25 on forwarder and 8.27 on receiver. Disk assisted 
queues on both, with a discard limit on origin. Here are the configs:


Forwarder:

   set $!appname = "group/unknown" ;
   template(name="ajsontemplate" type="string"
   string="<%pri%>%timegenerated:::date-rfc3339% %$myhostname%
   %$!appname%: %$!data%"
   )
   ruleset(name="fwdto"
   queue.spoolDirectory="/var/spool/rsyslog"
   queue.type="LinkedList"
   queue.filename="ruleset_forward"
   queue.maxdiskspace="768M"
   queue.saveonshutdown="on"
   queue.size="100"
   queue.discardmark="90"
   queue.highwatermark="60"
   queue.lowwatermark="10"
   queue.discardseverity="5"
   ) {
  set $!data!msg=$rawmsg;
  action(
   name="fwdto_action"
   type="omrelp"
   action.resumeRetryCount="-1"
   target="receiver"
   port="20514"
   template="ajsontemplate"
  )
   }

Receiver:

   input(
   port="20514"
   type="imrelp"
   name="imrelp"
   ruleset="relp"
   )

   ruleset(
   name="relp"
   #queue.filename="relp" (no borrar esta línea, se usa en rsyslog.sh
   para usar una cola diferente en cada nodo)
   queue.maxdiskspace="1G"
   queue.saveonshutdown="on"
   queue.lowwatermark="10"
   queue.highwatermark="60"
   queue.size="100"
   queue.type="LinkedList"
) {
set $!host_received=$$myhostname;
set $!time_received=exec_template("time_received");
set $!host_forwarded=$hostname; #$fromhost
set $!time_forwarded=exec_template("time_reported");
# TODO permitSlashInProgramname (v8.25)
set $!group=field($syslogtag,"/",1);
set $!app=field($syslogtag,"/",2);
set $!app=replace($!app,":","");
set $!period=$$now;
set $.type="logs";
set $!time_generated=exec_template("time_reported");
action(
   name="json"
   cookie=""
   type="mmjsonparse"
)
if $parsesuccess != "OK" then {
   action(
   name="error"
   type="omfile"
   file="/logs/rsyslog-errors.log"
)
   stop
}
action(
   name="kafka"
   action.resumeRetryCount="-1"
   action.reportsuspension="on"
   #action.reportSuspensionContinuation="on"
   type="omkafka"
   broker=["kafka:9092"]
   dynatopic="on"
   topic="topic"
   partitions.auto="on"
   dynatopic.cachesize="300"
   template="json"
   errorFile="/logs/rsyslog-kafka.json"
)
   }


Any idea? Shall I file a new issue?
Regards

___
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.