We had configured large batches (8192) for the main queue, action queue,
and udp input module.  However, after a brief debugging session, it appears
that at least two of these settings are not being honored...

The database action queue seems to be deleting maximum batch sizes of 8 at
a time:

7968.410369112:SecInfra2ArchiveDB queue[DA]:Reg/w0: regular consumer
finished, iret=0, szlog 235270 sz phys 235278
7968.410545345:SecInfra2ArchiveDB queue[DA]:Reg/w0: DeleteProcessedBatch:
we deleted 8 objects and enqueued 0 objects
7968.410554591:SecInfra2ArchiveDB queue[DA]:Reg/w0: doDeleteBatch: delete
batch from store, new sizes: log 235270, phys 235270

And the main queue appears to be dequeueing in maximum chunks of 256:

7963.597767589:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 25 of 25 messages]
7963.616701318:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 38 of 38 messages]
7963.669420594:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 224 of 224 messages]
7963.731337893:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.800962451:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.857743064:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.904639838:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.985528145:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
...

I'm starting to think some of the options in our config file are incorrect
or no longer supported...?

Also, is there any specific keywords I should be looking for in the debug
log?  Or any specific debug options I should be turning on (i.e.
LogFuncFlow, PrintMutexAction, OutputTidToStderr, etc...)?


On Mon, Nov 6, 2017 at 4:54 PM, David Lang <[email protected]> wrote:

> On Mon, 6 Nov 2017, Steven Libby wrote:
>
> I've tried removing the ruleset altogether and just using an action to
>> minimize locking overhead (which I assume would be the same thing you're
>> referring to here).  But just to make sure, I also tried changing the
>> ruleset queue type to "Direct" and got the same result.
>>
>
> yes
>
> Also, most of the time during my testing I used resumeRetryCount="-1" but
>> mistakenly left it at "0" when I posted to this mailing list.  Reverting
>> back to "-1" still results in a backlog in the database action queue.
>> When
>> you say "...some insert attempts...are not being accepted by mysql...",
>> are
>> you implying that some insert attempts are actually being rejected by
>> mysql?
>>
>
> that was one possibility when I thought you were not retrying logs.
>
>  I have enabled the general query log with the highest level of
>> verbosity, and also installed the audit plugin, but I am unable to see any
>> indication that any insert attempts are being rejected.  I have yet to
>> enable the DBUG package yet for further investigation, but will try that
>> next to see if I can glean any new information.
>>
>
> are logs getting inserted one at a time or in large batches? try to
> increase batching, get a debug log for a bit on the rsyslog side and see if
> it's having to make multiple attempts to insert messages (for example, you
> try to deliver up to 8K messages at once, rsyslog could be rejecting this
> because the total request size is too large and rsyslog then tries to
> deliver 4K, 2K, 1K... messages until it gets small enough to be accepted)
>
> David Lang
>
>
> On a side note, I created a separate test table using the MyISAM storage
>> engine and the insert rate was improved, but still filled up the database
>> action queue within several minutes.  I've reduced the durability of
>> TokuDB
>> by disabling tokudb_commit_sync and tokudb_fsync_log_period global
>> variables, but there was zero noticeable performance difference here.
>> (Background: we use TokuDB for it's compression and secondary clustering
>> indexes, mainly).  I would expect to see excessive CPU utilization or Disk
>> I/O if the database was struggling to keep up with the rsyslog queue, but
>> perhaps that expectation is incorrect?
>>
>> Regards,
>> Steven
>>
>> On Sat, Nov 4, 2017 at 5:03 PM, David Lang <[email protected]> wrote:
>>
>> don't put a queue on the ruleset and the only item in the ruleset
>>>
>>> when you have to queues like this, you end up with lots of locking
>>> overhead and poor batch sizing.
>>>
>>> with retrycount=0 you never attempt to re-deliver a message that fails,
>>> if
>>> you set it to -1 it will retry forever
>>>
>>> try setting it to -1 and see if logs are getting stuck (delivery stops),
>>> if so, you are getting some insert attempts that are not being accepted
>>> by
>>> mysql and since you have retries disabled, the entire batch is being
>>> thrown
>>> away.
>>>
>>>
>>> On Fri, 3 Nov 2017, Steven Libby via rsyslog wrote:
>>>
>>> Date: Fri, 3 Nov 2017 14:53:17 -0400
>>>
>>>> From: Steven Libby via rsyslog <[email protected]>
>>>> To: [email protected]
>>>> Cc: Steven Libby <[email protected]>
>>>> Subject: [rsyslog] ommysql database queue bottleneck
>>>>
>>>>
>>>> We are running rsyslog 8.21.0 on a RHEL 6.7 box with 256GB of RAM and 96
>>>> cores, writing to a MySQL instance (Percona Server 5.6.28-76) using
>>>> TokuDB
>>>> as the storage engine.  This is our centralized log server, which
>>>> receives
>>>> messages from numerous network devices at an average rate of 7k messages
>>>> per second.  This message rate is fairly consistent throughout the
>>>> course
>>>> of the day.
>>>>
>>>> We noticed records missing from the database when comparing with the raw
>>>> records being written directly to a file on another server.  Based on
>>>> the
>>>> investigation so far, it appears the lost messages are being discarded
>>>> due
>>>> to queues filling up.
>>>>
>>>> After enabling the impstats module, we discovered that the bottleneck
>>>> appears to be in the database queue; but we're having trouble
>>>> determining
>>>> the exact cause of the bottleneck.  Database resource utilization is
>>>> relatively low (CPU utilization < 4%, Disk I/O < 1%, 5+GB free RAM) and
>>>> we've tuned the database configuration to remove any unnecessary
>>>> overhead.
>>>> We've made numerous configuration changes, both on the DB side and in
>>>> rsyslogd, and the queue seems to build at roughly the same rate
>>>> regardless
>>>> of of everything we've tried.
>>>>
>>>> We are successfully writing 3k-4k records per-second into the DB, but
>>>> the
>>>> action queue is also building at about that same rate.
>>>>
>>>> Here is the latest rsyslog configuration we've tried (below) cleaned up
>>>> for
>>>> the purpose of this message.  Most of the incoming messages arrive via
>>>> imudp.  We've tried adjusting the number of threads in that module, as
>>>> well
>>>> as the batchSize, and the number of messages coming in stays about the
>>>> same, regardless.  We've also tried different combinations of MainMsg
>>>> queue
>>>> worker threads and batch sizes, as well as similar adjustments in the
>>>> ruleset and action configurations.  We've gone through the changelogs up
>>>> to
>>>> rsyslog 8.31.0 and didn't see anything to indicate that upgrading would
>>>> help with the performance.  We've also gone through most of the online
>>>> documentation at rsyslog.com/doc/v8-stable/ and read some articles by
>>>> David
>>>> Long regarding performance improvements and the work he did with Intuit.
>>>>
>>>> At this point, any additional help or guidance from the professionals
>>>> would
>>>> be greatly appreciated.  Thanks for taking the time to read this...
>>>>
>>>>
>>>> #rsyslog v8.x config file
>>>> $ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
>>>> $RepeatedMsgContainsOriginalMsg on
>>>> $OptimizeForUniprocessor off
>>>> $PreserveFQDN on
>>>> $WorkDirectory /HSD1/syslogQ
>>>>
>>>> $MainMsgQueueFileName mainq-SecInfra
>>>> $MainMsgQueueMaxDiskSpace 200g
>>>> $MainMsgQueueMaxFileSize 1g
>>>> $MainMsgQueueSaveOnShutdown on
>>>> $MainMsgQueueType LinkedList
>>>> $MainMsgQueueSize 10485760
>>>> $MainMsgQueueHighWatermark 10000000
>>>> $MainMsgQueueWorkerThreads 8
>>>> $MainMsgDequeueBatchSize 8192
>>>>
>>>> module(load="imudp" threads="8" batchSize="8192" timeRequery="10")
>>>> module(load="imptcp" threads="8" processOnPoller="off" )
>>>> module(load="imuxsock")
>>>> module(load="imklog")
>>>> module(load="immark")
>>>> module(load="ommysql")
>>>> module(load="impstats" interval="5" resetCounters="off" format="legacy"
>>>> log.syslog="off" log.file="/HSD1/locallog/pstat
>>>> s-SecInfra_2017-11-03.log")
>>>> module(load="mmutf8fix")
>>>>
>>>> $template SecInfraSyslogTmpl,"insert into SecInfraSyslog.syslogRaw
>>>> (recievedAt,deviceReportedTime,deviceTimezone,fromHost,
>>>> fromHostIpv4,hostName,facility,priority,localPriority,infoUnitId,
>>>> inputName,syslogTag,programName,appName,procId,msgId,
>>>> message,rawMessage)
>>>> values ('%timegenerated:::date-mysql%.%timegenerated:::date-
>>>> subseconds%','%timereported:::date-mysql%.%timereported:::
>>>> date-subseconds%','%timereported:::date-tzoffsdirection%%timereported:
>>>> ::date-tzoffshour%','%FROMHOST%','%fromhost-ip%','%
>>>> HOSTNAME%',%syslogfacility%,%syslogpriority%,'%PRI%',%iut%,
>>>> '%inputname%','%syslogtag%','%programname%','%APP-NAME%','%
>>>> PROCID%','%MSGID%','%msg%','%rawmsg%')",SQL
>>>>
>>>> ruleset(name="SecInfra"
>>>>  queue.saveonshutdown="on"
>>>>  queue.type="LinkedList"
>>>>  queue.filename="SecInfra-ruleQ"
>>>>  queue.highwatermark="1000000"
>>>>  queue.size="1310720"
>>>>  queue.dequeuebatchsize="8192"
>>>>  queue.maxdiskspace="200g"
>>>>  queue.maxfilesize="10g"
>>>>  queue.workerthreads="2"
>>>> ) {
>>>>  action(name="SecInfra2ArchiveDB"
>>>>    type="ommysql"
>>>>    server="127.0.0.1"
>>>>    db="SecInfraSyslog"
>>>>    uid="rsyslog"
>>>>    pwd="************"
>>>>    serverport="3306"
>>>>    template="SecInfraSyslogTmpl"
>>>>    queue.saveonshutdown="on"
>>>>    queue.type="LinkedList"
>>>>    queue.filename="SecInfraArchive-dbQ"
>>>>    queue.highwatermark="1000000"
>>>>    queue.size="1310720"
>>>>    queue.dequeuebatchsize="8192"
>>>>    queue.maxdiskspace="200g"
>>>>    queue.maxfilesize="10g"
>>>>    queue.workerthreads="2"
>>>>    action.resumeRetryCount="0"
>>>>    action.reportSuspension="on"
>>>>    action.reportSuspensionContinuation="on"
>>>>  )
>>>> }
>>>>
>>>> input(type="imudp" name="SecInfraUDP" address="************" port="514"
>>>> ruleset="SecInfra")
>>>> input(type="imtcp" name="SecInfraTCP" address="************" port="514"
>>>> ruleset="SecInfra")
>>>>
>>>> Regards,
>>>> Steven Libby
>>>> _______________________________________________
>>>> rsyslog mailing list
>>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>>>> http://www.rsyslog.com/professional-services/
>>>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>>>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>>>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>>>> DON'T LIKE THAT.
>>>>
>>>>
>>>>
>>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to