On Wed, 8 Nov 2017, Steven Libby wrote:

I can see where it is deleting from the database action queue in chunks of
10,000 now:

6702.235489189:SecInfra2ArchiveDB queue:Reg/w0: DeleteProcessedBatch: we
deleted 10000 objects and enqueued 0 objects

good

I assume that means they are being inserted in chunks, too.

That's what I'm trying to find out. Not every module sends things out in chunks, and some have other limits that they hit, especially with large chunks.

 But watching
the processlist in mysql, it appears that they are still being inserted
one-at-a-time, albeit in a transaction, so maybe that's what is meant by
"batch" processing?

how many messages in a transaction? it should be large. Is there any chance that MySQL has trouble with that many messages in a transaction?

David Lang

I see lots of entries like the following, but these appear to originate
from the main queue, not the db action queue.  Is that a correct
interpretation?

...
6702.597844624:main Q:Reg/w0  : SecInfra2ArchiveDB queue: EnqueueMsg
advised worker start
6702.597854174:main Q:Reg/w0  : action 'SecInfra2ArchiveDB': set suspended
state to 0
6702.597861111:main Q:Reg/w0  : processBATCH: next msg 649: <14>1
2017-11-07T15:31:42.255-05:00 JF-ITE-PM2-00C RT_FLOW -
RT_FLOW_SESSION_CREATE [[email protected] source-address="10.24
6702.597867433:main Q:Reg/w0  :
6702.597880487:SecInfra2ArchiveDB queue:Reg/w0: dnscache: entry
0x7febc4017ed0 found
6702.597894312:main Q:Reg/w0  : ACTION 0 [ommysql:action(type="ommysql"
...)]
6702.597907683:main Q:Reg/w0  : executing action 0
6702.597919469:main Q:Reg/w0  : action 'SecInfra2ArchiveDB': called,
logging to ommysql (susp 0/0, direct q 0)
...

Thanks,
Steven Libby

On Wed, Nov 8, 2017 at 4:32 PM, David Lang <[email protected]> wrote:

are the inserts into MySQL happening in large chunks now?

if the queue is filling, it should be dequeueing the messages in large
batches and submitting them to MySQL in batches, if the batches to MySQL
are too small, you could have trouble keeping up.

look in the debug logs for how many message are processed between calls to
insert them into MySQL.

David Lang

 On Wed, 8 Nov 2017, Steven Libby wrote:

Date: Wed, 8 Nov 2017 16:26:08 -0500
From: Steven Libby <[email protected]>
To: David Lang <[email protected]>
Cc: Steven Libby via rsyslog <[email protected]>
Subject: Re: [rsyslog] ommysql database queue bottleneck


I've converted the legacy $MainQueue% options over to the new RainerScript
style and confirmed the queues (main and action) are dequeueing in the
proper designated chunk sizes now.

I've seen nothing else in the rsyslog debug log which would indicate a
problem, but then again I might be missing something.  Any help or
guidance
on how to properly debug this bottleneck would be greatly appreciated.

Thanks,
Steven Libby

On Tue, Nov 7, 2017 at 2:06 PM, Steven Libby <[email protected]>
wrote:

We had configured large batches (8192) for the main queue, action queue,
and udp input module.  However, after a brief debugging session, it
appears
that at least two of these settings are not being honored...

The database action queue seems to be deleting maximum batch sizes of 8
at
a time:

7968.410369112:SecInfra2ArchiveDB queue[DA]:Reg/w0: regular consumer
finished, iret=0, szlog 235270 sz phys 235278
7968.410545345:SecInfra2ArchiveDB queue[DA]:Reg/w0:
DeleteProcessedBatch:
we deleted 8 objects and enqueued 0 objects
7968.410554591:SecInfra2ArchiveDB queue[DA]:Reg/w0: doDeleteBatch:
delete
batch from store, new sizes: log 235270, phys 235270

And the main queue appears to be dequeueing in maximum chunks of 256:

7963.597767589:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 25 of 25 messages]
7963.616701318:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 38 of 38 messages]
7963.669420594:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 224 of 224 messages]
7963.731337893:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.800962451:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.857743064:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.904639838:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
7963.985528145:main Q:Reg/w0  : END batch execution phase, entering to
commit phase [processed 256 of 256 messages]
...

I'm starting to think some of the options in our config file are
incorrect
or no longer supported...?

Also, is there any specific keywords I should be looking for in the debug
log?  Or any specific debug options I should be turning on (i.e.
LogFuncFlow, PrintMutexAction, OutputTidToStderr, etc...)?


On Mon, Nov 6, 2017 at 4:54 PM, David Lang <[email protected]> wrote:

On Mon, 6 Nov 2017, Steven Libby wrote:

I've tried removing the ruleset altogether and just using an action to

minimize locking overhead (which I assume would be the same thing
you're
referring to here).  But just to make sure, I also tried changing the
ruleset queue type to "Direct" and got the same result.


yes

Also, most of the time during my testing I used resumeRetryCount="-1"
but

mistakenly left it at "0" when I posted to this mailing list.
Reverting
back to "-1" still results in a backlog in the database action queue.
When
you say "...some insert attempts...are not being accepted by mysql...",
are
you implying that some insert attempts are actually being rejected by
mysql?


that was one possibility when I thought you were not retrying logs.

 I have enabled the general query log with the highest level of

verbosity, and also installed the audit plugin, but I am unable to see
any
indication that any insert attempts are being rejected.  I have yet to
enable the DBUG package yet for further investigation, but will try
that
next to see if I can glean any new information.


are logs getting inserted one at a time or in large batches? try to
increase batching, get a debug log for a bit on the rsyslog side and
see if
it's having to make multiple attempts to insert messages (for example,
you
try to deliver up to 8K messages at once, rsyslog could be rejecting
this
because the total request size is too large and rsyslog then tries to
deliver 4K, 2K, 1K... messages until it gets small enough to be
accepted)

David Lang


On a side note, I created a separate test table using the MyISAM storage

engine and the insert rate was improved, but still filled up the
database
action queue within several minutes.  I've reduced the durability of
TokuDB
by disabling tokudb_commit_sync and tokudb_fsync_log_period global
variables, but there was zero noticeable performance difference here.
(Background: we use TokuDB for it's compression and secondary
clustering
indexes, mainly).  I would expect to see excessive CPU utilization or
Disk
I/O if the database was struggling to keep up with the rsyslog queue,
but
perhaps that expectation is incorrect?

Regards,
Steven

On Sat, Nov 4, 2017 at 5:03 PM, David Lang <[email protected]> wrote:

don't put a queue on the ruleset and the only item in the ruleset


when you have to queues like this, you end up with lots of locking
overhead and poor batch sizing.

with retrycount=0 you never attempt to re-deliver a message that
fails,
if
you set it to -1 it will retry forever

try setting it to -1 and see if logs are getting stuck (delivery
stops),
if so, you are getting some insert attempts that are not being
accepted
by
mysql and since you have retries disabled, the entire batch is being
thrown
away.


On Fri, 3 Nov 2017, Steven Libby via rsyslog wrote:

Date: Fri, 3 Nov 2017 14:53:17 -0400

From: Steven Libby via rsyslog <[email protected]>
To: [email protected]
Cc: Steven Libby <[email protected]>
Subject: [rsyslog] ommysql database queue bottleneck


We are running rsyslog 8.21.0 on a RHEL 6.7 box with 256GB of RAM and
96
cores, writing to a MySQL instance (Percona Server 5.6.28-76) using
TokuDB
as the storage engine.  This is our centralized log server, which
receives
messages from numerous network devices at an average rate of 7k
messages
per second.  This message rate is fairly consistent throughout the
course
of the day.

We noticed records missing from the database when comparing with the
raw
records being written directly to a file on another server.  Based on
the
investigation so far, it appears the lost messages are being
discarded
due
to queues filling up.

After enabling the impstats module, we discovered that the bottleneck
appears to be in the database queue; but we're having trouble
determining
the exact cause of the bottleneck.  Database resource utilization is
relatively low (CPU utilization < 4%, Disk I/O < 1%, 5+GB free RAM)
and
we've tuned the database configuration to remove any unnecessary
overhead.
We've made numerous configuration changes, both on the DB side and in
rsyslogd, and the queue seems to build at roughly the same rate
regardless
of of everything we've tried.

We are successfully writing 3k-4k records per-second into the DB, but
the
action queue is also building at about that same rate.

Here is the latest rsyslog configuration we've tried (below) cleaned
up
for
the purpose of this message.  Most of the incoming messages arrive
via
imudp.  We've tried adjusting the number of threads in that module,
as
well
as the batchSize, and the number of messages coming in stays about
the
same, regardless.  We've also tried different combinations of MainMsg
queue
worker threads and batch sizes, as well as similar adjustments in the
ruleset and action configurations.  We've gone through the changelogs
up
to
rsyslog 8.31.0 and didn't see anything to indicate that upgrading
would
help with the performance.  We've also gone through most of the
online
documentation at rsyslog.com/doc/v8-stable/ and read some articles
by
David
Long regarding performance improvements and the work he did with
Intuit.

At this point, any additional help or guidance from the professionals
would
be greatly appreciated.  Thanks for taking the time to read this...


#rsyslog v8.x config file
$ActionFileDefaultTemplate RSYSLOG_TraditionalFileFormat
$RepeatedMsgContainsOriginalMsg on
$OptimizeForUniprocessor off
$PreserveFQDN on
$WorkDirectory /HSD1/syslogQ

$MainMsgQueueFileName mainq-SecInfra
$MainMsgQueueMaxDiskSpace 200g
$MainMsgQueueMaxFileSize 1g
$MainMsgQueueSaveOnShutdown on
$MainMsgQueueType LinkedList
$MainMsgQueueSize 10485760
$MainMsgQueueHighWatermark 10000000
$MainMsgQueueWorkerThreads 8
$MainMsgDequeueBatchSize 8192

module(load="imudp" threads="8" batchSize="8192" timeRequery="10")
module(load="imptcp" threads="8" processOnPoller="off" )
module(load="imuxsock")
module(load="imklog")
module(load="immark")
module(load="ommysql")
module(load="impstats" interval="5" resetCounters="off"
format="legacy"
log.syslog="off" log.file="/HSD1/locallog/pstat
s-SecInfra_2017-11-03.log")
module(load="mmutf8fix")

$template SecInfraSyslogTmpl,"insert into SecInfraSyslog.syslogRaw
(recievedAt,deviceReportedTime,deviceTimezone,fromHost,
fromHostIpv4,hostName,facility,priority,localPriority,infoUnitId,
inputName,syslogTag,programName,appName,procId,msgId,message
,rawMessage)
values ('%timegenerated:::date-mysql%.%timegenerated:::date-
subseconds%','%timereported:::date-mysql%.%timereported:::
date-subseconds%','%timereported:::date-tzoffsdirection%%
timereported:
::date-tzoffshour%','%FROMHOST%','%fromhost-ip%','%
HOSTNAME%',%syslogfacility%,%syslogpriority%,'%PRI%',%iut%,
'%inputname%','%syslogtag%','%programname%','%APP-NAME%','%
PROCID%','%MSGID%','%msg%','%rawmsg%')",SQL

ruleset(name="SecInfra"
 queue.saveonshutdown="on"
 queue.type="LinkedList"
 queue.filename="SecInfra-ruleQ"
 queue.highwatermark="1000000"
 queue.size="1310720"
 queue.dequeuebatchsize="8192"
 queue.maxdiskspace="200g"
 queue.maxfilesize="10g"
 queue.workerthreads="2"
) {
 action(name="SecInfra2ArchiveDB"
   type="ommysql"
   server="127.0.0.1"
   db="SecInfraSyslog"
   uid="rsyslog"
   pwd="************"
   serverport="3306"
   template="SecInfraSyslogTmpl"
   queue.saveonshutdown="on"
   queue.type="LinkedList"
   queue.filename="SecInfraArchive-dbQ"
   queue.highwatermark="1000000"
   queue.size="1310720"
   queue.dequeuebatchsize="8192"
   queue.maxdiskspace="200g"
   queue.maxfilesize="10g"
   queue.workerthreads="2"
   action.resumeRetryCount="0"
   action.reportSuspension="on"
   action.reportSuspensionContinuation="on"
 )
}

input(type="imudp" name="SecInfraUDP" address="************"
port="514"
ruleset="SecInfra")
input(type="imtcp" name="SecInfraTCP" address="************"
port="514"
ruleset="SecInfra")

Regards,
Steven Libby
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
myriad
of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if
you
DON'T LIKE THAT.








_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to