On Wed, 22 Apr 2009, Rainer Gerhards wrote: > Hi Ken, > > glad to have you here. I am a bit silent at the moment, because I am not a > real database guy and so I am primarily listening to any information that is > incoming. If you have a couple of minutes, it would be useful to review this > thread here: > > http://lists.adiscon.net/pipermail/rsyslog/2009-April/002003.html > > ...one comment inline below... > >> -----Original Message----- >> From: [email protected] [mailto:rsyslog- >> [email protected]] On Behalf Of Kenneth Marshall >> >> David, >> >> Okay, I am now subscribed to the mailing list. We are currently using >> rsyslog-3.20.x. As far as implementing a prototype of the prepared >> statement, I was sidetracked by other duties and have not had a chance >> to do anything but an initial evaluation. As far as the rsyslog >> internal >> escaping, it looked simplest to create another template type like the >> current SQL and STDSQL that indicated that escaping was not needed >> and/or that prepared statements should be used. > > To disable escaping, simply do not use SQL or STDSQL. However, the db outputs > currently require this option (easy to disable), because I cannot see how it > will work (with the existing code) without escaping. > > Any idea is most welcome.
when using prepared statement escaping is not needed. according to Ken's message below he found that the overhead of doing the escaping was significant. I don't see why this should be the case, but if it requires making an extra copy of the string I guess it's possible I plan to get a setup togeather in the next couple of days that will let me do some testing of the options on the database side. David Lang > Rainer >> >> Regards, >> Ken >> >> On Tue, Apr 21, 2009 at 08:51:37AM -0700, [email protected] wrote: >>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>> >>>> On Tue, Apr 21, 2009 at 08:37:54AM -0700, [email protected] wrote: >>>>> Kenneth, >>>>> could you join the discussion on the rsyslog mailing list? >>>>> rsyslog-users <[email protected]> >>>>> >>>>> I'm surprised to hear you say that rsyslog can already do batch >> inserts >>>>> and >>>>> am interested in how you did that. >>>>> >>>>> what sort of insert rate did you mange to get? >>>>> >>>>> David Lang >>>>> >>>> David, >>>> >>>> I would be happy to join the discussion. I did not mean to say >>>> that rsyslog currently supported batch inserts, just that the >>>> pieces that provide "stand-by queuing" could be used to manage >>>> batching inserts. >>> >>> I've changed the to list to the rsyslog users list. >>> >>> currently the stand-by queuing still handles messages one at a time. >>> however a sponser has been found to pay to changing the rsyslog >> internals >>> to allow for multiple messages to be handled at once, which is what >>> triggered some of this discussion. >>> >>> which version of rsyslog are you working with? >>> >>> when you modified rsyslog to do prepared statement (to avoid the >> escaping >>> and parsing) did you hard-code the prepared statement? what other >> changes >>> did you make? >>> >>> David Lang >>> >>>> Cheers, >>>> Ken >>>> >>>>> On Tue, 21 Apr 2009, Kenneth Marshall wrote: >>>>> >>>>>> Date: Tue, 21 Apr 2009 08:33:30 -0500 >>>>>> From: Kenneth Marshall <[email protected]> >>>>>> To: Richard Huxton <[email protected]> >>>>>> Cc: [email protected], Stephen Frost <[email protected]>, >>>>>> Greg Smith <[email protected]>, pgsql- >> [email protected] >>>>>> Subject: Re: [PERFORM] performance for high-volume log insertion >>>>>> Hi, >>>>>> >>>>>> I just finished reading this thread. We are currently working on >>>>>> setting up a central log system using rsyslog and PostgreSQL. It >>>>>> works well once we patched the memory leak. We also looked at what >>>>>> could be done to improve the efficiency of the DB interface. On >> the >>>>>> rsyslog side, moving to prepared queries allows you to remove the >>>>>> escaping that needs to be done currently before attempting to >>>>>> insert the data into the SQL backend as well as removing the >> parsing >>>>>> and planning time from the insert. This is a big win for high >> insert >>>>>> rates, which is what we are talking about. The escaping process is >>>>>> also a big CPU user in rsyslog which then hands the escaped string >>>>>> to the backend which then has to undo everything that had been >> done >>>>>> and parse/plan the resulting query. This can use a surprising >> amount >>>>>> of additional CPU. Even if you cannot support a general prepared >>>>>> query interface, by specifying what the query should look like you >>>>>> can handle much of the low-hanging fruit query-wise. >>>>>> >>>>>> We are currently using a date based trigger to use a new partition >>>>>> each day and keep 2 months of logs currently. This can be usefully >>>>>> managed on the backend database, but if rsyslog supported changing >>>>>> the insert to the new table on a time basis, the CPU used by the >>>>>> trigger to support this on the backend could be reclaimed. This >>>>>> would be a win for any DB backend. As you move to the new >> partition, >>>>>> issuing a truncate to clear the table would simplify the DB >> interfaces. >>>>>> >>>>>> Another performance enhancement already mentioned, would be to >>>>>> allow certain extra fields in the DB to be automatically populated >>>>>> as a function of the log messages. For example, logging the mail >> queue >>>>>> id for messages from mail systems would make it much easier to >> locate >>>>>> particular mail transactions in large amounts of data. >>>>>> >>>>>> To sum up, eliminating the escaping in rsyslog through the use of >>>>>> prepared queries would reduce the CPU load on the DB backend. >> Batching >>>>>> the inserts will also net you a big performance increase. Some DB- >> based >>>>>> applications allow for the specification of several types of >> queries, >>>>>> one for single inserts and then a second to support multiple >> inserts >>>>>> (copy). Rsyslog already supports the queuing pieces to allow you >> to >>>>>> batch inserts. Just some ideas. >>>>>> >>>>>> Regards, >>>>>> Ken >>>>>> >>>>>> >>>>>> On Tue, Apr 21, 2009 at 09:56:23AM +0100, Richard Huxton wrote: >>>>>>> [email protected] wrote: >>>>>>>> On Tue, 21 Apr 2009, Stephen Frost wrote: >>>>>>>>> * [email protected] ([email protected]) wrote: >>>>>>>>>> while I fully understand the 'benchmark your situation' need, >> this >>>>>>>>>> isn't >>>>>>>>>> that simple. >>>>>>>>> >>>>>>>>> It really is. You know your application, you know it's primary >> use >>>>>>>>> cases, and probably have some data to play with. You're >> certainly in >>>>>>>>> a >>>>>>>>> much better situation to at least *try* and benchmark it than >> we are. >>>>>>>> rsyslog is a syslog server. it replaces (or for debian and >> fedora, has >>>>>>>> replaced) your standard syslog daemon. it recieves log messages >> from >>>>>>>> every >>>>>>>> app on your system (and possibly others), filters, maniulates >> them, >>>>>>>> and >>>>>>>> then stores them somewhere. among the places that it can store >> the >>>>>>>> logs >>>>>>>> are database servers (native support for MySQL, PostgreSQL, and >>>>>>>> Oracle. >>>>>>>> plus libdbi for others) >>>>>>> >>>>>>> Well, from a performance standpoint the obvious things to do are: >>>>>>> 1. Keep a connection open, do NOT reconnect for each log- >> statement >>>>>>> 2. Batch log statements together where possible >>>>>>> 3. Use prepared statements >>>>>>> 4. Partition the tables by day/week/month/year (configurable I >> suppose) >>>>>>> >>>>>>> The first two are vital, the third takes you a step further. The >> fourth >>>>>>> is >>>>>>> a long-term admin thing. >>>>>>> >>>>>>> And possibly >>>>>>> 5. Have two connections, one for fatal/error etc and one for >> info/debug >>>>>>> level log statements (configurable split?). Then you can use the >>>>>>> synchronous_commit setting on the less important ones. Might buy >> you >>>>>>> some >>>>>>> performance on a busy system. >>>>>>> >>>>>>> http://www.postgresql.org/docs/8.3/interactive/runtime-config- >> wal.html#RUNTIME-CONFIG-WAL-SETTINGS >>>>>>> >>>>>>>> other apps then search and report on the data after it is >> stored. what >>>>>>>> apps?, I don't know either. pick your favorite reporting tool >> and >>>>>>>> you'll >>>>>>>> be a step ahead of me (I don't know a really good reporting >> tool) >>>>>>>> as for sample data, you have syslog messages, just like I do. so >> you >>>>>>>> have >>>>>>>> the same access to data that I have. >>>>>>>> how would you want to query them? how would people far less >>>>>>>> experianced >>>>>>>> that you want to query them? >>>>>>>> I can speculate that some people would do two columns (time, >>>>>>>> everything >>>>>>>> else), others will do three (time, server, everything else), and >>>>>>>> others >>>>>>>> will go further (I know some who would like to extract IP >> addresses >>>>>>>> embedded in a message into their own column). some people will >> index >>>>>>>> on >>>>>>>> the time and host, others will want to do full-text searches of >>>>>>>> everything. >>>>>>> >>>>>>> Well, assuming it looks much like traditional syslog, I would do >>>>>>> something >>>>>>> like: (timestamp, host, facility, priority, message). It's easy >> enough >>>>>>> to >>>>>>> stitch back together if people want that. >>>>>>> >>>>>>> PostgreSQL's full-text indexing is quite well suited to logfiles >> I'd >>>>>>> have >>>>>>> thought, since it knows about filenames, urls etc already. >>>>>>> >>>>>>> If you want to get fancy, add a msg_type column and one >> subsidiary >>>>>>> table >>>>>>> for each msg_type. So - you might have smtp_connect_from >> (hostname, >>>>>>> ip_addr). A set of perl regexps can match and extract the fields >> for >>>>>>> these >>>>>>> extra tables, or you could do it with triggers inside the >> database. I >>>>>>> think >>>>>>> it makes sense to do it in the application. Easier for users to >>>>>>> contribute >>>>>>> new patterns/extractions. Meanwhile, the core table is untouched >> so you >>>>>>> don't *need* to know about these extra tables. >>>>>>> >>>>>>> If you have subsidiary tables, you'll want to partition those too >> and >>>>>>> perhaps stick them in their own schema (logs200901, logs200902 >> etc). >>>>>>> >>>>>>> -- >>>>>>> Richard Huxton >>>>>>> Archonet Ltd >>>>>>> >>>>>>> -- >>>>>>> Sent via pgsql-performance mailing list >>>>>>> ([email protected]) >>>>>>> To make changes to your subscription: >>>>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Sent via pgsql-performance mailing list >>>>> ([email protected]) >>>>> To make changes to your subscription: >>>>> http://www.postgresql.org/mailpref/pgsql-performance >>>>> >>>> >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

