Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Rainer Gerhards Thu, 23 Apr 2009 23:54:39 -0700

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Friday, April 24, 2009 8:45 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] [PERFORM] performance for high-volume



> loginsertion(fwd)
> > So it looks my three-call (beginBatch, pushData, EndBatch) calling
> interface
> > can probably work. I need to work on how non-transactional outputs
> can convey
> > what they have commited, but the basic interface looks rather good.
> 
> yes, although there is benifit in making these not be seperate exec
> statements but instead sending them to the database as you go along (I

Definitely, but I'd consider this an implementation detail. If it is worth
it, every plugin in question may implement this mode. I'd also say it is not
too much work, depending on what "too much" means to you ;)

> don't know the library well enough to know how to do a non-blocking
> call
> like this) or crafting one long string and sending it all at once. even
> if
> the pieces are generated by seperate write calls on the network
> filehandle, with a TCP datastream (and a fast sender), the number of
> round-trips may be far fewer than you think (what you create as
> seperate
> exec statements
> 
> my earlier 4-part proposal (start, mid, stop, data) is _slightly_ more
> flexible in that it has the mid/joiv variable, allowing for something
> to
> appear between points of data, but not at the end.
> 
> i.e.
> 
> insert into table X values (),();
> 
> your 3-part version would end up with an extra , at the end.
> 
> while this isn't critical it is an easy way to gain about another
> factor
> of 10

I'd draw a subtle line here. I think what you propose is valid and right, but
it is not something that belongs into the output plugin interface.

Let's use my triplet (beginBatch, pushData, endBatch) for a while. On top of
that calling interface, the plugin can add strings in its configuration (NOT
an interface issue!). So it could use the calling interface as follows:

beginBatch:
   emit start

pushData:
   if not first element in batch
      emit mid
   emit data

endBatch
   emit stop

The question now is if there should be support in the core engine for the

   If not first element in batch
      Add mid

functionality. I am not sure if there are other plugins but databases that
could use it. So far, I doubt this (the file writer not, forwarding not, snmp
not, email? Not sure, but don't think so). If it is just a db thing, it does
not belong into the core.

Rainer

> 
> David Lang
> 
> > Rainer
> >
> >> David Lang
> >>
> >>> Feedback is appreciated.
> >>>
> >>> Rainer
> >>>
> >>>> -----Original Message-----
> >>>> From: [email protected] [mailto:rsyslog-
> >>>> [email protected]] On Behalf Of Rainer Gerhards
> >>>> Sent: Thursday, April 23, 2009 4:38 PM
> >>>> To: rsyslog-users
> >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> >>>> loginsertion(fwd)
> >>>>
> >>>> That's interesting. As a side-activity, I am thinking about a new
> >>>> output
> >>>> module interface. Especially given the discussion on the postgres
> >> list,
> >>>> but
> >>>> also some other thoughts about other modules (e.g. omtcp or the
> file
> >>>> output),
> >>>> I tend to use an approach that permits both string-based as well
> as
> >>>> API-based
> >>>> (API as in libpq) ways of doing things. I have not really designed
> >>>> anything,
> >>>> but the rough idea is that each plugin needs three entry points:
> >>>>
> >>>> - start batch
> >>>> - process single message
> >>>> - end batch
> >>>>
> >>>> Then, the plugin can decide itself what it wants to do and when.
> >> Most
> >>>> importantly, this calling interface works well for string-based
> >>>> transactions
> >>>> as well as API-based ones.
> >>>>
> >>>> For the output file writer, for example, I envision that over time
> >> it
> >>>> will
> >>>> have its own write buffer (for various reasons, for example I am
> >> also
> >>>> discussing zipped writing with some folks). With this interface, I
> >> can
> >>>> put
> >>>> everything into the buffer, write out if needed but not if there
> is
> >> no
> >>>> immediate need but I can make sure that I write out when the "end
> >>>> batch"
> >>>> entry point is called.
> >>>>
> >>>> As I said, it is not really thought out yet, but maybe a starting
> >>>> point. So
> >>>> feedback is appreciated.
> >>>>
> >>>> Rainer
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: [email protected] [mailto:rsyslog-
> >>>>> [email protected]] On Behalf Of [email protected]
> >>>>> Sent: Wednesday, April 22, 2009 10:11 PM
> >>>>> To: rsyslog-users
> >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log
> >>>>> insertion(fwd)
> >>>>>
> >>>>> from the postgres performance mailing list, relative speeds of
> >>>>> different
> >>>>> ways of inserting data.
> >>>>>
> >>>>> I've asked if the 'seperate inserts' mode is seperate round trips
> >> or
> >>>>> many
> >>>>> inserts in one round trip.
> >>>>>
> >>>>> based on this it looks like prepared statements make a
> difference,
> >>>> but
> >>>>> not
> >>>>> so much that other techniques (either a single statement or a
> copy)
> >>>>> aren't
> >>>>> comparable (or better) options.
> >>>>>
> >>>>> David Lang
> >>>>>
> >>>>> ---------- Forwarded message ----------
> >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400
> >>>>> From: Glenn Maynard <[email protected]>
> >>>>> To: [email protected]
> >>>>> Subject: Re: [PERFORM] performance for high-volume log insertion
> >>>>>
> >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost
> <[email protected]>
> >>>>> wrote:
> >>>>>> Yes, as I beleive was mentioned already, planning time for
> inserts
> >>>> is
> >>>>>> really small.  Parsing time for inserts when there's little
> >> parsing
> >>>>> that
> >>>>>> has to happen also isn't all *that* expensive and the same goes
> >> for
> >>>>>> conversions from textual representations of data to binary.
> >>>>>>
> >>>>>> We're starting to re-hash things, in my view.  The low-hanging
> >>>> fruit
> >>>>> is
> >>>>>> doing multiple things in a single transaction, either by using
> >>>> COPY,
> >>>>>> multi-value INSERTs, or just multiple INSERTs in a single
> >>>>> transaction.
> >>>>>> That's absolutely step one.
> >>>>>
> >>>>> This is all well-known, covered information, but perhaps some
> >> numbers
> >>>>> will help drive this home.  40000 inserts into a single-column,
> >>>>> unindexed table; with predictable results:
> >>>>>
> >>>>> separate inserts, no transaction: 21.21s
> >>>>> separate inserts, same transaction: 1.89s
> >>>>> 40 inserts, 100 rows/insert: 0.18s
> >>>>> one 40000-value insert: 0.16s
> >>>>> 40 prepared inserts, 100 rows/insert: 0.15s
> >>>>> COPY (text): 0.10s
> >>>>> COPY (binary): 0.10s
> >>>>>
> >>>>> Of course, real workloads will change the weights, but this is
> more
> >>>> or
> >>>>> less the magnitude of difference I always see--batch your inserts
> >>>> into
> >>>>> single statements, and if that's not enough, skip to COPY.
> >>>>>
> >>>>> --
> >>>>> Glenn Maynard
> >>>>>
> >>>>> --
> >>>>> Sent via pgsql-performance mailing list (pgsql-
> >>>>> [email protected])
> >>>>> To make changes to your subscription:
> >>>>> http://www.postgresql.org/mailpref/pgsql-performance
> >>>> _______________________________________________
> >>>> rsyslog mailing list
> >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>>> http://www.rsyslog.com
> >>> _______________________________________________
> >>> rsyslog mailing list
> >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> >>> http://www.rsyslog.com
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Reply via email to