Re: [rsyslog] [PERFORM] performance for high-volumeloginsertion(fwd)

Rainer Gerhards Fri, 24 Apr 2009 00:02:03 -0700

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of Rainer Gerhards
> Sent: Friday, April 24, 2009 8:55 AM
> To: rsyslog-users
> Subject: Re: [rsyslog] [PERFORM] performance for high-
> volumeloginsertion(fwd)
> 
> 
> > -----Original Message-----
> > From: [email protected] [mailto:rsyslog-
> > [email protected]] On Behalf Of [email protected]
> > Sent: Friday, April 24, 2009 8:45 AM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> 
> 
> > loginsertion(fwd)
> > > So it looks my three-call (beginBatch, pushData, EndBatch) calling
> > interface
> > > can probably work. I need to work on how non-transactional outputs
> > can convey
> > > what they have commited, but the basic interface looks rather good.
> >
> > yes, although there is benifit in making these not be seperate exec
> > statements but instead sending them to the database as you go along
> (I
> 
> Definitely, but I'd consider this an implementation detail. If it is
> worth
> it, every plugin in question may implement this mode. I'd also say it
> is not
> too much work, depending on what "too much" means to you ;)
> 
> > don't know the library well enough to know how to do a non-blocking
> > call
> > like this) or crafting one long string and sending it all at once.
> even
> > if
> > the pieces are generated by seperate write calls on the network
> > filehandle, with a TCP datastream (and a fast sender), the number of
> > round-trips may be far fewer than you think (what you create as
> > seperate
> > exec statements
> >
> > my earlier 4-part proposal (start, mid, stop, data) is _slightly_
> more
> > flexible in that it has the mid/joiv variable, allowing for something
> > to
> > appear between points of data, but not at the end.
> >
> > i.e.
> >
> > insert into table X values (),();
> >
> > your 3-part version would end up with an extra , at the end.
> >
> > while this isn't critical it is an easy way to gain about another
> > factor
> > of 10
> 
> I'd draw a subtle line here. I think what you propose is valid and
> right, but
> it is not something that belongs into the output plugin interface.


An additional clarification: you talk about string building, I talk about
callbacks. Both things need to go together, but I think we are talking about
separate entities. I currently think we need a triplet for the callbacks, but
a quadruple for the string builder.

I am not yet convinced that we need to put the string builder (using the
quadruple) into the core.

> 
> Let's use my triplet (beginBatch, pushData, endBatch) for a while. On
> top of
> that calling interface, the plugin can add strings in its configuration
> (NOT
> an interface issue!). So it could use the calling interface as follows:
> 
> beginBatch:
>    emit start
> 
> pushData:
>    if not first element in batch
>       emit mid
>    emit data
> 
> endBatch
>    emit stop
> 
> The question now is if there should be support in the core engine for
> the
> 
>    If not first element in batch
>       Add mid
> 
> functionality. I am not sure if there are other plugins but databases
> that
> could use it. So far, I doubt this (the file writer not, forwarding
> not, snmp
> not, email? Not sure, but don't think so). If it is just a db thing, it
> does
> not belong into the core.
> 
> Rainer
> 
> >
> > David Lang
> >
> > > Rainer
> > >
> > >> David Lang
> > >>
> > >>> Feedback is appreciated.
> > >>>
> > >>> Rainer
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: [email protected] [mailto:rsyslog-
> > >>>> [email protected]] On Behalf Of Rainer Gerhards
> > >>>> Sent: Thursday, April 23, 2009 4:38 PM
> > >>>> To: rsyslog-users
> > >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> > >>>> loginsertion(fwd)
> > >>>>
> > >>>> That's interesting. As a side-activity, I am thinking about a
> new
> > >>>> output
> > >>>> module interface. Especially given the discussion on the
> postgres
> > >> list,
> > >>>> but
> > >>>> also some other thoughts about other modules (e.g. omtcp or the
> > file
> > >>>> output),
> > >>>> I tend to use an approach that permits both string-based as well
> > as
> > >>>> API-based
> > >>>> (API as in libpq) ways of doing things. I have not really
> designed
> > >>>> anything,
> > >>>> but the rough idea is that each plugin needs three entry points:
> > >>>>
> > >>>> - start batch
> > >>>> - process single message
> > >>>> - end batch
> > >>>>
> > >>>> Then, the plugin can decide itself what it wants to do and when.
> > >> Most
> > >>>> importantly, this calling interface works well for string-based
> > >>>> transactions
> > >>>> as well as API-based ones.
> > >>>>
> > >>>> For the output file writer, for example, I envision that over
> time
> > >> it
> > >>>> will
> > >>>> have its own write buffer (for various reasons, for example I am
> > >> also
> > >>>> discussing zipped writing with some folks). With this interface,
> I
> > >> can
> > >>>> put
> > >>>> everything into the buffer, write out if needed but not if there
> > is
> > >> no
> > >>>> immediate need but I can make sure that I write out when the
> "end
> > >>>> batch"
> > >>>> entry point is called.
> > >>>>
> > >>>> As I said, it is not really thought out yet, but maybe a
> starting
> > >>>> point. So
> > >>>> feedback is appreciated.
> > >>>>
> > >>>> Rainer
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: [email protected] [mailto:rsyslog-
> > >>>>> [email protected]] On Behalf Of [email protected]
> > >>>>> Sent: Wednesday, April 22, 2009 10:11 PM
> > >>>>> To: rsyslog-users
> > >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> log
> > >>>>> insertion(fwd)
> > >>>>>
> > >>>>> from the postgres performance mailing list, relative speeds of
> > >>>>> different
> > >>>>> ways of inserting data.
> > >>>>>
> > >>>>> I've asked if the 'seperate inserts' mode is seperate round
> trips
> > >> or
> > >>>>> many
> > >>>>> inserts in one round trip.
> > >>>>>
> > >>>>> based on this it looks like prepared statements make a
> > difference,
> > >>>> but
> > >>>>> not
> > >>>>> so much that other techniques (either a single statement or a
> > copy)
> > >>>>> aren't
> > >>>>> comparable (or better) options.
> > >>>>>
> > >>>>> David Lang
> > >>>>>
> > >>>>> ---------- Forwarded message ----------
> > >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400
> > >>>>> From: Glenn Maynard <[email protected]>
> > >>>>> To: [email protected]
> > >>>>> Subject: Re: [PERFORM] performance for high-volume log
> insertion
> > >>>>>
> > >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost
> > <[email protected]>
> > >>>>> wrote:
> > >>>>>> Yes, as I beleive was mentioned already, planning time for
> > inserts
> > >>>> is
> > >>>>>> really small.  Parsing time for inserts when there's little
> > >> parsing
> > >>>>> that
> > >>>>>> has to happen also isn't all *that* expensive and the same
> goes
> > >> for
> > >>>>>> conversions from textual representations of data to binary.
> > >>>>>>
> > >>>>>> We're starting to re-hash things, in my view.  The low-hanging
> > >>>> fruit
> > >>>>> is
> > >>>>>> doing multiple things in a single transaction, either by using
> > >>>> COPY,
> > >>>>>> multi-value INSERTs, or just multiple INSERTs in a single
> > >>>>> transaction.
> > >>>>>> That's absolutely step one.
> > >>>>>
> > >>>>> This is all well-known, covered information, but perhaps some
> > >> numbers
> > >>>>> will help drive this home.  40000 inserts into a single-column,
> > >>>>> unindexed table; with predictable results:
> > >>>>>
> > >>>>> separate inserts, no transaction: 21.21s
> > >>>>> separate inserts, same transaction: 1.89s
> > >>>>> 40 inserts, 100 rows/insert: 0.18s
> > >>>>> one 40000-value insert: 0.16s
> > >>>>> 40 prepared inserts, 100 rows/insert: 0.15s
> > >>>>> COPY (text): 0.10s
> > >>>>> COPY (binary): 0.10s
> > >>>>>
> > >>>>> Of course, real workloads will change the weights, but this is
> > more
> > >>>> or
> > >>>>> less the magnitude of difference I always see--batch your
> inserts
> > >>>> into
> > >>>>> single statements, and if that's not enough, skip to COPY.
> > >>>>>
> > >>>>> --
> > >>>>> Glenn Maynard
> > >>>>>
> > >>>>> --
> > >>>>> Sent via pgsql-performance mailing list (pgsql-
> > >>>>> [email protected])
> > >>>>> To make changes to your subscription:
> > >>>>> http://www.postgresql.org/mailpref/pgsql-performance
> > >>>> _______________________________________________
> > >>>> rsyslog mailing list
> > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> > >>>> http://www.rsyslog.com
> > >>> _______________________________________________
> > >>> rsyslog mailing list
> > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog
> > >>> http://www.rsyslog.com
> > > _______________________________________________
> > > rsyslog mailing list
> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > > http://www.rsyslog.com
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] [PERFORM] performance for high-volumeloginsertion(fwd)

Reply via email to