> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of Rainer Gerhards > Sent: Friday, April 24, 2009 8:55 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high- > volumeloginsertion(fwd) > > > > -----Original Message----- > > From: [email protected] [mailto:rsyslog- > > [email protected]] On Behalf Of [email protected] > > Sent: Friday, April 24, 2009 8:45 AM > > To: rsyslog-users > > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > > > > loginsertion(fwd) > > > So it looks my three-call (beginBatch, pushData, EndBatch) calling > > interface > > > can probably work. I need to work on how non-transactional outputs > > can convey > > > what they have commited, but the basic interface looks rather good. > > > > yes, although there is benifit in making these not be seperate exec > > statements but instead sending them to the database as you go along > (I > > Definitely, but I'd consider this an implementation detail. If it is > worth > it, every plugin in question may implement this mode. I'd also say it > is not > too much work, depending on what "too much" means to you ;) > > > don't know the library well enough to know how to do a non-blocking > > call > > like this) or crafting one long string and sending it all at once. > even > > if > > the pieces are generated by seperate write calls on the network > > filehandle, with a TCP datastream (and a fast sender), the number of > > round-trips may be far fewer than you think (what you create as > > seperate > > exec statements > > > > my earlier 4-part proposal (start, mid, stop, data) is _slightly_ > more > > flexible in that it has the mid/joiv variable, allowing for something > > to > > appear between points of data, but not at the end. > > > > i.e. > > > > insert into table X values (),(); > > > > your 3-part version would end up with an extra , at the end. > > > > while this isn't critical it is an easy way to gain about another > > factor > > of 10 > > I'd draw a subtle line here. I think what you propose is valid and > right, but > it is not something that belongs into the output plugin interface.
An additional clarification: you talk about string building, I talk about callbacks. Both things need to go together, but I think we are talking about separate entities. I currently think we need a triplet for the callbacks, but a quadruple for the string builder. I am not yet convinced that we need to put the string builder (using the quadruple) into the core. > > Let's use my triplet (beginBatch, pushData, endBatch) for a while. On > top of > that calling interface, the plugin can add strings in its configuration > (NOT > an interface issue!). So it could use the calling interface as follows: > > beginBatch: > emit start > > pushData: > if not first element in batch > emit mid > emit data > > endBatch > emit stop > > The question now is if there should be support in the core engine for > the > > If not first element in batch > Add mid > > functionality. I am not sure if there are other plugins but databases > that > could use it. So far, I doubt this (the file writer not, forwarding > not, snmp > not, email? Not sure, but don't think so). If it is just a db thing, it > does > not belong into the core. > > Rainer > > > > > David Lang > > > > > Rainer > > > > > >> David Lang > > >> > > >>> Feedback is appreciated. > > >>> > > >>> Rainer > > >>> > > >>>> -----Original Message----- > > >>>> From: [email protected] [mailto:rsyslog- > > >>>> [email protected]] On Behalf Of Rainer Gerhards > > >>>> Sent: Thursday, April 23, 2009 4:38 PM > > >>>> To: rsyslog-users > > >>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > > >>>> loginsertion(fwd) > > >>>> > > >>>> That's interesting. As a side-activity, I am thinking about a > new > > >>>> output > > >>>> module interface. Especially given the discussion on the > postgres > > >> list, > > >>>> but > > >>>> also some other thoughts about other modules (e.g. omtcp or the > > file > > >>>> output), > > >>>> I tend to use an approach that permits both string-based as well > > as > > >>>> API-based > > >>>> (API as in libpq) ways of doing things. I have not really > designed > > >>>> anything, > > >>>> but the rough idea is that each plugin needs three entry points: > > >>>> > > >>>> - start batch > > >>>> - process single message > > >>>> - end batch > > >>>> > > >>>> Then, the plugin can decide itself what it wants to do and when. > > >> Most > > >>>> importantly, this calling interface works well for string-based > > >>>> transactions > > >>>> as well as API-based ones. > > >>>> > > >>>> For the output file writer, for example, I envision that over > time > > >> it > > >>>> will > > >>>> have its own write buffer (for various reasons, for example I am > > >> also > > >>>> discussing zipped writing with some folks). With this interface, > I > > >> can > > >>>> put > > >>>> everything into the buffer, write out if needed but not if there > > is > > >> no > > >>>> immediate need but I can make sure that I write out when the > "end > > >>>> batch" > > >>>> entry point is called. > > >>>> > > >>>> As I said, it is not really thought out yet, but maybe a > starting > > >>>> point. So > > >>>> feedback is appreciated. > > >>>> > > >>>> Rainer > > >>>> > > >>>>> -----Original Message----- > > >>>>> From: [email protected] [mailto:rsyslog- > > >>>>> [email protected]] On Behalf Of [email protected] > > >>>>> Sent: Wednesday, April 22, 2009 10:11 PM > > >>>>> To: rsyslog-users > > >>>>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > log > > >>>>> insertion(fwd) > > >>>>> > > >>>>> from the postgres performance mailing list, relative speeds of > > >>>>> different > > >>>>> ways of inserting data. > > >>>>> > > >>>>> I've asked if the 'seperate inserts' mode is seperate round > trips > > >> or > > >>>>> many > > >>>>> inserts in one round trip. > > >>>>> > > >>>>> based on this it looks like prepared statements make a > > difference, > > >>>> but > > >>>>> not > > >>>>> so much that other techniques (either a single statement or a > > copy) > > >>>>> aren't > > >>>>> comparable (or better) options. > > >>>>> > > >>>>> David Lang > > >>>>> > > >>>>> ---------- Forwarded message ---------- > > >>>>> Date: Wed, 22 Apr 2009 15:33:21 -0400 > > >>>>> From: Glenn Maynard <[email protected]> > > >>>>> To: [email protected] > > >>>>> Subject: Re: [PERFORM] performance for high-volume log > insertion > > >>>>> > > >>>>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost > > <[email protected]> > > >>>>> wrote: > > >>>>>> Yes, as I beleive was mentioned already, planning time for > > inserts > > >>>> is > > >>>>>> really small. Parsing time for inserts when there's little > > >> parsing > > >>>>> that > > >>>>>> has to happen also isn't all *that* expensive and the same > goes > > >> for > > >>>>>> conversions from textual representations of data to binary. > > >>>>>> > > >>>>>> We're starting to re-hash things, in my view. The low-hanging > > >>>> fruit > > >>>>> is > > >>>>>> doing multiple things in a single transaction, either by using > > >>>> COPY, > > >>>>>> multi-value INSERTs, or just multiple INSERTs in a single > > >>>>> transaction. > > >>>>>> That's absolutely step one. > > >>>>> > > >>>>> This is all well-known, covered information, but perhaps some > > >> numbers > > >>>>> will help drive this home. 40000 inserts into a single-column, > > >>>>> unindexed table; with predictable results: > > >>>>> > > >>>>> separate inserts, no transaction: 21.21s > > >>>>> separate inserts, same transaction: 1.89s > > >>>>> 40 inserts, 100 rows/insert: 0.18s > > >>>>> one 40000-value insert: 0.16s > > >>>>> 40 prepared inserts, 100 rows/insert: 0.15s > > >>>>> COPY (text): 0.10s > > >>>>> COPY (binary): 0.10s > > >>>>> > > >>>>> Of course, real workloads will change the weights, but this is > > more > > >>>> or > > >>>>> less the magnitude of difference I always see--batch your > inserts > > >>>> into > > >>>>> single statements, and if that's not enough, skip to COPY. > > >>>>> > > >>>>> -- > > >>>>> Glenn Maynard > > >>>>> > > >>>>> -- > > >>>>> Sent via pgsql-performance mailing list (pgsql- > > >>>>> [email protected]) > > >>>>> To make changes to your subscription: > > >>>>> http://www.postgresql.org/mailpref/pgsql-performance > > >>>> _______________________________________________ > > >>>> rsyslog mailing list > > >>>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>>> http://www.rsyslog.com > > >>> _______________________________________________ > > >>> rsyslog mailing list > > >>> http://lists.adiscon.net/mailman/listinfo/rsyslog > > >>> http://www.rsyslog.com > > > _______________________________________________ > > > rsyslog mailing list > > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > > http://www.rsyslog.com > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

