> -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Friday, April 24, 2009 7:57 AM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume > loginsertion(fwd) > > On Fri, 24 Apr 2009, Rainer Gerhards wrote: > > > Another innocent question: > > > > Let's say I used an exec() API exclusively. Now let me assume that I > do, on > > the *same* database connection, this calling sequence: > > > > exec("begin transaction") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") > > exec("insert ...") [Point A] > > exec("commit") > > > > Is it safe to assume that this will result in a performance benefit > (I know > > that it causes more network traffic than necessary, but that's not my > point - > > I just talk of speedup). Will this performance speedup be > considerable (along > > the magnitude of 20 vs. 3 seconds for a given sequence?). > > Yes, this speedup would be considerable > > from the message at the bottom it would be on the order of > > >>> separate inserts, no transaction: 21.21s > >>> separate inserts, same transaction: 1.89s
I read this, just wanted some reconfirmation. > > there is still another order of magnatude gain to be had by going to > the > copy (and eliminating the extra round trips) > > >>> COPY (text): 0.10s Definitely, but let's tackle the 90% issue first. > > a copy looks something like > > copy to table X from STDIN > data > data > data > > > > Also, even more importantly, does this really many they are all in > one > > transaction? > > yes. > > > In particular, what happens if the connection breaks at [Point > > A], e.g. by the network connection going down for an extended period > of time. > > Is it safe to assume that then everything will be rolled back? > > yes, every one of them would dissappear. > So it looks my three-call (beginBatch, pushData, EndBatch) calling interface can probably work. I need to work on how non-transactional outputs can convey what they have commited, but the basic interface looks rather good. Rainer > David Lang > > > Feedback is appreciated. > > > > Rainer > > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of Rainer Gerhards > >> Sent: Thursday, April 23, 2009 4:38 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] [PERFORM] performance for high-volume > >> loginsertion(fwd) > >> > >> That's interesting. As a side-activity, I am thinking about a new > >> output > >> module interface. Especially given the discussion on the postgres > list, > >> but > >> also some other thoughts about other modules (e.g. omtcp or the file > >> output), > >> I tend to use an approach that permits both string-based as well as > >> API-based > >> (API as in libpq) ways of doing things. I have not really designed > >> anything, > >> but the rough idea is that each plugin needs three entry points: > >> > >> - start batch > >> - process single message > >> - end batch > >> > >> Then, the plugin can decide itself what it wants to do and when. > Most > >> importantly, this calling interface works well for string-based > >> transactions > >> as well as API-based ones. > >> > >> For the output file writer, for example, I envision that over time > it > >> will > >> have its own write buffer (for various reasons, for example I am > also > >> discussing zipped writing with some folks). With this interface, I > can > >> put > >> everything into the buffer, write out if needed but not if there is > no > >> immediate need but I can make sure that I write out when the "end > >> batch" > >> entry point is called. > >> > >> As I said, it is not really thought out yet, but maybe a starting > >> point. So > >> feedback is appreciated. > >> > >> Rainer > >> > >>> -----Original Message----- > >>> From: [email protected] [mailto:rsyslog- > >>> [email protected]] On Behalf Of [email protected] > >>> Sent: Wednesday, April 22, 2009 10:11 PM > >>> To: rsyslog-users > >>> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > >>> insertion(fwd) > >>> > >>> from the postgres performance mailing list, relative speeds of > >>> different > >>> ways of inserting data. > >>> > >>> I've asked if the 'seperate inserts' mode is seperate round trips > or > >>> many > >>> inserts in one round trip. > >>> > >>> based on this it looks like prepared statements make a difference, > >> but > >>> not > >>> so much that other techniques (either a single statement or a copy) > >>> aren't > >>> comparable (or better) options. > >>> > >>> David Lang > >>> > >>> ---------- Forwarded message ---------- > >>> Date: Wed, 22 Apr 2009 15:33:21 -0400 > >>> From: Glenn Maynard <[email protected]> > >>> To: [email protected] > >>> Subject: Re: [PERFORM] performance for high-volume log insertion > >>> > >>> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <[email protected]> > >>> wrote: > >>>> Yes, as I beleive was mentioned already, planning time for inserts > >> is > >>>> really small. Parsing time for inserts when there's little > parsing > >>> that > >>>> has to happen also isn't all *that* expensive and the same goes > for > >>>> conversions from textual representations of data to binary. > >>>> > >>>> We're starting to re-hash things, in my view. The low-hanging > >> fruit > >>> is > >>>> doing multiple things in a single transaction, either by using > >> COPY, > >>>> multi-value INSERTs, or just multiple INSERTs in a single > >>> transaction. > >>>> That's absolutely step one. > >>> > >>> This is all well-known, covered information, but perhaps some > numbers > >>> will help drive this home. 40000 inserts into a single-column, > >>> unindexed table; with predictable results: > >>> > >>> separate inserts, no transaction: 21.21s > >>> separate inserts, same transaction: 1.89s > >>> 40 inserts, 100 rows/insert: 0.18s > >>> one 40000-value insert: 0.16s > >>> 40 prepared inserts, 100 rows/insert: 0.15s > >>> COPY (text): 0.10s > >>> COPY (binary): 0.10s > >>> > >>> Of course, real workloads will change the weights, but this is more > >> or > >>> less the magnitude of difference I always see--batch your inserts > >> into > >>> single statements, and if that's not enough, skip to COPY. > >>> > >>> -- > >>> Glenn Maynard > >>> > >>> -- > >>> Sent via pgsql-performance mailing list (pgsql- > >>> [email protected]) > >>> To make changes to your subscription: > >>> http://www.postgresql.org/mailpref/pgsql-performance > >> _______________________________________________ > >> rsyslog mailing list > >> http://lists.adiscon.net/mailman/listinfo/rsyslog > >> http://www.rsyslog.com > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

