That's interesting. As a side-activity, I am thinking about a new output module interface. Especially given the discussion on the postgres list, but also some other thoughts about other modules (e.g. omtcp or the file output), I tend to use an approach that permits both string-based as well as API-based (API as in libpq) ways of doing things. I have not really designed anything, but the rough idea is that each plugin needs three entry points:
- start batch - process single message - end batch Then, the plugin can decide itself what it wants to do and when. Most importantly, this calling interface works well for string-based transactions as well as API-based ones. For the output file writer, for example, I envision that over time it will have its own write buffer (for various reasons, for example I am also discussing zipped writing with some folks). With this interface, I can put everything into the buffer, write out if needed but not if there is no immediate need but I can make sure that I write out when the "end batch" entry point is called. As I said, it is not really thought out yet, but maybe a starting point. So feedback is appreciated. Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of [email protected] > Sent: Wednesday, April 22, 2009 10:11 PM > To: rsyslog-users > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log > insertion(fwd) > > from the postgres performance mailing list, relative speeds of > different > ways of inserting data. > > I've asked if the 'seperate inserts' mode is seperate round trips or > many > inserts in one round trip. > > based on this it looks like prepared statements make a difference, but > not > so much that other techniques (either a single statement or a copy) > aren't > comparable (or better) options. > > David Lang > > ---------- Forwarded message ---------- > Date: Wed, 22 Apr 2009 15:33:21 -0400 > From: Glenn Maynard <[email protected]> > To: [email protected] > Subject: Re: [PERFORM] performance for high-volume log insertion > > On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <[email protected]> > wrote: > > Yes, as I beleive was mentioned already, planning time for inserts is > > really small. Parsing time for inserts when there's little parsing > that > > has to happen also isn't all *that* expensive and the same goes for > > conversions from textual representations of data to binary. > > > > We're starting to re-hash things, in my view. The low-hanging fruit > is > > doing multiple things in a single transaction, either by using COPY, > > multi-value INSERTs, or just multiple INSERTs in a single > transaction. > > That's absolutely step one. > > This is all well-known, covered information, but perhaps some numbers > will help drive this home. 40000 inserts into a single-column, > unindexed table; with predictable results: > > separate inserts, no transaction: 21.21s > separate inserts, same transaction: 1.89s > 40 inserts, 100 rows/insert: 0.18s > one 40000-value insert: 0.16s > 40 prepared inserts, 100 rows/insert: 0.15s > COPY (text): 0.10s > COPY (binary): 0.10s > > Of course, real workloads will change the weights, but this is more or > less the magnitude of difference I always see--batch your inserts into > single statements, and if that's not enough, skip to COPY. > > -- > Glenn Maynard > > -- > Sent via pgsql-performance mailing list (pgsql- > [email protected]) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

