That's interesting. As a side-activity, I am thinking about a new output
module interface. Especially given the discussion on the postgres list, but
also some other thoughts about other modules (e.g. omtcp or the file output),
I tend to use an approach that permits both string-based as well as API-based
(API as in libpq) ways of doing things. I have not really designed anything,
but the rough idea is that each plugin needs three entry points:

- start batch
- process single message
- end batch

Then, the plugin can decide itself what it wants to do and when. Most
importantly, this calling interface works well for string-based transactions
as well as API-based ones.

For the output file writer, for example, I envision that over time it will
have its own write buffer (for various reasons, for example I am also
discussing zipped writing with some folks). With this interface, I can put
everything into the buffer, write out if needed but not if there is no
immediate need but I can make sure that I write out when the "end batch"
entry point is called.

As I said, it is not really thought out yet, but maybe a starting point. So
feedback is appreciated.

Rainer

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of [email protected]
> Sent: Wednesday, April 22, 2009 10:11 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] [PERFORM] performance for high-volume log
> insertion(fwd)
> 
> from the postgres performance mailing list, relative speeds of
> different
> ways of inserting data.
> 
> I've asked if the 'seperate inserts' mode is seperate round trips or
> many
> inserts in one round trip.
> 
> based on this it looks like prepared statements make a difference, but
> not
> so much that other techniques (either a single statement or a copy)
> aren't
> comparable (or better) options.
> 
> David Lang
> 
> ---------- Forwarded message ----------
> Date: Wed, 22 Apr 2009 15:33:21 -0400
> From: Glenn Maynard <[email protected]>
> To: [email protected]
> Subject: Re: [PERFORM] performance for high-volume log insertion
> 
> On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <[email protected]>
> wrote:
> > Yes, as I beleive was mentioned already, planning time for inserts is
> > really small.  Parsing time for inserts when there's little parsing
> that
> > has to happen also isn't all *that* expensive and the same goes for
> > conversions from textual representations of data to binary.
> >
> > We're starting to re-hash things, in my view.  The low-hanging fruit
> is
> > doing multiple things in a single transaction, either by using COPY,
> > multi-value INSERTs, or just multiple INSERTs in a single
> transaction.
> > That's absolutely step one.
> 
> This is all well-known, covered information, but perhaps some numbers
> will help drive this home.  40000 inserts into a single-column,
> unindexed table; with predictable results:
> 
> separate inserts, no transaction: 21.21s
> separate inserts, same transaction: 1.89s
> 40 inserts, 100 rows/insert: 0.18s
> one 40000-value insert: 0.16s
> 40 prepared inserts, 100 rows/insert: 0.15s
> COPY (text): 0.10s
> COPY (binary): 0.10s
> 
> Of course, real workloads will change the weights, but this is more or
> less the magnitude of difference I always see--batch your inserts into
> single statements, and if that's not enough, skip to COPY.
> 
> --
> Glenn Maynard
> 
> --
> Sent via pgsql-performance mailing list (pgsql-
> [email protected])
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to