Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Rainer Gerhards Thu, 23 Apr 2009 22:34:21 -0700

Another innocent question:

Let's say I used an exec() API exclusively. Now let me assume that I do, on
the *same* database connection, this calling sequence:


exec("begin transaction")
exec("insert ...") 
exec("insert ...")
exec("insert ...")
exec("insert ...")
exec("insert ...")
exec("insert ...")   [Point A]
exec("commit")

Is it safe to assume that this will result in a performance benefit (I know
that it causes more network traffic than necessary, but that's not my point -
I just talk of speedup). Will this performance speedup be considerable (along
the magnitude of 20 vs. 3 seconds for a given sequence?).

Also, even more importantly, does this really many they are all in one
transaction? In particular, what happens if the connection breaks at [Point
A], e.g. by the network connection going down for an extended period of time.
Is it safe to assume that then everything will be rolled back?

Feedback is appreciated.

Rainer

> -----Original Message-----
> From: [email protected] [mailto:rsyslog-
> [email protected]] On Behalf Of Rainer Gerhards
> Sent: Thursday, April 23, 2009 4:38 PM
> To: rsyslog-users
> Subject: Re: [rsyslog] [PERFORM] performance for high-volume
> loginsertion(fwd)
> 
> That's interesting. As a side-activity, I am thinking about a new
> output
> module interface. Especially given the discussion on the postgres list,
> but
> also some other thoughts about other modules (e.g. omtcp or the file
> output),
> I tend to use an approach that permits both string-based as well as
> API-based
> (API as in libpq) ways of doing things. I have not really designed
> anything,
> but the rough idea is that each plugin needs three entry points:
> 
> - start batch
> - process single message
> - end batch
> 
> Then, the plugin can decide itself what it wants to do and when. Most
> importantly, this calling interface works well for string-based
> transactions
> as well as API-based ones.
> 
> For the output file writer, for example, I envision that over time it
> will
> have its own write buffer (for various reasons, for example I am also
> discussing zipped writing with some folks). With this interface, I can
> put
> everything into the buffer, write out if needed but not if there is no
> immediate need but I can make sure that I write out when the "end
> batch"
> entry point is called.
> 
> As I said, it is not really thought out yet, but maybe a starting
> point. So
> feedback is appreciated.
> 
> Rainer
> 
> > -----Original Message-----
> > From: [email protected] [mailto:rsyslog-
> > [email protected]] On Behalf Of [email protected]
> > Sent: Wednesday, April 22, 2009 10:11 PM
> > To: rsyslog-users
> > Subject: Re: [rsyslog] [PERFORM] performance for high-volume log
> > insertion(fwd)
> >
> > from the postgres performance mailing list, relative speeds of
> > different
> > ways of inserting data.
> >
> > I've asked if the 'seperate inserts' mode is seperate round trips or
> > many
> > inserts in one round trip.
> >
> > based on this it looks like prepared statements make a difference,
> but
> > not
> > so much that other techniques (either a single statement or a copy)
> > aren't
> > comparable (or better) options.
> >
> > David Lang
> >
> > ---------- Forwarded message ----------
> > Date: Wed, 22 Apr 2009 15:33:21 -0400
> > From: Glenn Maynard <[email protected]>
> > To: [email protected]
> > Subject: Re: [PERFORM] performance for high-volume log insertion
> >
> > On Wed, Apr 22, 2009 at 8:19 AM, Stephen Frost <[email protected]>
> > wrote:
> > > Yes, as I beleive was mentioned already, planning time for inserts
> is
> > > really small.  Parsing time for inserts when there's little parsing
> > that
> > > has to happen also isn't all *that* expensive and the same goes for
> > > conversions from textual representations of data to binary.
> > >
> > > We're starting to re-hash things, in my view.  The low-hanging
> fruit
> > is
> > > doing multiple things in a single transaction, either by using
> COPY,
> > > multi-value INSERTs, or just multiple INSERTs in a single
> > transaction.
> > > That's absolutely step one.
> >
> > This is all well-known, covered information, but perhaps some numbers
> > will help drive this home.  40000 inserts into a single-column,
> > unindexed table; with predictable results:
> >
> > separate inserts, no transaction: 21.21s
> > separate inserts, same transaction: 1.89s
> > 40 inserts, 100 rows/insert: 0.18s
> > one 40000-value insert: 0.16s
> > 40 prepared inserts, 100 rows/insert: 0.15s
> > COPY (text): 0.10s
> > COPY (binary): 0.10s
> >
> > Of course, real workloads will change the weights, but this is more
> or
> > less the magnitude of difference I always see--batch your inserts
> into
> > single statements, and if that's not enough, skip to COPY.
> >
> > --
> > Glenn Maynard
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-
> > [email protected])
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Re: [rsyslog] [PERFORM] performance for high-volume loginsertion(fwd)

Reply via email to