Jumping into that threat again... I am thinking on how to enhance the engine so that fastest-possible database writes (actually, any output) are possible. However, I come across a couple of points. I would like to do so in the most generic way. Let me quote those message parts that I have specific questions on (out of sequence, thus I preserve the full message below - if you need more context).
> I made a small Python prototype to do something similar to what you > propose, with no batches, but committing each 1000 entries. The speedup > I got by introducing batches was about a factor 50. And the statement > was already prepared. Could you check what actually brings most of the speedup - the batches or the prepared statement. I am thinking along the lines of using batches but not prepared statements, as in this sample begin insert ... insert ... insert ... insert ... end Does this offer dramatic improvement? How much more improvement does the prepared statements offer (My hope is that you can quickly modify the Python prototype to provide a rough idea). This question stems back to me wondering if it is worth to rewrite all existing DB plugins to use prepared statements. Batching, as described above, can be done with far less modification to the plugin. And second question. Let's envision that the rsyslog core could provide you with multiple data records at once. For the case given above, I could still simply pass in a single - now longer - string (that makes it that attractive for the other db plugins). However, that does not work for the omoracle interface. Let's say the new interface we created is a "vector interface" as it provide each data item as part of a one-dimensional vector (or tuple). Then, it would look most natural to me if we extend this to "matrix interface", where you receive a tuple of tuples (or a two-dimensional structure that "feels" much like a SQL result set). What that be useful for you? Or, the other way around, what would you consider an optimal interface to your plugin if the rsyslog core would provide batching support? Feedback, from everyone interested, is highly appreciated and useful. Thanks, Rainer > -----Original Message----- > From: [email protected] [mailto:rsyslog- > [email protected]] On Behalf Of Luis Fernando Muñoz Mejías > Sent: Thursday, April 02, 2009 5:21 PM > To: [email protected] > Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for > batchoperations > > RB, > > > Oi. :) Sorry I'm late to the game. > > Your contribution is appreciated. :) > > > Forgive me - my database-performance-fu and oracle-fu are not > terribly > > strong, I may make a fool of myself here. What is the performance > > gain of making a prepared statement over just executing raw > > statements? > > The statement is parsed only once, so you save the overhead of parsing > and doing an execution plan for each execution, which will be > identical. And I expect to insert hundreds of entries per second. :) > > All you have to do is pass the arguments. > > > CREATE PROCEDURE zazz AS > > insert into foo(field1, field2, field3) values(:val1, :val2, > > :val3); SET TRANSACTION; zazz("foo", "bar", "baz"); zazz("foo1", > > "bar1", "baz1"); zazz("foo2", "bar2", "baz2"); COMMIT; > > > > -- over > > > > SET TRANSACTION; > > INSERT INTO foo(field1, field2, field3) values("foo", > > "bar", "baz"); > > INSERT INTO foo(field1, field2, field3) values("foo1", > > "bar1", "baz1"); > > INSERT INTO foo(field1, field2, field3) > > values("foo2", "bar2", "baz2"); COMMIT; > > > With this code, Oracle (any DB, actually) needs to parse each insert, > and then choose the execution plan that looks best once. > > What you get by preparing the statement and using batches is that the > client (rsyslog core) will store these triplets: > > (foo, bar, baz) (foo1, bar1, baz1) (foo2, bar2, baz2) > > and when you've hit a limit (say, you're on (foo1000, bar1000, > baz1000)) > send them all to the server at once (thus calling only once to > doAction, > calling only once to the Oracle interface), who will blindly execute > the > statement without wasting a single cycle on parsing or evaluating > execution plans: it's already done. > > > Perhaps that's not even what you're doing. > > For the moment I'm doing > > BEGIN > INSERT INTO foo(field1, field2, field3) values("foo", "bar", "baz"); > COMMIT; > BEGIN > INSERT INTO foo(field1, field2, field3) values("foo1", "bar1", "baz1"); > COMMIT; > > You can already imagine the overhead involved. Actually, all DB-based > modules on rsyslog do the same. > > > I know there are other considerations and niceties with procedures, > > It's not even a stored procedure, it's on the client doing > communicating > many times versus only one with the DB. > > > but the latter syntax would still allow for batched transactions > while > > enabling rsyslog to do the dirty work of formatting the query and not > > necessitating exposure of internal structures. > > > Indeed, I want rsyslog doing most of the work for me. But the overhead > involved in parsing and evaluating execution plans is unacceptable on > my > context. So I'm looking here for the balance between rsyslog doing work > for me and rsyslog performing as good as I need it. Perhaps exposing > the > structures is not a good idea, either. > > > IMHO, database output modules should still pretty much blindly > execute > > whatever SQL rsyslog hands them, be that wrapped in a transaction or > > not. > > > Yes and no. Yes, rsyslog should be the one who tells the statement to > be > executed. But there is no need for rsyslog to repeat that statement for > each entry (millions per day). Doing it at initialization time is > enough. > > I made a small Python prototype to do something similar to what you > propose, with no batches, but committing each 1000 entries. The speedup > I got by introducing batches was about a factor 50. And the statement > was already prepared. > > Cheers. > -- > Luis Fernando Muñoz Mejías > [email protected] > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

