On Fri, 17 Apr 2009, Luis Fernando Mu?oz Mej?as wrote:
> Hi,
>
>> I am thinking on how to enhance the engine so that fastest-possible
>> database writes (actually, any output) are possible. However, I come
>> across a couple of points. I would like to do so in the most generic
>> way. Let me quote those message parts that I have specific questions
>> on (out of sequence, thus I preserve the full message below - if you
>> need more context).
>>
>> > I made a small Python prototype to do something similar to what you
>> > propose, with no batches, but committing each 1000 entries. The
>> > speedup I got by introducing batches was about a factor 50. And the
>> > statement was already prepared.
>>
>> Could you check what actually brings most of the speedup - the batches
>> or the prepared statement. I am thinking along the lines of using
>> batches but not prepared statements, as in this sample
>>
>> begin insert ... insert ... insert ... insert ... end
>
> I'll do, but please note that
>
> begin
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> execute(unprepared_insert_statement)
> commit
>
> Needs 4 message exchanges with the server. OTOH:
>
> <client>
> push (@batch, $item);
> push (@batch, $item);
> push (@batch, $item);
> push (@batch, $item);
> <send to server>
> begin
> execute_many (insert_statement, @batch)
> commit
>
> Requires only one, so the network overhead is *way* smaller. This is
> true not only of Oracle, but also of PostgreSQL, and I suppose MySQL
> provides similar API.
as a strawman, and thinking of databases in general (not any particular
database), I see the needs for the database interface as being able to be
generisized down to a set of config variables something like
DBtype
value: one of "oracle|postgres|mysql|libdbi"
purpose: determine which low-level communication library is used to talk
to the DB
DBinit
value: string
purpose: any initialization that needs to be done when first connecting
to the database (sanity checks to make sure the DB has the correct schema,
initializing sql functions, authentication, etc)
DBstart
value: string
purpose: fixed text ahead of any message content
DBjoin
value: string
purpose: fixed text used to join two messages togeather
DBend
value: string
purpose: fixed text used to end a message to the server
DBmessage
value: rsyslog template
purpose: format an individual message for the database
examples
example 1 existing single-message handling
DBinit=""
DBstart=""
DBjoin=""
DBend=""
DBmessage="insert into table logs values ('$server','$timestamp','$msg');"
resulting statement
insert into table logs values ('server1','$timestamp',$'msg');
example 2 prepared statement
DBinit=""
DBstart=""
DBjoin="\n"
DBend="begin; execute_many (insert_statement, @batch); commit"
DBmessage="push (@batch, '$item');"
resulting statement
push (@batch, 'item1');
push (@batch, 'item2');
push (@batch, 'item3');
push (@batch, 'item4');
begin; execute_many (insert_statement, @batch); commit
example 3 multiple inserts in one statment
DBinit=""
DBstart="insert into table logs values "
DBjoin=", "
DBend=";"
DBmessage="('$server','$timestamp','$msg')"
resulting statement
insert into table logs values ('server1','time1','message1'),
('server2','time2','message2'), ('server3','time3','message3'),
('server4','time4','message4');
example 4 multiple inserts in one transaction
DBinit=""
DBstart="begin;\n"
DBjoin="\n"
DBend="\ncommit;"
DBmessage="insert into table logs values ('$server','$time','$message'); "
resulting statement
begin;
insert into table logs values ('server1','time1','message1');
insert into table logs values ('server2','time2','message2');
insert into table logs values ('server3','time3','message3');
insert into table logs values ('server4','time4','message4');
commit;
I don't happen to know the syntax to define a stored procedure off the top
of my head or I would give you an example of that (which would use the
DBinit to define the stored procedure)
postgres has a 'copy' command, where you tell it that you are going to
follow with many lines of content to insert (which is significantly faster
than insert statements, even batched up)
I believe that this 5-variable set can handle just about every variation
in putting things in the database, and as such would allow the database
drivers themselves to be greatly simplified.
thoughts?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com