On Mon, 20 Apr 2009, Rainer Gerhards wrote:

> Sorry for the silence, been thinking quite a bit ;)

no problem

>> -----Original Message-----
>> From: [email protected] [mailto:rsyslog-
>> [email protected]] On Behalf Of [email protected]
>> Sent: Saturday, April 18, 2009 1:29 AM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] RFC: On rsyslog output modules and support for
>> batchoperations
>>
>
>
> ...
>
>> as a strawman, and thinking of databases in general (not any particular
>> database), I see the needs for the database interface as being able to
>> be
>> generisized down to a set of config variables something like
>>
>> DBtype
>>    value:   one of "oracle|postgres|mysql|libdbi"
>>    purpose: determine which low-level communication library is used to
>> talk
>> to the DB
>>
>>
>> DBinit
>>    value:   string
>>    purpose: any initialization that needs to be done when first
>> connecting
>> to the database (sanity checks to make sure the DB has the correct
>> schema,
>> initializing sql functions, authentication, etc)
>>
>> DBstart
>>    value:   string
>>    purpose: fixed text ahead of any message content
>>
>> DBjoin
>>    value:   string
>>    purpose: fixed text used to join two messages togeather
>>
>> DBend
>>    value:   string
>>    purpose: fixed text used to end a message to the server
>>
>> DBmessage
>>    value:   rsyslog template
>>    purpose: format an individual message for the database
>>
>>
>> examples
>>
>> example 1 existing single-message handling
>>
>> DBinit=""
>> DBstart=""
>> DBjoin=""
>> DBend=""
>> DBmessage="insert into table logs values
>> ('$server','$timestamp','$msg');"
>>
>> resulting statement
>>
>> insert into table logs values ('server1','$timestamp',$'msg');
>>
>> example 2 prepared statement
>>
>> DBinit=""
>> DBstart=""
>> DBjoin="\n"
>> DBend="begin; execute_many (insert_statement, @batch); commit"
>> DBmessage="push (@batch, '$item');"
>>
>> resulting statement
>>
>> push (@batch, 'item1');
>> push (@batch, 'item2');
>> push (@batch, 'item3');
>> push (@batch, 'item4');
>> begin; execute_many (insert_statement, @batch); commit
>
>
> There is a problem with this example - and that is that each database
> provides its own API for prepared statements. Brief look tells the are quite
> similar (good!) but it also tells that you can not work with "just strings"
> (bad!). So the approach involves more than just crafting the right strings.
>
> Prepared statements, however, are quite useful (not only from a performance
> perspective), so it would definitely be a plus to have them.

every database that I have seen (including Oracle) has had the ability to 
create prepared statements and stored procedures from the text-based 
database tool, so I'm not understanding why working with 'just strings' 
isn't enough. could you explain more?

also, where prepared statements are good, stored procedures are probably 
better.

>>
>>
>> example 3 multiple inserts in one statment
>>
>> DBinit=""
>> DBstart="insert into table logs values "
>> DBjoin=", "
>> DBend=";"
>> DBmessage="('$server','$timestamp','$msg')"
>>
>> resulting statement
>>
>> insert into table logs values ('server1','time1','message1'),
>> ('server2','time2','message2'), ('server3','time3','message3'),
>> ('server4','time4','message4');
>>
>>
>> example 4 multiple inserts in one transaction
>>
>> DBinit=""
>> DBstart="begin;\n"
>> DBjoin="\n"
>> DBend="\ncommit;"
>> DBmessage="insert into table logs values
>> ('$server','$time','$message'); "
>>
>> resulting statement
>>
>> begin;
>> insert into table logs values ('server1','time1','message1');
>> insert into table logs values ('server2','time2','message2');
>> insert into table logs values ('server3','time3','message3');
>> insert into table logs values ('server4','time4','message4');
>> commit;
>>
>>
>> I don't happen to know the syntax to define a stored procedure off the
>> top
>> of my head or I would give you an example of that (which would use the
>> DBinit to define the stored procedure)
>
> I think this is quite similar to ordinary SQL, at least in those engines that
> I used in the past (not sure about MySQL, which IMHO is not really a
> full-blown SQL engine).

agreed.

>>
>> postgres has a 'copy' command, where you tell it that you are going to
>> follow with many lines of content to insert (which is significantly
>> faster
>> than insert statements, even batched up)
>
> There are other optimizations possible. MySQL, for example, has a mode where
> you permit the database to do inserts via a lazy writer, obviously at the
> risk of either consistency or even reliability (don't get the details
> together yet). Other engines, I guess, have other/additional optimizations.

correct, but you could set options like that in the DBinit string.

>> I believe that this 5-variable set can handle just about every
>> variation
>> in putting things in the database, and as such would allow the database
>> drivers themselves to be greatly simplified.
>
> I think what this really boils down is the design of a DB "superdriver" which
> provides core functionality we expect in many/most/at least the most
> relevant{Oracle, Postgres, MSSQL?, MySql) engines and that provides a
> minidriver layer, where the engine-specific functionality is actually linked
> in. This can greatly reduce the effort required to write a DB driver and it
> can also reduce the effort required to maintain currently existing drivers.
>
> But, still, there is some effort to do. Maybe we could achieve the same goal
> with a set of macros, that would boild down to an elegance vs. effort
> decision.
>
> Besides that, it is interesting to note that we can solve almost everything
> EXCEPT the prepared statements with the string-only method, and this is why I
> am so interested in the actual gain by prepared statements. I agree there is
> gain, I just wonder if it is sufficiently large to add the extra complexity
> (I'd expect that the far majority can be achieved by batching inside a single
> transaction, but even then, with 1000 insert statements you need to re-create
> the execution plan 1000 times...).
>
> It also boils down to what is wiser: start with a new DB abstraction or start
> with trying to make the queue support batches.

definantly making the queue support batches ;-)

as I see it, that will benifit all output modules, not just the DB ones. 
and you are the only person who can do the queue support while there are 
others who can (and do) work on the DB modules themselves.

I would expect it to take a bit of 'discussion' between the different DB 
folks for them to all agree on any new abstraction, no it's not something 
that can be started immediatly in any case.

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to