Hi,

> I am thinking on how to enhance the engine so that fastest-possible
> database writes (actually, any output) are possible. However, I come
> across a couple of points. I would like to do so in the most generic
> way. Let me quote those message parts that I have specific questions
> on (out of sequence, thus I preserve the full message below - if you
> need more context).
>
> > I made a small Python prototype to do something similar to what you
> > propose, with no batches, but committing each 1000 entries. The
> > speedup I got by introducing batches was about a factor 50. And the
> > statement was already prepared.
>
> Could you check what actually brings most of the speedup - the batches
> or the prepared statement. I am thinking along the lines of using
> batches but not prepared statements, as in this sample
>
> begin insert ...  insert ...  insert ...  insert ...  end

I'll do, but please note that

begin
execute(unprepared_insert_statement)
execute(unprepared_insert_statement)
execute(unprepared_insert_statement)
execute(unprepared_insert_statement)
commit

Needs 4 message exchanges with the server. OTOH:

<client>
push (@batch, $item);
push (@batch, $item);
push (@batch, $item);
push (@batch, $item);
<send to server>
begin
execute_many (insert_statement, @batch)
commit

Requires only one, so the network overhead is *way* smaller. This is
true not only of Oracle, but also of PostgreSQL, and I suppose MySQL
provides similar API.

I'll try to verify where the hottest spot is, anyways.

> And second question. Let's envision that the rsyslog core could
> provide you with multiple data records at once.

That would be *great*.

> For the case given above, I could still simply pass in a single - now
> longer - string (that makes it that attractive for the other db
> plugins). However, that does not work for the omoracle interface.

For omoracle it's not good, indeed. Also, I don't think you want to
maintain yet another way of passing messages to modules. IMHO, we have
two orthogonal use cases:

a) the module wants all messages one by one and is happy with it (all
modules but omoracle).

b) the module wants to handle the properties in big batches (omoracle).

IMHO, this is flexible enough for new developers to choose between easy
and fast.

> Let's say the new interface we created is a "vector interface" as it
> provide each data item as part of a one-dimensional vector (or
> tuple). Then, it would look most natural to me if we extend this to
> "matrix interface", where you receive a tuple of tuples (or a
> two-dimensional structure that "feels" much like a SQL result set).

Indeed, that's what I have to maintain in omoracle. If I could offload
it to rsyslog's core it would be even better.

> What that be useful for you? Or, the other way around, what
> would you consider an optimal interface to your plugin if the rsyslog
> core would provide batching support?
>
The matrix-like structure is the one I need, indeed. :)

Cheers.
-- 
Luis Fernando Muñoz Mejías
[email protected]

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to