Re: [HACKERS] PATCH: pgbench - merging transaction logs

Fabien COELHO Sun, 29 Mar 2015 01:59:42 -0700


Hello Tomas,

The results are as follow:

 * 1 thread 33 runs median tps (average is consistent):
 - no logging:        22062
 - separate logging:  19360  (-12.2%)
 - merged logging:    19326  (-12.4%, not significant from previous)


Interesting. What hardware is this?

Dell PowerEdge R720 with 2 Intel Xeon CPU E5-2660 2.20GHz (8 cores and 16threads per processor, so 32 threads in total), running with Linux 3.13(Ubuntu trusty).

I wouldn't be surprised by this behavior on a multi-socket system, [...]


There are 2 sockets.

So my overall conclusion is:

(1) The simple thread-shared file approach would save pgbench from
post-processing merge-sort heavy code, for a reasonable cost.


No it wouldn't - you're missing the fact that the proposed approach
(shared file + fprintf) only works with raw transaction log.

It does not work with aggregated log - the threads would have to somehow
track the progress of the other threads somehow, in a very non-trivial
way (e.g. what if one of the threads executes a long query, and thus
does not send the results for a long time?).


The counters are updated when the transaction is finished anyway?

Another option would be to update shared aggregated results, but thatrequires locking.

That is what I had in mind. ISTM that the locking impact would be muchlower than for logging, the data is just locked for a counter update, andif counters are per-thread, a conflict may only occur when the data aregathered for actual logging, which would be rather infrequent. Even if thecounters are shared, the locking would be for small time that would implya low conflict probability. So I do not see this one as a significantperformance issue.

(2) The feature would not be available for the thread-emulation with
this approach, but I do not see this as a particular issue as I
think that it is pretty much only dead code and a maintenance burden.


I'm not willing to investigate that, nor am I willing to implement
another feature that works only sometimes (I've done that in the past,
and I find it a bad practice).

Hmmm. Keeping an obsolete feature with significant impact on how otherfeatures can be implemented, so basically a maintenance burden, does notlook like best practice *either*.

If someone else is willing to try to eliminate the thread emulation, I
won't object to that.


Hmmm. I'll try to trigger a discussion in another thread to test the idea.

But as I pointed out above, simple fprintf is not going to work for theaggregated log - solving that will need more code (e.g. maintainingaggregated results for all threads, requiring additional locking etc).

The code for that is probably simple and short, and my wish is to try toavoid an external merge sort post processing, if possible, which is notespecially cheap anyway, neither in code nor in time.

(3) Optimizing doLog from its current fprintf-based implementation
may be a good thing.


That's probably true. The simplest thing we can do right now is
buffering the data into larger chunks and writing those chunks.
That amortizes the costs of locking.

If it is buffered in the process, that would mean more locking. If it isbuffered per threads that would result in out of order logs. Would that bean issue? It would be fine with me.

Using O_APPEND, as suggested by Andres, seems like a promising idea.

I tried that with a shared file handle, but the impact seemed negligeable.The figures I reported used it, btw.

I also tried to open the same file in append mode from all threads, withpositive performance effects, but then the flush did not occur at lineboundaries to there was some mangling in the result.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: pgbench - merging transaction logs

Reply via email to