Hello Andres,

With your worst-case figure and some rounding, it seems to look like:

  #threads    collision probability    performance impact
    2             1/40                    1/3200
    4             1/7                     1/533
    8             0.7                     < 0.01 (about 1%)

This suggest that for a pessimistic (ro load) fprintf overhead ratio there
would be a small impact even with 8 thread doing 20000 tps each.

I think math like this mostly disregards hardware realities.

Hmmm. In my mind, doing the maths helps understand what may be going on.

Note that it does not preclude to check afterwards that it does indeed correspond to reality:-)

The key suggestion of the maths is that if p*t << 1 all is (seems) well.

You don't actually need to have actual lock contention to notice overhead.

The overhead assumed is 1/40 of the transaction time from Tomas' measures. Given the ~ 18000 tps (we are talking of an in-memory read-only load probably on the same host), transaction time for pgbench seems to be about 0.06 ms, and fprintf seems to be about 0.0015 ms (1.5 µs).

- frequently acquiring an *uncontended* lock that resides in another socket's cache and where the cacheline is dirty requires relatively expensive cross cpu transfers. That's all besides the overhead of doing a lock operation itself. A lock; xaddl;, or whatever you end up using, has a significant cost in itself. It implies a bus lock and cache flush, which is far from free.

Ok, I did not assume an additional "lock cost". Do you have a figure? A quick googling suggested figure for "lightweight mutexes" around 100 ns, but the test conditions were unclear. If it is oky, then it is does not change much the above maths to add that overhead.

Additionally we're quite possibly talking about more than 8 threads. I've frequently used pgbench with hundreds of threads; for imo good reasons.

Good for you. I do not have access to a host on which this would make sense:-)

That all said, it's far from guaranteed that there's an actual problem
here. If done right, i.e. the expensive formatting of the string is
separated from the locked output to the kernel, it might end up being
acceptable.

That is what I would like to assess. Indeed, probably snprinf (to avoid mallocing anything) and then fputs/write/whatever would indeed help reduce the "contention" probability, if not the actual overhead.

--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to