Hello,

Yes but for a third thread (each on a physical core) it will be 1/40 +
1/40 and so on up to roughly 40/40 for 40 cores.

That is why I proposed a formula which depends on the number of threads.

[...] But they aren't constant only close. It may or not show up in this case but I've noticed that often the collision rate is a lot higher than the probability would suggest, I'm not sure why,

If so, I would suggested that the probability is wrong and try to understand why:-)

Moreover  they will write to the same cache lines for every fprintf
and this is very very bad even without atomic operations.

We're talking of transactions that involve network messages and possibly
disk IOs on the server, so some cache issues issues within pgbench would not
be a priori the main performance driver.

Sure but :
- good measurement is hard and by adding locking in fprintf it make
its timing more noisy.

This really depends on the probability of the lock collisions. If it is small enough, the impact would be negligeable.

- it's against 'good practices' for scalable code.
Trivial code can show that elapsed time for as low as four cores writing to same cache line in a loop, without locking or synchronization, is greater than the elapsed time for running these four loops sequentially on one core. If they write to different cache lines it scales linearly.

I'm not argumenting about general scalability principles, which may or may not be relevant to the case at hand.

I'm discussing whether the proposed feature can be implemented much simply with mutex instead of the current proposal which is on the heavy side, thus induces more maintenance effort latter.

Now I agree that if there is a mutex it must be a short as possible and not hinder performance significantly for pertinent use case. Note that overhead evaluation by Tomas is pessimistic as it only involves read-only transactions for which all transaction details are logged. Note also that if you have 1000 cores to run pgbench and that locking may be an issue, you could still use the per-thread logs.

The current discussion suggests that each thread should prepare the string off-lock (say with some sprintf) and then only lock when sending the string. This looks reasonable, but still need to be validated (i.e. the lock time would indeed be very small wrt the transaction time).

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to