On 1/15/22 06:12, Fujii Masao wrote:
On 2022/01/12 1:07, Tomas Vondra wrote:
I explored the idea of using page LSN a bit
Many thanks!
The patch from 22/12 simply checks if the change should/would wait for
sync replica, and if yes it WAL-logs the sequence increment. There's a
couple problems with this, unfortunately:
Yes, you're right.
So I think this approach is not really an improvement over WAL-logging
every increment. But there's a better way, I think - we don't need to
generate WAL, we just need to ensure we wait for it to be flushed at
transaction end in RecordTransactionCommit().
That is, instead of generating more WAL, simply update XactLastRecEnd
and then ensure RecordTransactionCommit flushes/waits etc. Attached is a
patch doing that - the changes in sequence.c are trivial, changes in
RecordTransactionCommit simply ensure we flush/wait even without XID
(this actually raises some additional questions that I'll discuss in a
separate message in this thread).
This approach (and also my previous proposal) seems to assume that the
value returned from nextval() should not be used until the transaction
executing that nextval() has been committed successfully. But I'm not
sure how many applications follow this assumption. Some application
might use the return value of nextval() instantly before issuing commit
command. Some might use the return value of nextval() executed in
rollbacked transaction.
IMO any application that assumes data from uncommitted transactions is
outright broken and we should not try to fix that because it's quite
futile (and likely will affect well-behaving applications).
The issue I'm trying to fix in this thread is much narrower - we don't
actually meet the guarantees for committed transactions (that only did
nextval without generating any WAL).
If we want to avoid duplicate sequence value even in those cases, ISTM
that the transaction needs to wait for WAL flush and sync rep before
nextval() returns the value. Of course, this might cause other issues
like performance decrease, though.
Right, something like that. But that'd hurt well-behaving applications,
because by doing the wait earlier (in nextval, not at commit) it
increases the probability of waiting.
FWIW I'm not against improvements in this direction, but it goes way
beyong fixing the original issue.
On btrfs, it looks like this (the numbers next to nextval are the cache
size, with 1 being the default):
client test master log-all page-lsn log-all page-lsn
-------------------------------------------------------------------
1 insert 829 807 802 97% 97%
nextval/1 16491 814 16465 5% 100%
nextval/32 24487 16462 24632 67% 101%
nextval/64 24516 24918 24671 102% 101%
nextval/128 32337 33178 32863 103% 102%
client test master log-all page-lsn log-all page-lsn
-------------------------------------------------------------------
4 insert 1577 1590 1546 101% 98%
nextval/1 45607 1579 21220 3% 47%
nextval/32 68453 49141 51170 72% 75%
nextval/64 66928 65534 66408 98% 99%
nextval/128 83502 81835 82576 98% 99%
The results seem clearly better, I think.
Thanks for benchmarking this! I agree that page-lsn is obviously better
than log-all.
For "insert" there's no drop at all (same as before), because as soon as
a transaction generates any WAL, it has to flush/wait anyway.
And for "nextval" there's a drop, but only with 4 clients, and it's much
smaller (53% instead of 97%). And increasing the cache size eliminates
even that.
Yes, but 53% drop would be critial for some applications that don't want
to increase the cache size for some reasons. So IMO it's better to
provide the option to enable/disable that page-lsn approach.
I disagree. This drop applies only to extremely simple transactions -
once the transaction does any WAL write, it disappears. Even if the
transaction does only a couple reads, it'll go away. I find it hard to
believe there's any serious application doing this.
So I think we should get it reliable (to not lose data after commit)
first and then maybe see if we can improve this.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company