On Fri, 17 Jan 2020 at 01:09, Alexey Kondratov <a.kondra...@postgrespro.ru> wrote: > > > I think we shouldn't touch the paths used by replication protocol. And > > don't we focus on how we make a change of a replication slot from SQL > > interface persistent? It seems to me that generaly we don't need to > > save dirty slots other than checkpoint, but the SQL function seems > > wanting the change to be saved immediately.
PLEASE do not make the streaming replication interface force flushes! The replication interface should not immediately flush changes to the slot replay position on advance. It should be marked dirty and left to be flushed by the next checkpoint. Doing otherwise potentially introduces a lot of unnecessary fsync()s and may have an unpleasant impact on performance. Clients of the replication protocol interface should be doing their own position tracking on the client side. They should not ever be relying on the server side restart position for correctness, since it can go backwards on crash and restart. Any that do rely on it are incorrect. I should propose a docs change that explains how the server and client restart position tracking interacts on both phy and logical rep since it's not really covered right now and naïve client implementations will be wrong. I don't really care if the SQL interface forces an immediate flush since it's never going to have good performance anyway. It's already impossible to write a strictly correct and crash safe client with the SQL interface. Adding forced flushing won't make that any better or worse. The SQL interface advances the slot restart position and marks the slot dirty *before the client has confirmed receipt of the data and flushed it to disk*. So there's a data loss window. If the client disconnects or crashes before all the data from that function call is safely flushed to disk it may lose the data, then be unable to fetch it again from the server because of the restart_lsn position advance. Really, we should add a "no_advance_position" option to the SQL interface, then expect the client to call a second function that explicitly advances the restart_lsn and confirmed_flush_lsn. Otherwise no SQL interface client can be crashsafe.