On 22.10.2019 20:22, Tomas Vondra wrote:
On Tue, Oct 22, 2019 at 11:01:48AM +0530, Dilip Kumar wrote:
On Tue, Oct 22, 2019 at 10:46 AM Amit Kapila <amit.kapil...@gmail.com> wrote:
  In general, yours and Alexy's test results
show that there is merit by having workers applying such transactions.
  OTOH, as noted above [1], we are also worried about the performance
of Rollbacks if we follow that approach.  I am not sure how much we
need to worry about Rollabcks if commits are faster, but can we think
of recording the changes in memory and only write to a file if the
changes are above a certain threshold?  I think that might help saving
I/O in many cases.  I am not very sure if we do that how much
additional workers can help, but they might still help.  I think we
need to do some tests and experiments to figure out what is the best
approach?  What do you think?
I agree with the point.  I think we might need to do some small
changes and test to see what could be the best method to handle the
streamed changes at the subscriber end.


Tomas, Alexey, do you have any thoughts on this matter?  I think it is
important that we figure out the way to proceed in this patch.

[1] - https://www.postgresql.org/message-id/b25ce80e-f536-78c8-d5c8-a5df3e230785%40postgrespro.ru



I think the patch should do the simplest thing possible, i.e. what it
does today. Otherwise we'll never get it committed.


I have to agree with Tomas, that keeping things as simple as possible should be a main priority right now. Otherwise, the entire patch set will pass next release cycle without being committed at least partially. In the same time, it resolves important problem from my perspective. It moves I/O overhead from primary to replica using large transactions streaming, which is a nice to have feature I guess.

Later it would be possible to replace logical apply worker with bgworkers pool in a separated patch, if we decide that it is a viable solution. Anyway, regarding the Amit's questions:

- I doubt that maintaining a separate buffer on the apply side before spilling to disk would help enough. We already have ReorderBuffer with logical_work_mem limit, and if we exceeded that limit on the sender side, then most probably we exceed it on the applier side as well, excepting the case when this new buffer size will be significantly higher then logical_work_mem to keep multiple open xacts.

- I still think that we should optimize database for commits, not rollbacks. BGworkers pool is dramatically slower for rollbacks-only load, though being at least twice as faster for commits-only. I do not know how it will perform with real life load, but this drawback may be inappropriate for such a general purpose database like Postgres.

- Tomas' implementation of streaming with spilling does not have this bias between commits/aborts. However, it has a noticeable performance drop (~x5 slower compared with master [1]) for large transaction consisting of many small rows. Although it is not of an order of magnitude slower.

Another thing is it that about a year ago I have found some problems with MVCC/visibility and fixed them somehow [1]. If I get it correctly Tomas adapted some of those fixes into his patch set, but I think that this part should be reviewed carefully again. I would be glad to check it, but now I am a little bit confused with all the patch set variants in the thread. Which is the last one? Is it still dependent on 2pc decoding?

[1] https://www.postgresql.org/message-id/flat/40c38758-04b5-74f4-c963-cf300f9e5dff%40postgrespro.ru#98d06fefc88122385dacb2f03f7c30f7


Thanks for moving this patch forward!

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



Reply via email to