On Thu, Jan 19, 2023 at 3:44 PM shveta malik <shveta.ma...@gmail.com> wrote: > > On Thu, Jan 19, 2023 at 11:11 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Jan 18, 2023 at 12:09 PM Amit Kapila <amit.kapil...@gmail.com> > > wrote: > > > > > > On Fri, Jan 13, 2023 at 11:50 AM Peter Smith <smithpb2...@gmail.com> > > > wrote: > > > > > > > > Here are some review comments for patch v79-0002. > > > > > > > > > > So, this is about the latest > > > v84-0001-Stop-extra-worker-if-GUC-was-changed. > > > > > > > > > > > I feel this patch just adds more complexity for almost no gain: > > > > - reducing the 'max_apply_workers_per_suibscription' seems not very > > > > common in the first place. > > > > - even when the GUC is reduced, at that point in time all the workers > > > > might be in use so there may be nothing that can be immediately done. > > > > - IIUC the excess workers (for a reduced GUC) are going to get freed > > > > naturally anyway over time as more transactions are completed so the > > > > pool size will reduce accordingly. > > > > > > > > > > I am still not sure if it is worth pursuing this patch because of the > > > above reasons. I don't think it would be difficult to add this even at > > > a later point in time if we really see a use case for this. > > > Sawada-San, IIRC, you raised this point. What do you think? > > > > > > The other point I am wondering is whether we can have a different way > > > to test partial serialization apart from introducing another developer > > > GUC (stream_serialize_threshold). One possibility could be that we can > > > have a subscription option (parallel_send_timeout or something like > > > that) with some default value (current_timeout used in the patch) > > > which will be used only when streaming = parallel. Users may want to > > > wait for more time before serialization starts depending on the > > > workload (say when resource usage is high on a subscriber-side > > > machine, or there are concurrent long-running transactions that can > > > block parallel apply for a bit longer time). I know with this as well > > > it may not be straightforward to test the functionality because we > > > can't be sure how many changes would be required for a timeout to > > > occur. This is just for brainstorming other options to test the > > > partial serialization functionality. > > > > > > > Apart from the above, we can also have a subscription option to > > specify parallel_shm_queue_size (queue_size used to determine the > > queue between the leader and parallel worker) and that can be used for > > this purpose. Basically, configuring it to a smaller value can help in > > reducing the test time but still, it will not eliminate the need for > > dependency on timing we have to wait before switching to partial > > serialize mode. I think this can be used in production as well to tune > > the performance depending on workload. > > > > Yet another way is to use the existing parameter logical_decode_mode > > [1]. If the value of logical_decoding_mode is 'immediate', then we can > > immediately switch to partial serialize mode. This will eliminate the > > dependency on timing. The one argument against using this is that it > > won't be as clear as a separate parameter like > > 'stream_serialize_threshold' proposed by the patch but OTOH we already > > have a few parameters that serve a different purpose when used on the > > subscriber. For example, 'max_replication_slots' is used to define the > > maximum number of replication slots on the publisher and the maximum > > number of origins on subscribers. Similarly, > > wal_retrieve_retry_interval' is used for different purposes on > > subscriber and standby nodes. > > > > [1] - https://www.postgresql.org/docs/devel/runtime-config-developer.html > > > > -- > > With Regards, > > Amit Kapila. > > Hi Amit, > > On rethinking the complete model, what I feel is that the name > logical_decoding_mode is not something which defines modes of logical > decoding. We, I think, picked it based on logical_decoding_work_mem. > As per current implementation, the parameter 'logical_decoding_mode' > tells what happens when work-memory used by logical decoding reaches > its limit. So it is in-fact 'logicalrep_workmem_vacate_mode' or
Minor correction in what I said earlier: As per current implementation, the parameter 'logical_decoding_mode' more or less tells how to deal with workmem i.e. to keep it buffering with txns until it reaches its limit or immediately vacate it. > 'logicalrep_trans_eviction_mode'. So if it is set to immediate, > meaning vacate the work-memory immediately or evict transactions > immediately. Add buffered means the reverse (i.e. keep on buffering > transactions until we reach a limit). Now coming to subscribers, we > can reuse the same parameter. On subscriber as well, shared-memory > queue could be considered as its workmem and thus the name > 'logicalrep_workmem_vacate_mode' might look more relevant. > > On publisher: > logicalrep_workmem_vacate_mode=immediate, buffered. > > On subscriber: > logicalrep_workmem_vacate_mode=stream_serialize (or if we want to > keep immediate here too, that will also be fine). > > Thoughts? > And I am assuming it is possible to change the GUC name before the > coming release. If not, please let me know, we can brainstorm other > ideas. > > thanks > Shveta thanks Shveta