RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-19 Thread kuroda.hay...@fujitsu.com
Dear Hou, Thanks for updating the patch! Followings are my comments. === 01. applyparallelworker.c - SIZE_STATS_MESSAGE ``` /* * There are three fields in each message received by the parallel apply * worker: start_lsn, end_lsn and send_time. Because we have updated these * statistics in the

RE: TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File: "reorderbuffer.c", Line: 927, PID: 568639)

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Sawada-san, Amit, > IIUC Change-2 is required in v16 and HEAD but not mandatory in v15 and > v14. The reason why we need Change-2 is that there is a case where we > mark only subtransactions as containing catalog change while not doing > that for its top-level transaction. In v15 and v14,

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Osumi-san, > I mainly followed the steps there and > replaced the command "SELECT" for the remote table at 6-9 with "INSERT" > command. > Then, after waiting for few seconds, the "COMMIT" succeeded like below output, > even after the server stop of the worker side. > Additionally, the last

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, > Might be on slight different direction, but it looks to me a bit too > much to use WaitEventSet to check only if a socket is live or not. > > A quick search in the tree told me that we could use pqSocketCheck() > instead, and I think it would be the something that "could

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thanks for giving suggestions! > Still, the reestablish mechanism can be further simplified with > WL_SOCKET_CLOSED event such as the following (where we should probably > rename pgfdw_connection_check_internal): Sounds reasonable. I think it may be included in this patch. I will

RE: Add mssing test to test_decoding/meson.build

2022-10-13 Thread kuroda.hay...@fujitsu.com
Dear Michael, > Thanks, applied. This was an oversight of 7f13ac8, and the CI accepts > the test. I confirmed your commit. Great thanks! Best Regards, Hayato Kuroda FUJITSU LIMITED

Add mssing test to test_decoding/meson.build

2022-10-12 Thread kuroda.hay...@fujitsu.com
Hi hackers, I found that the test catalog_change_snapshot was missed in test_decoding/meson.build file. PSA the patch to fix it. Best Regards, Hayato Kuroda FUJITSU LIMITED 0001-add-missing-test-to-test_decoding-meson.build.patch Description:

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-12 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thanks for updating the patch! I think your saying seems reasonable. I have no comments anymore now. Thanks for updating so quickly. Best Regards, Hayato Kuroda FUJITSU LIMITED

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-10-12 Thread kuroda.hay...@fujitsu.com
Dear Sawada-san, > FYI, as I just replied to the related thread[1], the assertion failure > in REL14 and REL15 can be fixed by the patch proposed there. So I'd > like to see how the discussion goes. Regardless of this proposed fix, > the patch proposed by Kuroda-san is required for HEAD, REL14,

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-10-11 Thread kuroda.hay...@fujitsu.com
Dear Sawada-san, Thank you for reviewing HEAD patch! PSA v3 patch. > +# Test that we can force the top transaction to do timetravel when one of sub > +# transactions needs that. This is necessary when we restart decoding > from RUNNING_XACT > +# without the wal to associate subtransaction to its

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-11 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thank you for updating the patch! > It is not about CREATE INDEX being async. It is about pg_stat_all_indexes > being async. If we do not wait, the tests become flaky, because sometimes > the update has not been reflected in the view immediately. Make sense, I forgot how stats

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-10 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thanks for updating the patch! I checked yours and almost good. Followings are just cosmetic comments. === 01. relation.c - GetCheapestReplicaIdentityFullPath ``` * The reason that the planner would not pick partial indexes and indexes * with only expressions

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-06 Thread kuroda.hay...@fujitsu.com
Dear Hou, > Thanks for the suggestion. > > I tried to add a WaitLatch, but it seems affect the performance > because the Latch might not be set when leader send some > message to parallel apply worker which means it will wait until > timeout. Yes, currently it leader does not notify anything.

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-06 Thread kuroda.hay...@fujitsu.com
Dear Hou, I put comments for v35-0001. 01. catalog.sgml ``` + Controls how to handle the streaming of in-progress transactions: + f = disallow streaming of in-progress transactions, + t = spill the changes of in-progress transactions to + disk and apply at once after the

RE: Perform streaming logical transactions by background workers and parallel apply

2022-10-06 Thread kuroda.hay...@fujitsu.com
Dear Amit, > Can't we use WaitLatch in the case of SHM_MQ_WOULD_BLOCK as we are > using it for the same case at some other place in the code? We can use > the same nap time as we are using in the leader apply worker. I'm not sure whether such a short nap time is needed or not. Because unlike

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-10-05 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thank you for updating the patch! At first I replied to your comments. > My thinking on those functions is that they should probably stay > in src/backend/replication/logical/relation.c. My main motivation is that > those functions are so much tailored to the purposes of this file

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder, > As far as I can see this patch is mostly useful for detecting the failures > on the initial remote command. This is especially common when the remote > server does a failover/switchover and postgres_fdw uses a cached connection > to access to the remote server. Sounds reasonable.

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thank you for being interest to my patch! Your suggestions will be included to newer version. > In other words, what is the trade-off for calling > pgfdw_connection_check_internal() inside GetConnection() when we are about > to use a "cached" connection? I think that might simplify

RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Andres, > This seems to reliably fail on windows. See Thanks for reporting. Actually this feature cannot be used on Windows machine. To check the status of each socket that connects to the foreign server, the socket event WL_SOCKET_CLOSED is used. The event is only enabled on some OSes, and

RE: Question: test "aggregates" failed in 32-bit machine

2022-09-29 Thread kuroda.hay...@fujitsu.com
Dear Tom, > Hmm, we're not seeing any such failures in the buildfarm's 32-bit > animals, so there must be some additional condition needed to make > it happen. Can you be more specific? Hmm, I was not sure about additional conditions, sorry. I could reproduce with followings steps: $ git

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-29 Thread kuroda.hay...@fujitsu.com
Dear Hou, Thanks for updating patch. I will review yours soon, but I reply to your comment. > > 04. applyparallelworker.c - LogicalParallelApplyLoop() > > > > ``` > > + shmq_res = shm_mq_receive(mqh, , , false); > > ... > > + if (ConfigReloadPending) > > +

Question: test "aggregates" failed in 32-bit machine

2022-09-29 Thread kuroda.hay...@fujitsu.com
Hi hackers, While running `make check LANC=C` with 32-bit virtual machine, I found that it was failed at "aggregates". PSA the a1b3bca1_regression.diffs. IIUC that part has been added by db0d67db. I checked out the source, tested, and got same result. PSA the db0d67db_regression.diffs I'm not

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-27 Thread kuroda.hay...@fujitsu.com
Dear Önder: Thank you for updating patch! Your documentation seems OK, and I could not find any other places to be added Followings are my comments. 01 relation.c - general Many files are newly included. I was not sure but some codes related with planner may be able to move to

RE: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"

2022-09-27 Thread kuroda.hay...@fujitsu.com
Dear Michael, > Yeah, at least as of the cancel callback psql_cancel_callback() that > handle_sigint() would call on SIGINT as this is set by psql. So it > does not seem right to use a boolean rather than a sig_atomic_t in > this case, as well. PSA fix patch. Note that

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-27 Thread kuroda.hay...@fujitsu.com
Dear Wang, Followings are comments for your patchset. 0001 01. launcher.c - logicalrep_worker_stop_internal() ``` + + Assert(LWLockHeldByMe(LogicalRepWorkerLock)); + ``` I think it should be Assert(LWLockHeldByMeInMode(LogicalRepWorkerLock, LW_SHARED)) because the lock is

RE: [small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"

2022-09-26 Thread kuroda.hay...@fujitsu.com
Dear Michael, > Done this one. I have scanned the code, but did not notice a similar > mistake. I found your commit, thanks! > It is worth noting that we have only one remaining "volatile > bool" in the headers now. Maybe you mentioned about sigint_interrupt_enabled, and it also seems to be

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-26 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thanks for updating patch!... but cfbot says that it cannot be accepted [1]. I thought the header should be included, like miscadmin.h. [1]: https://cirrus-ci.com/task/5909508684775424 Best Regards, Hayato Kuroda FUJITSU LIMITED

[small patch] Change datatype of ParallelMessagePending from "volatile bool" to "volatile sig_atomic_t"

2022-09-26 Thread kuroda.hay...@fujitsu.com
Hi hackers, While reviewing [1], I and Amit noticed that a flag ParallelMessagePending is defined as "volatile bool", but other flags set by signal handlers are defined as "volatile sig_atomic_t". The datatype has been defined in standard C, and it says that variables referred by signal

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread kuroda.hay...@fujitsu.com
Hi Amit, > > I checked other flags that are set by signal handlers, their datatype > > seemed to > be sig_atomic_t. > > Is there any reasons that you use normal bool? It should be changed if not. > > > > It follows the logic similar to ParallelMessagePending. Do you see any > problem with it?

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thanks for updating the patch! Followings are comments for v33-0001. === libpqwalreceiver.c 01. inclusion ``` +#include "catalog/pg_subscription.h" ``` We don't have to include it because the analysis of parameters is done at caller. === launcher.c 02. logicalrep_worker_launch()

RE: [Proposal] Add foreign-server health checks infrastructure

2022-09-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thanks for checking! > These failed to be applied to the master branch cleanly. Could you update > them? PSA rebased patches. I reviewed my myself and they contain changes. E.g., move GUC-related code to option.c. > + this option relies on kernel events exposed by Linux,

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-20 Thread kuroda.hay...@fujitsu.com
> One last thing - do you think there is any need to mention this > behaviour in the pgdocs, or is OK just to be a hidden performance > improvement? FYI - I put my opinion. We have following sentence in the logical-replication.sgml: ``` ... If the table does not have any suitable key, then it

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-19 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thanks for updating the patch! I will check it later. Currently I just reply to your comments. > Hmm, I couldn't realize this comment earlier. So you suggest "slow" here > refers to the additional function call "GetRelationIdentityOrPK"? If so, yes > I'll update that. Yes I meant

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-15 Thread kuroda.hay...@fujitsu.com
Dear Önder, Thank you for proposing good feature. I'm also interested in the patch, So I started to review this. Followings are initial comments. === For execRelation.c 01. RelationFindReplTupleByIndex() ``` /* Start an index scan. */ InitDirtySnapshot(snap); - scan =

RE: logical replication restrictions

2022-09-14 Thread kuroda.hay...@fujitsu.com
Hi, Sorry for noise but I found another bug. When the 032_apply_delay.pl is modified like following, the test will be always failed even if my patch is applied. ``` # Disable subscription. worker should die immediately. -$node_subscriber->safe_psql('postgres', - "ALTER SUBSCRIPTION tap_sub

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-13 Thread kuroda.hay...@fujitsu.com
Hi, > > 01. filename > > The word-ordering of filename seems not good > > because you defined the new worker as "parallel apply worker". > > > > I think in the future we may have more files for apply work (like > applyddl.c for DDL apply work), so it seems okay to name all apply > related files

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-13 Thread kuroda.hay...@fujitsu.com
Dear Hou-san, > I will dig your patch more, but I send partially to keep the activity of the > thread. More minor comments about v28. === About 0002 For 015_stream.pl 14. check_parallel_log ``` +# Check the log that the streamed transaction was completed successfully +# reported by

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-12 Thread kuroda.hay...@fujitsu.com
Dear Hou-san, Thank you for updating the patch! Followings are comments for v28-0001. I will dig your patch more, but I send partially to keep the activity of the thread. === For applyparallelworker.c 01. filename The word-ordering of filename seems not good because you defined the new worker

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
Dear Amit, Thanks for giving comments! > Did you get this new assertion failure after you applied the patch for > the first failure? Because otherwise, how can you reach it with the > same test case? The first failure is occurred only in the HEAD, so I did not applied the first patch to REL14

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
> I was not sure what's the proper way to fix it. > The solution I've thought at first was transporting all invalidations from > sub to top > like ReorderBufferTransferSnapToParent(), > but I do not know its side effect. Moreover, how do we deal with > ReorderBufferChange? > Should we transfer

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-06 Thread kuroda.hay...@fujitsu.com
Dear hackers, > I agreed both that DEBUG2 messages are still useful but we should not > change the condition for output. So I prefer the idea suggested by Amit. PSA newer patch, which contains the fix and test. > > I have not verified but I think we need to backpatch this till 14 > > because

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-02 Thread kuroda.hay...@fujitsu.com
> > I'm basically fine, too. But this is a bug that needs back-patching > > back to 10. > > > > I have not verified but I think we need to backpatch this till 14 > because prior to that in DecodeCommit, we use to set the top-level txn > as having catalog changes based on the if there are

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-01 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, Dilip, Thank you for replying! > > It seems that SnapBuildCommitTxn() is already taking care of adding > > the top transaction to the committed transaction if any subtransaction > > has the catalog changes, it has just missed setting the flag so I > > think just setting the

RE: test_decoding assertion failure for the loss of top-sub transaction relationship

2022-09-01 Thread kuroda.hay...@fujitsu.com
Hi Hackers, > Therefore, this leads to the failure for the assert that can check > the consistency that when one sub transaction modifies the catalog, > its top transaction should be marked so as well. > > I feel we need to remember the relationship between top transaction and sub > transaction

RE: patch: Add missing descriptions for rmgr APIs

2022-08-29 Thread kuroda.hay...@fujitsu.com
> Your observation seems correct to me but you have not updated the > comment for the mask. Is there a reason for the same? Oh, it seems that I attached wrong one. There is no reason. PSA the newer version. Best Regards, Hayato Kuroda FUJITSU LIMITED v2-0001-add-a-missing-comment.patch

patch: Add missing descriptions for rmgr APIs

2022-08-28 Thread kuroda.hay...@fujitsu.com
Hi hackers, While reading codes related with logical decoding, I thought that following comment in rmgrlist.h is not consistent. > /* symbol name, textual name, redo, desc, identify, startup, cleanup */ This comment describes a set of APIs that the resource manager should have, but functions

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-23 Thread kuroda.hay...@fujitsu.com
Dear Wang, Followings are my comments about v23-0003. Currently I do not have any comments about 0002 and 0004. 09. general It seems that logicalrep_rel_mark_parallel_apply() is always called when relations are opened on the subscriber-side, but is it really needed? There checks are required

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-22 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thank you for updating the patch! Followings are comments about v23-0001 and v23-0005. v23-0001 01. logical-replication.sgml + + When the streaming mode is parallel, the finish LSN of + failed transactions may not be logged. In that case, it may be necessary to + change the

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-09 Thread kuroda.hay...@fujitsu.com
Hi Wang, > 6.a > > It seems that the upper line represents the apply background worker, but I > think > last_msg_send_time and last_msg_receipt_time should be null. > Is it like initialization mistake? I checked again about the issue. Attributes worker->last_send_time, worker->last_recv_time,

RE: Perform streaming logical transactions by background workers and parallel apply

2022-08-09 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thanks for updating patch sets! Followings are comments about v20-0001. 1. config.sgml ``` Specifies maximum number of logical replication workers. This includes both apply workers and table synchronization workers. ``` I think you can add a

RE: Perform streaming logical transactions by background workers and parallel apply

2022-07-27 Thread kuroda.hay...@fujitsu.com
Dear Wang, I found further comments about the test code. 11. src/test/regress/sql/subscription.sql ``` -- fail - streaming must be boolean CREATE SUBSCRIPTION regress_testsub CONNECTION 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = false, streaming = foo); ``` The comment

RE: Support load balancing in libpq

2022-07-27 Thread kuroda.hay...@fujitsu.com
Dear Jelte, > With plain Postgres this assumption is probably correct. But the main reason > I'm interested in this patch was because I would like to be able to load > balance across the workers in a Citus cluster. These workers are all > primaries. > Similar usage would likely be possible with

RE: Perform streaming logical transactions by background workers and parallel apply

2022-07-27 Thread kuroda.hay...@fujitsu.com
Dear Wang-san, Hi, I'm also interested in the patch and I started to review this. Followings are comments about 0001. 1. terminology In your patch a new worker "apply background worker" has been introduced, but I thought it might be confused because PostgreSQL has already the worker

RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-15 Thread kuroda.hay...@fujitsu.com
Dear Hou-san, > Thanks for having a look. It was a bit difficult to add a test for this. > Because we currently don't have a user function which can return these > collected ObjectAddresses for ALTER TABLE. And It seems we don't have tests > for > already collected ObjectAddresses as well :( >

RE: Support load balancing in libpq

2022-07-14 Thread kuroda.hay...@fujitsu.com
Dear Jelte, I like your idea. But do we have to sort randomly even if target_session_attr is set to 'primary' or 'read-write'? I think this parameter can be used when all listed servers have same data, and we can assume that one of members is a primary and others are secondary. In this case

RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger

2022-07-14 Thread kuroda.hay...@fujitsu.com
Hi, > > I noticed that we didn't collect the ObjectAddress returned by > > ATExec[Attach|Detach]Partition. I think collecting this information can > > make it > > easier for users to get the partition OID of the attached or detached table > > in > > the event trigger. So how about collecting it

RE: Multi-Master Logical Replication

2022-06-14 Thread kuroda.hay...@fujitsu.com
Dear Takahashi-san, Thanks for giving feedbacks! > > I don't know if it requires the kind of code you are thinking but I > > agree that it is worth considering implementing it as an extension. > > I think the other advantage to implement as an extension is that users could > install the

RE: Multi-Master Logical Replication

2022-06-06 Thread kuroda.hay...@fujitsu.com
Dear hackers, I found another use-case for LRG. It might be helpful for migration. LRG for migration -- LRG may be helpful for machine migration, OS upgrade, or PostgreSQL itself upgrade. Assumes that users want to migrate database to other environment,

RE: Multi-Master Logical Replication

2022-05-18 Thread kuroda.hay...@fujitsu.com
Hi hackers, I created a small PoC. Please see the attached patches. REQUIREMENT Before patching them, patches in [1] must also be applied. DIFFERENCES FROM PREVIOUS DESCRIPTIONS * LRG is now implemented as SQL functions, not as a contrib module. * New tables are added as system catalogs.

RE: Multi-Master Logical Replication

2022-04-28 Thread kuroda.hay...@fujitsu.com
Dear Laurenz, Thank you for your interest in our works! > I am missing a discussion how replication conflicts are handled to > prevent replication from breaking or the databases from drifting apart. Actually we don't have plans for developing the feature that avoids conflict. We think that it

RE: Handle infinite recursion in logical replication setup

2022-04-07 Thread kuroda.hay...@fujitsu.com
Dear Peter, > FYI, here is a test script that is using the current patch (v6) to > demonstrate a way to share table data between different numbers of > nodes (up to 5 of them here). Thanks for sharing your script! It's very helpful for us. While reading your script, however, I had a question

RE: Logical replication timeout problem

2022-03-28 Thread kuroda.hay...@fujitsu.com
Dear Amit, Wang, > > I think maybe we do not need to deal with this use case. > > The maximum number of table columns allowed by PG is 1600 > > (macro MaxHeapAttributeNumber), and after loop through all columns in the > > function logicalrep_write_tuple, the function OutputPluginWrite will be >

RE: Logical replication timeout problem

2022-03-27 Thread kuroda.hay...@fujitsu.com
Dear Wang-san, Thank you for updating! ...but it also cannot be applied to current HEAD because of the commit 923def9a533. Your patch seems to conflict the adding an argument of logicalrep_write_insert(). It allows specifying columns to publish by skipping some columns in

RE: Logical replication timeout problem

2022-03-24 Thread kuroda.hay...@fujitsu.com
Dear Amit, > It seems by mistake you have removed the changes from pgoutput_message > and pgoutput_truncate functions. I have added those back. > Additionally, I made a few other changes: (a) moved the function > UpdateProgress to pgoutput.c as it is not used outside it, (b) change > the new

RE: Handle infinite recursion in logical replication setup

2022-03-14 Thread kuroda.hay...@fujitsu.com
Dear Vignesh, > Thanks for kind explanation. > I read above and your doc in 0002, and I put some comments. I forgot a comment about 0002 doc. 5. create_subscription.sgml - about your example Three possibilities were listed in the doc, but I was not sure about b) case. In the situation Node1

RE: Handle infinite recursion in logical replication setup

2022-03-14 Thread kuroda.hay...@fujitsu.com
Dear Vignesh, Thank you for updating your patch! > Let's consider an existing Multi master logical replication setup > between Node1 and Node2 that is created using the following steps: > a) Node1 - Publication publishing employee table - pub1 > b) Node2 - Subscription subscribing from

RE: Handle infinite recursion in logical replication setup

2022-03-11 Thread kuroda.hay...@fujitsu.com
Hi Vegnesh, While considering about second problem, I was very confusing about it. I'm happy if you answer my question. > To handle this if user has specified only_local option, we could throw > a warning or error out while creating subscription in this case, we > could have a column

RE: Logical replication timeout problem

2022-03-08 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thank you for updating! > > Do we need adding a test for them? I think it can be added to 100_bugs.pl. > > Actually I tried to send PoC, but it does not finish to implement that. > > I'll send if it is done. > I'm not sure if it is worth it. > Because the reproduced test of this bug

RE: Logical replication timeout problem

2022-03-08 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thank you for updating the patch! Good self-reviewing. > And I looked into the function WalSndUpdateProgress. I found function > WalSndUpdateProgress try to record the time of some message(by function > LagTrackerWrite) sent to subscriber, such as in function pgoutput_commit_txn.

RE: Handle infinite recursion in logical replication setup

2022-03-06 Thread kuroda.hay...@fujitsu.com
Dear Vignesh, > I felt changing only_local option might be useful for the user while > modifying the subscription like setting it with a different set of > publications. Changes for this are included in the v2 patch attached > at [1]. +1, thanks. I'll post if I notice something to say. > Shall

RE: Handle infinite recursion in logical replication setup

2022-03-06 Thread kuroda.hay...@fujitsu.com
Dear Peter, > > So, why does the patch use syntax option 1? IMU it might be useful for the following case. Assuming that multi-master configuration with node1, node2. Node1 has a publication pub1 and a subscription sub2, node2 has pub2 and sub1. From that situation, please consider that new

RE: Logical replication timeout problem

2022-03-04 Thread kuroda.hay...@fujitsu.com
Dear Wang, > Some codes were added in ReorderBufferProcessTXN() for treating DDL, My mailer went wrong, so I'll put comments again. Sorry. Some codes were added in ReorderBufferProcessTXN() for treating DDL, but I doubted updating accept_writes is needed. IMU, the parameter is read by

RE: [Proposal] Add foreign-server health checks infrastructure

2022-03-03 Thread kuroda.hay...@fujitsu.com
Hi Hackers, > It's not happy, but I'm not sure about a good solution. I made a timer > reschedule > if connection lost had detected. But if queries in the transaction are quite > short, > catching SIGINT may be fail. Attached uses another way: sets pending flags again if DoingCommandRead is

RE: Logical replication timeout problem

2022-03-03 Thread kuroda.hay...@fujitsu.com
Dear Wang, > Attach the new patch. [suggestion by Kuroda-San] > 1. Fix the typo. > 2. Improve comment style. > 3. Fix missing consideration. > 4. Add comments to clarifies above functions and function calls. Thank you for updating the patch! I confirmed they were fixed. ```

RE: Handle infinite recursion in logical replication setup

2022-03-01 Thread kuroda.hay...@fujitsu.com
Hi Vignesh, > In logical replication, currently Walsender sends the data that is > generated locally and the data that are replicated from other > instances. This results in infinite recursion in circular logical > replication setup. Thank you for good explanation. I understand that this fix can

RE: Logical replication timeout problem

2022-02-28 Thread kuroda.hay...@fujitsu.com
Dear Wang, > Attached a new patch that addresses following improvements I have got so far > as > comments: > 1. Consider other changes that need to be skipped(truncate, DDL and function > calls pg_logical_emit_message). [suggestion by Ajin, Amit] > (Add a new function SendKeepaliveIfNecessary

RE: Logical replication timeout problem

2022-02-24 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thank you for teaching some backgrounds about the patch. > According to our discussion, we need to send keepalive messages to subscriber > when skipping changes. > One approach is that **for each skipped change**, we try to send keepalive > message by calculating whether a timeout

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-23 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thank you for your quick reviewing! I attached new version. I found previous patches have wrong name. Sorry. > The connection check timer is re-scheduled repeatedly even while the backend > is > in idle state or is running a local transaction that doesn't access to any >

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Cfbot is still angry because of missing PGDLLIMPORT, so attached. Best Regards, Hayato Kuroda FUJITSU LIMITED v12_0001_add_checking_infrastracture.patch Description: v12_0001_add_checking_infrastracture.patch v12_0002_add_health_check.patch Description: v12_0002_add_health_check.patch

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, > cfbot is reporting that the 0002 patch fails to be applied cleanly. Could you > update > the patch? > http://cfbot.cputube.org/patch_37_3388.log Thanks for reporting and sorry for inconvenience. I repo was not latest version. Attached can be applied to 52e4f0c Best Regards,

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, Fujii-san, > > I understood here as removing following mechanism from core: > > > > * disable timeout at end of tx. > > * skip if held off or read commands > > I think we're on the same page. Anyway query cancel interrupt is > ignored while rading input. > > > > - If an

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, > Isn't it a very special case where many FDWs use their own user timeouts? > Could > you tell me the assumption that you're thinking, especially how many FDWs are > working? I came up with the case like star schema, which postgres database connects data store. If each dbms are

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-17 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, > I think we just don't need to add the special timeout kind to the > core. postgres_fdw can use USER_TIMEOUT and it would be suffiction to > keep running health checking regardless of transaction state then fire > query cancel if disconnection happens. As I said in the

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-16 Thread kuroda.hay...@fujitsu.com
> I understood here as removing following mechanism from core: > > * disable timeout at end of tx. While reading again and this part might be wrong. Sorry for inconvenience. But anyway some codes should be (re)moved from core, right? Best Regards, Hayato Kuroda FUJITSU LIMITED

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-16 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, Thank you for giving your suggestions. I want to confirm your saying. > FWIW, I'm not sure this feature necessarily requires core support > dedicated to FDWs. The core have USER_TIMEOUT feature already and > FDWs are not necessarily connection based. It seems better if FDWs

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-15 Thread kuroda.hay...@fujitsu.com
Dear Hackers, I found patches we depend have been committed, so rebased. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=50e570a59e7f86bb41f029a66b781fc79b8d50f1 In this version there is a little bit change in part of postgres_fdw. A system checking by

RE: Logical replication timeout problem

2022-02-08 Thread kuroda.hay...@fujitsu.com
Dear Wang, Thank you for making a patch. I applied your patch and confirmed that codes passed regression test. I put a short reviewing: ``` + static int skipped_changes_count = 0; + /* +* Conservatively, at least 150,000 changes can be skipped in 1s. +* +*

RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-02 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thank you for good suggestions. > This logic sounds complicated to me. I'm afraid that FDW developers may a bit > easily misunderstand the logic and make the bug in their FDW. > Isn't it simpler to just disable the timeout in core whenever the transaction > ends > whether

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-31 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thank you for reviewing! I attached the latest version. > When more than one FDWs are used, even if one FDW calls this function to > disable the timeout, its remote-server-check-callback can still be called. Is > this > OK? > Please imagine the case where two FDWs are used and

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-20 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Zhihong, I attached the latest version. The biggest change is that callbacks are no longer un-registered at the end of transactions. FDW developer must enable or disable timeout instead, via new APIs. The timer will be turned on when: * new GUC is >= 0, and * anyone calls

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-20 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thank you for your interest! I'll post new version within several days. > > Yeah, remote-checking timeout will be enable even ifa local transaction is > opened. > > In my understanding, postgres cannot distinguish whether opening > > transactions > > are using only local object

RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-19 Thread kuroda.hay...@fujitsu.com
Dear Zhihong, Thank you for reviewing! And I have to apologize for the late. I'll post new patchset later. > + UnregisterAllCheckingRemoteServersCallback(); > > UnregisterAllCheckingRemoteServersCallback > -> UnregisterAllCheckingRemoteServersCallbacks Will fix. > +

RE: Allow escape in application_name

2021-12-27 Thread kuroda.hay...@fujitsu.com
Dear Sawada-san, > Good idea. But the application_name is actually truncated to 63 > characters (NAMEDATALEN - 1)? If so, we need to do substring(... for > 63) instead. Yeah, the parameter will be truncated as one less than NAMEDATALEN: ``` max_identifier_length (integer) Reports the maximum

RE: Allow escape in application_name

2021-12-26 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, > Attached is the patch that adds the regression test for > postgres_fdw.application_name. Could you review this? > > As Horiguchi-san suggested at [1], I split the test into two, and then > tweaked them > as follows. > > 1. Set application_name option of a server option to

RE: Allow escape in application_name

2021-12-26 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Horiguchi-san, I confirmed that the feature was committed but reverted the test. Now I'm checking buildfarm. But anyway I want to say thank you for your contribution! Best Regards, Hayato Kuroda FUJITSU LIMITED

RE: [PATCH] pg_stat_toast v0.3

2021-12-21 Thread kuroda.hay...@fujitsu.com
Dear Gunnar, > The attached v0.3 includes > * columns "storagemethod" and "compressmethod" added as per Hayato > Kuroda's suggestion I prefer your implementation that referring another system view. > * gathering timing information Here is a minor comment: I'm not sure when we should start to

RE: [PATCH] pg_stat_toast

2021-12-19 Thread kuroda.hay...@fujitsu.com
Dear Gunnar, > postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text); > postgres=# INSERT INTO test SELECT > i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM > generate_series(0,10) x(i); > postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public'; > -[

RE: Allow escape in application_name

2021-12-16 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Sorry I forgot replying your messages. > >>if (strcmp(keywords[i], "application_name") == 0 && > >>values[i] != NULL && *(values[i]) != '\0') > > > > I'm not sure but do we have a case that values[i] becomes NULL > > even if

RE: Allow escape in application_name

2021-12-16 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, > Thanks for the review! At first I pushed 0001 patch. I found your commit. Thanks! > BTW, 0002 patch adds the regression test that checks > pg_stat_activity.application_name. But three months before, we added the > similar > test when introducing postgres_fdw.application_name

RE: Allow escape in application_name

2021-12-15 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Thank you for updating! I read your patches and I have only one comment. > if (strcmp(keywords[i], "application_name") == 0 && > values[i] != NULL && *(values[i]) != '\0') I'm not sure but do we have a case that values[i]

  1   2   3   >