On Wed, Jan 25, 2023 at 11:49:27AM -0500, Tom Lane wrote:
> Right. I fixed some other infelicities and pushed it.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
On Wed, Jan 25, 2023 at 04:12:00PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
> wrote in
>> Here is a first attempt at a patch. I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else
Kyotaro Horiguchi writes:
> At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
> wrote in
>> Here is a first attempt at a patch. I scanned through all the existing
>> uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
>> else that needed adjusting.
> There seems to
At Tue, 24 Jan 2023 10:42:17 -0800, Nathan Bossart
wrote in
> On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> > Either that comment needs to be rewritten or we need to invent some
> > more macros.
>
> Here is a first attempt at a patch. I scanned through all the existing
> uses of
On Tue, Jan 24, 2023 at 01:13:55PM -0500, Tom Lane wrote:
> Either that comment needs to be rewritten or we need to invent some
> more macros.
Here is a first attempt at a patch. I scanned through all the existing
uses of InvalidDsaPointer and DSM_HANDLE_INVALID and didn't notice anything
else
Nathan Bossart writes:
> IMO ideally there should be a DSA_HANDLE_INVALID and DSHASH_HANDLE_INVALID
> for use with dsa_handle and dshash_table_handle, respectively. But your
> patch does seem like an improvement.
Yeah, particularly given that dsa.h says
/*
* The handle for a dsa_area is
On Tue, Jan 24, 2023 at 02:55:07AM +, houzj.f...@fujitsu.com wrote:
> I noticed one minor thing in this commit.
>
> -
> LogicalRepCtx->last_start_dsh = DSM_HANDLE_INVALID;
> -
>
> The code takes the last_start_dsh as dsm_handle, but it seems it is a
> dsa_pointer.
> " typedef dsa_pointer
On Monday, January 23, 2023 3:13 AM Tom Lane wrote:
Hi,
>
> Nathan Bossart writes:
> > On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
> >> I haven't looked in detail but isn't it better to explain somewhere
> >> in the comments that it achieves to rate limit the restart of
On Sun, Jan 22, 2023 at 02:12:54PM -0500, Tom Lane wrote:
> I pushed v17 with some mostly-cosmetic changes, including more comments.
Thanks!
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Nathan Bossart writes:
> On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
>> I haven't looked in detail but isn't it better to explain somewhere in
>> the comments that it achieves to rate limit the restart of workers in
>> case of error and allows them to restart immediately in case
On Tue, Jan 10, 2023 at 10:59:14AM +0530, Amit Kapila wrote:
> I haven't looked in detail but isn't it better to explain somewhere in
> the comments that it achieves to rate limit the restart of workers in
> case of error and allows them to restart immediately in case of
> subscription parameter
On Sat, Jan 7, 2023 at 6:15 AM Nathan Bossart wrote:
>
> On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
>
> > Attached is a rebased 0003, just to keep the cfbot happy.
> > I'm kind of wondering whether 0003 is worth the complexity TBH,
> > but in any case I ran out of time to look at
rebased for cfbot
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..7661c0c86e 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1996,6 +1996,16 @@ postgres
On Fri, Jan 06, 2023 at 05:31:26PM -0500, Tom Lane wrote:
> I've pushed 0001 and 0002, which seem pretty uncontroversial.
Thanks!
> Attached is a rebased 0003, just to keep the cfbot happy.
> I'm kind of wondering whether 0003 is worth the complexity TBH,
> but in any case I ran out of time to
Nathan Bossart writes:
> I found some additional places that should remove the last-start time from
> the hash table. I've added those in v14.
I've pushed 0001 and 0002, which seem pretty uncontroversial.
Attached is a rebased 0003, just to keep the cfbot happy.
I'm kind of wondering whether
I found some additional places that should remove the last-start time from
the hash table. I've added those in v14.
On Fri, Jan 06, 2023 at 10:30:18AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart
> wrote:
>> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila
On Thu, Jan 5, 2023 at 10:49 PM Nathan Bossart wrote:
>
> On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> > On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart
> > wrote:
> >> In v12, I moved the restart for two_phase mode to the end of
> >> process_syncing_tables_for_apply() so that
On Thu, Jan 05, 2023 at 09:29:24AM -0800, Nathan Bossart wrote:
> On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
>> True, if we want we can use dshash for this.
>
> I'll look into this.
Here is an attempt at using dshash. This is quite a bit cleaner since we
don't need garbage
On Thu, Jan 05, 2023 at 10:57:58AM +0530, Amit Kapila wrote:
> True, if we want we can use dshash for this.
I'll look into this.
> The garbage collection
> mechanism used in the patch seems odd to me as that will remove/add
> entries to the hash table even when the corresponding subscription is
On Thu, Jan 05, 2023 at 11:34:37AM +0530, Amit Kapila wrote:
> On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart
> wrote:
>> In v12, I moved the restart for two_phase mode to the end of
>> process_syncing_tables_for_apply() so that we don't need to rely on another
>> iteration of the loop.
>
>
On Thu, Jan 5, 2023 at 10:16 AM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 08:12:37PM -0800, Nathan Bossart wrote:
> > On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
> >> But there doesn't appear to be any guarantee that the result for
> >> AllTablesyncsReady() will change
On Thu, Jan 5, 2023 at 6:19 AM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 10:12:19AM -0800, Nathan Bossart wrote:
> > From the discussion thus far, it sounds like the alternatives are to 1) add
> > a global flag that causes wal_retrieve_retry_interval to be bypassed for
> > all workers or
On Wed, Jan 04, 2023 at 08:12:37PM -0800, Nathan Bossart wrote:
> On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
>> But there doesn't appear to be any guarantee that the result for
>> AllTablesyncsReady() will change between the time it is invoked
>> earlier in the function and at
On Thu, Jan 05, 2023 at 09:09:12AM +0530, Amit Kapila wrote:
> On Wed, Jan 4, 2023 at 11:03 PM Nathan Bossart
> wrote:
>> On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
>> > If so, we probably also need to
>> > ensure that table_states_valid is marked false probably via
>> >
On Wed, Jan 4, 2023 at 11:03 PM Nathan Bossart wrote:
>
> On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
> > I am not sure if I understand the problem you are trying to solve with
> > this part of the patch. Are you worried that after we mark some of the
> > relation's state as
On Wed, Jan 04, 2023 at 10:12:19AM -0800, Nathan Bossart wrote:
> From the discussion thus far, it sounds like the alternatives are to 1) add
> a global flag that causes wal_retrieve_retry_interval to be bypassed for
> all workers or to 2) add a hash map in the launcher and a
> restart_immediately
On Wed, Jan 04, 2023 at 10:57:43AM +0530, Amit Kapila wrote:
> On Tue, Jan 3, 2023 at 11:40 PM Nathan Bossart
> wrote:
>> My approach was to add a variable to LogicalRepWorker that indicated
>> whether a worker needed to be restarted immediately. While this is a
>> little weird because the
On Wed, Jan 04, 2023 at 09:41:47AM +0530, Amit Kapila wrote:
> I am not sure if I understand the problem you are trying to solve with
> this part of the patch. Are you worried that after we mark some of the
> relation's state as READY, all the table syncs are in the READY state
> but we will not
On Tue, Jan 3, 2023 at 11:40 PM Nathan Bossart wrote:
>
> On Tue, Jan 03, 2023 at 11:03:32AM +0530, Amit Kapila wrote:
> > On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart
> > wrote:
> >> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> >> > Maybe we could have workers that are exiting
On Tue, Jan 3, 2023 at 11:51 PM Nathan Bossart wrote:
>
> On Tue, Jan 03, 2023 at 11:43:59AM +0530, Amit Kapila wrote:
> > On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart
> > wrote:
> >> After sleeping on this, I think we can do better. IIUC we can simply check
> >> for AllTablesyncsReady() at
On Tue, Jan 03, 2023 at 11:43:59AM +0530, Amit Kapila wrote:
> On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart
> wrote:
>> After sleeping on this, I think we can do better. IIUC we can simply check
>> for AllTablesyncsReady() at the end of process_syncing_tables_for_apply()
>> and wake up the
On Tue, Jan 03, 2023 at 11:03:32AM +0530, Amit Kapila wrote:
> On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart
> wrote:
>> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
>> > Maybe we could have workers that are exiting for that reason set a
>> > flag saying "please restart me without
On Wed, Dec 7, 2022 at 11:42 PM Nathan Bossart wrote:
>
> On Wed, Dec 07, 2022 at 02:07:11PM +0300, Melih Mutlu wrote:
> > Do we also need to wake up all sync workers too? Even if not, I'm not
> > actually sure whether doing that would harm anything though.
> > Just asking since currently the
On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart wrote:
>
> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> > Maybe we could have workers that are exiting for that reason set a
> > flag saying "please restart me without delay"?
>
> That helps a bit, but there are still delays when
On Sun, Dec 18, 2022 at 03:36:07PM -0800, Nathan Bossart wrote:
> This seems to have somehow broken the archiving tests on Windows, so
> obviously I owe some better analysis here. I didn't see anything obvious
> in the logs, but I will continue to dig.
On Windows, WaitForWALToBecomeAvailable()
On Thu, Dec 15, 2022 at 02:47:21PM -0800, Nathan Bossart wrote:
> I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests
> (similar to what was done in 2710ccd), and I noticed that the recovery
> tests consistently took much longer. Upon further inspection, it looks
> like the same
I tried setting wal_retrieve_retry_interval to 1ms for all TAP tests
(similar to what was done in 2710ccd), and I noticed that the recovery
tests consistently took much longer. Upon further inspection, it looks
like the same (or a very similar) race condition described in e5d494d's
commit message
On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> Maybe we could have workers that are exiting for that reason set a
> flag saying "please restart me without delay"?
That helps a bit, but there are still delays when starting workers for new
subscriptions. I think we'd need to create a
Nathan Bossart writes:
> On Wed, Dec 14, 2022 at 01:23:18PM -0500, Tom Lane wrote:
>> Oh. What in the world is the rationale for that?
> My assumption is that this is meant to avoid starting workers as fast as
> possible if they repeatedly crash.
I can see the point of rate-limiting if the
On Wed, Dec 14, 2022 at 01:23:18PM -0500, Tom Lane wrote:
> Nathan Bossart writes:
>> I'm reasonably certain the launcher is already signaled like you describe.
>> It'll just wait to start new workers if it's been less than
>> wal_retrieve_retry_interval milliseconds since the last time it
Nathan Bossart writes:
> I'm reasonably certain the launcher is already signaled like you describe.
> It'll just wait to start new workers if it's been less than
> wal_retrieve_retry_interval milliseconds since the last time it started
> workers.
Oh. What in the world is the rationale for that?
On Wed, Dec 14, 2022 at 12:42:32PM -0500, Tom Lane wrote:
> Nathan Bossart writes:
>> My first thought is that the latter two uses should be moved to a new
>> parameter, and the apply launcher should store the last start time for each
>> apply worker like the apply workers do for the table-sync
Nathan Bossart writes:
> My first thought is that the latter two uses should be moved to a new
> parameter, and the apply launcher should store the last start time for each
> apply worker like the apply workers do for the table-sync workers. In any
> case, it probably makes sense to lower this
On Tue, Dec 13, 2022 at 04:41:05PM -0800, Nathan Bossart wrote:
> On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote:
>> I certainly don't think that "wake the apply launcher every 1ms"
>> is a sane configuration. Unless I'm missing something basic about
>> its responsibilities, it should
On Tue, Dec 13, 2022 at 07:20:14PM -0500, Tom Lane wrote:
> I certainly don't think that "wake the apply launcher every 1ms"
> is a sane configuration. Unless I'm missing something basic about
> its responsibilities, it should seldom need to wake at all in
> normal operation.
This parameter
Nathan Bossart writes:
> On Tue, Dec 13, 2022 at 06:32:08PM -0500, Tom Lane wrote:
>> I've not chased it further than that, but I venture that the apply
>> launcher also needs a kick in the pants, and/or there needs to be
>> an interlock to ensure that it doesn't wake until after the old
>> apply
On Tue, Dec 13, 2022 at 06:32:08PM -0500, Tom Lane wrote:
> Before, there was up to 1 second (with multiple "SELECT count(1) = 0"
> probes from the test script) between the ALTER SUBSCRIPTION command
> and the "apply worker will restart" log entry. That wait is pretty
> well zapped, but instead
Nathan Bossart writes:
> After sleeping on this, I think we can do better. IIUC we can simply check
> for AllTablesyncsReady() at the end of process_syncing_tables_for_apply()
> and wake up the logical replication workers (which should just consiѕt of
> setting the current process's latch) if we
On Wed, Dec 07, 2022 at 02:07:11PM +0300, Melih Mutlu wrote:
> Do we also need to wake up all sync workers too? Even if not, I'm not
> actually sure whether doing that would harm anything though.
> Just asking since currently the patch wakes up all workers including sync
> workers if any still
Hi,
> Actually, that's not quite right. The sync worker will wake up the apply
> worker to change the state from SYNCDONE to READY. AllTablesyncsReady()
> checks that all tables are READY, so we need to wake up all the workers
> when an apply worker changes the state to READY. Each worker
On Tue, Dec 06, 2022 at 11:25:51AM -0800, Nathan Bossart wrote:
> On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote:
>> - When the state is SYNCDONE and the apply worker has to wake up to change
>> the state to READY.
>>
>> I think we already call logicalrep_worker_wakeup_ptr wherever
Thanks for reviewing!
On Tue, Dec 06, 2022 at 07:44:46PM +0300, Melih Mutlu wrote:
> Is it really necessary to wake logical workers up when renaming other than
> subscription or publication? address.objectId will be a valid subid only
> when renaming a subscription.
Oops, that is a mistake. I
Hi Nathan,
@@ -410,6 +411,12 @@ ExecRenameStmt(RenameStmt *stmt)
> stmt->newname);
> table_close(catalog, RowExclusiveLock);
>
> + /*
> + * Wake up the logical replication workers to handle this
> + * change quickly.
> + */
> + LogicalRepWorkersWakeupAtCommit(address.objectId);
Is it
On Thu, Dec 01, 2022 at 04:21:30PM -0800, Nathan Bossart wrote:
> Okay, here is a new version of the patch. This seems to clear up
> everything that I could find via the tests.
I cleaned up the patch a bit.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From
On Tue, Nov 29, 2022 at 09:04:41PM -0800, Nathan Bossart wrote:
> I don't mind fixing it! There are a couple more I'd like to track down
> before posting another revision.
Okay, here is a new version of the patch. This seems to clear up
everything that I could find via the tests.
Thanks to
On Wed, Nov 30, 2022 at 05:27:40PM +1300, Thomas Munro wrote:
> On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro wrote:
>> On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart
>> wrote:
>> > I spent some more time on the prevent-unnecessary-wakeups patch for
>> > logical/worker.c that I've been alluding
Dear Nathan,
> I spent some more time on the prevent-unnecessary-wakeups patch for
> logical/worker.c that I've been alluding to in this thread, and I found a
> few more places where we depend on the worker periodically waking up. This
> seems to be a common technique, so I'm beginning to wonder
On Wed, Nov 30, 2022 at 5:23 PM Thomas Munro wrote:
> On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart
> wrote:
> > I spent some more time on the prevent-unnecessary-wakeups patch for
> > logical/worker.c that I've been alluding to in this thread, and I found a
> > few more places where we depend
On Wed, Nov 30, 2022 at 5:10 PM Nathan Bossart wrote:
> I spent some more time on the prevent-unnecessary-wakeups patch for
> logical/worker.c that I've been alluding to in this thread, and I found a
> few more places where we depend on the worker periodically waking up. This
> seems to be a
I spent some more time on the prevent-unnecessary-wakeups patch for
logical/worker.c that I've been alluding to in this thread, and I found a
few more places where we depend on the worker periodically waking up. This
seems to be a common technique, so I'm beginning to wonder whether these
changes
On Thu, Nov 24, 2022 at 05:26:27AM +, Hayato Kuroda (Fujitsu) wrote:
> I have no comments for the v3 patch.
Thanks for reviewing! Does anyone else have thoughts on the patch?
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
Dear Nathan,
Thank you for updating the patch!
> In v3, I moved the call to LogicalRepWorkersWakeupAtCommit() to the end of
> the function. This should avoid waking up workers in some cases where it's
> unnecessary (e.g., if ALTER SUBSCRIPTION ERRORs in a subtransaction), but
> there are still
On Tue, Nov 22, 2022 at 04:59:28PM +0530, Amit Kapila wrote:
> On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart
> wrote:
>> While working on avoiding unnecessary wakeups in logical/worker.c (as was
>> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
>> much longer. This
On Tue, Nov 22, 2022 at 07:25:36AM +, houzj.f...@fujitsu.com wrote:
> On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu)
>
>> Thanks for updating! It becomes better. Further comments:
>>
>> 01. AlterSubscription()
>>
>> ```
>> +LogicalRepWorkersWakeupAtCommit(subid);
>> +
>>
On Tue, Nov 22, 2022 at 6:11 AM Nathan Bossart wrote:
>
> While working on avoiding unnecessary wakeups in logical/worker.c (as was
> done for walreceiver.c in 05a7be9), I noticed that the tests began taking
> much longer. This seems to be caused by the reduced frequency of calls to
>
On Tuesday, November 22, 2022 2:49 PM Hayato Kuroda (Fujitsu)
>
> Dear Nathan,
>
> > I think you are correct. I did it this way in v2. I've also moved
> > the bulk of the logic to logical/worker.c.
>
> Thanks for updating! It becomes better. Further comments:
>
> 01. AlterSubscription()
>
On Tuesday, November 22, 2022 1:39 PM Nathan Bossart
wrote:
> On Tue, Nov 22, 2022 at 03:03:52AM +, Hayato Kuroda (Fujitsu) wrote:
> > Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be
> > executed inside the transaction. So if two subscriptions are altered
> > in the same
Dear Nathan,
> I think you are correct. I did it this way in v2. I've also moved the
> bulk of the logic to logical/worker.c.
Thanks for updating! It becomes better. Further comments:
01. AlterSubscription()
```
+ LogicalRepWorkersWakeupAtCommit(subid);
+
```
Currently subids will be
On Tue, Nov 22, 2022 at 03:03:52AM +, Hayato Kuroda (Fujitsu) wrote:
> Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be executed
> inside the transaction. So if two subscriptions are altered in the same
> transaction, only one of them will awake. Is it expected behavior?
>
> I
Hi Nathan,
I have done almost same thing locally for [1], but I thought your code seemed
better.
Just One comment: IIUC the statement "ALTER SUBSCRIPTION" can be executed
inside the transaction. So if two subscriptions are altered in the same
transaction, only one of them will awake. Is it
On Tue, Nov 22, 2022 at 03:16:05PM +1300, Thomas Munro wrote:
> Maybe a comment to explain why a single variable is enough?
This crossed my mind shortly after sending my previous message. Looking
closer, I see that several types of ALTER SUBSCRIPTION do not call
PreventInTransactionBlock(), so a
On Tue, Nov 22, 2022 at 1:41 PM Nathan Bossart wrote:
> On my machine, the attached patch
> improved 'check-world -j8' run time by ~12 seconds (from 3min 8sec to 2min
> 56 sec) and src/test/subscription test time by ~17 seconds (from 139
> seconds to 122 seconds).
Nice!
Maybe a comment to
72 matches
Mail list logo