Re: Restart pg_usleep when interrupted

2024-08-30 Thread Sami Imseih
> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,


> I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
> agree that the leader won't be interrupted by PqMsg_Progress anymore.

Correct.


> 1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
> 5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break 
> anything
> if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would 
> still
> prevent the leader to be waked up too frequently by the parallel workers.

Yes, regardless of any changes that may occur in the future that change the 
behaior
of pg_usleep, preventing a leader from being woken up too frequently is
good to have. The "1 Hz" work is still good to have.


> 2. still discuss the "need" and get a consensus regarding a sleep that could
> ensure the wait duration (in some cases and depending of the reason why the
> process is waked up).

Unless there is objection, I will withdraw the CF [1] entry for this patch next 
week.

This discussion however should be one of the points that CF 5118 must deal with.
That is, 5118 will change the behavior of pg_usleep when dealing with interrupts
previously signaled by SIGUSR1. 


[1] https://commitfest.postgresql.org/49/5161/

Regards, 

Sami







Re: Restart pg_usleep when interrupted

2024-08-21 Thread Bertrand Drouvot
Hi,

On Tue, Aug 20, 2024 at 02:25:19PM -0500, Sami Imseih wrote:
> > As it looks like we have a consensus that reducing the number of interrupts 
> > also 
> > makes sense, I just provided a rebase version of the 1 Hz version (see [0], 
> > that
> > also makes clear in the doc that the new field might show slightly old 
> > values).
> 
> That makes sense. However, I suspect the "1 Hz" work code will no longer
> be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back 
> goes through. Maybe this extra work can be removed if [1] goes through.
> What do you think?

Yeah agree that the "1 Hz" work wouldn't be needed anymore if the CF entry 5118
goes through *and* if vacuum delays keep doing pg_usleep() (as the leader won't
receive SIGUSR1 from the parallel workers while executing nanosleep() anymore).
It could still receive SIGHUP or such but that's outside of the PqMsg_Progress
case though.

> With regards to CF 5118 and what it means to the current discussion, below
> are my thoughts.
> 
> I tested with both CF 5118 [1] and the cost-delay tracking patch. With that 
> in place, 
> pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. 
> This is
> because certain interrupts like Parallel Message and others are not signaled
> by SIGUSR1 any longer, but with latches.
>

Yeah.
 
> From this discussion, there is desire for a sleep function that:
> 1/ Sleeps for the full duration of the requested time
> 2/ Continues to handle important interrupts during the sleep
> 
> While something like CF 5118 will take care of point #1,

I'm not sure, even with CF entry 5118, nanosleep() could be interrupted. But I
agree that the leader won't be interrupted by PqMsg_Progress anymore.

> it will not deal
> with #2. Also, the v11 [4] patch does not do #2 either. So I think
> in the sleep loop, we need a C_F_I call. The same type of loop can
> also be used to call WaitForSingleObject.
> 
> If CF 5118 gets committed, will still need similar loop that calls C_F_I, 
> but the function will need to call WaitLatchUs [5].
> 
> Thoughts?

If CF 5118 gets committed, then I think we would need to move to using 
WaitLatchUs()
to react to urgent request. We'd also need to find a way to ensure that we'd
wait for the full duration of the requested time depending of the reason why we
waked up (well, only if we agree that 1/ is needed and I'm not sure we got a
consensus).

So I think that:

1. we should still implement the "1 Hz" stuff as 1.1/ it could be useful if CF
5118 gets committed and we move to WaitLatchUs() and 2.2/ it won't break 
anything
if CF gets committed and we don't move to WaitLatchUs(). For 1.1 it would still
prevent the leader to be waked up too frequently by the parallel workers.

2. still discuss the "need" and get a consensus regarding a sleep that could
ensure the wait duration (in some cases and depending of the reason why the
process is waked up).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-20 Thread Sami Imseih
> As it looks like we have a consensus that reducing the number of interrupts 
> also 
> makes sense, I just provided a rebase version of the 1 Hz version (see [0], 
> that
> also makes clear in the doc that the new field might show slightly old 
> values).

That makes sense. However, I suspect the "1 Hz" work code will no longer
be needed if CF entry 5118 [1] mentioned by Thomas [2] a few days back 
goes through. Maybe this extra work can be removed if [1] goes through.
What do you think?

With regards to CF 5118 and what it means to the current discussion, below
are my thoughts.

I tested with both CF 5118 [1] and the cost-delay tracking patch. With that in 
place, 
pg_usleep is able to sleep the full requested, as mentioned by Thomas [3]. This 
is
because certain interrupts like Parallel Message and others are not signaled
by SIGUSR1 any longer, but with latches.

>From this discussion, there is desire for a sleep function that:
1/ Sleeps for the full duration of the requested time
2/ Continues to handle important interrupts during the sleep

While something like CF 5118 will take care of point #1, it will not deal
with #2. Also, the v11 [4] patch does not do #2 either. So I think
in the sleep loop, we need a C_F_I call. The same type of loop can
also be used to call WaitForSingleObject.

If CF 5118 gets committed, will still need similar loop that calls C_F_I, 
but the function will need to call WaitLatchUs [5].

Thoughts?

--
Sami 


[1] https://commitfest.postgresql.org/49/5118/
[2] 
https://www.postgresql.org/message-id/CA%2BhUKG%2Bf-nEc_SowDLW1JMUa6Of5sCK-JZ%3Dv-KhL1xgXk83fiw%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/CA%2BhUKGKpo3fM%3DrnfdxHqt%2BjNGh_zUNcL1ob4hMsb%3DjFfKn9Aag%40mail.gmail.com
[4] 
https://www.postgresql.org/message-id/e3297b5e-0b33-4f45-afcd-4b00ba0b3547%40gmail.com
[5] 
https://www.postgresql.org/message-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=lufwpds...@mail.gmail.com








Re: Restart pg_usleep when interrupted

2024-08-20 Thread Bertrand Drouvot
Hi,

On Tue, Aug 13, 2024 at 11:40:27AM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote:
> > Having to add special handling to space out instrumentation
> > directly in vacuum_delay_point seems very odd to me. I don't
> > think vacuum_delay_point should have to worry about this.
> > 
> > Also,
> > 1/ what is an appropriate interval to collect these stats?
> > 2/ What if there are other callers in the future that wish
> > to instrument parallel vacuum workers? they will need to implement
> > similar logic.
> 
> None of this seems intractable to me.  1 Hz seems like an entirely
> reasonable place to start, and it is very easy to change (or to even make
> configurable).  pg_stat_progress_vacuum might show slightly old values in
> this column, but that should be easy enough to explain in the docs if we
> are really concerned about it.  If other callers want to do something
> similar, maybe we should add a more generic implementation in
> backend_progress.c.
> 

As it looks like we have a consensus that reducing the number of interrupts 
also 
makes sense, I just provided a rebase version of the 1 Hz version (see [0], that
also makes clear in the doc that the new field might show slightly old values).

[0]: 
https://www.postgresql.org/message-id/ZsSQnS9OW9EWSOk4%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-19 Thread Bertrand Drouvot
Hi,

On Thu, Aug 15, 2024 at 04:13:29PM -0500, Nathan Bossart wrote:
> On Wed, Aug 14, 2024 at 06:00:06AM +, Bertrand Drouvot wrote:
> > I gave it more thoughts and I don't think we have to choose between the two.
> > The 1 Hz approach reduces the number of interrupts and Sami's patch 
> > provides a
> > way to get "accurate" delay in case of interrupts. I think both have their 
> > own
> > benefit. 
> 
> Is it really that important to delay with that level of accuracy?  In most
> cases, the chances of actually interrupting a given vacuum delay point are
> pretty small.  Even in the extreme scenario you tested with ~350K
> interrupts in a 19 minute vacuum, you only saw a 10-15% difference in total
> time.  I wouldn't say I'm diametrically opposed to this patch, but I do
> think we need to carefully consider whether it's worth the extra code.
>

I'm not 100% sure that it is worth it but on OTOH I'm thinking that could be the
case for someone that cares enough to change the cost delay from say 2ms to 3ms.

I mean, given the granularity we expose in the cost delay, one could expect to
get "accurate" delay. The doc is cautious enough to mention that "such delays 
may
not be measured accurately on older platforms" which makes me think that could
be worth to implement it.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-17 Thread Thomas Munro
On Sun, Aug 18, 2024 at 11:12 AM Thomas Munro  wrote:
> I guess both of these issues go away in practice if
> CF #5118 goes in.

To be more precise, if you just keep doing pg_usleep() the issue goes
away, and likewise for posix_fallocate() it goes away...  But if you
switch to WaitLatchUs() so you can handle latch wakeups in vacuum
delays, which really you should because the latch might be an urgent
request for you to CHECK_FOR_INTERRUPTS(), because another backend is
waiting for all backends to service a ProcSignalBarrier (we need a new
name for that), well now you'll get wakeups, so you're back to square
one if someone is sending them very fast.




Re: Restart pg_usleep when interrupted

2024-08-17 Thread Thomas Munro
On Wed, Aug 14, 2024 at 9:30 AM Nathan Bossart  wrote:
> Another concern is the huge number of PqMsg_Progress messages sent by
> parallel workers with that approach.  In Bertrand's tests, he was seeing
> nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
> second).  That seems a bit extreme to me.  I don't see how anyone could
> possibly need stats about vacuum delays with that level of accuracy.

I suspect CF #5118 would fix lots of cases of ProcSignal() senders
going berserk, because it deletes SendProcSignal(), and introduces
SendInterrupt(), which calls SetLatch(), which doesn't send a signal if
the latch is already set.  Even if the latch is not already set, it
only sends a signal if the latch is currently being waited on
("maybe_sleeping" flag).  Even when it sends a signal, it goes to a
signalfd, kqueue or NT event flag on common platforms.

Of course that is only talking about the receiving side.  I'm sure we
can improve the senders too.  There's nothing we can do about NOTIFY,
because that's under user control, but that PqMsg_Progress case sounds
pretty bad, and the recovery conflict system could probably be made
more precise in its logic about who to wake up and when, etc.

Other backends going bananas with SendProcSignal() is the reason
dsm_impl_posix_resize() has to block signals while calling
posix_fallocate().  Unlike nanosleep(), which you can fix by tracking
remaining time, posix_fallocate() is all-or-nothing, it has no way to
report partial progress, so it must therefore undo its work if
interrupted, so its EINTR retry loop could get stuck forever when
other backends are trigger-happy with signals, which was a real
production issue.  I guess both of these issues go away in practice if
CF #5118 goes in.




Re: Restart pg_usleep when interrupted

2024-08-17 Thread Sami Imseih
> time. I wouldn't say I'm diametrically opposed to this patch, but I do
> think we need to carefully consider whether it's worth the extra code.

FWIW, besides the patch that Bertrand is proposing [1], there is another 
parallel 
vacuum case being discussed to allow for parallel heap scan [2].

Being able to support both instrumentation of sleep time by a parallel workers
and ensuring that actual sleep times are as close as possible to the 
requested times is a good think, IMO.

> Separately, I've been wondering whether it's worth allowing the sleep to be
> interrupted in certain cases, such as SIGINT and SIGTERM. That should
> address one of Heikki's points.

An idea may be to check for pending interrupts inside the 
pg_usleep_non_interruptible nanosleep loop. If there is a
pending interrupt and the interrupt is QueryCancelPending or
ClientConnectionLost, we can break out immediately.

I am not sure yet how this can work for Windows, since for
this patch, we are using a simple SleepEx call which is 
non-interruptible anyhow. 

Is it worth the effort and even more code to deal with specific
Interrupts for such short sleeps ( less than 100ms for vacuum at most )?

I am also thinking that pg_usleep_non_interruptuble routine should have
a cap on the sleep time allowed. That cap can be 100ms to match the 
max vacuum_cost_delay. This will prevent anyone from trying to use
this API for much longer sleeps.

What do you think?

[1] 
https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
[2] 
https://www.postgresql.org/message-id/CAD21AoAEfCNv-GgaDheDJ%2Bs-p_Lv1H24AiJeNoPGCmZNSwL1YA%40mail.gmail.com


Regards,

Sami






Re: Restart pg_usleep when interrupted

2024-08-15 Thread Nathan Bossart
On Wed, Aug 14, 2024 at 06:00:06AM +, Bertrand Drouvot wrote:
> I gave it more thoughts and I don't think we have to choose between the two.
> The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
> way to get "accurate" delay in case of interrupts. I think both have their own
> benefit. 

Is it really that important to delay with that level of accuracy?  In most
cases, the chances of actually interrupting a given vacuum delay point are
pretty small.  Even in the extreme scenario you tested with ~350K
interrupts in a 19 minute vacuum, you only saw a 10-15% difference in total
time.  I wouldn't say I'm diametrically opposed to this patch, but I do
think we need to carefully consider whether it's worth the extra code.

Separately, I've been wondering whether it's worth allowing the sleep to be
interrupted in certain cases, such as SIGINT and SIGTERM.  That should
address one of Heikki's points.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Bertrand Drouvot
Hi,

On Tue, Aug 13, 2024 at 04:30:46PM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
> >> None of this seems intractable to me.  1 Hz seems like an entirely
> >> reasonable place to start, and it is very easy to change (or to even make
> >> configurable).  pg_stat_progress_vacuum might show slightly old values in
> >> this column, but that should be easy enough to explain in the docs if we
> >> are really concerned about it.  If other callers want to do something
> >> similar, maybe we should add a more generic implementation in
> >> backend_progress.c.
> >> 
> > I don't know if I agree. Making vacuum sleep more robust to handle
> > interrupts seems like a cleaner general solution than to add
> > even more code to handle this case or have to explain the behavior of
> > cost delay instrumentation in docs.
> 
> Another concern is the huge number of PqMsg_Progress messages sent by
> parallel workers with that approach.  In Bertrand's tests, he was seeing
> nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
> second).  That seems a bit extreme to me.  I don't see how anyone could
> possibly need stats about vacuum delays with that level of accuracy.
> 

I gave it more thoughts and I don't think we have to choose between the two.
The 1 Hz approach reduces the number of interrupts and Sami's patch provides a
way to get "accurate" delay in case of interrupts. I think both have their own
benefit. 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Sami Imseih
On Tue, Aug 13, 2024 at 4:30 PM Nathan Bossart 
wrote:

> On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
> >> None of this seems intractable to me.  1 Hz seems like an entirely
> >> reasonable place to start, and it is very easy to change (or to even
> make
> >> configurable).  pg_stat_progress_vacuum might show slightly old values
> in
> >> this column, but that should be easy enough to explain in the docs if we
> >> are really concerned about it.  If other callers want to do something
> >> similar, maybe we should add a more generic implementation in
> >> backend_progress.c.
> >>
> > I don't know if I agree. Making vacuum sleep more robust to handle
> > interrupts seems like a cleaner general solution than to add
> > even more code to handle this case or have to explain the behavior of
> > cost delay instrumentation in docs.
>
> Another concern is the huge number of PqMsg_Progress messages sent by
> parallel workers with that approach.  In Bertrand's tests, he was seeing
> nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
> second).  That seems a bit extreme to me.  I don't see how anyone could
> possibly need stats about vacuum delays with that level of accuracy.
>
> --
> nathan


> Fair point. If there is a clear benefit to spacing out the vacuum sleep
delay instrumentation, that could be taken up in that thread. This will
reduce the interrupts,  but not eliminate them.

There could still be benefit to ensure that vacuum sleeps can deal
with interrupts and sleep the requested time consistently.

Regards,

Sami


Re: Restart pg_usleep when interrupted

2024-08-13 Thread Nathan Bossart
On Tue, Aug 13, 2024 at 01:12:30PM -0500, Imseih (AWS), Sami wrote:
>> None of this seems intractable to me.  1 Hz seems like an entirely
>> reasonable place to start, and it is very easy to change (or to even make
>> configurable).  pg_stat_progress_vacuum might show slightly old values in
>> this column, but that should be easy enough to explain in the docs if we
>> are really concerned about it.  If other callers want to do something
>> similar, maybe we should add a more generic implementation in
>> backend_progress.c.
>> 
> I don't know if I agree. Making vacuum sleep more robust to handle
> interrupts seems like a cleaner general solution than to add
> even more code to handle this case or have to explain the behavior of
> cost delay instrumentation in docs.

Another concern is the huge number of PqMsg_Progress messages sent by
parallel workers with that approach.  In Bertrand's tests, he was seeing
nearly 350K interrupts for a ~19 minute vacuum (~300 interrupts per
second).  That seems a bit extreme to me.  I don't see how anyone could
possibly need stats about vacuum delays with that level of accuracy.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Imseih (AWS), Sami

None of this seems intractable to me.  1 Hz seems like an entirely
reasonable place to start, and it is very easy to change (or to even make
configurable).  pg_stat_progress_vacuum might show slightly old values in
this column, but that should be easy enough to explain in the docs if we
are really concerned about it.  If other callers want to do something
similar, maybe we should add a more generic implementation in
backend_progress.c.


I don't know if I agree. Making vacuum sleep more robust to handle
interrupts seems like a cleaner general solution than to add
even more code to handle this case or have to explain the behavior of
cost delay instrumentation in docs.


Regards,


Sami




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Nathan Bossart
On Tue, Aug 13, 2024 at 11:07:46AM -0500, Imseih (AWS), Sami wrote:
> Having to add special handling to space out instrumentation
> directly in vacuum_delay_point seems very odd to me. I don't
> think vacuum_delay_point should have to worry about this.
> 
> Also,
> 1/ what is an appropriate interval to collect these stats?
> 2/ What if there are other callers in the future that wish
> to instrument parallel vacuum workers? they will need to implement
> similar logic.

None of this seems intractable to me.  1 Hz seems like an entirely
reasonable place to start, and it is very easy to change (or to even make
configurable).  pg_stat_progress_vacuum might show slightly old values in
this column, but that should be easy enough to explain in the docs if we
are really concerned about it.  If other callers want to do something
similar, maybe we should add a more generic implementation in
backend_progress.c.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Imseih (AWS), Sami

Please disregards this point from the last reply:


"""

2/ What if there are other callers in the future that wish
to instrument parallel vacuum workers? they will need to implement
similar logic.

"""

I misspoke about this and this point does not matter since only
vacuum sleep matters for this current discussion.


Regards,

Sami




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Imseih (AWS), Sami



On 8/13/24 10:57 AM, Nathan Bossart wrote:

On Tue, Aug 13, 2024 at 10:47:51AM -0500, Imseih (AWS), Sami wrote:

On 8/13/24 10:09 AM, Nathan Bossart wrote:

On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:

Skimming the last few messages of that thread [0], it looks like Bertrand
is exploring ways to avoid so many interrupts.  I guess the unavoidable
question is whether this work is still worthwhile given that improvement.

The way the instrumentation in [0] dealt with interrupts was too complex,
which is why it seemed better to handle the restart the remainder of the
sleep in the sleep function

Can you elaborate on how it is too complex?


[0] made vacuum_delay_point more complex as it has to
instrument cost_delay at an interval to reduce the number
of interrupts to the leader.

Sure, but looking at the patch [0], it adds maybe an extra 10 lines of code
to limit the reports to 1 Hz.  That doesn't strike me as too complex...

[0] 
https://postgr.es/m/ZnlPZZZJCRu/8fka%40ip-10-97-1-34.eu-west-3.compute.internal

Perhaps "complex" may not be the correct way to describe it.

Having to add special handling to space out instrumentation
directly in vacuum_delay_point seems very odd to me. I don't
think vacuum_delay_point should have to worry about this.

Also,
1/ what is an appropriate interval to collect these stats?
2/ What if there are other callers in the future that wish
to instrument parallel vacuum workers? they will need to implement
similar logic.

Regards,

Sami




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Nathan Bossart
On Tue, Aug 13, 2024 at 10:47:51AM -0500, Imseih (AWS), Sami wrote:
> On 8/13/24 10:09 AM, Nathan Bossart wrote:
>> On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>> > > Skimming the last few messages of that thread [0], it looks like Bertrand
>> > > is exploring ways to avoid so many interrupts.  I guess the unavoidable
>> > > question is whether this work is still worthwhile given that improvement.
>> > The way the instrumentation in [0] dealt with interrupts was too complex,
>> > which is why it seemed better to handle the restart the remainder of the
>> > sleep in the sleep function
>> Can you elaborate on how it is too complex?
>> 
> [0] made vacuum_delay_point more complex as it has to
> instrument cost_delay at an interval to reduce the number
> of interrupts to the leader.

Sure, but looking at the patch [0], it adds maybe an extra 10 lines of code
to limit the reports to 1 Hz.  That doesn't strike me as too complex...

[0] 
https://postgr.es/m/ZnlPZZZJCRu/8fka%40ip-10-97-1-34.eu-west-3.compute.internal

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Imseih (AWS), Sami



On 8/13/24 10:09 AM, Nathan Bossart wrote:

On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:

Skimming the last few messages of that thread [0], it looks like Bertrand
is exploring ways to avoid so many interrupts.  I guess the unavoidable
question is whether this work is still worthwhile given that improvement.

The way the instrumentation in [0] dealt with interrupts was too complex,
which is why it seemed better to handle the restart the remainder of the
sleep in the sleep function

Can you elaborate on how it is too complex?


[0] made vacuum_delay_point more complex as it has to
instrument cost_delay at an interval to reduce the number
of interrupts to the leader.

On the other hand, with allowing the sleep to deal with
interrupts,no additional logic to space out instrumentation
is required.


Regards,

Sami


[0] https://commitfest.postgresql.org/49/5027/




Re: Restart pg_usleep when interrupted

2024-08-13 Thread Nathan Bossart
On Mon, Aug 12, 2024 at 05:35:08PM -0500, Imseih (AWS), Sami wrote:
>> Skimming the last few messages of that thread [0], it looks like Bertrand
>> is exploring ways to avoid so many interrupts.  I guess the unavoidable
>> question is whether this work is still worthwhile given that improvement.
>
> The way the instrumentation in [0] dealt with interrupts was too complex,
> which is why it seemed better to handle the restart the remainder of the
> sleep in the sleep function

Can you elaborate on how it is too complex?

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-12 Thread Bertrand Drouvot
Hi,

On Mon, Aug 12, 2024 at 05:04:02PM -0500, Nathan Bossart wrote:
> On Tue, Aug 13, 2024 at 12:04:28AM +0300, Heikki Linnakangas wrote:
> >> In the case of the patch being proposed by Bertrand, the number of
> >> interrupts > will be much more frequent as parallel workers would send a
> >> message
> > to the leader
> >> to update the vacuum delay counters every vacuum_delay_point call.
> > 
> > Hmm, I wonder if that's a good design, if it results in a lot of interrupts.
> 
> Skimming the last few messages of that thread [0], it looks like Bertrand
> is exploring ways to avoid so many interrupts.

Yeah, that was mainly to avoid the side effect of the interrupts making the 
vacuum faster as compared to the master branch (as in [0], the leader is not
honouring the delays when the parallel workers report their delayed time): that
could be noticeable depending of the amount of work and the number of parallel
workers involved in the vacuum.

That could be solved thanks to this thread. Once this thread is over (and 
whatever
the outcome is), I'll resume my testing as far the cost delay report is 
concerned.

> I guess the unavoidable
> question is whether this work is still worthwhile given that improvement.

The delay not being honored already affects the vacuum since we allow a parallel
worker to report progress to the leader (see [1]). The interrupts are far less
frequent (as compare to [1]) though.

[0]: https://commitfest.postgresql.org/49/5027/
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f1889729dd

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-12 Thread Imseih (AWS), Sami




Skimming the last few messages of that thread [0], it looks like Bertrand
is exploring ways to avoid so many interrupts.  I guess the unavoidable
question is whether this work is still worthwhile given that improvement.

The way the instrumentation in [0] dealt with interrupts was too complex,
which is why it seemed better to handle the restart the remainder of the
sleep in the sleep function

On the patch itself: Making the sleeps in vacuum uninterruptible means that
vacuum will be more slow to respond to interrupts. If you SIGTERM a vacuum
process, or hit CTRL-C, you *would* want to exit the sleep ASAP.

Since the delay will typically be pretty small (2 milliseconds by default
for autovacuum), I'm assuming this won't ordinarily be noticeable.  But I
do think it is an important consideration.


At most, the sleep will be 100ms for vacuum.



Tom raised that concern earlier in this thread 
(https://www.postgresql.org/message-id/2100439.1719610468%40sss.pgh.pa.us), 
but it seems the discussion wandered off to the details of how to do 
the sleep, and left that unaddressed.


Doing something like pg_sleep, using WaitLatch [1], was explored. 
However this

does not support microsecond sleeps which was allowed in 720de00af49
Thomas shared WaitLatchUs [2], which supports higher precision sleeps, but
it requires epoll_pwait(2) on the system, thus it's not very portable.
Using nanosleep with remain time and checking for drift was the most 
portable

approach.

Regards,

Sami

[0] https://commitfest.postgresql.org/49/5027/
[1] 
https://www.postgresql.org/message-id/67072E39-3B4E-4240-8373-AC45E23721E7%40gmail.com
[2] 
https://www.postgresql.org/message-id/CA+hUKGKVbJE59JkwnUj5XMY+-rzcTFciV9vVC7i=lufwpds...@mail.gmail.com






Re: Restart pg_usleep when interrupted

2024-08-12 Thread Nathan Bossart
On Tue, Aug 13, 2024 at 12:04:28AM +0300, Heikki Linnakangas wrote:
>> In the case of the patch being proposed by Bertrand, the number of
>> interrupts > will be much more frequent as parallel workers would send a
>> message
> to the leader
>> to update the vacuum delay counters every vacuum_delay_point call.
> 
> Hmm, I wonder if that's a good design, if it results in a lot of interrupts.

Skimming the last few messages of that thread [0], it looks like Bertrand
is exploring ways to avoid so many interrupts.  I guess the unavoidable
question is whether this work is still worthwhile given that improvement.

> On the patch itself: Making the sleeps in vacuum uninterruptible means that
> vacuum will be more slow to respond to interrupts. If you SIGTERM a vacuum
> process, or hit CTRL-C, you *would* want to exit the sleep ASAP.

Since the delay will typically be pretty small (2 milliseconds by default
for autovacuum), I'm assuming this won't ordinarily be noticeable.  But I
do think it is an important consideration.

[0] https://commitfest.postgresql.org/49/5027/

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-12 Thread Heikki Linnakangas
I'm trying to understand what the point of this patch is, so I went to 
read this thread from the beginning:



In the proposal by Bertrand [1] to implement vacuum cost delay tracking
in pg_stat_progress_vacuum, it was discovered that the vacuum cost delay
ends early on the leader process of a parallel vacuum due to parallel workers > 
reporting progress on index vacuuming, which was introduced in 17
with commit [2]. With this patch, everytime a parallel worker
completes a vacuum index, it will send a completion message to the leader.


Ok, so we might sometimes skip the sleep, if an interrupt is received. I 
agree that's a bit sloppy, but probably won't make any difference in 
practice.



The facility that allows a parallel worker to report progress to the leader was
introduced in commit [3].

In the case of the patch being proposed by Bertrand, the number of interrupts > will be much more frequent as parallel workers would send a message 

to the leader

to update the vacuum delay counters every vacuum_delay_point call.


Hmm, I wonder if that's a good design, if it results in a lot of interrupts.

On the patch itself: Making the sleeps in vacuum uninterruptible means 
that vacuum will be more slow to respond to interrupts. If you SIGTERM a 
vacuum process, or hit CTRL-C, you *would* want to exit the sleep ASAP.


Tom raised that concern earlier in this thread 
(https://www.postgresql.org/message-id/2100439.1719610468%40sss.pgh.pa.us), 
but it seems the discussion wandered off to the details of how to do the 
sleep, and left that unaddressed.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Restart pg_usleep when interrupted

2024-08-12 Thread Imseih (AWS), Sami

As cfbot points out [0], this is missing a Windows/frontend implementation.

[0] https://cirrus-ci.com/task/6393555868975104


Looks like the pgsleep implementation is missing the
#ifndef WIN32

I followed what is done in pg_usleep.

v11 should now build on Windows, hopefully.

Strangely, v10 build on a Windows machine I have locally.


Regards,

Sami


From 60a6058c1960246b9fd431398bd68a77a0247a9a Mon Sep 17 00:00:00 2001
From: Sami Imseih 
Date: Mon, 12 Aug 2024 15:47:55 -0500
Subject: [PATCH v11 1/1] vaccum_delay with absolute time nanosleep

---
 src/backend/commands/vacuum.c|  2 +-
 src/backend/port/win32/signal.c  | 10 ++
 src/include/port.h   |  1 +
 src/include/portability/instr_time.h | 10 ++
 src/port/pgsleep.c   | 49 
 5 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..4c4698 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2384,7 +2384,7 @@ vacuum_delay_point(void)
msec = vacuum_cost_delay * 4;
 
pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
-   pg_usleep(msec * 1000);
+   pg_usleep_non_interruptible(msec * 1000);
pgstat_report_wait_end();
 
/*
diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c
index 285cb611b4..edcb181215 100644
--- a/src/backend/port/win32/signal.c
+++ b/src/backend/port/win32/signal.c
@@ -73,6 +73,16 @@ pg_usleep(long microsec)
}
 }
 
+/*
+ * pg_usleep_non_interruptible --- delay the specified number of microseconds.
+ *
+ * Unlike pg_usleep, this relies on a non-interruptible sleep.
+ */
+void
+pg_usleep_non_interruptible(long microsec)
+{
+   SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
+}
 
 /* Initialization */
 void
diff --git a/src/include/port.h b/src/include/port.h
index c740005267..c8ff23e5ee 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -162,6 +162,7 @@ extern int  pg_disable_aslr(void);
 
 /* Portable delay handling */
 extern void pg_usleep(long microsec);
+extern void pg_usleep_non_interruptible(long microsec);
 
 /* Portable SQL-like case-independent comparisons and conversions */
 extern int pg_strcasecmp(const char *s1, const char *s2);
diff --git a/src/include/portability/instr_time.h 
b/src/include/portability/instr_time.h
index e66ecf34cd..6e4b0f1b17 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -36,6 +36,10 @@
  *
  * INSTR_TIME_GET_NANOSEC(t)   convert t to int64 (in nanoseconds)
  *
+ * INSTR_TIME_ADD_MICROSEC(x,t)add t (in microseconds) to x
+ *
+ * INSTR_TIME_IS_GREATER(x,y)  is x greater than y?
+ *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
  * only useful on intervals.
@@ -194,4 +198,10 @@ GetTimerFrequency(void)
 #define INSTR_TIME_GET_MICROSEC(t) \
(INSTR_TIME_GET_NANOSEC(t) / NS_PER_US)
 
+#define INSTR_TIME_ADD_MICROSEC(x,t) \
+   ((x).ticks += (t) * NS_PER_US)
+
+#define INSTR_TIME_IS_GREATER(x,y) \
+   ((x).ticks > (y).ticks)
+
 #endif /* INSTR_TIME_H */
diff --git a/src/port/pgsleep.c b/src/port/pgsleep.c
index 1284458bfc..63523a6e41 100644
--- a/src/port/pgsleep.c
+++ b/src/port/pgsleep.c
@@ -14,6 +14,8 @@
 
 #include 
 
+#include "portability/instr_time.h"
+
 /*
  * In a Windows backend, we don't use this implementation, but rather
  * the signal-aware version in src/backend/port/win32/signal.c.
@@ -54,4 +56,51 @@ pg_usleep(long microsec)
}
 }
 
+/*
+ * pg_usleep_non_interruptible --- delay the specified number of microseconds.
+ *
+ * Unlike pg_usleep, this function continues the delay in case of an
+ * interrupt.
+ */
+void
+pg_usleep_non_interruptible(long microsec)
+{
+   /*
+* We allow nanosleep to handle interrupts and retry with the remaining
+* time. However, frequent interruptions and restarts of the nanosleep
+* calls can substantially lead to drift in the time when the sleep
+* finally completes. To deal with this, we break out of the loop 
whenever
+* the current time is past the expected end time of the sleep.
+*/
+   if (microsec > 0)
+   {
+#ifndef WIN32
+   struct timespec delay;
+   struct timespec remain;
+   instr_time  end_time;
+
+   INSTR_TIME_SET_CURRENT(end_time);
+   INSTR_TIME_ADD_MICROSEC(end_time, microsec);
+
+   delay.tv_sec = microsec / 100L;
+   delay.tv_nsec = (microsec % 100L) * 1000;
+
+   while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
+   {
+   instr_time  current_

Re: Restart pg_usleep when interrupted

2024-08-12 Thread Nathan Bossart
On Mon, Aug 12, 2024 at 03:56:18PM +, Bertrand Drouvot wrote:
> Thanks, v10 LGTM.

As cfbot points out [0], this is missing a Windows/frontend implementation.

[0] https://cirrus-ci.com/task/6393555868975104

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-12 Thread Bertrand Drouvot
Hi,

On Mon, Aug 12, 2024 at 10:19:56AM -0500, Sami Imseih wrote:
> v10 attached in the previous message addresses 
> Bertrands last comment and replaces “This” with “this"
> 

Thanks, v10 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-12 Thread Sami Imseih
My email client attached the last response for
some reason :( 

v10 attached in the previous message addresses 
Bertrands last comment and replaces “This” with “this"

Regards,

Sami 



Re: Restart pg_usleep when interrupted

2024-08-12 Thread Sami Imseih
> 
> + * Unlike pg_usleep, This function continues
> 
> s/This/this/?
> 
> Apart from the above, LGTM.
> 



v10-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Thanks for this catch. Uploaded v10.

Regards,

Sami 



Re: Restart pg_usleep when interrupted

2024-08-11 Thread Bertrand Drouvot
Hi,

On Sat, Aug 10, 2024 at 08:27:56AM -0500, Sami Imseih wrote:

> 
> 
> v9 has some has some minor corrections to the comments.
> 

Thanks!

1 ===

+ * Unlike pg_usleep, This function continues

s/This/this/?

Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
mazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-10 Thread Sami Imseih


v9-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


v9 has some has some minor corrections to the comments.


Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-08-09 Thread Sami Imseih


v8-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data

> 
> Yeah, I had the same thought in [1], so +1.
> 
> [1]: 
> https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal
> 

The intention ( see start of the thread ) was to make this a general API,
but I was not sure if there are use cases outside of vacuum.c. 

In v8, I moved the function to pgsleep.c/signals.c and called it 
pg_usleep_non_interruptible.
The function, unlike vacuum_sleep, will take in micros as an argument to match 
with pg_usleep.

Regards

Sami




Re: Restart pg_usleep when interrupted

2024-08-09 Thread Bertrand Drouvot
Hi,

On Fri, Aug 09, 2024 at 02:03:55PM -0500, Nathan Bossart wrote:
> On Thu, Aug 08, 2024 at 03:01:20PM -0500, Nathan Bossart wrote:
> > Thanks.  This one looks pretty good to me, and so I plan to commit it in
> > the near future unless anyone voices concerns about the approach.
> 
> As I am preparing this for commit, I'm wondering whether it makes sense to
> name the new function vacuum_sleep() and keep it private to vacuum.c.
> Nothing about this function is terribly specific to vacuum, and it's not
> inconceivable that it might be useful elsewhere.  Perhaps we should move it
> to pgsleep.c and rename it to something to the effect of
> pg_usleep_non_interruptable().

Yeah, I had the same thought in [1], so +1.

[1]: 
https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-09 Thread Nathan Bossart
On Thu, Aug 08, 2024 at 03:01:20PM -0500, Nathan Bossart wrote:
> Thanks.  This one looks pretty good to me, and so I plan to commit it in
> the near future unless anyone voices concerns about the approach.

As I am preparing this for commit, I'm wondering whether it makes sense to
name the new function vacuum_sleep() and keep it private to vacuum.c.
Nothing about this function is terribly specific to vacuum, and it's not
inconceivable that it might be useful elsewhere.  Perhaps we should move it
to pgsleep.c and rename it to something to the effect of
pg_usleep_non_interruptable().

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-08 Thread Nathan Bossart
On Wed, Aug 07, 2024 at 04:22:22PM -0500, Sami Imseih wrote:
> Please see v7.

Thanks.  This one looks pretty good to me, and so I plan to commit it in
the near future unless anyone voices concerns about the approach.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-08 Thread Nathan Bossart
On Thu, Aug 08, 2024 at 06:49:27AM +, Bertrand Drouvot wrote:
>  2.2 CLOCK_REALTIME: Its time represents seconds and nanoseconds since the 
> Epoch.
> It means that we´re currently about 237 years away from the limit. So even,
> if we were to say add 2 "recents" of them we are still about 183 years away 
> from
> the limit.

I hope to be retired before then.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-07 Thread Bertrand Drouvot
Hi,

On Wed, Aug 07, 2024 at 09:36:59AM -0500, Nathan Bossart wrote:
> Also, do we need to worry about overflow here?  It looks like the rest of
> instr_time.h is oblivious about overflow, so maybe this is better discussed
> in a separate thread...

Yeah, a separate thread would be better.

FWIW and just out of curiosity:

1. it seems to me that most of the time (always?) we are manipulating 
instr_time(s)
as interval(s) which (with int64) gives “space” for about 292 years interval 
time
measurement (if my maths are correct).

2. for "absolute" manipulation (if any) it would depend of the PG_INSTR_CLOCK.

A "man clock_gettime" mentions:

 2.1 CLOCK_MONOTONIC: on Linux, time since the system was booted. Not sure what
the longest Linux uptime record is but can't be more than since the 90's.

 2.2 CLOCK_REALTIME: Its time represents seconds and nanoseconds since the 
Epoch.
It means that we’re currently about 237 years away from the limit. So even,
if we were to say add 2 "recents" of them we are still about 183 years away from
the limit.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-07 Thread Sami Imseih


v7-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


> 
> On Wed, Aug 07, 2024 at 06:00:53AM +, Bertrand Drouvot wrote:
>> +SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
> 
> I think this deserves a comment.
> 

Done

>> +#define INSTR_TIME_ADD_MICROSEC(x,t) \
>> +((x).ticks += t * NS_PER_US)
> 
> I'd add parentheses around "t" to ensure any expressions given as "t" are
> evaluated first.
> 

Done


> Also, do we need to worry about overflow here?  It looks like the rest of
> instr_time.h is oblivious about overflow, so maybe this is better discussed
> in a separate thread...
> 

I agree, this needs to be handled in a different thread.

Please see v7.


Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-08-07 Thread Nathan Bossart
On Wed, Aug 07, 2024 at 06:00:53AM +, Bertrand Drouvot wrote:
> + SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);

I think this deserves a comment.

> +#define INSTR_TIME_ADD_MICROSEC(x,t) \
> + ((x).ticks += t * NS_PER_US)

I'd add parentheses around "t" to ensure any expressions given as "t" are
evaluated first.

Also, do we need to worry about overflow here?  It looks like the rest of
instr_time.h is oblivious about overflow, so maybe this is better discussed
in a separate thread...

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-08-07 Thread Bertrand Drouvot
Hi,

On Wed, Aug 07, 2024 at 09:11:19AM -0500, Sami Imseih wrote:
> 
> 
> > On Aug 7, 2024, at 1:00 AM, Bertrand Drouvot  
> > wrote:
> > 
> > add t (in microseconds) to x”
> 
> 
> I was attempting to be more verbose in the comment,
> but what you have above matches the format of
> the other comments. I am ok with your revision. 
> 

Thanks!

I'm marking the CF entry [1], as RFC.

[1]: https://commitfest.postgresql.org/49/5161/

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-07 Thread Sami Imseih


> On Aug 7, 2024, at 1:00 AM, Bertrand Drouvot  
> wrote:
> 
> add t (in microseconds) to x”


I was attempting to be more verbose in the comment,
but what you have above matches the format of
the other comments. I am ok with your revision. 

Regards.

Sami 



Re: Restart pg_usleep when interrupted

2024-08-06 Thread Bertrand Drouvot
Hi,

On Tue, Aug 06, 2024 at 12:36:44PM -0500, Sami Imseih wrote:

> 
> 
> v5 takes care of the latest comments by Bertrand.
> 

Thanks!

Please find attached a rebase version (due to 39a138fbef) and in passing I 
changed
one comment:

"add t in microseconds to a instr_time" -> "add t (in microseconds) to x"

Does that make sense to you?

That said, it looks good to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
>From 619d4fea1eada6fb65fec673bc9600f8502f9b00 Mon Sep 17 00:00:00 2001
From: Bertrand Drouvot 
Date: Wed, 7 Aug 2024 05:04:10 +
Subject: [PATCH v6] vaccum_delay with absolute time nanosleep

---
 src/backend/commands/vacuum.c| 47 +++-
 src/include/portability/instr_time.h | 10 ++
 2 files changed, 56 insertions(+), 1 deletion(-)
  81.6% src/backend/commands/
  18.3% src/include/portability/

diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 48f8eab202..0ed188a083 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -116,6 +116,51 @@ static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params,
 static double compute_parallel_delay(void);
 static VacOptValue get_vacoptval_from_boolean(DefElem *def);
 static bool vac_tid_reaped(ItemPointer itemptr, void *state);
+static void vacuum_sleep(double msec);
+
+static
+void
+vacuum_sleep(double msec)
+{
+	long		microsec = msec * 1000;
+
+	if (microsec > 0)
+	{
+#ifndef WIN32
+		/*
+		 * We allow nanosleep to handle interrupts and retry with the
+		 * remaining time. However, frequent interruptions and restarts of the
+		 * nanosleep calls can substantially lead to drift in the time when
+		 * the sleep finally completes. To deal with this, we break out of the
+		 * loop whenever the current time is past the expected end time of the
+		 * sleep.
+		 */
+		struct timespec delay;
+		struct timespec remain;
+		instr_time	end_time;
+
+		INSTR_TIME_SET_CURRENT(end_time);
+		INSTR_TIME_ADD_MICROSEC(end_time, microsec);
+
+		delay.tv_sec = microsec / 100L;
+		delay.tv_nsec = (microsec % 100L) * 1000;
+
+		while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
+		{
+			instr_time	current_time;
+
+			INSTR_TIME_SET_CURRENT(current_time);
+
+			if (INSTR_TIME_IS_GREATER(current_time, end_time))
+break;
+
+			delay = remain;
+		}
+#else
+		SleepEx((microsec < 500 ? 1 : (microsec + 500) / 1000), FALSE);
+#endif
+	}
+}
 
 /*
  * GUC check function to ensure GUC value specified is within the allowable
@@ -2384,7 +2429,7 @@ vacuum_delay_point(void)
 			msec = vacuum_cost_delay * 4;
 
 		pgstat_report_wait_start(WAIT_EVENT_VACUUM_DELAY);
-		pg_usleep(msec * 1000);
+		vacuum_sleep(msec);
 		pgstat_report_wait_end();
 
 		/*
diff --git a/src/include/portability/instr_time.h b/src/include/portability/instr_time.h
index e66ecf34cd..754ea77c43 100644
--- a/src/include/portability/instr_time.h
+++ b/src/include/portability/instr_time.h
@@ -36,6 +36,10 @@
  *
  * INSTR_TIME_GET_NANOSEC(t)		convert t to int64 (in nanoseconds)
  *
+ * INSTR_TIME_ADD_MICROSEC(x,t)		add t (in microseconds) to x
+ *
+ * INSTR_TIME_IS_GREATER(x,y)		is x greater than y?
+ *
  * Note that INSTR_TIME_SUBTRACT and INSTR_TIME_ACCUM_DIFF convert
  * absolute times to intervals.  The INSTR_TIME_GET_xxx operations are
  * only useful on intervals.
@@ -194,4 +198,10 @@ GetTimerFrequency(void)
 #define INSTR_TIME_GET_MICROSEC(t) \
 	(INSTR_TIME_GET_NANOSEC(t) / NS_PER_US)
 
+#define INSTR_TIME_ADD_MICROSEC(x,t) \
+	((x).ticks += t * NS_PER_US)
+
+#define INSTR_TIME_IS_GREATER(x,y) \
+	((x).ticks > (y).ticks)
+
 #endif			/* INSTR_TIME_H */
-- 
2.34.1



Re: Restart pg_usleep when interrupted

2024-08-06 Thread Sami Imseih


v5-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


v5 takes care of the latest comments by Bertrand.

Regards,

Sami 

Re: Restart pg_usleep when interrupted

2024-08-06 Thread Bertrand Drouvot
Hi,

On Mon, Aug 05, 2024 at 03:07:34PM -0500, Sami Imseih wrote:

> 
> > yeah, we already have a few macros that access the .ticks, so maybe we 
> > could add
> > 2 new ones, say:
> > 
> > 1. INSTR_TIME_ADD_MS(t1, msec)
> > 2. INSTR_TIME_IS_GREATER(t1, t2)
> > 
> > I think the less operations is done in the while loop the better.
> > 
> 
> See v4. it includes 2 new instr_time.h macros to simplify the 
> code insidethe while loop.

Thanks!

1 ===

+#define INSTR_TIME_IS_GREATER(x,y) \
+   ((bool) (x).ticks > (y).ticks)

Around parentheses are missing, that should be ((bool) ((x).ticks > (y).ticks)).
I did not pay attention to it initially but found it was the culprit of breaks
not occuring (while my test case produces some).

That said, I don't think the cast is necessary here and that we could get rid of
it. 

2 ===

What about providing a quick comment about the 2 new macros in header of 
instr_time.h? (like it is done for the others macros)

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-08-05 Thread Sami Imseih


v4-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data

> yeah, we already have a few macros that access the .ticks, so maybe we could 
> add
> 2 new ones, say:
> 
> 1. INSTR_TIME_ADD_MS(t1, msec)
> 2. INSTR_TIME_IS_GREATER(t1, t2)
> 
> I think the less operations is done in the while loop the better.
> 

See v4. it includes 2 new instr_time.h macros to simplify the 
code insidethe while loop.

Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-07-29 Thread Bertrand Drouvot
Hi,

On Mon, Jul 29, 2024 at 06:15:49PM -0500, Sami Imseih wrote:
> > On Jul 26, 2024, at 3:27 AM, Bertrand Drouvot 
> >  wrote:
> > 3 ===
> > 
> > I gave more thoughts and I think it can be simplified a bit to reduce the
> > number of operations in the while loop.
> > 
> > What about relying on a "absolute" time that way:
> > 
> > instr_time absolute;
> >absolute.ticks = start_time.ticks + msec * 100;
> > 
> > and then in the while loop:
> > 
> >while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
> >{
> >instr_time current_time;
> >INSTR_TIME_SET_CURRENT(current_time);
> > 
> >if (current_time.ticks > absolute.ticks)
> >{
> >break;
> 
> While I agree this code is cleaner, myy hesitation there is we don’t 
> have any other place in which we access .ticks directly and the 
> common practice is to use the intsr_time.h APIs.

yeah, we already have a few macros that access the .ticks, so maybe we could add
2 new ones, say:

1. INSTR_TIME_ADD_MS(t1, msec)
2. INSTR_TIME_IS_GREATER(t1, t2)

I think the less operations is done in the while loop the better.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-29 Thread Sami Imseih



> On Jul 26, 2024, at 3:27 AM, Bertrand Drouvot  
> wrote:
> 
> Hi,
> 
> On Thu, Jul 25, 2024 at 05:27:15PM -0500, Sami Imseih wrote:
>> I am attaching v3 of the patch which addresses the comments made
>> earlier by Bertrand about the comment in the patch [1].
> 
> Thanks!
> 
> Looking at it:
> 
> 1 ===
> 
> +   struct instr_time start_time;
> 
> I think we can get rid of the "struct" keyword here.
> 
> 2 ===
> 
> +   struct instr_time current_time;
> +   struct instr_time elapsed_time;
> 
> Same as above.

Will fix those 2.

> 
> 3 ===
> 
> I gave more thoughts and I think it can be simplified a bit to reduce the
> number of operations in the while loop.
> 
> What about relying on a "absolute" time that way:
> 
>   instr_time absolute;
>absolute.ticks = start_time.ticks + msec * 100;
> 
> and then in the while loop:
> 
>while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
>{
>instr_time current_time;
>INSTR_TIME_SET_CURRENT(current_time);
> 
>if (current_time.ticks > absolute.ticks)
>{
>break;

While I agree this code is cleaner, myy hesitation there is we don’t 
have any other place in which we access .ticks directly and the 
common practice is to use the intsr_time.h APIs.


What do you think?


Regards,

Sami 






Re: Restart pg_usleep when interrupted

2024-07-26 Thread Bertrand Drouvot
Hi,

On Thu, Jul 25, 2024 at 05:27:15PM -0500, Sami Imseih wrote:
> I am attaching v3 of the patch which addresses the comments made
> earlier by Bertrand about the comment in the patch [1].

Thanks!

Looking at it:

1 ===

+   struct instr_time start_time;

I think we can get rid of the "struct" keyword here.

2 ===

+   struct instr_time current_time;
+   struct instr_time elapsed_time;

Same as above.

3 ===

I gave more thoughts and I think it can be simplified a bit to reduce the
number of operations in the while loop.

What about relying on a "absolute" time that way:

instr_time absolute;
absolute.ticks = start_time.ticks + msec * 100;

and then in the while loop:

while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
{
instr_time current_time;
INSTR_TIME_SET_CURRENT(current_time);

if (current_time.ticks > absolute.ticks)
{
break;

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-25 Thread Sami Imseih


v3-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


attaching the patch again. Something is strange with my email client.

Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-07-25 Thread Sami Imseih
I am attaching v3 of the patch which addresses the comments madeearlier by Bertrand about the comment in the patch [1]. Also I will stick withvacuum_sleep as the name as the function will be inside vacuum.c. I am notsure we should make this function available outside of vacuum, but I would liketo hear other thoughts.Also, earlier in the thread, Alvaro mentions what happensif the sleep time is 0 [2]. In that case, we do not do anything as we checkif sleep time is > 0 microseconds before proceeding with the sleep[1] https://www.postgresql.org/message-id/ZpDhS4nFX66ItAze%40ip-10-97-1-34.eu-west-3.compute.internal[2] https://www.postgresql.org/message-id/202407120939.pr6wpjffmxov%40alvherre.pgsqlRegards,Sami

v3-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-15 Thread Bertrand Drouvot
Hi,

On Mon, Jul 15, 2024 at 12:20:29PM -0500, Nathan Bossart wrote:
> On Fri, Jul 12, 2024 at 03:39:57PM -0500, Sami Imseih wrote:
> > but Bertrand found long drifts [2[ which I could not reproduce.
> > To safeguard the long drifts, continue to use the &remain time with an 
> > additional safeguard to make sure the actual sleep does not exceed the 
> > requested sleep time.
> 
> Bertrand, does this safeguard fix the long drifts you saw?

Yeah, it was the case with the first version using the safeguard (see [1]) and
it's also the case with the last one shared in [2].

[1]: 
https://www.postgresql.org/message-id/Zo0UdeE3i9d0Wt5E%40ip-10-97-1-34.eu-west-3.compute.internal
[2]: 
https://www.postgresql.org/message-id/C18017A1-EDFD-4F2F-BCA1-0572D4CCC92B%40gmail.com
 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-15 Thread Nathan Bossart
On Fri, Jul 12, 2024 at 03:39:57PM -0500, Sami Imseih wrote:
> but Bertrand found long drifts [2[ which I could not reproduce.
> To safeguard the long drifts, continue to use the &remain time with an 
> additional safeguard to make sure the actual sleep does not exceed the 
> requested sleep time.

Bertrand, does this safeguard fix the long drifts you saw?

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-07-12 Thread Sami Imseih

> What does your testing show when you don't have
> the extra check, i.e.,
> 
>   struct timespec delay;
>   struct timespec remain;
> 
>   delay.tv_sec = microsec / 100L;
>   delay.tv_nsec = (microsec % 100L) * 1000;
> 
>   while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
>   delay = remain;
> 


This is similar to the first attempt [1], 

+pg_usleep_handle_interrupt(long microsec, bool force)
 {
if (microsec > 0)
{
 #ifndef WIN32
struct timespec delay;
+   struct timespec remaining;
 
delay.tv_sec = microsec / 100L;
delay.tv_nsec = (microsec % 100L) * 1000;
-   (void) nanosleep(&delay, NULL);
+
+   if (force)
+   while (nanosleep(&delay, &remaining) == -1 && errno == EINTR)
+   delay = remaining;


but Bertrand found long drifts [2[ which I could not reproduce.
To safeguard the long drifts, continue to use the &remain time with an 
additional safeguard to make sure the actual sleep does not exceed the 
requested sleep time.

[1] 
https://www.postgresql.org/message-id/7D50DC5B-80C6-47B5-8DA8-A6C68A115EE5%40gmail.com
[2] 
https://www.postgresql.org/message-id/ZoPC5IeP4k7sZpki%40ip-10-97-1-34.eu-west-3.compute.internal


Regards,

Sami 



Re: Restart pg_usleep when interrupted

2024-07-12 Thread Nathan Bossart
On Fri, Jul 12, 2024 at 12:14:56PM -0500, Sami Imseih wrote:
> 1/ TimestampDifference has a dependency on gettimeofday, 
> while my proposal utilizes clock_gettime. There are old discussions
> that did not reach a conclusion comparing both mechanisms. 
> My main conclusion from these hacker discussions [1], [2] and other 
> online discussions on the topic is clock_gettime should replace
> getimeofday when possible. Precision is the main reason.
> 
> 2/ It no longer uses the remain time. I think the remain time
> is still required here. I did a unrealistic stress test which shows 
> the original proposal can handle frequent interruptions much better.

My comment was mostly about coding style and not about gettimeofday()
versus clock_gettime().  What does your testing show when you don't have
the extra check, i.e.,

struct timespec delay;
struct timespec remain;

delay.tv_sec = microsec / 100L;
delay.tv_nsec = (microsec % 100L) * 1000;

while (nanosleep(&delay, &remain) == -1 && errno == EINTR)
delay = remain;

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-07-12 Thread Sami Imseih
> 
> I'm imagining something like this:
> 
>struct timespec delay;
>TimestampTz end_time;
> 
>end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
> 
>do
>{
>longsecs;
>int microsecs;
> 
>TimestampDifference(GetCurrentTimestamp(), end_time,
>&secs, µsecs);
> 
>delay.tv_sec = secs;
>delay.tv_nsec = microsecs * 1000;
> 
>} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);
> 

I do agree that this is cleaner code, but I am not sure I like this.


1/ TimestampDifference has a dependency on gettimeofday, 
while my proposal utilizes clock_gettime. There are old discussions
that did not reach a conclusion comparing both mechanisms. 
My main conclusion from these hacker discussions [1], [2] and other 
online discussions on the topic is clock_gettime should replace
getimeofday when possible. Precision is the main reason.

2/ It no longer uses the remain time. I think the remain time
is still required here. I did a unrealistic stress test which shows 
the original proposal can handle frequent interruptions much better.

#1 in one session kicked off a vacuum

set vacuum_cost_delay = 10;
set vacuum_cost_limit = 1;
set client_min_messages = log;
update large_tbl set version = 1;
vacuum (verbose, parallel 4) large_tbl;

#2 in another session, ran a loop to continually
interrupt the vacuum leader. This was during the
“heap scan” phase of the vacuum.

PID=< pid of vacuum leader >
while :
do
kill -USR1 $PID
done


Using the proposed loop with the remainder, I noticed that
the actual time reported remains close to the requested
delay time.

LOG:  10.00,10.013420
LOG:  10.00,10.011188
LOG:  10.00,10.010860
LOG:  10.00,10.014839
LOG:  10.00,10.004542
LOG:  10.00,10.006035
LOG:  10.00,10.012230
LOG:  10.00,10.014535
LOG:  10.00,10.009645
LOG:  10.00,10.000817
LOG:  10.00,10.002162
LOG:  10.00,10.011721
LOG:  10.00,10.011655

Using the approach mentioned by Nathan, there
are large differences between requested and actual time.

LOG:  10.00,17.801778
LOG:  10.00,12.795450
LOG:  10.00,11.793723
LOG:  10.00,11.796317
LOG:  10.00,13.785993
LOG:  10.00,11.803775
LOG:  10.00,15.782767
LOG:  10.00,31.783901
LOG:  10.00,19.792440
LOG:  10.00,21.795795
LOG:  10.00,18.800412
LOG:  10.00,16.782886
LOG:  10.00,10.795197
LOG:  10.00,14.79
LOG:  10.00,29.806556
LOG:  10.00,18.810784
LOG:  10.00,11.804956
LOG:  10.00,24.809812
LOG:  10.00,25.815600
LOG:  10.00,22.809493
LOG:  10.00,22.790908
LOG:  10.00,19.699097
LOG:  10.00,23.795613
LOG:  10.00,24.797078

Let me know what you think?

[1] https://www.postgresql.org/message-id/flat/31856.1400021891%40sss.pgh.pa.us
[2] 
https://www.postgresql.org/message-id/flat/E1cO7fR-0003y0-9E%40gemulon.postgresql.org



Regards,

Sami 



Re: Restart pg_usleep when interrupted

2024-07-12 Thread Alvaro Herrera
On 2024-Jul-11, Nathan Bossart wrote:

> I'm imagining something like this:
> 
> struct timespec delay;
> TimestampTz end_time;
> 
> end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);
> 
> do
> {
> longsecs;
> int microsecs;
> 
> TimestampDifference(GetCurrentTimestamp(), end_time,
> &secs, µsecs);
> 
> delay.tv_sec = secs;
> delay.tv_nsec = microsecs * 1000;
> 
> } while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

This looks nicer.  We could deal with clock drift easily (in case the
sysadmin winds the clock back) by testing that tv_sec+tv_nsec is not
higher than the initial time to sleep.  I don't know how common this
situation is nowadays, but I remember debugging a system years ago where
autovacuum was sleeping for a very long time because of that.  I can't
remember now if we did anything in the code to cope, or just told
sysadmins not to do that anymore :-)

FWIW my (Linux's) nanosleep() manpage contains this note:

   If  the interval specified in req is not an exact multiple of the granu‐
   larity underlying clock (see time(7)), then the interval will be rounded
   up  to the next multiple.  Furthermore, after the sleep completes, there
   may still be a delay before the CPU becomes free to once  again  execute
   the calling thread.

It's not clear to me what happens if the time to sleep is zero, so maybe
there should be a "if tv_sec == 0 && tv_nsec == 0 then break" statement
at the bottom of the loop, to quit without sleeping one more time than
needed.

For Windows, this [1] looks like an interesting and possibly relevant
read (though maybe SleepEx already does what we want to do here.)

[1] 
https://randomascii.wordpress.com/2020/10/04/windows-timer-resolution-the-great-rule-change/

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Having your biases confirmed independently is how scientific progress is
made, and hence made our great society what it is today" (Mary Gardiner)




Re: Restart pg_usleep when interrupted

2024-07-12 Thread Bertrand Drouvot
Hi,

On Thu, Jul 11, 2024 at 10:15:41AM -0500, Sami Imseih wrote:
> 
> > I did a few tests with the patch and did not see any "large" drifts like the
> > ones observed above.
> 
> Thanks for testing.
> 
> > I think it could be "simplified" by making use of instr_time instead of 
> > timespec
> > for current and absolute. Then I think it would be enough to compare their
> > ticks.
> 
> Correct I attached a v2 of this patch that uses instr_time to check the 
> elapsed
> time and break out of the loop. It needs some more benchmarking.

Thanks!

Outside of Nathan's comment:

1 ===

+* However, since nanosleep is susceptible to time drift when 
interrupted
+* frequently, we add a safeguard to break out of the nanosleep 
whenever the

I'm not sure that "nanosleep is susceptible to time drift when interrupted 
frequently"
is a correct wording.

What about?

"
However, since the time between frequent interruptions and restarts of the
nanosleep calls can substantially lead to drift in the time when the sleep
finally completes, we add"

2 ===

+static void vacuum_sleep(double msec);

What about a more generic name that could be used outside of vacuum?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-11 Thread Nathan Bossart
On Thu, Jul 11, 2024 at 01:10:25PM -0500, Sami Imseih wrote:
>> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
>> the bottom of the while loop to avoid needing this extra check.  
> 
> Can you elaborate further? I am not sure how this will work since delay is a 
> timespec 
> and elapsed time is an instr_time. 
> 
> Also, in every iteration of the loop, the delay must be set to the remaining 
> time. The
> purpose of the elapsed_time is to make sure that we don´t surpass requested 
> time
> delay as an additional safeguard.

I'm imagining something like this:

struct timespec delay;
TimestampTz end_time;

end_time = TimestampTzPlusMilliseconds(GetCurrentTimestamp(), msec);

do
{
longsecs;
int microsecs;

TimestampDifference(GetCurrentTimestamp(), end_time,
&secs, µsecs);

delay.tv_sec = secs;
delay.tv_nsec = microsecs * 1000;

} while (nanosleep(&delay, NULL) == -1 && errno == EINTR);

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih


> 
> I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
> the bottom of the while loop to avoid needing this extra check.  

Can you elaborate further? I am not sure how this will work since delay is a 
timespec 
and elapsed time is an instr_time. 

Also, in every iteration of the loop, the delay must be set to the remaining 
time. The
purpose of the elapsed_time is to make sure that we don’t surpass requested time
delay as an additional safeguard.

> Also, I
> think we need some commentary about why we want to retry after an interrupt
> in this case.

I will elaborate further in the comments for the next revision.


Regards,

Sami 





Re: Restart pg_usleep when interrupted

2024-07-11 Thread Nathan Bossart
+   /*
+* We allow nanosleep to handle interrupts and retry with the 
remaining time.
+* However, since nanosleep is susceptible to time drift when 
interrupted
+* frequently, we add a safeguard to break out of the nanosleep 
whenever the
+* total time of the sleep exceeds the requested sleep time. 
Using nanosleep
+* is a more portable approach than clock_nanosleep.
+*/

I'm curious why we wouldn't just subtract "elapsed_time" from "delay" at
the bottom of the while loop to avoid needing this extra check.  Also, I
think we need some commentary about why we want to retry after an interrupt
in this case.

-- 
nathan




Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih

> 
> Correct I attached a v2 of this patch that uses instr_time to check the 
> elapsed
> time and break out of the loop. It needs some more benchmarking.

My email client has an issue sending attachments it seems. Reattaching 

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Regards,

Sami

Re: Restart pg_usleep when interrupted

2024-07-11 Thread Sami Imseih
I did a few tests with the patch and did not see any "large" drifts like theones observed above.Thanks for testing.I think it could be "simplified" by making use of instr_time instead of timespecfor current and absolute. Then I think it would be enough to compare theirticks.Correct I attached a v2 of this patch that uses instr_time to check the elapsedtime and break out of the loop. It needs some more benchmarking.Since sub-millisecond sleep times are not guaranteed as suggested bythe vacuum_cost_delay docs ( see below ), an alternative ideais to use clock_nanosleep for vacuum delay when it’s available, elsefallback to WaitLatch.Wouldn't that increase even more the cases where sub-millisecond won't beguaranteed?Yes, nanosleep is going to provide the most coverage as it’s widely available.Regards,Sami

v2-0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-09 Thread Bertrand Drouvot
Hi,

On Fri, Jul 05, 2024 at 11:49:45AM -0500, Sami Imseih wrote:
> 
> > With 50 indexes and 10 parallel workers I can see things like:
> > 
> > 2024-07-02 08:22:23.789 UTC [2189616] LOG:  expected 1.00, actual 
> > 239.378368
> > 2024-07-02 08:22:24.575 UTC [2189616] LOG:  expected 0.10, actual 
> > 224.331737
> > 2024-07-02 08:22:25.363 UTC [2189616] LOG:  expected 1.30, actual 
> > 230.462793
> > 2024-07-02 08:22:26.154 UTC [2189616] LOG:  expected 1.00, actual 
> > 225.980803
> > 
> > Means we waited more than the max allowed cost delay (100ms).
> > 
> > With 49 parallel workers, it's worst as I can see things like:
> > 
> > 2024-07-02 08:26:36.069 UTC [2189807] LOG:  expected 1.00, actual 
> > 1106.790136
> > 2024-07-02 08:26:36.298 UTC [2189807] LOG:  expected 1.00, actual 
> > 218.148985
> > 
> > The first actual wait time is about 1 second (it has been interrupted about
> > 16300 times during this second).
> > 
> > To avoid this drift, the nanosleep() man page suggests to use 
> > clock_nanosleep()
> > with an absolute time value, that might be another idea to explore.
> > 
> > [1]: 
> > https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
> > 
> 
> 
> A more portable approach which could be to continue using nanosleep and
> add checks to ensure that nanosleep exists whenever
> it goes past an absolute time. This was suggested by Bertrand in an offline
> conversation. I am not yet fully convinced of this idea, but posting the patch
> that implements this idea for anyone interested in looking.

Thanks! 

I did a few tests with the patch and did not see any "large" drifts like the
ones observed above.

As far the patch, not thoroughly review (as it's still one option among others
being discussed)):

+   struct timespec current;
+   float time_diff;
+
+   clock_gettime(PG_INSTR_CLOCK, ¤t);
+
+   time_diff = (absolute.tv_sec - current.tv_sec) + (absolute.tv_nsec 
- current.tv_nsec) / 10.0;

I think it could be "simplified" by making use of instr_time instead of timespec
for current and absolute. Then I think it would be enough to compare their
ticks.

> Since sub-millisecond sleep times are not guaranteed as suggested by
> the vacuum_cost_delay docs ( see below ), an alternative idea
> is to use clock_nanosleep for vacuum delay when it’s available, else
> fallback to WaitLatch.

Wouldn't that increase even more the cases where sub-millisecond won't be
guaranteed?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-05 Thread Sami Imseih
> 
> A more portable approach which could be to continue using nanosleep and
> add checks to ensure that nanosleep exists whenever
> it goes past an absolute time. This was suggested by Bertrand in an offline
> conversation. I am not yet fully convinced of this idea, but posting the patch
> that implements this idea for anyone interested in looking.

oops, forgot to attach the patch. Here it is.

0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-05 Thread Sami Imseih
With 50 indexes and 10 parallel workers I can see things like:2024-07-02 08:22:23.789 UTC [2189616] LOG:  expected 1.00, actual 239.3783682024-07-02 08:22:24.575 UTC [2189616] LOG:  expected 0.10, actual 224.3317372024-07-02 08:22:25.363 UTC [2189616] LOG:  expected 1.30, actual 230.4627932024-07-02 08:22:26.154 UTC [2189616] LOG:  expected 1.00, actual 225.980803Means we waited more than the max allowed cost delay (100ms).With 49 parallel workers, it's worst as I can see things like:2024-07-02 08:26:36.069 UTC [2189807] LOG:  expected 1.00, actual 1106.7901362024-07-02 08:26:36.298 UTC [2189807] LOG:  expected 1.00, actual 218.148985The first actual wait time is about 1 second (it has been interrupted about16300 times during this second).To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()with an absolute time value, that might be another idea to explore.[1]: https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internalI could not reproduce the same time you drift you observed on mymachine, so I am guessing the time drift could be worse on certainplatforms than others.I also looked into the WaitLatchUs patch proposed by Thomas in [1]and since my system does have epoll_pwait(2) available, I could notachieve the sub-millisecond wait times. A more portable approach which could be to continue using nanosleep andadd checks to ensure that nanosleep exists wheneverit goes past an absolute time. This was suggested by Bertrand in an offlineconversation. I am not yet fully convinced of this idea, but posting the patchthat implements this idea for anyone interested in looking.Since sub-millisecond sleep times are not guaranteed as suggested bythe vacuum_cost_delay docs ( see below ), an alternative ideais to use clock_nanosleep for vacuum delay when it’s available, elsefallback to WaitLatch."While vacuum_cost_delay can be set to fractional-millisecond values, such delays may not be measured accurately on older platforms”[1] https://www.postgresql.org/message-id/CA%2BhUKGKVbJE59JkwnUj5XMY%2B-rzcTFciV9vVC7i%3DLUfWPds8Xw%40mail.gmail.comRegards, Sami

0001-vaccum_delay-with-absolute-time-nanosleep.patch
Description: Binary data


Re: Restart pg_usleep when interrupted

2024-07-02 Thread Bertrand Drouvot
Hi,

On Fri, Jun 28, 2024 at 05:50:20PM -0500, Sami Imseih wrote:
> 
> Thanks for the feedback!
> 
> > On Jun 28, 2024, at 4:34 PM, Tom Lane  wrote:
> > 
> > Sami Imseih  writes:
> >> Reattaching the patch.
> > 
> > I feel like this is fundamentally a wrong solution, for the reasons
> > cited in the comment for pg_usleep: long sleeps are a bad idea
> > because of the resulting uncertainty about whether we'll respond to
> > interrupts and such promptly.  An example here is that if we get
> > a query cancel interrupt, we should probably not insist on finishing
> > out the current sleep before responding.
> 
> The case which brought up this discussion is the pg_usleep that 
> is called within the vacuum_delay_point being interrupted. 
> 
> When I read the same code comment you cited, it sounded to me 
> that “long sleeps” are those that are in seconds or minutes. The longest 
> vacuum delay allowed is 100ms.

I think that with the proposed patch the actual wait time can be "long".

Indeed, the time between the interruptions and restarts of the nanosleep() call
will lead to drift (as mentioned in the nanosleep() man page). So, with a large
number of interruptions, the actual wait time could be way longer than the
expected wait time.

To put numbers, I did a few tests with the patch (and with v2 shared in [1]):

cost delay is 1ms and cost limit is 10.

With 50 indexes and 10 parallel workers I can see things like:

2024-07-02 08:22:23.789 UTC [2189616] LOG:  expected 1.00, actual 239.378368
2024-07-02 08:22:24.575 UTC [2189616] LOG:  expected 0.10, actual 224.331737
2024-07-02 08:22:25.363 UTC [2189616] LOG:  expected 1.30, actual 230.462793
2024-07-02 08:22:26.154 UTC [2189616] LOG:  expected 1.00, actual 225.980803

Means we waited more than the max allowed cost delay (100ms).

With 49 parallel workers, it's worst as I can see things like:

2024-07-02 08:26:36.069 UTC [2189807] LOG:  expected 1.00, actual 
1106.790136
2024-07-02 08:26:36.298 UTC [2189807] LOG:  expected 1.00, actual 218.148985

The first actual wait time is about 1 second (it has been interrupted about
16300 times during this second).

To avoid this drift, the nanosleep() man page suggests to use clock_nanosleep()
with an absolute time value, that might be another idea to explore.

[1]: 
https://www.postgresql.org/message-id/flat/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Restart pg_usleep when interrupted

2024-07-01 Thread Sami Imseih

> 
>> Therefore, rather than "improving" pg_usleep (and uglifying its API),
>> the right answer is to fix parallel vacuum leaders to not depend on
>> pg_usleep in the first place.  A better idea might be to use
>> pg_sleep() or equivalent code.
> 
> Yes, that is a good idea to explore and it will not require introducing
> an awkward new API. I will look into using something similar to
> pg_sleep.


Looking through the history of the sleep in vacuum_delay_point, commit
720de00af49 replaced WaitLatch with pg_usleep to allow for microsecond
sleep precision [1]. 

Thomas has proposed a WaitLatchUs implementation in [2], but I have not
yet tried it. 

So I see there are 2 possible options here to deal with the interrupt of a 
parallel vacuum leader when a message is sent by a parallel vacuum worker. 

Option 1/ something like my initial proposal which is
to create a function similar to pg_usleep that is able to deal with
interrupts in a sleep. This could be a function scoped only to vacuum.c,
so it can only be used for vacuum delay purposes.

—— 
Option 2/ to explore the WaitLatchUs implementation by
Thomas which will give both a latch implementation for a sleep with
the microsecond precision.

It is worth mentioning that if we do end up using WaitLatch(Us) inside
vacuum_delay_point, it will need to set only WL_TIMEOUT and 
WL_EXIT_ON_PM_DEATH.

i.e.
(void) WaitLatch(MyLatch, WL_TIMEOUT| WL_EXIT_ON_PM_DEATH,
   msec
  WAIT_EVENT_VACUUM_DELAY);

This way it is not interrupted by a WL_LATCH_SET when a message
is set by a parallel worker.

——

Ultimately, I think option 2 may be worth a closer look as it is a cleaner
and safer approach, to detect a postmaster death.


Thoughts?


[1] 
https://postgr.es/m/CAAKRu_b-q0hXCBUCAATh0Z4Zi6UkiC0k2DFgoD3nC-r3SkR3tg%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CA%2BhUKGKVbJE59JkwnUj5XMY%2B-rzcTFciV9vVC7i%3DLUfWPds8Xw%40mail.gmail.com
 

Re: Restart pg_usleep when interrupted

2024-06-28 Thread Sami Imseih

Thanks for the feedback!

> On Jun 28, 2024, at 4:34 PM, Tom Lane  wrote:
> 
> Sami Imseih  writes:
>> Reattaching the patch.
> 
> I feel like this is fundamentally a wrong solution, for the reasons
> cited in the comment for pg_usleep: long sleeps are a bad idea
> because of the resulting uncertainty about whether we'll respond to
> interrupts and such promptly.  An example here is that if we get
> a query cancel interrupt, we should probably not insist on finishing
> out the current sleep before responding.

The case which brought up this discussion is the pg_usleep that 
is called within the vacuum_delay_point being interrupted. 

When I read the same code comment you cited, it sounded to me 
that “long sleeps” are those that are in seconds or minutes. The longest 
vacuum delay allowed is 100ms.


> Therefore, rather than "improving" pg_usleep (and uglifying its API),
> the right answer is to fix parallel vacuum leaders to not depend on
> pg_usleep in the first place.  A better idea might be to use
> pg_sleep() or equivalent code.

Yes, that is a good idea to explore and it will not require introducing
an awkward new API. I will look into using something similar to
pg_sleep.

Regards,

Sami




Re: Restart pg_usleep when interrupted

2024-06-28 Thread Thomas Munro
On Sat, Jun 29, 2024 at 9:34 AM Tom Lane  wrote:
> Therefore, rather than "improving" pg_usleep (and uglifying its API),
> the right answer is to fix parallel vacuum leaders to not depend on
> pg_usleep in the first place.  A better idea might be to use
> pg_sleep() or equivalent code.

In case it's useful for someone looking into that, in earlier
discussions we figured out that it is possible to have high resolution
timeouts AND support latch multiplexing on Linux, FreeBSD, macOS:

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BhC9mFx8tEcBsyo7-cAfWgtbRy1eDizeFuff2K7T%3D4bA%40mail.gmail.com




Re: Restart pg_usleep when interrupted

2024-06-28 Thread Tom Lane
Sami Imseih  writes:
> Reattaching the patch.

I feel like this is fundamentally a wrong solution, for the reasons
cited in the comment for pg_usleep: long sleeps are a bad idea
because of the resulting uncertainty about whether we'll respond to
interrupts and such promptly.  An example here is that if we get
a query cancel interrupt, we should probably not insist on finishing
out the current sleep before responding.

Therefore, rather than "improving" pg_usleep (and uglifying its API),
the right answer is to fix parallel vacuum leaders to not depend on
pg_usleep in the first place.  A better idea might be to use
pg_sleep() or equivalent code.

regards, tom lane




Re: Restart pg_usleep when interrupted

2024-06-28 Thread Sami Imseih

> We want patches to be in the pgsql-hackers archives, not temporarily
> accessible via some link.
> 
> …Robert
> 

Moving to another email going forward.

Reattaching the patch.




0001-Handle-Sleep-interrupts.patch
Description: Binary data


Regards,

Sami Imseih
Amazon Web Services

Re: Restart pg_usleep when interrupted

2024-06-28 Thread Imseih (AWS), Sami
> I think you need to find a way to disable this "Attachment protected
> by Amazon" thing:

Yes, I am looking into that. I only noticed after sending the email.

Sorry about that.

Sami 







Re: Restart pg_usleep when interrupted

2024-06-28 Thread Robert Haas
Hi,

I think you need to find a way to disable this "Attachment protected
by Amazon" thing:

http://postgr.es/m/01000190606e3d2a-116ead16-84d2-4449-8d18-5053da66b1f4-000...@email.amazonses.com

We want patches to be in the pgsql-hackers archives, not temporarily
accessible via some link.

...Robert