Re: Small and unaffected typo in pg_logical_slot_get_changes_guts()

2022-02-16 Thread Kasahara Tatsuhito
2022年2月17日(木) 10:56 Michael Paquier :
>
> On Wed, Feb 16, 2022 at 01:25:09PM +0900, Kasahara Tatsuhito wrote:
> > Remove all references to tuplestore_donestoring() except for the header.
>
> Looks fine, thanks.  This has no impact on Melanie's patch posted on
> [1], so applied after tweaking the comment in tuplestore.h.
>
> [1]: 
> https://www.postgresql.org/message-id/20220106005748.GT14051%40telsasoft.com
Thanks !

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Small and unaffected typo in pg_logical_slot_get_changes_guts()

2022-02-15 Thread Kasahara Tatsuhito
Hi !

2022年2月16日(水) 11:42 Michael Paquier :
>
> On Wed, Feb 16, 2022 at 11:27:58AM +0900, Kasahara Tatsuhito wrote:
> > Since commit dd04e958c8b03c0f0512497651678c7816af3198,
> > tuplestore_donestoring() seems to be left only for compatibility, so
> > it looks like it can be removed from the core code, except for the
> > header (tuplestore.h).
>
> FWIW, it looks sensible to me to just do the cleanup you are
> suggesting here on HEAD, aka keeping the compatibility macro in the
> header, but wiping out any references of it in the C code.  That
> reduces the chances of copy-pasting it around for no effect.  Would
> you like to write a patch?
Thanks for the reply. The patch is attached.
Remove all references to tuplestore_donestoring() except for the header.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


remove_tuplestore_donestoring.patch
Description: Binary data


Small and unaffected typo in pg_logical_slot_get_changes_guts()

2022-02-15 Thread Kasahara Tatsuhito
Hi,

I noticed the small typo in pg_logical_slot_get_changes_guts().

==
--- a/src/backend/replication/logical/logicalfuncs.c
+++ b/src/backend/replication/logical/logicalfuncs.c
@@ -295,7 +295,7 @@ pg_logical_slot_get_changes_guts(FunctionCallInfo
fcinfo, bool confirm, bool bin
CHECK_FOR_INTERRUPTS();
}

-   tuplestore_donestoring(tupstore);
+   tuplestore_donestoring(p->tupstore);
==

As above, I think tuplestore_donestoring() should have p->tupstore as
an argument.
Fortunately, tuplestore_donestoring() is defined as ((void) 0) and
does not do anything. Therefore, this typo has no effect

However, it is better to fix the typo or remove the
tuplestore_donestoring() part to avoid confusion.

Since commit dd04e958c8b03c0f0512497651678c7816af3198,
tuplestore_donestoring() seems to be left only for compatibility, so
it looks like it can be removed from the core code, except for the
header (tuplestore.h).

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Error code for checksum failure in origin.c

2021-08-30 Thread Kasahara Tatsuhito
On Mon, Aug 30, 2021 at 5:35 PM Amit Kapila  wrote:
>
> On Fri, Aug 27, 2021 at 12:47 PM Daniel Gustafsson  wrote:
> >
> > > On 27 Aug 2021, at 06:32, Amit Kapila  wrote:
> >
> > > I think we need to backpatch this till 9.6 as this is introduced by
> > > commit 5aa2350426. Any objections to that?
> >
> > No, that seems appropriate.
> >
>
> Pushed.
Thanks !

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Error code for checksum failure in origin.c

2021-08-26 Thread Kasahara Tatsuhito
On Fri, Aug 27, 2021 at 1:32 PM Amit Kapila  wrote:
>
> On Thu, Aug 26, 2021 at 4:11 PM Daniel Gustafsson  wrote:
> >
> > > On 26 Aug 2021, at 12:03, Amit Kapila  wrote:
> > >
> > > On Thu, Aug 26, 2021 at 3:18 PM Kasahara Tatsuhito
> > >  wrote:
> > >>
> > >> Hi.
> > >>
> > >> In the code in src/backend/replication/logical/origin.c,
> > >> the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
> > >> when a checksum check results in an error,
> > >> but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.
> > >>
> > >
> > > +1. Your observation looks correct to me and the error code suggested
> > > by you seems appropriate. We use ERRCODE_DATA_CORRUPTED in
> > > ReadTwoPhaseFile() for similar error.
> >
> > Agreed, +1 for changing this.
> >
>
> I think we need to backpatch this till 9.6 as this is introduced by
> commit 5aa2350426.
+1

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Error code for checksum failure in origin.c

2021-08-26 Thread Kasahara Tatsuhito
Hi.

In the code in src/backend/replication/logical/origin.c,
the error code "ERRCODE_CONFIGURATION_LIMIT_EXCEEDED" is given
when a checksum check results in an error,
but "ERRCODE_ DATA_CORRUPTED" seems to be more appropriate.


diff --git a/src/backend/replication/logical/origin.c
b/src/backend/replication/logical/origin.c
index 2c191de..65dcd03 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -796,7 +796,7 @@ StartupReplicationOrigin(void)
FIN_CRC32C(crc);
if (file_crc != crc)
ereport(PANIC,
-   (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
+   (errcode(ERRCODE_DATA_CORRUPTED),
 errmsg("replication slot checkpoint
has wrong checksum %u, expected %u",
crc, file_crc)));

Thought?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: There doesn't seem to be any case where PQputCopyEnd() returns 0

2021-02-06 Thread Kasahara Tatsuhito
On Fri, Feb 5, 2021 at 11:01 PM Fujii Masao  wrote:
>
>
>
> On 2021/02/05 16:52, Kasahara Tatsuhito wrote:
> > Hi,
> >
> > The following is written in the comments of PQputCopyEnd().
> >
> >   (snip)
> >   * Returns 1 if successful, 0 if data could not be sent (only possible
> >   * in nonblock mode), or -1 if an error occurs.
> >   (snip)
> >
> > The PQputCopyEnd() section of the manual (libpq.sgml) describes the 
> > following.
> >
> > The result is 1 if the termination message was sent; or in
> > nonblocking mode, this may only indicate that the termination
> > message was successfully queued.  (In nonblocking mode, to be
> > certain that the data has been sent, you should next wait for
> > write-ready and call , repeating 
> > until it
> > returns zero.)  Zero indicates that the function could not queue
> > the termination message because of full buffers; this will only
> > happen in nonblocking mode.  (In this case, wait for
> > write-ready and try the  call
> > again.)  If a hard error occurs, -1 is returned; you can use
> >  to retrieve details.
> >
> >
> > These says that 0 may be returned if a non-blocking mode is used, but
> > there doesn't seem to be any case where 0 is returned in the code of
> > PQputCopyEnd().
>
> I found the past discussion [1] about this issue.
>
> [1]
> https://www.postgresql.org/message-id/CA+Tgmobjj+0modbnmjy7ezeBFOBo9d2mAVcSPkzLx4LtZmc==g...@mail.gmail.com
Oh, thank you.
I understood what was unclear.

Best regards,

>
> Regards,
>
> --
> Fujii Masao
> Advanced Computing Technology Center
> Research and Development Headquarters
> NTT DATA CORPORATION



-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




There doesn't seem to be any case where PQputCopyEnd() returns 0

2021-02-04 Thread Kasahara Tatsuhito
Hi,

The following is written in the comments of PQputCopyEnd().

 (snip)
 * Returns 1 if successful, 0 if data could not be sent (only possible
 * in nonblock mode), or -1 if an error occurs.
 (snip)

The PQputCopyEnd() section of the manual (libpq.sgml) describes the following.

   The result is 1 if the termination message was sent; or in
   nonblocking mode, this may only indicate that the termination
   message was successfully queued.  (In nonblocking mode, to be
   certain that the data has been sent, you should next wait for
   write-ready and call , repeating until it
   returns zero.)  Zero indicates that the function could not queue
   the termination message because of full buffers; this will only
   happen in nonblocking mode.  (In this case, wait for
   write-ready and try the  call
   again.)  If a hard error occurs, -1 is returned; you can use
to retrieve details.


These says that 0 may be returned if a non-blocking mode is used, but
there doesn't seem to be any case where 0 is returned in the code of
PQputCopyEnd().

I may have missed something, but is it a mistake in the comments or
documentation?

Or should it return 0 when sending a COPY exit message fails
in non-blocking mode, like this?

@@ -2370,7 +2370,7 @@ PQputCopyEnd(PGconn *conn, const char *errormsg)
/* Send COPY DONE */
if (pqPutMsgStart('c', false, conn) < 0 ||
pqPutMsgEnd(conn) < 0)
-   return -1;
+   return pqIsnonblocking(conn) ? 0 : -1;
}

/*
@@ -2399,7 +2399,7 @@ PQputCopyEnd(PGconn *conn, const char *errormsg)
if (pqPutMsgStart(0, false, conn) < 0 ||
pqPutnchar("\\.\n", 3, conn) < 0 ||
pqPutMsgEnd(conn) < 0)
-   return -1;
+   return pqIsnonblocking(conn) ? 0 : -1;
}
}


Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Get memory contexts of an arbitrary backend process

2020-12-25 Thread Kasahara Tatsuhito
Hi,

On Thu, Dec 10, 2020 at 10:48 AM torikoshia  wrote:
>
> On 2020-12-04 19:16, torikoshia wrote:
> > On 2020-12-03 10:36, Tom Lane wrote:
> >> Fujii Masao  writes:
> >>> I'm starting to study how this feature behaves. At first, when I
> >>> executed
> >>> the following query, the function never returned. ISTM that since
> >>> the autovacuum launcher cannot respond to the request of memory
> >>> contexts dump, the function keeps waiting infinity. Is this a bug?
> >>> Probably we should exclude non-backend proceses from the target
> >>> processes to dump? Sorry if this was already discussed.
> >>
> >>>  SELECT pg_get_backend_memory_contexts(pid) FROM
> >>> pg_stat_activity;
> >
> > Thanks for trying it!
> >
> > It was not discussed explicitly, and I was going to do it later
> > as commented.
> >
> >> +/* TODO: Check also whether backend or not. */
> >
> >>
> >> FWIW, I think this patch is fundamentally unsafe.  It's got a
> >> lot of the same problems that I complained about w.r.t. the
> >> nearby proposal to allow cross-backend stack trace dumping.
> >> It does avoid the trap of thinking that it can do work in
> >> a signal handler, but instead it supposes that it can do
> >> work involving very high-level objects such as shared hash tables
> >> in anyplace that might execute CHECK_FOR_INTERRUPTS.  That's
> >> never going to be safe: the only real expectation the system
> >> has is that CHECK_FOR_INTERRUPTS is called at places where our
> >> state is sane enough that a transaction abort can clean up.
> >> Trying to do things like taking LWLocks is going to lead to
> >> deadlocks or worse.  We need not even get into the hard questions,
> >> such as what happens when one process or the other exits
> >> unexpectedly.
> >
> > Thanks for reviewing!
> >
> > I may misunderstand something, but the dumper works not at
> > CHECK_FOR_INTERRUPTS but during the client read, i.e.,
> > ProcessClientReadInterrupt().
> >
> > Is it also unsafe?
> >
> >
> > BTW, since there was a comment that the shared hash table
> > used too much memory, I'm now rewriting this patch not to use
> > the shared hash table but a simpler static shared memory struct.
>
> Attached a rewritten patch.
Thanks for updating patch.

But when I had applyed the patch to the current HEAD and did make, I
got an error due to duplicate OIDs.
You need to rebase the patch.

> Accordingly, I also slightly modified the basic design as below.
>
> ---
> # Communication flow between the dumper and the requester
> - (1) When requesting memory context dumping, the dumper changes
> the struct on the shared memory state from 'ACCEPTABLE' to
> 'REQUESTING'.
> - (2) The dumper sends the signal to the dumper process and wait on
> the latch.
> - (3) When the dumper noticed the signal, it changes the state to
> 'DUMPING'.
> - (4) When the dumper completes dumping, it changes the state to
> 'DONE' and set the latch.
> - (5) The requestor reads the dump file and shows it to the user.
> Finally, the requestor removes the dump file and reset the shared
> memory state to 'ACCEPTABLE'.
>
> # Query cancellation
> - When the requestor cancels dumping, e.g. signaling using ctrl-C,
> the requestor changes the state of the shared memory to 'CANCELING'.
> - The dumper checks the state when it tries to change the state to
> 'DONE' at (4), and if the state is 'CANCELING', it initializes the
> dump file and reset the shared memory state to 'ACCEPTABLE'.
>
> # Cleanup dump file and the shared memory
> - In the normal case, the dumper removes the dump file and resets
> the shared memory entry as described in (5).
> - When something like query cancellation or process termination
> happens on the dumper after (1) and before (3), in other words,
> the state is 'REQUESTING', the requestor does the cleanup.
> - When something happens on the dumper or the requestor after (3)
> and before (4), in other words, the state is 'DUMPING', the dumper
> does the cleanup. Specifically, if the requestor cancels the query,
> it just changes the state to 'CANCELING' and the dumper notices it
> and cleans up things later.
> OTOH, when the dumper fails to dump, it cleans up the dump file and
> reset the shared memory state.
> - When something happens on the requestor after (4), i.e., the state
> is 'DONE', the requestor does the cleanup.
> - In the case of receiving SIGKILL or power failure, all dump files
> are removed in the crash recovery process.
> ---
If the dumper is terminated before it dumps, the requestor will appear
to enter an
infinite loop because the status of mcxtdumpShmem will not change.
The following are the steps to reproduce.

 - session1
   BEGIN; LOCK TABLE t;
   - session2
 SELECT * FROM t; -- wait
 - session3
   select pg_get_target_backend_memory_contexts(); -- wait
 - session1
   select pg_terminate_backend(); -- kill session2
 - session3 waits forever.

Therefore, you may need to set mcxtdumpShmem->dump_status to
MCXTDUMPSTATUS_CANCELING
or other 

Re: autovac issue with large number of tables

2020-12-08 Thread Kasahara Tatsuhito
On Wed, Dec 9, 2020 at 12:01 AM Fujii Masao  wrote:
>
>
>
> On 2020/12/04 12:21, Kasahara Tatsuhito wrote:
> > Hi,
> >
> > On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/12/03 11:46, Kasahara Tatsuhito wrote:
> >>> On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada  
> >>> wrote:
> >>>>
> >>>> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao  
> >>>> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/12/02 12:53, Masahiko Sawada wrote:
> >>>>>> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada  
> >>>>>> wrote:
> >>>>>>>
> >>>>>>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao 
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2020/12/01 16:23, Masahiko Sawada wrote:
> >>>>>>>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao 
> >>>>>>>>>>  wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote:
> >>>>>>>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> >>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi, Thanks for you comments.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao 
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> >>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> >>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> >>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> >>>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
&

Re: autovac issue with large number of tables

2020-12-03 Thread Kasahara Tatsuhito
Hi,

On Thu, Dec 3, 2020 at 9:09 PM Fujii Masao  wrote:
>
>
>
> On 2020/12/03 11:46, Kasahara Tatsuhito wrote:
> > On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada  
> > wrote:
> >>
> >> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/12/02 12:53, Masahiko Sawada wrote:
> >>>> On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada  
> >>>> wrote:
> >>>>>
> >>>>> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao 
> >>>>>  wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2020/12/01 16:23, Masahiko Sawada wrote:
> >>>>>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao 
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote:
> >>>>>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> >>>>>>>>>>  wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi, Thanks for you comments.
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao 
> >>>>>>>>>>>  wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> >>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> >>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two 
> >>>>>>>>>>>>>>>>>>>>> probes of the stats
> >>>>>>>>>>>>>>>>>>>>> data.  First probe the existing stats data, and if it 
> >>>>>>>>>>>>>>>>>>>>> shows the table to
> >>>>>>>>>>>>>>>>>>>>> be al

Re: autovac issue with large number of tables

2020-12-02 Thread Kasahara Tatsuhito
On Wed, Dec 2, 2020 at 7:11 PM Masahiko Sawada  wrote:
>
> On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao  
> wrote:
> >
> >
> >
> > On 2020/12/02 12:53, Masahiko Sawada wrote:
> > > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada  
> > > wrote:
> > >>
> > >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao  
> > >> wrote:
> > >>>
> > >>>
> > >>>
> > >>> On 2020/12/01 16:23, Masahiko Sawada wrote:
> > >>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> > >>>>  wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao 
> > >>>>>  wrote:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote:
> > >>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> > >>>>>>>  wrote:
> > >>>>>>>>
> > >>>>>>>> Hi, Thanks for you comments.
> > >>>>>>>>
> > >>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao 
> > >>>>>>>>  wrote:
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> > >>>>>>>>>>  wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> > >>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> > >>>>>>>>>>>>  wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> > >>>>>>>>>>>>>  wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> > >>>>>>>>>>>>>>  wrote:
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> > >>>>>>>>>>>>>>>  wrote:
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> Hi,
> > >>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> > >>>>>>>>>>>>>>>>  wrote:
> > >>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two 
> > >>>>>>>>>>>>>>>>>> probes of the stats
> > >>>>>>>>>>>>>>>>>> data.  First probe the existing stats data, and if it 
> > >>>>>>>>>>>>>>>>>> shows the table to
> > >>>>>>>>>>>>>>>>>> be already vacuumed, return immediately.  If not, *then* 
> > >>>>>>>>>>>>>>>>>> force a stats
> > >>>>>>>>>>>>>>>>>> re-read, and check a second time.
> > >>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent 
> > >>>>>>>>>>>>>>>>> table_recheck_autovac()
> > >>>>>>>>>>>>>>>>> will be improved to first check using the previo

Re: autovac issue with large number of tables

2020-12-02 Thread Kasahara Tatsuhito
Hi

On Wed, Dec 2, 2020 at 3:33 PM Fujii Masao  wrote:
>
>
>
> On 2020/12/02 12:53, Masahiko Sawada wrote:
> > On Tue, Dec 1, 2020 at 5:31 PM Masahiko Sawada  
> > wrote:
> >>
> >> On Tue, Dec 1, 2020 at 4:32 PM Fujii Masao  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/12/01 16:23, Masahiko Sawada wrote:
> >>>> On Tue, Dec 1, 2020 at 1:48 PM Kasahara Tatsuhito
> >>>>  wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao 
> >>>>>  wrote:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 2020/11/30 10:43, Masahiko Sawada wrote:
> >>>>>>> On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> Hi, Thanks for you comments.
> >>>>>>>>
> >>>>>>>> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao 
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> >>>>>>>>>>  wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> >>>>>>>>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> >>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> >>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >>>>>>>>>>>>>>>>  wrote:
> >>>>>>>>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two 
> >>>>>>>>>>>>>>>>>> probes of the stats
> >>>>>>>>>>>>>>>>>> data.  First probe the existing stats data, and if it 
> >>>>>>>>>>>>>>>>>> shows the table to
> >>>>>>>>>>>>>>>>>> be already vacuumed, return immediately.  If not, *then* 
> >>>>>>>>>>>>>>>>>> force a stats
> >>>>>>>>>>>>>>>>>> re-read, and check a second time.
> >>>>>>>>>>>>>>>>> Does the above mean that the second and subsequent 
> >>>>>>>>>>>>>>>>> table_recheck_autovac()
> >>>>>>>>>>>>>>>>> will be improved to first check using the previous 
> >>>>>>>>>>>>>>>>> refreshed statistics?
> >>>>>>>>>>>>>>>>> I think that certainly works.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC
> >>>>>>>>&

Re: autovac issue with large number of tables

2020-11-30 Thread Kasahara Tatsuhito
Hi,

On Mon, Nov 30, 2020 at 8:59 PM Fujii Masao  wrote:
>
>
>
> On 2020/11/30 10:43, Masahiko Sawada wrote:
> > On Sun, Nov 29, 2020 at 10:34 PM Kasahara Tatsuhito
> >  wrote:
> >>
> >> Hi, Thanks for you comments.
> >>
> >> On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao  
> >> wrote:
> >>>
> >>>
> >>>
> >>> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> >>>> Hi,
> >>>>
> >>>> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao 
> >>>>  wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> >>>>>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada 
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>>>>>>  wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada 
> >>>>>>>>  wrote:
> >>>>>>>>>
> >>>>>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>>>>>>  wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >>>>>>>>>>  wrote:
> >>>>>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of 
> >>>>>>>>>>>> the stats
> >>>>>>>>>>>> data.  First probe the existing stats data, and if it shows the 
> >>>>>>>>>>>> table to
> >>>>>>>>>>>> be already vacuumed, return immediately.  If not, *then* force a 
> >>>>>>>>>>>> stats
> >>>>>>>>>>>> re-read, and check a second time.
> >>>>>>>>>>> Does the above mean that the second and subsequent 
> >>>>>>>>>>> table_recheck_autovac()
> >>>>>>>>>>> will be improved to first check using the previous refreshed 
> >>>>>>>>>>> statistics?
> >>>>>>>>>>> I think that certainly works.
> >>>>>>>>>>>
> >>>>>>>>>>> If that's correct, I'll try to create a patch for the PoC
> >>>>>>>>>>
> >>>>>>>>>> I still don't know how to reproduce Jim's troubles, but I was able 
> >>>>>>>>>> to reproduce
> >>>>>>>>>> what was probably a very similar problem.
> >>>>>>>>>>
> >>>>>>>>>> This problem seems to be more likely to occur in cases where you 
> >>>>>>>>>> have
> >>>>>>>>>> a large number of tables,
> >>>>>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at
> >>>>>>>>>> the same time.
> >>>>>>>>>>
> >>>>>>>>>> So I followed Tom's advice and created a patch for the PoC.
> >>>>>>>>>> This patch will enable a flag in the table_recheck_autovac 
> >>>>>>>>>> function to use
> >>>>>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already 
> >>>>>>>>>> been done
> >>>>>>>>>> by another worker on the check after the stats have been updated.
> >>>>>>>>>> If the tables continue to require VACUUM after the refresh, then a 
> >>>>>>>>>> refresh
> >>>>>>>>>> will be required instead of using the existing statistics.
> >>>>>>>>>>
> >>>>>>>>>> I did simple test with HEAD and HEAD + this PoC patch.
> >>>>>>>>>> The tests were conducted in two cases.
> >>>>>>>>>> (I changed few configurations. see attached scripts)
> >>>>>>>>

Re: autovac issue with large number of tables

2020-11-29 Thread Kasahara Tatsuhito
Hi, Thanks for you comments.

On Fri, Nov 27, 2020 at 9:51 PM Fujii Masao  wrote:
>
>
>
> On 2020/11/27 18:38, Kasahara Tatsuhito wrote:
> > Hi,
> >
> > On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao  
> > wrote:
> >>
> >>
> >>
> >> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> >>> On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada  
> >>> wrote:
> >>>>
> >>>> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>>>  wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada  
> >>>>> wrote:
> >>>>>>
> >>>>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>>>  wrote:
> >>>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >>>>>>>  wrote:
> >>>>>>>>> I wonder if we could have table_recheck_autovac do two probes of 
> >>>>>>>>> the stats
> >>>>>>>>> data.  First probe the existing stats data, and if it shows the 
> >>>>>>>>> table to
> >>>>>>>>> be already vacuumed, return immediately.  If not, *then* force a 
> >>>>>>>>> stats
> >>>>>>>>> re-read, and check a second time.
> >>>>>>>> Does the above mean that the second and subsequent 
> >>>>>>>> table_recheck_autovac()
> >>>>>>>> will be improved to first check using the previous refreshed 
> >>>>>>>> statistics?
> >>>>>>>> I think that certainly works.
> >>>>>>>>
> >>>>>>>> If that's correct, I'll try to create a patch for the PoC
> >>>>>>>
> >>>>>>> I still don't know how to reproduce Jim's troubles, but I was able to 
> >>>>>>> reproduce
> >>>>>>> what was probably a very similar problem.
> >>>>>>>
> >>>>>>> This problem seems to be more likely to occur in cases where you have
> >>>>>>> a large number of tables,
> >>>>>>> i.e., a large amount of stats, and many small tables need VACUUM at
> >>>>>>> the same time.
> >>>>>>>
> >>>>>>> So I followed Tom's advice and created a patch for the PoC.
> >>>>>>> This patch will enable a flag in the table_recheck_autovac function 
> >>>>>>> to use
> >>>>>>> the existing stats next time if VACUUM (or ANALYZE) has already been 
> >>>>>>> done
> >>>>>>> by another worker on the check after the stats have been updated.
> >>>>>>> If the tables continue to require VACUUM after the refresh, then a 
> >>>>>>> refresh
> >>>>>>> will be required instead of using the existing statistics.
> >>>>>>>
> >>>>>>> I did simple test with HEAD and HEAD + this PoC patch.
> >>>>>>> The tests were conducted in two cases.
> >>>>>>> (I changed few configurations. see attached scripts)
> >>>>>>>
> >>>>>>> 1. Normal VACUUM case
> >>>>>>> - SET autovacuum = off
> >>>>>>> - CREATE tables with 100 rows
> >>>>>>> - DELETE 90 rows for each tables
> >>>>>>> - SET autovacuum = on and restart PostgreSQL
> >>>>>>> - Measure the time it takes for all tables to be VACUUMed
> >>>>>>>
> >>>>>>> 2. Anti wrap round VACUUM case
> >>>>>>> - CREATE brank tables
> >>>>>>> - SELECT all of these tables (for generate stats)
> >>>>>>> - SET autovacuum_freeze_max_age to low values and restart 
> >>>>>>> PostgreSQL
> >>>>>>> - Consumes a lot of XIDs by using txid_curent()
> >>>>>>> - Measure the time it takes for all tables to be VACUUMed
> >>>>>>>
> >>>>>>> For each test case, the following results were obtaine

Re: autovac issue with large number of tables

2020-11-27 Thread Kasahara Tatsuhito
On Fri, Nov 27, 2020 at 5:22 PM Masahiko Sawada  wrote:
>
> On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao  
> wrote:
> >
> >
> >
> > On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> > > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada  
> > > wrote:
> > >>
> > >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> > >>  wrote:
> > >>>
> > >>> Hi,
> > >>>
> > >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada  
> > >>> wrote:
> > >>>>
> > >>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> > >>>>  wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> > >>>>>  wrote:
> > >>>>>>> I wonder if we could have table_recheck_autovac do two probes of 
> > >>>>>>> the stats
> > >>>>>>> data.  First probe the existing stats data, and if it shows the 
> > >>>>>>> table to
> > >>>>>>> be already vacuumed, return immediately.  If not, *then* force a 
> > >>>>>>> stats
> > >>>>>>> re-read, and check a second time.
> > >>>>>> Does the above mean that the second and subsequent 
> > >>>>>> table_recheck_autovac()
> > >>>>>> will be improved to first check using the previous refreshed 
> > >>>>>> statistics?
> > >>>>>> I think that certainly works.
> > >>>>>>
> > >>>>>> If that's correct, I'll try to create a patch for the PoC
> > >>>>>
> > >>>>> I still don't know how to reproduce Jim's troubles, but I was able to 
> > >>>>> reproduce
> > >>>>> what was probably a very similar problem.
> > >>>>>
> > >>>>> This problem seems to be more likely to occur in cases where you have
> > >>>>> a large number of tables,
> > >>>>> i.e., a large amount of stats, and many small tables need VACUUM at
> > >>>>> the same time.
> > >>>>>
> > >>>>> So I followed Tom's advice and created a patch for the PoC.
> > >>>>> This patch will enable a flag in the table_recheck_autovac function 
> > >>>>> to use
> > >>>>> the existing stats next time if VACUUM (or ANALYZE) has already been 
> > >>>>> done
> > >>>>> by another worker on the check after the stats have been updated.
> > >>>>> If the tables continue to require VACUUM after the refresh, then a 
> > >>>>> refresh
> > >>>>> will be required instead of using the existing statistics.
> > >>>>>
> > >>>>> I did simple test with HEAD and HEAD + this PoC patch.
> > >>>>> The tests were conducted in two cases.
> > >>>>> (I changed few configurations. see attached scripts)
> > >>>>>
> > >>>>> 1. Normal VACUUM case
> > >>>>>- SET autovacuum = off
> > >>>>>- CREATE tables with 100 rows
> > >>>>>- DELETE 90 rows for each tables
> > >>>>>- SET autovacuum = on and restart PostgreSQL
> > >>>>>- Measure the time it takes for all tables to be VACUUMed
> > >>>>>
> > >>>>> 2. Anti wrap round VACUUM case
> > >>>>>- CREATE brank tables
> > >>>>>- SELECT all of these tables (for generate stats)
> > >>>>>- SET autovacuum_freeze_max_age to low values and restart 
> > >>>>> PostgreSQL
> > >>>>>- Consumes a lot of XIDs by using txid_curent()
> > >>>>>- Measure the time it takes for all tables to be VACUUMed
> > >>>>>
> > >>>>> For each test case, the following results were obtained by changing
> > >>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> > >>>>> Also changing num of tables to 1000, 5000, 1 and 2.
> > >>>>>
> > >>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a 
> > &

Re: autovac issue with large number of tables

2020-11-27 Thread Kasahara Tatsuhito
Hi,

On Fri, Nov 27, 2020 at 1:43 AM Fujii Masao  wrote:
>
>
>
> On 2020/11/26 10:41, Kasahara Tatsuhito wrote:
> > On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada  
> > wrote:
> >>
> >> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada  
> >>> wrote:
> >>>>
> >>>> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> >>>>  wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >>>>>  wrote:
> >>>>>>> I wonder if we could have table_recheck_autovac do two probes of the 
> >>>>>>> stats
> >>>>>>> data.  First probe the existing stats data, and if it shows the table 
> >>>>>>> to
> >>>>>>> be already vacuumed, return immediately.  If not, *then* force a stats
> >>>>>>> re-read, and check a second time.
> >>>>>> Does the above mean that the second and subsequent 
> >>>>>> table_recheck_autovac()
> >>>>>> will be improved to first check using the previous refreshed 
> >>>>>> statistics?
> >>>>>> I think that certainly works.
> >>>>>>
> >>>>>> If that's correct, I'll try to create a patch for the PoC
> >>>>>
> >>>>> I still don't know how to reproduce Jim's troubles, but I was able to 
> >>>>> reproduce
> >>>>> what was probably a very similar problem.
> >>>>>
> >>>>> This problem seems to be more likely to occur in cases where you have
> >>>>> a large number of tables,
> >>>>> i.e., a large amount of stats, and many small tables need VACUUM at
> >>>>> the same time.
> >>>>>
> >>>>> So I followed Tom's advice and created a patch for the PoC.
> >>>>> This patch will enable a flag in the table_recheck_autovac function to 
> >>>>> use
> >>>>> the existing stats next time if VACUUM (or ANALYZE) has already been 
> >>>>> done
> >>>>> by another worker on the check after the stats have been updated.
> >>>>> If the tables continue to require VACUUM after the refresh, then a 
> >>>>> refresh
> >>>>> will be required instead of using the existing statistics.
> >>>>>
> >>>>> I did simple test with HEAD and HEAD + this PoC patch.
> >>>>> The tests were conducted in two cases.
> >>>>> (I changed few configurations. see attached scripts)
> >>>>>
> >>>>> 1. Normal VACUUM case
> >>>>>- SET autovacuum = off
> >>>>>- CREATE tables with 100 rows
> >>>>>- DELETE 90 rows for each tables
> >>>>>- SET autovacuum = on and restart PostgreSQL
> >>>>>- Measure the time it takes for all tables to be VACUUMed
> >>>>>
> >>>>> 2. Anti wrap round VACUUM case
> >>>>>- CREATE brank tables
> >>>>>- SELECT all of these tables (for generate stats)
> >>>>>- SET autovacuum_freeze_max_age to low values and restart PostgreSQL
> >>>>>- Consumes a lot of XIDs by using txid_curent()
> >>>>>- Measure the time it takes for all tables to be VACUUMed
> >>>>>
> >>>>> For each test case, the following results were obtained by changing
> >>>>> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> >>>>> Also changing num of tables to 1000, 5000, 1 and 2.
> >>>>>
> >>>>> Due to the poor VM environment (2 VCPU/4 GB), the results are a little 
> >>>>> unstable,
> >>>>> but I think it's enough to ask for a trend.
> >>>>>
> >>>>> ===
> >>>>> [1.Normal VACUUM case]
> >>>>>   tables:1000
> >>>>>autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
> >>>>>autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
> >>>>>autovacuum_ma

Re: autovac issue with large number of tables

2020-11-25 Thread Kasahara Tatsuhito
On Wed, Nov 25, 2020 at 8:46 PM Masahiko Sawada  wrote:
>
> On Wed, Nov 25, 2020 at 4:18 PM Kasahara Tatsuhito
>  wrote:
> >
> > Hi,
> >
> > On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> > > >  wrote:
> > > > > > I wonder if we could have table_recheck_autovac do two probes of 
> > > > > > the stats
> > > > > > data.  First probe the existing stats data, and if it shows the 
> > > > > > table to
> > > > > > be already vacuumed, return immediately.  If not, *then* force a 
> > > > > > stats
> > > > > > re-read, and check a second time.
> > > > > Does the above mean that the second and subsequent 
> > > > > table_recheck_autovac()
> > > > > will be improved to first check using the previous refreshed 
> > > > > statistics?
> > > > > I think that certainly works.
> > > > >
> > > > > If that's correct, I'll try to create a patch for the PoC
> > > >
> > > > I still don't know how to reproduce Jim's troubles, but I was able to 
> > > > reproduce
> > > > what was probably a very similar problem.
> > > >
> > > > This problem seems to be more likely to occur in cases where you have
> > > > a large number of tables,
> > > > i.e., a large amount of stats, and many small tables need VACUUM at
> > > > the same time.
> > > >
> > > > So I followed Tom's advice and created a patch for the PoC.
> > > > This patch will enable a flag in the table_recheck_autovac function to 
> > > > use
> > > > the existing stats next time if VACUUM (or ANALYZE) has already been 
> > > > done
> > > > by another worker on the check after the stats have been updated.
> > > > If the tables continue to require VACUUM after the refresh, then a 
> > > > refresh
> > > > will be required instead of using the existing statistics.
> > > >
> > > > I did simple test with HEAD and HEAD + this PoC patch.
> > > > The tests were conducted in two cases.
> > > > (I changed few configurations. see attached scripts)
> > > >
> > > > 1. Normal VACUUM case
> > > >   - SET autovacuum = off
> > > >   - CREATE tables with 100 rows
> > > >   - DELETE 90 rows for each tables
> > > >   - SET autovacuum = on and restart PostgreSQL
> > > >   - Measure the time it takes for all tables to be VACUUMed
> > > >
> > > > 2. Anti wrap round VACUUM case
> > > >   - CREATE brank tables
> > > >   - SELECT all of these tables (for generate stats)
> > > >   - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
> > > >   - Consumes a lot of XIDs by using txid_curent()
> > > >   - Measure the time it takes for all tables to be VACUUMed
> > > >
> > > > For each test case, the following results were obtained by changing
> > > > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> > > > Also changing num of tables to 1000, 5000, 1 and 2.
> > > >
> > > > Due to the poor VM environment (2 VCPU/4 GB), the results are a little 
> > > > unstable,
> > > > but I think it's enough to ask for a trend.
> > > >
> > > > ===
> > > > [1.Normal VACUUM case]
> > > >  tables:1000
> > > >   autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
> > > >   autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
> > > >   autovacuum_max_workers 3:   (HEAD) 18 sec VS (with patch)  16 sec
> > > >   autovacuum_max_workers 5:   (HEAD) 19 sec VS (with patch)  17 sec
> > > >   autovacuum_max_workers 10:  (HEAD) 19 sec VS (with patch)  17 sec
> > > >
> > > >  tables:5000
> > > >   autovacuum_max_workers 1:   (HEAD) 77 sec VS (with patch)  78 sec
> > > >   autovacuum_max_workers 2:   (HEAD) 61 sec VS (with patch)  43 sec
> > > >   autovacuum_max_workers 3:   (HEAD) 38 sec VS (with patch)  38 sec
> > > >   autovacuum_max_workers 5:   (HEAD) 45 sec VS (wi

Re: autovac issue with large number of tables

2020-11-24 Thread Kasahara Tatsuhito
Hi,

On Wed, Nov 25, 2020 at 2:17 PM Masahiko Sawada  wrote:
>
> On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
>  wrote:
> >
> > Hi,
> >
> > On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
> >  wrote:
> > > > I wonder if we could have table_recheck_autovac do two probes of the 
> > > > stats
> > > > data.  First probe the existing stats data, and if it shows the table to
> > > > be already vacuumed, return immediately.  If not, *then* force a stats
> > > > re-read, and check a second time.
> > > Does the above mean that the second and subsequent table_recheck_autovac()
> > > will be improved to first check using the previous refreshed statistics?
> > > I think that certainly works.
> > >
> > > If that's correct, I'll try to create a patch for the PoC
> >
> > I still don't know how to reproduce Jim's troubles, but I was able to 
> > reproduce
> > what was probably a very similar problem.
> >
> > This problem seems to be more likely to occur in cases where you have
> > a large number of tables,
> > i.e., a large amount of stats, and many small tables need VACUUM at
> > the same time.
> >
> > So I followed Tom's advice and created a patch for the PoC.
> > This patch will enable a flag in the table_recheck_autovac function to use
> > the existing stats next time if VACUUM (or ANALYZE) has already been done
> > by another worker on the check after the stats have been updated.
> > If the tables continue to require VACUUM after the refresh, then a refresh
> > will be required instead of using the existing statistics.
> >
> > I did simple test with HEAD and HEAD + this PoC patch.
> > The tests were conducted in two cases.
> > (I changed few configurations. see attached scripts)
> >
> > 1. Normal VACUUM case
> >   - SET autovacuum = off
> >   - CREATE tables with 100 rows
> >   - DELETE 90 rows for each tables
> >   - SET autovacuum = on and restart PostgreSQL
> >   - Measure the time it takes for all tables to be VACUUMed
> >
> > 2. Anti wrap round VACUUM case
> >   - CREATE brank tables
> >   - SELECT all of these tables (for generate stats)
> >   - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
> >   - Consumes a lot of XIDs by using txid_curent()
> >   - Measure the time it takes for all tables to be VACUUMed
> >
> > For each test case, the following results were obtained by changing
> > autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> > Also changing num of tables to 1000, 5000, 1 and 2.
> >
> > Due to the poor VM environment (2 VCPU/4 GB), the results are a little 
> > unstable,
> > but I think it's enough to ask for a trend.
> >
> > ===
> > [1.Normal VACUUM case]
> >  tables:1000
> >   autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
> >   autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
> >   autovacuum_max_workers 3:   (HEAD) 18 sec VS (with patch)  16 sec
> >   autovacuum_max_workers 5:   (HEAD) 19 sec VS (with patch)  17 sec
> >   autovacuum_max_workers 10:  (HEAD) 19 sec VS (with patch)  17 sec
> >
> >  tables:5000
> >   autovacuum_max_workers 1:   (HEAD) 77 sec VS (with patch)  78 sec
> >   autovacuum_max_workers 2:   (HEAD) 61 sec VS (with patch)  43 sec
> >   autovacuum_max_workers 3:   (HEAD) 38 sec VS (with patch)  38 sec
> >   autovacuum_max_workers 5:   (HEAD) 45 sec VS (with patch)  37 sec
> >   autovacuum_max_workers 10:  (HEAD) 43 sec VS (with patch)  35 sec
> >
> >  tables:1
> >   autovacuum_max_workers 1:   (HEAD) 152 sec VS (with patch)  153 sec
> >   autovacuum_max_workers 2:   (HEAD) 119 sec VS (with patch)   98 sec
> >   autovacuum_max_workers 3:   (HEAD)  87 sec VS (with patch)   78 sec
> >   autovacuum_max_workers 5:   (HEAD) 100 sec VS (with patch)   66 sec
> >   autovacuum_max_workers 10:  (HEAD)  97 sec VS (with patch)   56 sec
> >
> >  tables:2
> >   autovacuum_max_workers 1:   (HEAD) 338 sec VS (with patch)  339 sec
> >   autovacuum_max_workers 2:   (HEAD) 231 sec VS (with patch)  229 sec
> >   autovacuum_max_workers 3:   (HEAD) 220 sec VS (with patch)  191 sec
> >   autovacuum_max_workers 5:   (HEAD) 234 sec VS (with patch)  147 sec
> >   autovacuum_max_workers 10:  (HEAD) 320 sec VS (with patch)  113 sec
> >
> > [2.Anti wrap round VACUUM case]
> >  tables:1000
> >   autovacuum_max_workers 1:   (HEAD) 19 sec VS (with pa

Re: Add a description to the documentation that toast_tuple_target affects "Main"

2020-10-12 Thread Kasahara Tatsuhito
Hi,

On Fri, Oct 9, 2020 at 5:44 PM Shinya Okano  wrote:
>
> Hi,
>
> Regarding the toast_tuple_target parameter of CREATE TABLE, the
> documentation says that it only affects External or Extended, but it
> actually affects the compression of Main as well.
>
> The attached patch modifies the document to match the actual behavior.
+1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Get memory contexts of an arbitrary backend process

2020-10-01 Thread Kasahara Tatsuhito
Hi,

On Fri, Sep 25, 2020 at 4:28 PM torikoshia  wrote:
> Thanks for all your comments, I updated the patch.
Thanks for updating the patch.
I did a brief test and code review.

> I added a shared hash table consisted of minimal members
> mainly for managing whether the file is dumped or not.
> Some members like 'loc' seem useful in the future, but I
> haven't added them since it's not essential at this point.
Yes, that would be good.

+/*
+ * Since we allow only one session can request to dump
memory context at
+ * the same time, check whether the dump files already exist.
+ */
+while (stat(dumpfile, _tmp) == 0 || stat(tmpfile, _tmp) == 0)
+{
+pg_usleep(100L);
+}

If pg_get_backend_memory_contexts() is executed by two or more
sessions at the same time, it cannot be run exclusively in this way.
Currently it seems to cause a crash when do it so.
This is easy to reproduce and can be done as follows.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
[Session-3]
select * FROM pg_get_backend_memory_contexts();
  [Session-4]
  select * FROM pg_get_backend_memory_contexts();

If you issue commit or abort at session-1, you will get SEGV.

Instead of checking for the existence of the file, it might be better
to use a hash (mcxtdumpHash) entry with LWLock.

+if (proc == NULL)
+{
+ereport(WARNING,
+(errmsg("PID %d is not a PostgreSQL server
process", dst_pid)));
+return (Datum) 1;
+}

Shouldn't it clear the hash entry before return?

+/* Wait until target process finished dumping file. */
+while (!entry->is_dumped)
+{
+CHECK_FOR_INTERRUPTS();
+pg_usleep(1L);
+}

If the target is killed and exit before dumping the memory
information, you're in an infinite loop here.
So how about making sure that any process that has to stop before
doing a memory dump changes the status of the hash (entry->is_dumped)
before stopping and the caller detects it?
I'm not sure it's best or not, but you might want to use something
like the on_shmem_exit callback.

In the current design, if the caller stops processing before reading
the dumped file, you will have an orphaned file.
It looks like the following.

[session-1]
BEGIN;
LOCK TABKE t1;
  [Session-2]
  BEGIN;
  LOCK TABLE t1; <- waiting
[Session-3]
select * FROM pg_get_backend_memory_contexts();

If you cancel or terminate the session-3, then issue commit or abort
at session-1, you will get orphaned files in pg_memusage.

So if you allow only one session can request to dump file, it could
call pg_memusage_reset() before send the signal in this function.

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Get memory contexts of an arbitrary backend process

2020-09-10 Thread Kasahara Tatsuhito
Hi,

On Thu, Sep 10, 2020 at 8:53 PM torikoshia  wrote:
>
> On 2020-09-04 21:46, Tomas Vondra wrote:
> > On Fri, Sep 04, 2020 at 11:47:30AM +0900, Kasahara Tatsuhito wrote:
> >> On Fri, Sep 4, 2020 at 2:40 AM Tom Lane  wrote:
> >>> Kasahara Tatsuhito  writes:
> >>> > Yes, but it's not only for future expansion, but also for the
> >>> > usability and the stability of this feature.
> >>> > For example, if you want to read one dumped file multiple times and 
> >>> > analyze it,
> >>> > you will want the ability to just read the dump.
> >>>
> >>> If we design it to make that possible, how are we going to prevent
> >>> disk
> >>> space leaks from never-cleaned-up dump files?
> >> In my thought, with features such as a view that allows us to see a
> >> list of dumped files,
> >> it would be better to have a function that simply deletes the dump
> >> files associated with a specific PID,
> >> or to delete all dump files.
> >> Some files may be dumped with unexpected delays, so I think the
> >> cleaning feature will be necessary.
> >> ( Also, as the pgsql_tmp file, it might better to delete dump files
> >> when PostgreSQL start.)
> >>
> >> Or should we try to delete the dump file as soon as we can read it?
> >>
> >
> > IMO making the cleanup a responsibility of the users (e.g. by exposing
> > the list of dumped files through a view and expecting users to delete
> > them in some way) is rather fragile.
> >
> > I don't quite see what's the point of designing it this way. It was
> > suggested this improves stability and usability of this feature, but
> > surely making it unnecessarily complex contradicts both points?
> >
> > IMHO if the user needs to process the dump repeatedly, what's
> > preventing
> > him/her from storing it in a file, or something like that? At that
> > point
> > it's clear it's up to them to remove the file. So I suggest to keep the
> > feature as simple as possible - hand the dump over and delete.
Yeah, it might be better  to avoid making the user responsible for removal.

I think it's fine to have an interface to delete in an emergency, but
I agree that
users shouldn't be made aware of the existence or deletion of dump
files, basically.

> +1.
> If there are no other objections, I'm going to accept this
> suggestion.
So +1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: autovac issue with large number of tables

2020-09-10 Thread Kasahara Tatsuhito
Therefore, we expect this patch [1] to be committed for its original
purpose, as well as to improve autovacuum from v14 onwards.Hi,

On Fri, Sep 4, 2020 at 7:50 PM Kasahara Tatsuhito
 wrote:
>
> Hi,
>
> On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
>  wrote:
> > > I wonder if we could have table_recheck_autovac do two probes of the stats
> > > data.  First probe the existing stats data, and if it shows the table to
> > > be already vacuumed, return immediately.  If not, *then* force a stats
> > > re-read, and check a second time.
> > Does the above mean that the second and subsequent table_recheck_autovac()
> > will be improved to first check using the previous refreshed statistics?
> > I think that certainly works.
> >
> > If that's correct, I'll try to create a patch for the PoC
>
> I still don't know how to reproduce Jim's troubles, but I was able to 
> reproduce
> what was probably a very similar problem.
>
> This problem seems to be more likely to occur in cases where you have
> a large number of tables,
> i.e., a large amount of stats, and many small tables need VACUUM at
> the same time.
>
> So I followed Tom's advice and created a patch for the PoC.
> This patch will enable a flag in the table_recheck_autovac function to use
> the existing stats next time if VACUUM (or ANALYZE) has already been done
> by another worker on the check after the stats have been updated.
> If the tables continue to require VACUUM after the refresh, then a refresh
> will be required instead of using the existing statistics.
>
> I did simple test with HEAD and HEAD + this PoC patch.
> The tests were conducted in two cases.
> (I changed few configurations. see attached scripts)
>
> 1. Normal VACUUM case
>   - SET autovacuum = off
>   - CREATE tables with 100 rows
>   - DELETE 90 rows for each tables
>   - SET autovacuum = on and restart PostgreSQL
>   - Measure the time it takes for all tables to be VACUUMed
>
> 2. Anti wrap round VACUUM case
>   - CREATE brank tables
>   - SELECT all of these tables (for generate stats)
>   - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
>   - Consumes a lot of XIDs by using txid_curent()
>   - Measure the time it takes for all tables to be VACUUMed
>
> For each test case, the following results were obtained by changing
> autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
> Also changing num of tables to 1000, 5000, 1 and 2.
>
> Due to the poor VM environment (2 VCPU/4 GB), the results are a little 
> unstable,
> but I think it's enough to ask for a trend.
>
> ===
> [1.Normal VACUUM case]
>  tables:1000
>   autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
>   autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
>   autovacuum_max_workers 3:   (HEAD) 18 sec VS (with patch)  16 sec
>   autovacuum_max_workers 5:   (HEAD) 19 sec VS (with patch)  17 sec
>   autovacuum_max_workers 10:  (HEAD) 19 sec VS (with patch)  17 sec
>
>  tables:5000
>   autovacuum_max_workers 1:   (HEAD) 77 sec VS (with patch)  78 sec
>   autovacuum_max_workers 2:   (HEAD) 61 sec VS (with patch)  43 sec
>   autovacuum_max_workers 3:   (HEAD) 38 sec VS (with patch)  38 sec
>   autovacuum_max_workers 5:   (HEAD) 45 sec VS (with patch)  37 sec
>   autovacuum_max_workers 10:  (HEAD) 43 sec VS (with patch)  35 sec
>
>  tables:1
>   autovacuum_max_workers 1:   (HEAD) 152 sec VS (with patch)  153 sec
>   autovacuum_max_workers 2:   (HEAD) 119 sec VS (with patch)   98 sec
>   autovacuum_max_workers 3:   (HEAD)  87 sec VS (with patch)   78 sec
>   autovacuum_max_workers 5:   (HEAD) 100 sec VS (with patch)   66 sec
>   autovacuum_max_workers 10:  (HEAD)  97 sec VS (with patch)   56 sec
>
>  tables:2
>   autovacuum_max_workers 1:   (HEAD) 338 sec VS (with patch)  339 sec
>   autovacuum_max_workers 2:   (HEAD) 231 sec VS (with patch)  229 sec
>   autovacuum_max_workers 3:   (HEAD) 220 sec VS (with patch)  191 sec
>   autovacuum_max_workers 5:   (HEAD) 234 sec VS (with patch)  147 sec
>   autovacuum_max_workers 10:  (HEAD) 320 sec VS (with patch)  113 sec
>
> [2.Anti wrap round VACUUM case]
>  tables:1000
>   autovacuum_max_workers 1:   (HEAD) 19 sec VS (with patch) 18 sec
>   autovacuum_max_workers 2:   (HEAD) 14 sec VS (with patch) 15 sec
>   autovacuum_max_workers 3:   (HEAD) 14 sec VS (with patch) 14 sec
>   autovacuum_max_workers 5:   (HEAD) 14 sec VS (with patch) 16 sec
>   autovacuum_max_workers 10:  (HEAD) 16 sec VS (with patch) 14 sec
>
>  tables:5000
>   autovacuum_max_workers 1:   (HEAD) 69 sec VS (with patch) 69 sec
>   autovacuum_max_workers 

Re: autovac issue with large number of tables

2020-09-04 Thread Kasahara Tatsuhito
Hi,

On Wed, Sep 2, 2020 at 2:10 AM Kasahara Tatsuhito
 wrote:
> > I wonder if we could have table_recheck_autovac do two probes of the stats
> > data.  First probe the existing stats data, and if it shows the table to
> > be already vacuumed, return immediately.  If not, *then* force a stats
> > re-read, and check a second time.
> Does the above mean that the second and subsequent table_recheck_autovac()
> will be improved to first check using the previous refreshed statistics?
> I think that certainly works.
>
> If that's correct, I'll try to create a patch for the PoC

I still don't know how to reproduce Jim's troubles, but I was able to reproduce
what was probably a very similar problem.

This problem seems to be more likely to occur in cases where you have
a large number of tables,
i.e., a large amount of stats, and many small tables need VACUUM at
the same time.

So I followed Tom's advice and created a patch for the PoC.
This patch will enable a flag in the table_recheck_autovac function to use
the existing stats next time if VACUUM (or ANALYZE) has already been done
by another worker on the check after the stats have been updated.
If the tables continue to require VACUUM after the refresh, then a refresh
will be required instead of using the existing statistics.

I did simple test with HEAD and HEAD + this PoC patch.
The tests were conducted in two cases.
(I changed few configurations. see attached scripts)

1. Normal VACUUM case
  - SET autovacuum = off
  - CREATE tables with 100 rows
  - DELETE 90 rows for each tables
  - SET autovacuum = on and restart PostgreSQL
  - Measure the time it takes for all tables to be VACUUMed

2. Anti wrap round VACUUM case
  - CREATE brank tables
  - SELECT all of these tables (for generate stats)
  - SET autovacuum_freeze_max_age to low values and restart PostgreSQL
  - Consumes a lot of XIDs by using txid_curent()
  - Measure the time it takes for all tables to be VACUUMed

For each test case, the following results were obtained by changing
autovacuum_max_workers parameters to 1, 2, 3(def) 5 and 10.
Also changing num of tables to 1000, 5000, 1 and 2.

Due to the poor VM environment (2 VCPU/4 GB), the results are a little unstable,
but I think it's enough to ask for a trend.

===
[1.Normal VACUUM case]
 tables:1000
  autovacuum_max_workers 1:   (HEAD) 20 sec VS (with patch)  20 sec
  autovacuum_max_workers 2:   (HEAD) 18 sec VS (with patch)  16 sec
  autovacuum_max_workers 3:   (HEAD) 18 sec VS (with patch)  16 sec
  autovacuum_max_workers 5:   (HEAD) 19 sec VS (with patch)  17 sec
  autovacuum_max_workers 10:  (HEAD) 19 sec VS (with patch)  17 sec

 tables:5000
  autovacuum_max_workers 1:   (HEAD) 77 sec VS (with patch)  78 sec
  autovacuum_max_workers 2:   (HEAD) 61 sec VS (with patch)  43 sec
  autovacuum_max_workers 3:   (HEAD) 38 sec VS (with patch)  38 sec
  autovacuum_max_workers 5:   (HEAD) 45 sec VS (with patch)  37 sec
  autovacuum_max_workers 10:  (HEAD) 43 sec VS (with patch)  35 sec

 tables:1
  autovacuum_max_workers 1:   (HEAD) 152 sec VS (with patch)  153 sec
  autovacuum_max_workers 2:   (HEAD) 119 sec VS (with patch)   98 sec
  autovacuum_max_workers 3:   (HEAD)  87 sec VS (with patch)   78 sec
  autovacuum_max_workers 5:   (HEAD) 100 sec VS (with patch)   66 sec
  autovacuum_max_workers 10:  (HEAD)  97 sec VS (with patch)   56 sec

 tables:2
  autovacuum_max_workers 1:   (HEAD) 338 sec VS (with patch)  339 sec
  autovacuum_max_workers 2:   (HEAD) 231 sec VS (with patch)  229 sec
  autovacuum_max_workers 3:   (HEAD) 220 sec VS (with patch)  191 sec
  autovacuum_max_workers 5:   (HEAD) 234 sec VS (with patch)  147 sec
  autovacuum_max_workers 10:  (HEAD) 320 sec VS (with patch)  113 sec

[2.Anti wrap round VACUUM case]
 tables:1000
  autovacuum_max_workers 1:   (HEAD) 19 sec VS (with patch) 18 sec
  autovacuum_max_workers 2:   (HEAD) 14 sec VS (with patch) 15 sec
  autovacuum_max_workers 3:   (HEAD) 14 sec VS (with patch) 14 sec
  autovacuum_max_workers 5:   (HEAD) 14 sec VS (with patch) 16 sec
  autovacuum_max_workers 10:  (HEAD) 16 sec VS (with patch) 14 sec

 tables:5000
  autovacuum_max_workers 1:   (HEAD) 69 sec VS (with patch) 69 sec
  autovacuum_max_workers 2:   (HEAD) 66 sec VS (with patch) 47 sec
  autovacuum_max_workers 3:   (HEAD) 59 sec VS (with patch) 37 sec
  autovacuum_max_workers 5:   (HEAD) 39 sec VS (with patch) 28 sec
  autovacuum_max_workers 10:  (HEAD) 39 sec VS (with patch) 29 sec

 tables:1
  autovacuum_max_workers 1:   (HEAD) 139 sec VS (with patch) 138 sec
  autovacuum_max_workers 2:   (HEAD) 130 sec VS (with patch)  86 sec
  autovacuum_max_workers 3:   (HEAD) 120 sec VS (with patch)  68 sec
  autovacuum_max_workers 5:   (HEAD)  96 sec VS (with patch)  41 sec
  autovacuum_max_workers 10:  (HEAD)  90 sec VS (with patch)  39 sec

 tables:2
  autovacuum_max_workers 1:  

Re: Get memory contexts of an arbitrary backend process

2020-09-03 Thread Kasahara Tatsuhito
On Fri, Sep 4, 2020 at 2:40 AM Tom Lane  wrote:
> Kasahara Tatsuhito  writes:
> > Yes, but it's not only for future expansion, but also for the
> > usability and the stability of this feature.
> > For example, if you want to read one dumped file multiple times and analyze 
> > it,
> > you will want the ability to just read the dump.
>
> If we design it to make that possible, how are we going to prevent disk
> space leaks from never-cleaned-up dump files?
In my thought, with features such as a view that allows us to see a
list of dumped files,
it would be better to have a function that simply deletes the dump
files associated with a specific PID,
or to delete all dump files.
Some files may be dumped with unexpected delays, so I think the
cleaning feature will be necessary.
( Also, as the pgsql_tmp file, it might better to delete dump files
when PostgreSQL start.)

Or should we try to delete the dump file as soon as we can read it?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Get memory contexts of an arbitrary backend process

2020-09-03 Thread Kasahara Tatsuhito
Hi,

On Thu, Sep 3, 2020 at 3:38 PM torikoshia  wrote:
> >> - Currently, "the signal transmission for dumping memory
> >> information"
> >> and "the read & output of dump information "
> >> are on the same interface, but I think it would be better to
> >> separate them.
> >> How about providing the following three types of functions for
> >> users?
> >> - send a signal to specified pid
> >> - check the status of the signal sent and received
> >> - read the dumped information
>
> Is this for future extensibility to make it possible to get
> other information like the current execution plan which was
> suggested by Pavel?
Yes, but it's not only for future expansion, but also for the
usability and the stability of this feature.
For example, if you want to read one dumped file multiple times and analyze it,
you will want the ability to just read the dump.
Moreover, when it takes a long time from the receive the signal to the
dump output,
or the dump output itself takes a long time, users can investigate
where it takes time
if each process is separated.

> If so, I agree with considering extensibility, but I'm not
> sure it's necessary whether providing these types of
> functions for 'users'.
Of course, it is possible and may be necessary to provide a wrapped
sequence of processes
from sending a signal to reading  dump files.
But IMO, some users would like to perform the signal transmission,
state management and
dump file reading processes separately.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: autovac issue with large number of tables

2020-09-01 Thread Kasahara Tatsuhito
Hi,

On Wed, Aug 12, 2020 at 2:46 AM Tom Lane  wrote:
> So I think Kasahara-san's point is that the shared memory stats collector
> might wipe out those costs, depending on how it's implemented.  (I've not
> looked at that patch in a long time either, so I don't know how much it'd
> cut the reader-side costs.  But maybe it'd be substantial.)
Thanks for your clarification, that's what I wanted to say.
Sorry for the lack of explanation.

> I think the real issue here is autovac_refresh_stats's insistence that it
> shouldn't throttle pgstats re-reads in workers.
I agree that.

> I wonder if we could have table_recheck_autovac do two probes of the stats
> data.  First probe the existing stats data, and if it shows the table to
> be already vacuumed, return immediately.  If not, *then* force a stats
> re-read, and check a second time.
Does the above mean that the second and subsequent table_recheck_autovac()
will be improved to first check using the previous refreshed statistics?
I think that certainly works.

If that's correct, I'll try to create a patch for the PoC.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Get memory contexts of an arbitrary backend process

2020-08-31 Thread Kasahara Tatsuhito
Hi,

On Mon, Aug 31, 2020 at 8:22 PM torikoshia  wrote:
> As discussed in the thread[1], it'll be useful to make it
> possible to get the memory contexts of an arbitrary backend
> process.
+1

> Attached PoC patch makes pg_get_backend_memory_contexts()
> display meory contexts of the specified PID of the process.
Thanks, it's a very good patch for discussion.

> It doesn't display contexts of all the backends but only
> the contexts of specified process.
or we can  "SELECT (pg_get_backend_memory_contexts(pid)).* FROM
pg_stat_activity WHERE ...",
so I don't think it's a big deal.

> The rough idea of implementation is like below:
>
>1. send  a signal to the specified process
>2. signaled process dumps its memory contexts to a file
>3. read the dumped file and display it to the user
I agree with the overview of the idea.
Here are some comments and questions.

- Currently, "the signal transmission for dumping memory information"
and "the read & output of dump information "
  are on the same interface, but I think it would be better to separate them.
  How about providing the following three types of functions for users?
  - send a signal to specified pid
  - check the status of the signal sent and received
  - read the dumped information
- How about managing the status of signal send/receive and dump
operations on a shared hash or others ?
  Sending and receiving signals, dumping memory information, and
referencing dump information all work asynchronously.
  Therefore, it would be good to have management information to check
the status of each process.
  A simple idea is that ..
   - send a signal to dump to a PID, it first record following
information into the shared hash.
 pid (specified pid)
 loc (dump location, currently might be ASAP)
 recv (did the pid process receive a signal? first false)
 dumped (did the pid process dump a mem information?  first false)
   - specified process receive the signal, update the status in the
shared hash, then dumped at specified location.
   - specified process finish dump mem information,  update the status
in the shared hash.
- Does it allow one process to output multiple dump files?
  It appears to be a specification to overwrite at present, but I
thought it would be good to be able to generate
  multiple dump files in different phases (e.g., planning phase and
execution phase) in the future.
- How is the dump file cleaned up?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-08-19 Thread Kasahara Tatsuhito
On 2020/08/20 0:01, Tom Lane wrote:
> The only situation I could imagine where this would have any use is
> where there is long-term (cross-query) bloat in, say, CacheMemoryContext
Yeah, in cases where a very large number of sessions are connected to
the DB for very long periods of time, the memory consumption of the
back-end processes may increase slowly, and such a feature is useful
for analysis.

And ,as Fujii said, this feature very useful to see which contexts are
consuming a lot of memory and to narrow down the causes.

On Thu, Aug 20, 2020 at 11:18 AM Fujii Masao
 wrote:
> On 2020/08/20 10:43, Andres Freund wrote:
> > Hi,
> > Even just being able to see the memory usage in a queryable way is a
> > huge benefit.
>
> +1
+1

I think this feature is very useful in environments where gdb is not
available or access to server logs is limited.
And it is cumbersome to extract and analyze the memory information
from the very large server logs.


> > I totally agree that it's not *enough*, but in contrast to you I think
> > it's a good step. Subsequently we should add a way to get any backends
> > memory usage.
> > It's not too hard to imagine how to serialize it in a way that can be
> > easily deserialized by another backend. I am imagining something like
> > sending a procsignal that triggers (probably at CFR() time) a backend to
> > write its own memory usage into pg_memusage/ or something roughly
> > like that.
>
> Sounds good. Maybe we can also provide the SQL-callable function
> or view to read pg_memusage/, to make the analysis easier.
+1

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-08-07 Thread Kasahara Tatsuhito
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

I tested the latest 
patch(0007-Adding-a-function-exposing-memory-usage-of-local-backend.patch) 
with the latest PG-version (199cec9779504c08aaa8159c6308283156547409) and test 
was passed.
It looks good to me.

The new status of this patch is: Ready for Committer


Re: autovac issue with large number of tables

2020-07-31 Thread Kasahara Tatsuhito
Hi,

On Tue, Jul 28, 2020 at 3:49 AM Jim Nasby  wrote:
> I'm in favor of trying to improve scheduling (especially allowing users
> to control how things are scheduled), but that's a far more invasive
> patch. I'd like to get something like this patch in without waiting on a
> significantly larger effort.

BTW, Have you tried the patch suggested in the thread below?

https://www.postgresql.org/message-id/20180629.173418.190173462.horiguchi.kyotaro%40lab.ntt.co.jp

The above is a suggestion to manage statistics on shared memory rather
than in a file, but I think this feature may mitigate your problem.
I think that feature has yet another performance challenge, but it
might be worth a try.
The above patch will also require a great deal of effort to get into
the PostgreSQL-core, but I'm curious to see how well it works for this
problem.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-07-30 Thread Kasahara Tatsuhito
Hi,

On Fri, Jul 10, 2020 at 5:32 PM torikoshia  wrote:
> - whether information for identifying parent-child relation is necessary
> or it's an overkill
I think it's important to understand the parent-child relationship of
the context.
Personally, I often want to know the following two things ..

- In which life cycle is the target context? (Remaining as long as the
process is living? per query?)
- Does the target context belong to the correct (parent) context?

> - if this information is necessary, memory address is suitable or other
> means like assigning unique numbers are required
IMO, If each context can be uniquely identified (or easily guessed) by
"name" and "ident",
then I don't think the address information is necessary.
Instead, I like the way that directly shows the context name of the
parent, as in the 0005 patch.

Best regards

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

2020-07-10 Thread Kasahara Tatsuhito
Hi,

On Wed, Jul 8, 2020 at 9:40 PM Bharath Rupireddy
 wrote:
> One way, we could solve the above problem is that, upon firing the new
> foreign query from local backend using the cached connection,
> (assuming the remote backend that was cached in the local backed got
> killed for some reasons), instead of failing the query in the local
> backend, upon detecting error from the remote backend, we could just
> delete the cached old entry and try getting another connection to
> remote backend, cache it and proceed to submit the query. This has to
> happen only at the beginning of remote xact.
+1.

I think this is a very useful feature.
In an environment with connection pooling for local, if a remote
server has a failover or switchover,
this feature would prevent unexpected errors of local queries after
recovery of the remote server.

I haven't looked at the code in detail yet, some comments here.

1. To keep the previous behavior (and performance), how about allowing
the user to specify
   whether or not to retry as a GUC parameter or in the FOREIGN SERVER OPTION?

2. How about logging a LOG message when retry was success to let us know
   the retry feature worked or how often the retries worked ?

> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

Couldn't you confirm this by adding a test case like the following?
===
BEGIN;
-- Generate a connection to remote
SELECT * FROM ft1 LIMIT 1;

-- retrieve pid of postgres_fdw and kill it
-- could use the other unique identifier (not postgres_fdw but
fdw_retry_check, etc ) for application name
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend' AND application_name = 'postgres_fdw'

-- COMMIT, so next query will should success if connection-retry works
COMMIT;
SELECT * FROM ft1 LIMIT 1;
===

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-06-26 Thread Kasahara Tatsuhito
Hi,

On Fri, Jun 26, 2020 at 3:42 PM Bharath Rupireddy
 wrote:
> While going through the mail chain on relation, plan and catalogue
> caching [1], I'm thinking on the lines that is there a way to know the
> current relation, plan and catalogue cache sizes? If there is a way
> already,  please ignore this and it would be grateful if someone point
> me to that.
AFAIK the only way to get statistics on PostgreSQL's backend  internal
local memory usage is to use MemoryContextStats() via gdb to output
the information to the log, so far.

> If there is no such way to know the cache sizes and other info such as
> statistics, number of entries, cache misses, hits etc.  can the
> approach discussed here be applied?
I think it's partially yes.

> If the user knows the cache statistics and other information, may be
> we can allow user to take appropriate actions such as allowing him to
> delete few entries through a command or some other way.
Yeah, one of the purposes of the features we are discussing here is to
use them for such situation.

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Creating a function for exposing memory usage of backend process

2020-06-25 Thread Kasahara Tatsuhito
Hi !

On Thu, Jun 18, 2020 at 12:56 PM Fujii Masao
 wrote:
> Agreed. The feature to view how local memory contexts are used in
> each process is very useful!
+1

> >=# SELECT * FROM pg_stat_get_backend_memory_context(PID);
>
> I'm afraid that this interface is not convenient when we want to monitor
> the usages of local memory contexts for all the processes. For example,
> I'd like to monitor how much memory is totally used to store prepared
> statements information. For that purpose, I wonder if it's more convenient
> to provide the view displaying the memory context usages for
> all the processes.
How about separating a function that examines memory consumption
trends for all processes and a function that examines memory
consumption for a particular phase of a particular process?

For the former, as Fujii said, the function shows the output limited
information for each context type. All processes calculation and
output the information at idle status.

I think the latter is useful for debugging and other purposes.
For example, imagine preparing a function for registration like the following.
=# SELECT pg_stat_get_backend_memory_context_regist (pid, context,
max_children, calc_point)

pid: A target process
context: The top level of the context of interest
max_children: Maximum number of output for the target context's children
 (similar to MemoryContextStatsInternal()'s max_children)
calc_point: Single or multiple position(s) to calculate and output
context information
 (Existing hooks such as planner_hook, executor_start, etc.. could be used. )

This function informs the target PID to output the information of the
specified context at the specified calc_point.
When the target PID process reaches the calc_point, it calculates and
output the context information one time to a file or DSM.

(Currently PostgreSQL has no formal ways of externally modifying the
parameters of a particular process, so it may need to be
implemented...)

Sometimes I want to know the memory usage in the planning phase or
others with a query_string and/or plan_tree that before target process
move to the idle status.
So it would be nice to retrieve memory usage at some arbitrary point in time !

Regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: small improvement of the elapsed time for truncating heap in vacuum

2020-02-16 Thread Kasahara Tatsuhito
Hi,

On Mon, Feb 17, 2020 at 1:07 PM Masahiko Sawada
 wrote:
> For the patch, we can put also the declaration of ru0 into the loop.
Thanks for your reply.
Hmm, certainly that it may be better.

Fix the v2 patch and attached.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


reset_usage_v3.patch
Description: Binary data


Re: small improvement of the elapsed time for truncating heap in vacuum

2020-02-16 Thread Kasahara Tatsuhito
Hi,

On Fri, Feb 14, 2020 at 4:50 PM Fujii Masao  wrote:
> Regarding the patch, isn't it better to put pg_rusage_init() at the
> top of do loop block? If we do this, as a side-effect, we can get
> rid of pg_rusage_init() at the top of lazy_truncate_heap().
Thanks for your reply.
Yeah, it makes sense.

Attached patch moves pg_rusage_init() to the top of do-loop-block.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


reset_usage_v2.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-07 Thread Kasahara Tatsuhito
On Fri, Feb 7, 2020 at 10:09 PM Fujii Masao  wrote:
> Pushed! Thanks!
Thanks Fujii.


--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-07 Thread Kasahara Tatsuhito
Hi,

On Fri, Feb 7, 2020 at 5:02 PM Fujii Masao  wrote:
> BTW, commit 147e3722f7 causing the issue changed currtid_byreloid()
> and currtid_byrelname() so that they also call table_beginscan().
> I'm not sure what those functions are, but probably we should fix them
> so that table_beginscan_tid() is called instead. Thought?
+1, sorry, I overlooked it.

Both functions are used to check whether a valid tid or not with a
relation name (or oid),
and both perform a tid scan internally.
So, these functions should call table_beginscan_tid().

Perhaps unnecessary, I will attach a patch.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues_v6.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-06 Thread Kasahara Tatsuhito
Hi,

On Fri, Feb 7, 2020 at 1:27 PM Kyotaro Horiguchi
 wrote:
> It seems that nkeys and key are useless. Since every table_beginscan_*
> functions have distinct parameter sets, don't we remove them from
> table_beginscan_tid?
Yeah, actually, when calling table_beginscan_tid(), nkeys is set to 0
and * key is set to NULL,
so these are not used at the moment.

I removed unnecessary arguments from table_beginscan_tid().

Attache the v5 patch.


-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues_v5.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-06 Thread Kasahara Tatsuhito
Hi,

On Thu, Feb 6, 2020 at 11:01 PM Masahiko Sawada
 wrote:
>
> On Thu, 6 Feb 2020 at 19:12, Kasahara Tatsuhito
>  wrote:
> I've tested predicate locking including promotion cases with v3 patch
> and it works fine.
>
> +table_beginscan_tid(Relation rel, Snapshot snapshot,
> +   int nkeys, struct ScanKeyData *key)
> +{
> +   uint32  flags = SO_ALLOW_STRAT | SO_ALLOW_SYNC | SO_ALLOW_PAGEMODE;
>
> IIUC setting SO_ALLOW_STRAT, SO_ALLOW_SYNC and SO_ALLOW_PAGEMODE has
> no meaning during tid scan. I think v11 also should be the same.
Thanks for your check, and useful advise.
I was wondering if I should keep these flags, but I confirmed that I
can remove these from TidScan's flags.

> Why did you remove SO_TYPE_TIDSCAN from the previous version patch? I
> think it's better to add it and then we can set only SO_TYPE_TIDSCAN
> to the scan option of tid scan.
Because, currently SO_TYPE_TIDSCAN is not used anywhere.
So I thought it might be better to avoid adding a new flag.
However, as you said, this flag may be useful for the future tid scan
feature (like [1])

Attach v4 patch. I re-added SO_TYPE_TIDSCAN to ScanOptions and
table_beginscan_tid.
And I remove unnecessary SO_ALLOW_* flags.

Best regards,

[1]
https://www.postgresql.org/message-id/flat/CAKJS1f-%2BJJpm6B_NThUWzFv4007zAjObBXX1CBHE_bH9nOAvSw%40mail.gmail.com#1ae648acdc2df930b19218b6026135d3

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues_v4.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-06 Thread Kasahara Tatsuhito
HI,

On Thu, Feb 6, 2020 at 3:24 PM Fujii Masao  wrote:
> > I added a new (but minimal)  isolation test for the case of  tid scan.
> > (v12 and HEAD will be failed this test. v11 and HEAD with my patch
> > will be passed)
>
> Isn't this test scenario a bit overkill? We can simply test that
> as follows, instead.
> CREATE TABLE test_tidscan AS SELECT 1 AS id;
> BEGIN ISOLATION LEVEL SERIALIZABLE;
> SELECT * FROM test_tidscan WHERE ctid = '(0,1)';
> SELECT locktype, mode FROM pg_locks WHERE pid = pg_backend_pid() AND mode = 
> 'SIReadLock';
> COMMIT;
>
> In the expected file, the result of query looking at pg_locks
> should be matched with the following.
>
>   locktype |mode
> --+
>   tuple| SIReadLock
Thanks for your reply.
Hmm, it's an simple and  might be the better way than adding isolation test.

I added above test case to regress/sql/tidscan.sql.
Attach the patch.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues_v3.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-05 Thread Kasahara Tatsuhito
Hi,

On Thu, Feb 6, 2020 at 11:48 AM Andres Freund  wrote:
> I think it'd be good if we could guard against b) via an isolation
> test. It's more painful to do that for a), due to the unreliability of
> stats at the moment (we have some tests, but they take a long time).
Thanks for your advise, and agreed.

I added a new (but minimal)  isolation test for the case of  tid scan.
(v12 and HEAD will be failed this test. v11 and HEAD with my patch
will be passed)

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues_v2.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-04 Thread Kasahara Tatsuhito
On Mon, Feb 3, 2020 at 6:20 PM Kasahara Tatsuhito
 wrote:
> Therefore, from v12, Tid scan not only increases the value of
> seq_scan, but also acquires a predicate lock.

Based on further investigation and Fujii's advice, I've summarized
this issue as follows.

>From commit 147e3722f7, Tid Scan came to
(A) increments num of seq_scan on pg_stat_*_tables
and
(B) take a predicate lock on the entire relation.

(A) may be confusing to users, so I think it is better to fix it.
For (B), an unexpected serialization error has occurred as follows, so
I think it should be fix.

=
[Preparation]
CREATE TABLE tid_test (c1 int, c2 int);
INSERT INTO  tid_test SELECT generate_series(1,1000), 0;


[Session-1:]
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
  [Session-2:]
  BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE ;
[Session-1:]
SELECT * FROM tid_test WHERE ctid = '(0,1)';
  [Session-2:]
  SELECT * FROM tid_test WHERE ctid = '(1,1)';
[Session-1:]
INSERT INTO tid_test SELECT 1001, 10;
  [Session-2:]
  INSERT INTO tid_test SELECT 1001, 10;
[Session-1:]
COMMIT;
  [Session-2:]
  COMMIT;

Result:
(-v11): Both session could commit.
(v12-): Session-2 raised error as following because of taking a
predicate lock on the entire table...

ERROR:  could not serialize access due to read/write dependencies
among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during
commit attempt.
HINT:  The transaction might succeed if retried.

=

Attached patch fix both (A) and (B), so that the behavior of Tid Scan
back to the same as before v11.
(As a result, this patch is the same as the one that first attached.)

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_issues.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-03 Thread Kasahara Tatsuhito
On Mon, Feb 3, 2020 at 5:33 PM Fujii Masao  wrote:
> Thanks for explaining that! But heap_beginscan_internal() was really
> called in TidScan case?
Oh, you are right.
Tid Scan started calling table_beginscan from v12 (commit 147e3722f7).
So previously(-v11), Tid Scan might never calls heap_beginscan_internal().

Therefore, from v12, Tid scan not only increases the value of
seq_scan, but also acquires a predicate lock.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-02-02 Thread Kasahara Tatsuhito
On Mon, Feb 3, 2020 at 4:22 PM Fujii Masao  wrote:
> On 2020/02/01 16:05, Kasahara Tatsuhito wrote:
> >  if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN))
> >  {
> >  /*
> >   * Ensure a missing snapshot is noticed reliably, even if the
> >   * isolation mode means predicate locking isn't performed (and
> >   * therefore the snapshot isn't used here).
> >   */
> >  Assert(snapshot);
> >  PredicateLockRelation(relation, snapshot);
> >  }
> >
> > Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID 
> > scan.
> > To keep the old behavior, I think it would be better to add a new
> > SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation.
>
> But in the old behavior, PredicateLockRelation() was not called in TidScan 
> case
> because its flag was not SO_TYPE_SEQSCAN. No?
No. Tid scan called PredicateLockRelation() both previous and current.

In the current (v12 and HEAD), Tid scan has SO_TYPE_SEQSCAN flag so
that PredicateLockRelation()is called in Tid scan.
In the Previous (- v11), in heap_beginscan_internal(), checks
is_bitmapscan flags.
If is_bitmapscan is set to false, calls PredicateLockRelation().

(- v11)
if (!is_bitmapscan)
PredicateLockRelation(relation, snapshot);

And in the Tid scan,  is_bitmapscan is set to false, so that
PredicateLockRelation()is called in Tid scan.

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-01-31 Thread Kasahara Tatsuhito
On Thu, Jan 30, 2020 at 1:49 PM Kyotaro Horiguchi
 wrote:
> At Thu, 30 Jan 2020 13:30:56 +0900, Kasahara Tatsuhito 
>  wrote in
> > > TID scan   : yes , seq_scan, , 
> > Here is wrong, because TID scan came to have SO_TYPE_SEQSCAN flags
> > from commit 147e3722f7.
>
> It is reflectings the discussion below, which means TID scan doesn't
> have corresponding SO_TYPE_ value. Currently it is setting
> SO_TYPE_SEQSCAN by accedent.
Ah, sorry I misunderstood..

Upon further investigation, the SO_TYPE_SEQSCAN flag was also used at
heap_beginscan() to determine whether a predicate lock was taken on
the entire relation.

if (scan->rs_base.rs_flags & (SO_TYPE_SEQSCAN | SO_TYPE_SAMPLESCAN))
{
/*
 * Ensure a missing snapshot is noticed reliably, even if the
 * isolation mode means predicate locking isn't performed (and
 * therefore the snapshot isn't used here).
 */
Assert(snapshot);
PredicateLockRelation(relation, snapshot);
}

Therefore, it can not simply remove the SO_TYPE_SEQSCAN flag from a TID scan.
To keep the old behavior, I think it would be better to add a new
SO_TYPE_TIDSCAN flag and take a predicate lock on the entire relation.

Attach the v2 patch which change the status to following. (same as
-v11 but have new SO_TYPE_TIDSCAN flag)

 increments
scan typetable_beginscan?, per scan, per tuple , SO_TYPE flags
=
TID scan   : yes , , , SO_TYPE_TIDSCAN

Is it acceptable change for HEAD and v12?

Best regards,

-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_increments_seqscan_num_v2.patch
Description: Binary data


Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-01-29 Thread Kasahara Tatsuhito
Hi,

On Thu, Jan 30, 2020 at 10:55 AM Kyotaro Horiguchi
 wrote:
> At Wed, 29 Jan 2020 23:24:09 +0900, Fujii Masao  
> wrote in
> > On 2020/01/29 20:06, Kasahara Tatsuhito wrote:
> > > Although I'm not sure this behavior is really problem or not,
> > > it seems to me that previous behavior is more prefer.
> > > Is it worth to apply to HEAD and v12 branch ?
> >
> > I've not read the patch yet, but I agree that updating only seq_scan
> > but not seq_tup_read in Tid Scan sounds strange. IMO at least
> > both should be update together or neither should be updated.
>
> Basically agreed, but sample scan doesn't increment seq_scan but
> increments seq_tup_read.
Yeah, sample scan's behavior is also bit annoying.


> From the view of the view pg_stat_*_tables, the counters moves as follows.
Thanks for your clarification.

> TID scan   : yes , seq_scan, , 
Here is wrong, because TID scan came to have SO_TYPE_SEQSCAN flags
from commit 147e3722f7.

So, currently( v12 and HEAD) TID scan status as following

 increments
scan typetable_beginscan?, per scan, per tuple , SO_TYPE flags
=
TID scan   : yes , seq_scan, , SO_TYPE_SEQSCAN

And my patch change the status to following (same as -v11)

 increments
scan typetable_beginscan?, per scan, per tuple , SO_TYPE flags
=
TID scan   : yes , , , 


> I'd rather think that whatever calls table_beginscan should have
> corresponding SO_TYPE_* flags. (Note: index scan doesn't call it.)
Agreed.
It may be better to add new flag such like SO_TYPE_TIDSCAN,
and handles some statistics updating and other things.
But it may be a bit overkill, since I want to revert to the previous
behavior this time.

Best regards,
-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com




Re: Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-01-29 Thread Kasahara Tatsuhito
Hi.

Attached patch solve this problem.

This patch adds table_beginscan_tid() and call it in TidListEval()
instead of table_beginscan().
table_beginscan_tid() is the same as table_beginscan() but do not set
SO_TYPE_SEQSCAN to flags.

Although I'm not sure this behavior is really problem or not,
it seems to me that previous behavior is more prefer.

Is it worth to apply to HEAD and v12 branch ?

Best regards,

2020年1月27日(月) 14:35 Kasahara Tatsuhito :
>
> Hi, I noticed that since PostgreSQL 12, Tid scan increments value of
> pg_stat_all_tables.seq_scan. (but not seq_tup_read)
>
> The following is an example.
>
> CREATE TABLE t (c int);
> INSERT INTO t SELECT 1;
> SET enable_seqscan to off;
>
> (v12 -)
> =# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
> QUERY PLAN
> ---
>  Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
> time=0.034..0.035 rows=1 loops=1)
>TID Cond: (ctid = '(0,1)'::tid)
>  Planning Time: 0.341 ms
>  Execution Time: 0.059 ms
> (4 rows)
>
> =# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables WHERE relname = 't';
>  seq_scan | seq_tup_read
> --+--
> 1 |0
> (1 row)
>
> (- v11)
> =# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
> QUERY PLAN
> ---
>  Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
> time=0.026..0.027 rows=1 loops=1)
>TID Cond: (ctid = '(0,1)'::tid)
>  Planning Time: 1.003 ms
>  Execution Time: 0.068 ms
> (4 rows)
>
> postgres=# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables
> WHERE relname = 't';
>  seq_scan | seq_tup_read
> --+--
> 0 |0
> (1 row)
>
>
> Exactly, this change occurred from following commit.
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=147e3722f7e531f15ba389a4d518efe8cd0bd736)
> I think, from this commit, TidListEval() came to call
> table_beginscan() , so this increments would be happen.
>
> I'm not sure this change whether intention or not, it can confuse some users.
>
> Best regards,
> --
> NTT Open Source Software Center
> Tatsuhito Kasahara



-- 
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com


fix_tidscan_increments_seqscan_num.patch
Description: Binary data


Tid scan increments value of pg_stat_all_tables.seq_scan. (but not seq_tup_read)

2020-01-26 Thread Kasahara Tatsuhito
Hi, I noticed that since PostgreSQL 12, Tid scan increments value of
pg_stat_all_tables.seq_scan. (but not seq_tup_read)

The following is an example.

CREATE TABLE t (c int);
INSERT INTO t SELECT 1;
SET enable_seqscan to off;

(v12 -)
=# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
QUERY PLAN
---
 Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
time=0.034..0.035 rows=1 loops=1)
   TID Cond: (ctid = '(0,1)'::tid)
 Planning Time: 0.341 ms
 Execution Time: 0.059 ms
(4 rows)

=# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables WHERE relname = 't';
 seq_scan | seq_tup_read
--+--
1 |0
(1 row)

(- v11)
=# EXPLAIN ANALYZE SELECT * FROM t WHERE ctid = '(0,1)';
QUERY PLAN
---
 Tid Scan on t  (cost=0.00..4.01 rows=1 width=4) (actual
time=0.026..0.027 rows=1 loops=1)
   TID Cond: (ctid = '(0,1)'::tid)
 Planning Time: 1.003 ms
 Execution Time: 0.068 ms
(4 rows)

postgres=# SELECT seq_scan, seq_tup_read FROM pg_stat_user_tables
WHERE relname = 't';
 seq_scan | seq_tup_read
--+--
0 |0
(1 row)


Exactly, this change occurred from following commit.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=147e3722f7e531f15ba389a4d518efe8cd0bd736)
I think, from this commit, TidListEval() came to call
table_beginscan() , so this increments would be happen.

I'm not sure this change whether intention or not, it can confuse some users.

Best regards,
--
NTT Open Source Software Center
Tatsuhito Kasahara




small improvement of the elapsed time for truncating heap in vacuum

2019-08-12 Thread Kasahara Tatsuhito
Hi,

I got following log messages when measured the heap truncating
duration in a vacuum.

=
INFO:  "dst": suspending truncate due to conflicting lock request
INFO:  "dst": truncated 550073 to 101472 pages
DETAIL:  CPU: user: 0.35 s, system: 4.92 s, elapsed: 6.96 s
INFO:  "dst": truncated 101472 to 164 pages
DETAIL:  CPU: user: 0.35 s, system: 11.02 s, elapsed: 13.46 s
=

Above message shows that postgres detected a access to the table
during heap truncating so suspend the truncating,
and then resumed truncating after the access finish. The messages were
no-problem.
But "usage" and "elapsed (time)" were bit confusing.
Total truncating duration was about 13.5s, but log said 6.96s (before
suspend) + 13.46s (remain).
# I confirmed the total truncating duration by elog debugging.

In lazy_truncate_heap() pg_rusage_init is only called once at the
truncating start.
So the last-truncating-phase-log shows the total truncating-phase
usages and elapsed time.
Attached patch make pg_rusage_init would be called after each
ereport() of heap-truncating,
so log messages will change like following.

=
INFO:  "dst": suspending truncate due to conflicting lock request
INFO:  "dst": truncated 550073 to 108288 pages
DETAIL:  CPU: user: 0.20 s, system: 4.88 s, elapsed: 7.41 s
INFO:  "dst": truncated 108288 to 164 pages
DETAIL:  CPU: user: 0.00 s, system: 7.36 s, elapsed: 7.92 s
=
(Total truncating time was about 15.3s in above case)

Any thoughts ?
Best regards,

-- 
Tatsuhito Kasahara
NTT Open Source Software Center


reset_usage.patch
Description: Binary data