Re: suboverflowed subtransactions concurrency performance optimize

2021-09-02 Thread Andrey Borodin
Sorry, for some reason Mail.app converted message to html and mailing list 
mangled this html into mess. I'm resending previous message as plain text 
again. Sorry for the noise.

> 31 авг. 2021 г., в 11:43, Pengchengliu  написал(а):
> 
> Hi Andrey,
>  Thanks a lot for your replay and reference information.
> 
>  The default NUM_SUBTRANS_BUFFERS is 32. My implementation is 
> local_cache_subtrans_pages can be adjusted dynamically.
>  If we configure local_cache_subtrans_pages as 64, every backend use only 
> extra 64*8192=512KB memory. 
>  So the local cache is similar to the first level cache. And subtrans SLRU is 
> the second level cache.
>  And I think extra memory is very well worth it. It really resolve massive 
> subtrans stuck issue which I mentioned in previous email.
> 
>  I have view the patch of [0] before. For SLRU buffers adding GUC 
> configuration parameters are very nice.
>  I think for subtrans, its optimize is not enough. For 
> SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then 
> call SubTransGetParent in loop.
>  Prevent acquire/release  SubtransSLRULock in SubTransGetTopmostTransaction-> 
> SubTransGetParent in loop.
>  After I apply this patch which I  optimize SubTransGetTopmostTransaction,  
> with my test case, I still get stuck result.

SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem 
may arise only when someone reads page from disk. But if you have big enough 
cache - this will never happen. And this cache will be much less than 
512KB*max_connections.

I think if we really want to fix exclusive SubtransSLRULock I think best option 
would be to split SLRU control lock into array of locks - one for each bank (in 
v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this 
approach we will have to rename s/bank/partition/g for consistency with locks 
and buffers partitions. I really liked having my own banks, but consistency 
worth it anyway.

Thanks!

Best regards, Andrey Borodin.



Re: Skipping logical replication transactions on subscriber side

2021-09-02 Thread Greg Nancarrow
On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
>
> I've attached rebased patches. 0004 patch is not the scope of this
> patch. It's borrowed from another thread[1] to fix the assertion
> failure for newly added tests. Please review them.
>

BTW, these patches need rebasing (broken by recent commits, patches
0001, 0003 and 0004 no longer apply, and it's failing in the cfbot).


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Improve logging when using Huge Pages

2021-09-02 Thread Shinoda, Noriyoshi (PN Japan FSIP)
Fujii-san, Julien-san

Thank you very much for your comment.
I followed your comment and changed the elog function to ereport function and 
also changed the log level. The output message is the same as in the case of 
non-HugePages memory acquisition failure.I did not simplify the error messages 
as it would have complicated the response to the preprocessor.

> I agree that the message should be promoted to a higher level.  But I 
> think we should also make that information available at the SQL level, 
> as the log files may be truncated / rotated before you need the info, 
> and it can be troublesome to find the information at the OS level, if 
> you're lucky enough to have OS access.

In the attached patch, I have added an Internal GUC 'using_huge_pages' to know 
that it is using HugePages. This parameter will be True only if the instance is 
using HugePages.

Regards,
Noriyoshi Shinoda

-Original Message-
From: Fujii Masao [mailto:masao.fu...@oss.nttdata.com] 
Sent: Wednesday, September 1, 2021 2:06 AM
To: Julien Rouhaud ; Shinoda, Noriyoshi (PN Japan FSIP) 

Cc: pgsql-hack...@postgresql.org
Subject: Re: Improve logging when using Huge Pages



On 2021/08/31 15:57, Julien Rouhaud wrote:
> On Tue, Aug 31, 2021 at 1:37 PM Shinoda, Noriyoshi (PN Japan FSIP) 
>  wrote:
>>
>> In the current version, when GUC huge_pages=try, which is the default 
>> setting, no log is output regardless of the success or failure of the 
>> HugePages acquisition. If you want to output logs, you need to set 
>> log_min_messages=DEBUG3, but it will output a huge amount of extra logs.
>> With huge_pages=try setting, if the kernel parameter vm.nr_hugepages is not 
>> enough, the administrator will not notice that HugePages is not being used.
>> I think it should output a log if HugePages was not available.

+1

-   elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages disabled: %m",
+   elog(WARNING, "mmap(%zu) with MAP_HUGETLB failed, huge 
pages 
+disabled: %m",

elog() should be used only for internal errors and low-level debug logging.
So per your proposal, elog() is not suitable here. Instead, ereport() should be 
used.

The log level should be LOG rather than WARNING because this message indicates 
the information about server activity that administrators are interested in.

The message should be updated so that it follows the Error Message Style Guide.
https://www.postgresql.org/docs/devel/error-style-guide.html 

With huge_pages=on, if shared memory fails to be allocated, the error message 
is reported currently. Even with huge_page=try, this error message should be 
used to simplify the code as follows?

 errno = mmap_errno;
-   ereport(FATAL,
+   ereport((huge_pages == HUGE_PAGES_ON) ? FATAL : LOG,
 (errmsg("could not map anonymous shared 
memory: %m"),
  (mmap_errno == ENOMEM) ?
  errhint("This error usually means that 
PostgreSQL's request "



> I agree that the message should be promoted to a higher level.  But I 
> think we should also make that information available at the SQL level, 
> as the log files may be truncated / rotated before you need the info, 
> and it can be troublesome to find the information at the OS level, if 
> you're lucky enough to have OS access.

+1

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


huge_pages_log_v2.diff
Description: huge_pages_log_v2.diff


Re: Read-only vs read only vs readonly

2021-09-02 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 22:07:02 +, "Bossart, Nathan"  
wrote in 
> On 9/2/21, 11:30 AM, "Magnus Hagander"  wrote:
> > I had a customer point out to me that we're inconsistent in how we
> > spell read-only. Turns out we're not as inconsistent as I initially
> > thought :), but that they did manage to spot the one actual log
> > message we have that writes it differently than everything else -- but
> > that broke their grepping...
> >
> > Almost everywhere we use read-only. Attached patch changes the one log
> > message where we didn't, as well as a few places in the docs for it. I
> > did not bother with things like comments in the code.
> > 
> > Two questions:
> >
> > 1. Is it worth fixing? Or just silly nitpicking?
> 
> It seems entirely reasonable to me to consistently use "read-only" in
> the log messages and documentation.
> 
> > 2. What about translations? This string exists in translations --
> > should we just "fix" it there, without touching the translated string?
> > Or try to fix both? Or leave it for the translators who will get a
> > diff on it?
> 
> I don't have a strong opinion, but if I had to choose, I would say to
> leave it to the translators to decide.

+1 for both.  As a translator, it seems that that kind of changes are
usual.  Many changes about full-stops, spacings, capitalizing and so
happen especially at near-release season like now.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-09-02 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for your great works. Attached is the latest version.

> Thanks! What about updating the comments furthermore as follows?
> 
> -
> Use pgfdw_application_name as application_name if set.
> 
> PQconnectdbParams() processes the parameter arrays from start to end.
> If any key word is repeated, the last value is used. Therefore note that
> pgfdw_application_name must be added to the arrays after options of
> ForeignServer are, so that it can override application_name set in
> ForeignServer.
> -

It's more friendly than mine because it mentions
about specification about PQconnectdbParams(). 
Fixed like yours.

> + }
> + /* Use "postgres_fdw" as fallback_application_name */
> 
> It's better to add new empty line between these two lines.

Fixed.

> +-- Disconnect once because the value is used only when establishing
> connections
> +DO $$
> + BEGIN
> + PERFORM postgres_fdw_disconnect_all();
> + END
> +$$;
> 
> Why does DO command need to be used here to execute
> postgres_fdw_disconnect_all()? Instead, we can just execute
> "SELECT 1 FROM postgres_fdw_disconnect_all();"?

DO command was used because I want to
ignore the returning value of postgres_fdw_disconnect_all().
Currently this function retruns false, but if other tests are modified,
some connection may remain and the function may become to return true.

I seeked sql file and I found that this function was called by your way.
Hence I fixed.

> For test coverage, it's better to test at least the following three cases?
> 
> (1) appname is set in neither GUC nor foreign server
> -> "postgres_fdw" set in fallback_application_name is used
> (2) appname is set in foreign server, but not in GUC
> -> appname in foreign server is used
> (3) appname is set both in GUC and foreign server
>-> appname in GUC is used

I set four testcases:

(1) Sets neither GUC nor server option
(2) Sets server option, but not GUC
(3) Sets GUC but not server option
(4) Sets both GUC and server option

I confirmed it almost works fine, but I found that
fallback_application_name will be never used in our test enviroment.
It is caused because our test runner pg_regress sets PGAPPNAME to "pg_regress" 
and
libpq prefer the environment variable to fallback_appname.
(I tried to control it by \setenv, but failed...)

> +SELECT FROM ft1 LIMIT 1;
> 
> "1" should be added just after SELECT in the above statement?
> Because postgres_fdw regression test basically uses "SELECT 1 FROM ..."
> in other places.

Fixed.

> + DefineCustomStringVariable("postgres_fdw.application_name",
> +"Sets the 
> application name. This is used when connects to the remote server.",
>
> What about simplifying this description as follows?
>
> ---
> Sets the application name to be used on the remote server.
> ---

+1.

> +   Configuration Parameters 
> +  
> 
> The empty characters just after  and before  should be removed?

I checked other sgml file and agreed. Fixed.

And I found that including string.h is no more needed. Hence it is removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v07_0001_add_application_name_GUC.patch
Description: v07_0001_add_application_name_GUC.patch


Re: Possible missing segments in archiving on standby

2021-09-02 Thread Kyotaro Horiguchi
At Fri, 3 Sep 2021 02:06:45 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/09/02 10:16, Kyotaro Horiguchi wrote:
> > Ok, I agree that the reader-side needs an amendment.
> 
> Thanks for the review! Attached is the updated version of the patch.
> Based on my latest patch, I changed the startup process so that
> it creates an archive notification file of the streamed WAL segment
> including XLOG_SWITCH record if the notification file has not been
> created yet.

+   if (readSource == XLOG_FROM_STREAM &&
+   record->xl_rmid == RM_XLOG_ID &&
+   (record->xl_info & ~XLR_INFO_MASK) == 
XLOG_SWITCH)

readSource is the source at the time startup reads it and it could be
different from the source at the time the record was written. We
cannot know where the record came from there, so does the readSource
condition work as expected?  If we had some trouble streaming just
before, readSource at the time is likely to be XLOG_FROM_PG_WAL.

+   if (XLogArchivingAlways())
+   
XLogArchiveNotify(xlogfilename, true);
+   else
+   
XLogArchiveForceDone(xlogfilename);

The path is used both for crash and archive recovery. If we pass there
while crash recovery on a primary with archive_mode=on, the file could
be marked .done before actually archived. On the other hand when
archive_mode=always, the file could be re-marked .ready even after it
has been already archived.  Why isn't it XLogArchiveCheckDone?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Unused variable in TAP tests file

2021-09-02 Thread Amul Sul
Few tap test files have the "tempdir_short" variable which isn't in
use. The attached patch removes the same

Regards,
Amul
From 0751895df64bcd6bc719933013edf1d76e31b784 Mon Sep 17 00:00:00 2001
From: Amul Sul 
Date: Fri, 3 Sep 2021 01:19:29 -0400
Subject: [PATCH] Remove unused variable

---
 src/bin/pg_ctl/t/002_status.pl   | 1 -
 src/bin/pg_dump/t/001_basic.pl   | 1 -
 src/bin/pg_dump/t/002_pg_dump.pl | 1 -
 src/bin/pg_dump/t/003_pg_dump_with_server.pl | 1 -
 src/test/modules/test_pg_dump/t/001_base.pl  | 1 -
 5 files changed, 5 deletions(-)

diff --git a/src/bin/pg_ctl/t/002_status.pl b/src/bin/pg_ctl/t/002_status.pl
index 56a06fafa3b..c6f4fac57b6 100644
--- a/src/bin/pg_ctl/t/002_status.pl
+++ b/src/bin/pg_ctl/t/002_status.pl
@@ -9,7 +9,6 @@ use TestLib;
 use Test::More tests => 3;
 
 my $tempdir   = TestLib::tempdir;
-my $tempdir_short = TestLib::tempdir_short;
 
 command_exit_is([ 'pg_ctl', 'status', '-D', "$tempdir/nonexistent" ],
 	4, 'pg_ctl status with nonexistent directory');
diff --git a/src/bin/pg_dump/t/001_basic.pl b/src/bin/pg_dump/t/001_basic.pl
index d1a7e1db405..d6731855eda 100644
--- a/src/bin/pg_dump/t/001_basic.pl
+++ b/src/bin/pg_dump/t/001_basic.pl
@@ -10,7 +10,6 @@ use TestLib;
 use Test::More tests => 82;
 
 my $tempdir   = TestLib::tempdir;
-my $tempdir_short = TestLib::tempdir_short;
 
 #
 # Basic checks
diff --git a/src/bin/pg_dump/t/002_pg_dump.pl b/src/bin/pg_dump/t/002_pg_dump.pl
index a4ee54d516f..223f60e3bcb 100644
--- a/src/bin/pg_dump/t/002_pg_dump.pl
+++ b/src/bin/pg_dump/t/002_pg_dump.pl
@@ -10,7 +10,6 @@ use TestLib;
 use Test::More;
 
 my $tempdir   = TestLib::tempdir;
-my $tempdir_short = TestLib::tempdir_short;
 
 ###
 # Definition of the pg_dump runs to make.
diff --git a/src/bin/pg_dump/t/003_pg_dump_with_server.pl b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
index ba994aee823..a879ae28d8d 100644
--- a/src/bin/pg_dump/t/003_pg_dump_with_server.pl
+++ b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
@@ -9,7 +9,6 @@ use TestLib;
 use Test::More tests => 3;
 
 my $tempdir   = TestLib::tempdir;
-my $tempdir_short = TestLib::tempdir_short;
 
 my $node = PostgresNode->new('main');
 my $port = $node->port;
diff --git a/src/test/modules/test_pg_dump/t/001_base.pl b/src/test/modules/test_pg_dump/t/001_base.pl
index ea7739d7254..17c404c81f2 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -10,7 +10,6 @@ use TestLib;
 use Test::More;
 
 my $tempdir   = TestLib::tempdir;
-my $tempdir_short = TestLib::tempdir_short;
 
 ###
 # This structure is based off of the src/bin/pg_dump/t test
-- 
2.18.0



Re: Estimating HugePages Requirements?

2021-09-02 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 16:46:56 +, "Bossart, Nathan"  
wrote in 
> On 9/2/21, 12:54 AM, "Michael Paquier"  wrote:
> > Thanks.  Anyway, we don't really need huge_pages_required on Windows,
> > do we?  The following docs of Windows tell what do to when using large
> > pages:
> > https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
> >
> > The backend code does that as in PGSharedMemoryCreate(), now that I
> > look at it.  And there is no way to change the minimum large page size
> > there as far as I can see because that's decided by the processor, no?
> > There is a case for shared_memory_size on Windows to be able to adjust
> > the sizing of the memory of the host, though.
> 
> Yeah, huge_pages_required might not serve much purpose for Windows.
> We could always set it to -1 for Windows if it seems like it'll do
> more harm than good.

I agreed to this.

> > At the end it would be nice to not finish with two GUCs.  Both depend
> > on the reordering of the actions done by the postmaster, so I'd be
> > curious to hear the thoughts of others on this particular point.
> 
> Of course.  It'd be great to hear others' thoughts on this stuff.

Honestly, I would be satisfied if the following error message
contained required huge pages.

FATAL:  could not map anonymous shared memory: Cannot allocate memory
HINT:  This error usually means that PostgreSQL's request for a shared memory 
segment exceeded available memory, swap space, or huge pages. To reduce the 
request size (currently 148897792 bytes), reduce PostgreSQL's shared memory 
usage, perhaps by reducing shared_buffers or max_connections.

Or emit a different message if huge_pages=on.

FATAL: could not map anonymous shared memory from huge pages
HINT:  This usually means that PostgreSQL's request for huge pages more than 
available. The required 2048kB huge pages for the required memory size 
(currently 148897792 bytes) is 71 pages.


Returning to this feature, even if I am informed that via GUC, I won't
add memory by looking shared_memory_size.  Anyway since shard_buffers
occupies almost all portion of shared memory allocated to postgres, we
are not supposed to need such a precise adjustment of the required
size of shared memory.  On the other hand available number of huge
pages is configurable and we need to set it as required.  On the other
hand, it might seem to me a bit strange that there's only
huge_page_required and not shared_memory_size in the view of
comprehensiveness or completeness.  So my feeling at this point is "I
need only huge_pages_required but might want shared_memory_size just
for completeness".


By the way I noticed that postgres -C huge_page_size shows 0, which I
think should have the number used for the calculation if we show
huge_page_required.

I noticed that postgres -C shared_memory_size showed 137 (= 144703488)
whereas the error message above showed 148897792 bytes (142MB). So it
seems that something is forgotten while calculating
shared_memory_size.  As the consequence, launching postgres setting
huge_pages_required (69 pages) as vm.nr_hugepages ended up in the
"could not map anonymous shared memory" error.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: using an end-of-recovery record in all cases

2021-09-02 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 11:30:59 -0400, Robert Haas  wrote in 
> On Mon, Aug 9, 2021 at 3:00 PM Robert Haas  wrote:
> > I decided to try writing a patch to use an end-of-recovery record
> > rather than a checkpoint record in all cases.
> >
> > The first problem I hit was that GetRunningTransactionData() does
> > Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
> >
> > Unfortunately we can't just relax the assertion, because the
> > XLOG_RUNNING_XACTS record will eventually be handed to
> > ProcArrayApplyRecoveryInfo() for processing ... and that function
> > contains a matching assertion which would in turn fail. It in turn
> > passes the value to MaintainLatestCompletedXidRecovery() which
> > contains yet another matching assertion, so the restriction to normal
> > XIDs here looks pretty deliberate. There are no comments, though, so
> > the reader is left to guess why. I see one problem:
> > MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
> > expects a normal XID. Perhaps it's best to just dodge the entire issue
> > by skipping LogStandbySnapshot() if latestCompletedXid happens to be
> > 2, but that feels like a hack, because AFAICS the real problem is that
> > StartupXLog() doesn't agree with the rest of the code on whether 2 is
> > a legal case, and maybe we ought to be storing a value that doesn't
> > need to be computed via TransactionIdRetreat().
> 
> Anyone have any thoughts about this?

I tried to reproduce this but just replacing the end-of-recovery
checkpoint request with issuing an end-of-recovery record didn't cause
make check-workd fail for me.  Do you have an idea of any other
requirement to cause that?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [PATCH] support tab-completion for single quote input with equal sign

2021-09-02 Thread tanghy.f...@fujitsu.com
On Friday, September 3, 2021 2:14 AM, Jacob Champion  
wrote
>I applied your patch against HEAD (and did a clean build for good
>measure) but couldn't get the tab-completion you described -- on my
>machine, `PUBLICATION` still fails to complete. Tab completion is
>working in general, for example with the `SUBSCRIPTION` and
>`CONNECTION` keywords.
>
>Is there additional setup that I need to do?

Thanks for your check.

I applied the 0001-patch to the HEAD(c95ede41) by now and it worked as I 
expected.
Did you leave a space between "dbname=postgres'" and "[TAB]"?

In 0002-patch I added a tap test for the scenario of single quoted input with 
equal sign. 
The test result is 'All tests successful. ' on my machine. 
You can run the tap tests as follows:
1. Apply the attached patch
2. build the src with option '--enable-tap-tests' (./configure 
--enable-tap-tests)
3. Move to src subdirectory 'src/bin/psql' 
4. Run 'make check' 

I'd appreciate it if you can share your test results with me.

Regards,
Tang


v2-0001-support-tab-completion-for-single-quote-input-wit.patch
Description:  v2-0001-support-tab-completion-for-single-quote-input-wit.patch


Re: suboverflowed subtransactions concurrency performance optimize

2021-09-02 Thread Andrey Borodin
31 авг. 2021 г., в 11:43, Pengchengliu  написал(а):Hi Andrey, Thanks a lot for your replay and reference information. The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically. If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory.  So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache. And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email. I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice. I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop. Prevent acquire/release  SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop. After I apply this patch which I  optimize SubTransGetTopmostTransaction,  with my test case, I still get stuck result.SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections.I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway.Thanks!Best regards, Andrey Borodin.



Re: PoC/WIP: Extended statistics on expressions

2021-09-02 Thread Justin Pryzby
On Wed, Sep 01, 2021 at 06:45:29PM +0200, Tomas Vondra wrote:
> However while polishing the second patch, I realized we're allowing
> statistics on expressions referencing system attributes. So this fails;
> 
> CREATE STATISTICS s ON ctid, x FROM t;
> 
> but this passes:
> 
> CREATE STATISTICS s ON (ctid::text), x FROM t;
> 
> IMO we should reject such expressions, just like we reject direct references
> to system attributes - patch attached.

Right, same as indexes.  +1

> Furthermore, I wonder if we should reject expressions without any Vars? This
> works now:
> 
> CREATE STATISTICS s ON (11:text) FROM t;
> 
> but it seems rather silly / useless, so maybe we should reject it.

To my surprise, this is also allowed for indexes...

But (maybe this is what I was remembering) it's prohibited to have a constant
expression as a partition key.

Expressions without a var seem like a case where the user did something
deliberately silly, and dis-similar from the case of making a stats expression
on a simple column - that seemed like it could be a legitimate
mistake/confusion (it's not unreasonable to write an extra parenthesis, but
it's strange if that causes it to behave differently).

I think it's not worth too much effort to prohibit this: if they're determined,
they can still write an expresion with a var which is constant.  I'm not going
to say it's worth zero effort, though.

-- 
Justin




Re: improve pg_receivewal code

2021-09-02 Thread Bharath Rupireddy
On Thu, Sep 2, 2021 at 9:05 PM Ronan Dunklau  wrote:
> > 1) Fetch the server system identifier in the first RunIdentifySystem
> > call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> > unexpected changes that may happen in the server while pg_receivewal
> > is connected to it. This can be helpful in scenarios when
> > pg_receivewal tries to reconnect to the server (see the code around
> > pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> > happnend in the server that changed the its system identifier. Once
> > the pg_receivewal establishes the conenction to server again, then the
> > ReceiveXlogStream has a code chunk to compare the system identifier
> > that we received in the initial connection.
>
> I'm not sure what kind of problem could be happening here: if you were
> somewhat routed to another server ? Or if we "switched" the cluster listening
> on that port ?

Yeah. Also, pg_control file on the server can get corrupted for
whatever reasons it may be. This sys identifier check is useful in
case if the pg_receivewal connects to the server again and again.
These are things that sound over cautious to me, however it's nothing
wrong to use what ReceiveXlogStream provides. pg_basebackup does make
use of this already.

> > 2) Move the RunIdentifySystem to identify timeline id and start LSN
> > from the server only if the pg_receivewal failed to get them from
> > FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> > avoided.
>
> That makes sense, even if we add another IDENTIFY_SYSTEM to check against the
> one set in the first place.
>
> > 3) Place the "replication connetion shouldn't have any database name
> > associated" error check right after RunIdentifySystem so that we can
> > avoid fetching wal segment size with RetrieveWalSegSize if at all we
> > were to fail with that error. This change is similar to what
> > pg_recvlogical.c does.
>
> Makes sense.
>
> > 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> > main loop to get the wal from the server. This avoids an unnecessary
> > query for pg_receivewal with "--create-slot" or "--drop-slot".
> > 5) Have an assertion after the pg_receivewal done a good amount of
> > work to find start timeline and LSN might be helpful:
> > Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
> >
> > Attaching a patch that does take care of above improvements. Thoughts?
>
> Overall I think it is good.

Thanks for your review.

> However, you have some formatting issues, here it mixes tabs and spaces:
>
> +   /*
> +* No valid data can be found in the existing output
> directory.
> +* Get start LSN position and current timeline ID from
> the server.
> +*/

My bad. I forgot to run "git diff --check" on the v1 patch.

> And here it is not formatted properly:
>
> +static char   *server_sysid = NULL;

Done.

Here's the v2 with above modifications.

Regards,
Bharath Rupireddy.


v2-0001-improve-pg_receivewal-code.patch
Description: Binary data


Re: a misbehavior of partition row movement (?)

2021-09-02 Thread Amit Langote
Hi Andrew,

On Fri, Sep 3, 2021 at 6:19 AM Andrew Dunstan  wrote:
> On 7/13/21 8:09 AM, Amit Langote wrote:
> > Unfortunately, I don’t think I’ll have time in this CF to solve some
> > very fundamental issues I found in the patch during the last cycle.
> > I’m fine with either marking this as RwF for now or move to the next CF.
>
> Amit, do you have time now to work on this?

I will take some time next week to take a fresh look at this and post an update.

Thank you.

-- 
Amit Langote
EDB: http://www.enterprisedb.com




Re: AIX: Symbols are missing in libpq.a

2021-09-02 Thread Noah Misch
On Wed, Sep 01, 2021 at 08:59:57AM +, REIX, Tony wrote:
> Here is a new patch, using the export.txt whenever it does exist.
> I have tested it with v13.4 : it's OK.
> Patch for 14beta3 should be the same since there was no change for 
> src/Makefile.shlib between v13 and v14.

Thanks.  This looks good.  I'm attaching what I intend to push, which just
adds a log message and some cosmetic changes compared to your version.  Here
are the missing symbols restored by the patch:

pg_encoding_to_char
pg_utf_mblen
pg_char_to_encoding
pg_valid_server_encoding
pg_valid_server_encoding_id

I was ambivalent about whether to back-patch to v13 or to stop at v14, but I
decided that v13 should have this change.  We should expect sad users when
libpq lacks a documented symbol.  Complaints about loss of undocumented
symbols (e.g. pqParseInput3) are unlikely, and we're even less likely to have
users opposing reintroduction of long-documented symbols.  An alternative
would be to have v13 merge the symbol lists, like your original proposal, so
we're not removing even undocumented symbols.  I doubt applications have
accrued dependencies on libpq-internal symbols in the year since v13 appeared,
particularly since those symbols are inaccessible on Linux.  Our AIX export
lists never included libpgport or libpgcommon symbols.
Author: Noah Misch 
Commit: Noah Misch 

AIX: Fix missing libpq symbols by respecting SHLIB_EXPORTS.

We make each AIX shared library export all globals found in .o files
that originate in the library.  That doesn't include symbols acquired by
-lpgcommon_shlib.  That is good on average, but it became a problem for
libpq when commit e6afa8918c461c1dd80c5063a950518fa4e950cd moved five
official libpq API symbols into src/common.  Fix this by implementing
the SHLIB_EXPORTS mechanism for AIX, so affected libraries export the
same symbols that they export on Linux.  This reintroduces symbols
pg_encoding_to_char, pg_utf_mblen, pg_char_to_encoding,
pg_valid_server_encoding, and pg_valid_server_encoding_id.  Back-patch
to v13, where the aforementioned commit first appeared.  While a minor
release is usually the wrong time to add or remove symbol exports in
libpq or libecpg, we should expect users to want each documented symbol.

Tony Reix

Discussion: 
https://postgr.es/m/pr3pr02mb6396742e2fc3e77d37a920bc86...@pr3pr02mb6396.eurprd02.prod.outlook.com

diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 29a7f6d..6b79db1 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -329,7 +329,11 @@ $(shlib): $(OBJS) | $(SHLIB_PREREQS)
rm -f $(stlib)
$(LINK.static) $(stlib) $^
$(RANLIB) $(stlib)
+ifeq (,$(SHLIB_EXPORTS))
$(MKLDEXPORT) $(stlib) $(shlib) >$(exports_file)
+else
+   ( echo '#! $(shlib)'; $(AWK) '/^[^#]/ {printf "%s\n",$$1}' 
$(SHLIB_EXPORTS) ) >$(exports_file)
+endif
$(COMPILER) -o $(shlib) $(stlib) -Wl,-bE:$(exports_file) $(LDFLAGS) 
$(LDFLAGS_SL) $(SHLIB_LINK)
rm -f $(stlib)
$(AR) $(AROPT) $(stlib) $(shlib)
diff --git a/src/port/README b/src/port/README
index c446b46..97f18a6 100644
--- a/src/port/README
+++ b/src/port/README
@@ -28,5 +28,5 @@ applications.
 from libpgport are linked first.  This avoids having applications
 dependent on symbols that are _used_ by libpq, but not intended to be
 exported by libpq.  libpq's libpgport usage changes over time, so such a
-dependency is a problem.  Windows, Linux, and macOS use an export list to
-control the symbols exported by libpq.
+dependency is a problem.  Windows, Linux, AIX, and macOS use an export
+list to control the symbols exported by libpq.


Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-09-02 Thread Noah Misch
On Thu, Sep 02, 2021 at 11:28:27AM -0400, Robert Haas wrote:
> On Wed, Aug 25, 2021 at 8:03 AM Robert Haas  wrote:
> > On Wed, Aug 25, 2021 at 1:21 AM Noah Misch  wrote:
> > > Sounds good.  I think the log message is the optimal place:
> >
> > Looks awesome.
> 
> Is there anything still standing in the way of committing this?

I pushed it as commit 97ddda8.




Re: pgstat_send_connstats() introduces unnecessary timestamp and UDP overhead

2021-09-02 Thread Laurenz Albe
On Wed, 2021-09-01 at 10:56 +0200, Laurenz Albe wrote:
> On Tue, 2021-08-31 at 21:16 -0700, Andres Freund wrote:
> > On 2021-09-01 05:39:14 +0200, Laurenz Albe wrote:
> > > On Tue, 2021-08-31 at 18:55 -0700, Andres Freund wrote:
> > > > > > On Tue, Aug 31, 2021 at 04:55:35AM +0200, Laurenz Albe wrote:In the
> > > > > > view of that, how about doubling PGSTAT_STAT_INTERVAL to 1000
> > > > > > milliseconds?  That would mean slightly less up-to-date statistics, 
> > > > > > but
> > > > > > I doubt that that will be a problem.
> > > > 
> > > > I think it's not helpful. Still increases the number of messages 
> > > > substantially in workloads
> > > > with a lot of connections doing occasional queries. Which is common.
> > > 
> > > How come?  If originally you send table statistics every 500ms, and now 
> > > you send
> > > table statistics and session statistics every second, that should amount 
> > > to the
> > > same thing.  Where is my misunderstanding?
> > 
> > Consider the case of one query a second.
> 
> I guess I am too stupid.  I don't see it.

Finally got it.  That would send a message every second, and with connection 
statistics,
twice as many.

Here is my next suggestion for a band-aid to mitigate this problem:
Introduce a second, much longer interval for reporting session statistics.

--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -77,6 +77,8 @@
 #define PGSTAT_STAT_INTERVAL   500 /* Minimum time between stats file
 * 
updates; in milliseconds. */
 
+#define PGSTAT_CONSTAT_INTERVAL6   /* interval to report 
connection statistics */
+
 #define PGSTAT_RETRY_DELAY 10  /* How long to wait between 
checks for a
 * new 
file; in milliseconds. */
 
@@ -889,8 +891,13 @@ pgstat_report_stat(bool disconnect)
!TimestampDifferenceExceeds(last_report, now, 
PGSTAT_STAT_INTERVAL))
return;
 
-   /* for backends, send connection statistics */
-   if (MyBackendType == B_BACKEND)
+   /*
+* For backends, send connection statistics, but only every
+* PGSTAT_CONSTAT_INTERVAL or when the backend terminates.
+*/
+   if (MyBackendType == B_BACKEND &&
+   (TimestampDifferenceExceeds(last_report, now, 
PGSTAT_CONSTAT_INTERVAL) ||
+disconnect))
pgstat_send_connstats(disconnect, last_report, now);
 
last_report = now;

That should keep the extra load moderate, except for workloads with lots of 
tiny connections
(for which this may be the least of their problems).

Yours,
Laurenz Albe





Re: row filtering for logical replication

2021-09-02 Thread Peter Smith
On Wed, Sep 1, 2021 at 9:23 PM Euler Taveira  wrote:
>
> On Sun, Aug 29, 2021, at 11:14 PM, Peter Smith wrote:
...
> Peter, I'm still reviewing this new cache mechanism. I will provide a feedback
> as soon as I integrate it as part of this recent modification.

Hi Euler, for your next version can you please also integrate the
tab-autocomplete change back into the main patch.

This autocomplete change was originally posted quite a few weeks ago
here [1] but seems to have gone overlooked.
I've rebased it and it applied OK to your latest v27* set. PSA.

Thanks!
--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPuLoZuHD_A%3Dn8GshC84Nc%3D8guReDsTmV1RFsCYojssD8Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Add-tab-auto-complete-support-for-the-Row-Filter-.patch
Description: Binary data


Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Kyotaro Horiguchi
At Fri, 3 Sep 2021 01:01:43 +1200, Thomas Munro  wrote 
in 
> On Fri, Sep 3, 2021 at 12:44 AM Thomas Munro  wrote:
> > NtFileCreate()
> 
> Erm, that's spelled NtCreateFile.  I see Michael mentioned this
> before[1]; I don't think it's only available in kernel mode though,
> the docs[2] say "This function is the user-mode equivalent to the
> ZwCreateFile function", and other open source user space stuff is
> using it.  It's explicitly internal and subject to change though,
> hence my desire to avoid it.
> 
> [1] 
> https://www.postgresql.org/message-id/flat/a9c76882-27c7-9c92-7843-21d5521b70a9%40postgrespro.ru
> [2] 
> https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile

Might be stupid, if a delete-pending'ed file can obstruct something,
couldn't we change unlink on Windows to rename to a temporary random
name then remove it?  We do something like it explicitly while WAL
file removal. (It may cause degradation on bulk file deletion, and we
may need further fix so that such being-deleted files are excluded
while running a directory scan, though..)

However, looking [1], with that strategy there may be a case where
such "deleted" files may be left alone forever, though.


[1] 
https://www.postgresql.org/message-id/002101d79fc2%24c96dff60%245c49fe20%24%40gmail.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Estimating HugePages Requirements?

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 04:46:56PM +, Bossart, Nathan wrote:
> Yeah, huge_pages_required might not serve much purpose for Windows.
> We could always set it to -1 for Windows if it seems like it'll do
> more harm than good.

I'd be fine with this setup on environments where there is no need for
it.

>> At the end it would be nice to not finish with two GUCs.  Both depend
>> on the reordering of the actions done by the postmaster, so I'd be
>> curious to hear the thoughts of others on this particular point.
> 
> Of course.  It'd be great to hear others' thoughts on this stuff.

Just to be clear here, the ordering of HEAD is that for the
postmaster:
- Load configuration.
- Handle -C config_param.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).
- Register apply launcher
- shared_preload_libraries

With 0002, we have that:
- Load configuration.
- checkDataDir(), to check permissions of the data dir, etc.
- checkControlFile(), to see if the control file exists.
- Switch to data directory as work dir.
- Register apply launcher
- shared_preload_libraries
- Calculate the shmem GUCs (new step)
- Handle -C config_param.
- Lock file creation.
- Initial read of the control file (where the GUC data_checksums is
set).

One thing that would be incorrect upon more review is that we'd still
have data_checksums wrong with -C, meaning that the initial read of
the control file should be moved further up, and getting the control
file checks done before registering workers would be better.  Keeping
the lock file at the end would be fine AFAIK, but should we worry
about the interactions with _PG_init() here?

0001, that refactors the calculation of the shmem size into a
different routine, is fine as-is, so I'd like to move on and apply
it.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] test/ssl: rework the sslfiles Makefile target

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 04:42:14PM +, Jacob Champion wrote:
> Done that way in v5. It's a lot of moved code, so I've kept it as two
> commits for review purposes.

Nice.  This is neat.  The split helps a lot to understand how you've
changed things from the original implementation.  As a whole, this
looks rather committable to me.

One small-ish comment that I have is about all the .config files we
have at the root of src/test/ssl/, bloating the whole.  I think that
it would be a bit cleaner to put all of them in a different
sub-directory, say just config/ or conf/.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-02 Thread Tom Lane
Phil Krylov  writes:
> IMHO pg_ctl should not blindly truncate generated command lines at 
> MAXPGPATH (1024 characters) and then run that, resulting in:

Fair enough.

> The attached patch tries to fix it in the least intrusive way.

Seems reasonable.  We didn't have psprintf when this code was written,
but now that we do, it's hardly any more complicated to do it without
the length restriction.

> While we're at it, is it supposed that pg_ctl is a very short-lived 
> process and is therefore allowed to leak memory? I've noticed some 
> places where I would like to add a free() call.

I think that these free() calls you propose to add are a complete
waste of code space.  Certainly a free() right before an exit() call
is that; if anything, it's *delaying* recycling the memory space for
some useful purpose.  But no part of pg_ctl runs long enough for it
to be worth worrying about small leaks.

I do not find your proposed test case to be a useful expenditure
of test cycles, either.  If it ever fails, we'd learn nothing,
except that that particular platform has a surprisingly small
command line length limit.

regards, tom lane




Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

2021-09-02 Thread Tom Lane
Mario Emmenlauer  writes:
> The idea to switch to dup(2) sounds very good to me.

I poked at this some more, and verified that adding "fclose(stdin);"
at the head of PostmasterMain is enough to trigger the reported
failure.  However, after changing fd.c to dup stderr not stdin,
we can pass check-world even with that in place.  So that confirms
that there is no very good reason for the postmaster to require
stdin to be available.

Hence, I pushed the fix to make fd.c use stderr here.  I only
back-patched to v14, because given the lack of other complaints,
I couldn't quite justify touching stable branches for this.

regards, tom lane




Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-02 Thread Phil Krylov

On 2021-09-03 00:36, Ranier Vilela wrote:

The msvc docs says that limit for the command line is 32,767 
characters,

while ok for me, I think if not it would be better to check this limit?


Well, it's ARG_MAX in POSIX, and ARG_MAX is defined as 256K in Darwin, 
512K in FreeBSD, 128K in Linux; _POSIX_ARG_MAX is defined as 4096 on all 
three platforms. Windows may differ too. Anyways, allocating even 128K 
in precious stack space is too much, that's why I suggest to use 
psprintf(). As for checking any hard limit, I don't think it would have 
much value - somehow we got the original command line, thus it is 
supported by the system, so we can just pass it on.


-- Ph.




Re: prevent immature WAL streaming

2021-09-02 Thread Alvaro Herrera
On 2021-Sep-02, Kyotaro Horiguchi wrote:

> So, this is a crude PoC of that.

I had ended up with something very similar, except I was trying to cram
the flag via the checkpoint record instead of hacking
AdvanceXLInsertBuffer().  I removed that stuff and merged both, here's
the result.

> 1. This patch is written on the current master, but it doesn't
>   interfare with the seg-boundary-memorize patch since it removes the
>   calls to RegisterSegmentBoundary.

I rebased on top of the revert patch.

> 2. Since xlogreader cannot emit a log-message immediately, we don't
>   have a means to leave a log message to inform recovery met an
>   aborted partial continuation record. (In this PoC, it is done by
>   fprintf:p)

Shrug.  We can just use an #ifndef FRONTEND / elog(LOG).  (I didn't keep
this part, sorry.)

> 3. Myebe we need to pg_waldump to show partial continuation records,
>   but I'm not sure how to realize that.

Ah yes, we'll need to fix that.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las navajas y los monos deben estar siempre distantes"   (Germán Poo)
>From af9dba1c1523ea8b7963e01fdd5e357b142e69f8 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 31 Aug 2021 20:55:10 -0400
Subject: [PATCH v2 1/3] Revert "Avoid creating archive status ".ready" files
 too early"

This reverts commit 515e3d84a0b58b58eb30194209d2bc47ed349f5b.
---
 src/backend/access/transam/timeline.c|   2 +-
 src/backend/access/transam/xlog.c| 220 ++-
 src/backend/access/transam/xlogarchive.c |  17 +-
 src/backend/postmaster/walwriter.c   |   7 -
 src/backend/replication/walreceiver.c|   6 +-
 src/include/access/xlog.h|   1 -
 src/include/access/xlogarchive.h |   4 +-
 src/include/access/xlogdefs.h|   1 -
 8 files changed, 24 insertions(+), 234 deletions(-)

diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index acd5c2431d..8d0903c175 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -452,7 +452,7 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 	if (XLogArchivingActive())
 	{
 		TLHistoryFileName(histfname, newTLI);
-		XLogArchiveNotify(histfname, true);
+		XLogArchiveNotify(histfname);
 	}
 }
 
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab03e..e51a7a749d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -724,18 +724,6 @@ typedef struct XLogCtlData
 	XLogRecPtr	lastFpwDisableRecPtr;
 
 	slock_t		info_lck;		/* locks shared variables shown above */
-
-	/*
-	 * Variables used to track segment-boundary-crossing WAL records.  See
-	 * RegisterSegmentBoundary.  Protected by segtrack_lck.
-	 */
-	XLogSegNo	lastNotifiedSeg;
-	XLogSegNo	earliestSegBoundary;
-	XLogRecPtr	earliestSegBoundaryEndPtr;
-	XLogSegNo	latestSegBoundary;
-	XLogRecPtr	latestSegBoundaryEndPtr;
-
-	slock_t		segtrack_lck;	/* locks shared variables shown above */
 } XLogCtlData;
 
 static XLogCtlData *XLogCtl = NULL;
@@ -932,7 +920,6 @@ static void RemoveXlogFile(const char *segname, XLogSegNo recycleSegNo,
 		   XLogSegNo *endlogSegNo);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
-static void RegisterSegmentBoundary(XLogSegNo seg, XLogRecPtr pos);
 static void CleanupBackupHistory(void);
 static void UpdateMinRecoveryPoint(XLogRecPtr lsn, bool force);
 static XLogRecord *ReadRecord(XLogReaderState *xlogreader,
@@ -1167,56 +1154,23 @@ XLogInsertRecord(XLogRecData *rdata,
 	END_CRIT_SECTION();
 
 	/*
-	 * If we crossed page boundary, update LogwrtRqst.Write; if we crossed
-	 * segment boundary, register that and wake up walwriter.
+	 * Update shared LogwrtRqst.Write, if we crossed page boundary.
 	 */
 	if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
 	{
-		XLogSegNo	StartSeg;
-		XLogSegNo	EndSeg;
-
-		XLByteToSeg(StartPos, StartSeg, wal_segment_size);
-		XLByteToSeg(EndPos, EndSeg, wal_segment_size);
-
-		/*
-		 * Register our crossing the segment boundary if that occurred.
-		 *
-		 * Note that we did not use XLByteToPrevSeg() for determining the
-		 * ending segment.  This is so that a record that fits perfectly into
-		 * the end of the segment causes the latter to get marked ready for
-		 * archival immediately.
-		 */
-		if (StartSeg != EndSeg && XLogArchivingActive())
-			RegisterSegmentBoundary(EndSeg, EndPos);
-
-		/*
-		 * Advance LogwrtRqst.Write so that it includes new block(s).
-		 *
-		 * We do this after registering the segment boundary so that the
-		 * comparison with the flushed pointer below can use the latest value
-		 * known globally.
-		 */
 		SpinLockAcquire(&XLogCtl->info_lck);
+		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
 			XLogCtl->LogwrtRqst.Write = EndPos;
 		/* update local result copy while I have the chance *

Re: [PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-02 Thread Ranier Vilela
Em qui., 2 de set. de 2021 às 18:36, Phil Krylov  escreveu:

> Hello,
>
> Lacking a tool to edit postgresql.conf programmatically, people resort
> to passing cluster options on the command line. While passing all
> non-default options in this way may sound like an abuse of the feature,
> IMHO pg_ctl should not blindly truncate generated command lines at
> MAXPGPATH (1024 characters) and then run that, resulting in:
>
The msvc docs says that limit for the command line is 32,767 characters,
while ok for me, I think if not it would be better to check this limit?


> /bin/sh: Syntax error: end of file unexpected (expecting word)
> pg_ctl: could not start server
> Examine the log output.
>
> The attached patch tries to fix it in the least intrusive way.
>
> While we're at it, is it supposed that pg_ctl is a very short-lived
> process and is therefore allowed to leak memory? I've noticed some
> places where I would like to add a free() call.
>
+1 to add free.

regards,
Ranier Vilela


Re: Added schema level support for publication.

2021-09-02 Thread Peter Smith
On Thu, Sep 2, 2021 at 6:50 PM houzj.f...@fujitsu.com
 wrote:
>
> From Wed, Sep 1, 2021 2:36 PM Peter Smith  wrote:
> > Schema objects are not part of the publication. Current only TABLES are in
> > publications, so I thought that \dRp+ output would just be the of "Tables" 
> > in
> > the publication. Schemas would not even be displayed at all (except in the
> > table name).
>
> I think one use case of schema level publication is it can automatically
> publish new table created in the shcema(same as ALL TABLE publication). So,
> IMO, \dRp+ should output Schema level publication separately to make the user
> aware of it.

OK. That is a fair point.

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Read-only vs read only vs readonly

2021-09-02 Thread Bossart, Nathan
On 9/2/21, 11:30 AM, "Magnus Hagander"  wrote:
> I had a customer point out to me that we're inconsistent in how we
> spell read-only. Turns out we're not as inconsistent as I initially
> thought :), but that they did manage to spot the one actual log
> message we have that writes it differently than everything else -- but
> that broke their grepping...
>
> Almost everywhere we use read-only. Attached patch changes the one log
> message where we didn't, as well as a few places in the docs for it. I
> did not bother with things like comments in the code.
> 
> Two questions:
>
> 1. Is it worth fixing? Or just silly nitpicking?

It seems entirely reasonable to me to consistently use "read-only" in
the log messages and documentation.

> 2. What about translations? This string exists in translations --
> should we just "fix" it there, without touching the translated string?
> Or try to fix both? Or leave it for the translators who will get a
> diff on it?

I don't have a strong opinion, but if I had to choose, I would say to
leave it to the translators to decide.

Nathan



Re: .ready and .done files considered harmful

2021-09-02 Thread Bossart, Nathan
On 9/2/21, 6:22 AM, "Dipesh Pandit"  wrote:
> I agree that multiple-files-pre-readdir is cleaner and has the resilience of 
> the
> current implementation. However, I have a few suggestion on keep-trying-the
> -next-file approach patch shared in previous thread.

Which approach do you think we should use?  I think we have decent
patches for both approaches at this point, so perhaps we should see if
we can get some additional feedback from the community on which one we
should pursue further.

> +   /* force directory scan the first time we call pgarch_readyXlog() */
> +   PgArchForceDirScan();
> +
>
> We should not force a directory in pgarch_ArchiverCopyLoop(). This gets called
> whenever archiver wakes up from the wait state. This will result in a
> situation where the archiver performs a full directory scan despite having the
> accurate information about the next anticipated log segment. 
> Instead we can check if lastSegNo is initialized and continue directory scan 
> until it gets initialized in pgarch_readyXlog().

The problem I see with this is that pgarch_archiveXlog() might end up
failing.  If it does, we won't retry archiving the file until we do a
directory scan.  I think we could try to avoid forcing a directory
scan outside of these failure cases and archiver startup, but I'm not
sure it's really worth it.  When pgarch_readyXlog() returns false, it
most likely means that there are no .ready files present, so I'm not
sure we are gaining a whole lot by avoiding a directory scan in that
case.  I guess it might help a bit if there are a ton of .done files,
though.

> +   return lastSegNo;
> We should return "true" here.

Oops.  Good catch.

> I am thinking if we can add a log message for files which are 
> archived as part of directory scan. This will be useful for diagnostic purpose
> to check if desired files gets archived as part of directory scan in special 
> scenarios. I also think that we should add a few comments in 
> pgarch_readyXlog().

I agree, but it should probably be something like DEBUG3 instead of
LOG.

> +   /*
> +* We must use <= because the archiver may have just completed a
> +* directory scan and found a later segment (but hasn't updated
> +* shared memory yet).
> +*/
> +   if (this_segno <= arch_segno)
> +   PgArchForceDirScan();
>
> I still think that we should use '<' operator here because
> arch_segno represents the segment number of the most recent
> .ready file found by the archiver. This gets updated in shared 
> memory only if archiver has successfully found a .ready file.
> In a normal scenario this_segno will be greater than arch_segno 
> whereas in cases where a .ready file is created out of order 
> this_segno may be less than arch_segno. I am wondering
> if there is a scenario where arch_segno is equal to this_segno
> unless I am missing something.

The pg_readyXlog() logic looks a bit like this:

1. Try to skip directory scan.  If that succeeds, we're done.
2. Do a directory scan.
3. If we found a regular WAL file, update PgArch and return
   what we found.

Let's say step 1 looks for WAL file 10, but 10.ready doesn't exist
yet.  The following directory scan ends up finding 11.ready.  Just
before we update the PgArch state, XLogArchiveNotify() is called and
creates 10.ready.  However, pg_readyXlog() has already decided to
return WAL segment 11 and update the state to look for 12 next.  If we
just used '<', we won't force a directory scan, and segment 10 will
not be archived until the next one happens.  If we use '<=', I don't
think we have the same problem.

I've also thought about another similar scenario.  Let's say step 1
looks for WAL file 10, but it doesn't exist yet (just like the
previous example).  The following directory scan ends up finding
12.ready, but just before we update PgArch, we create 11.ready.  In
this case, we'll still skip forcing a directory scan until 10.ready is
created later on.  I believe it all eventually works out as long as we
can safely assume that all files that should have .ready files will
eventually get them.

Nathan



[PATCH] pg_ctl should not truncate command lines at 1024 characters

2021-09-02 Thread Phil Krylov

Hello,

Lacking a tool to edit postgresql.conf programmatically, people resort 
to passing cluster options on the command line. While passing all 
non-default options in this way may sound like an abuse of the feature, 
IMHO pg_ctl should not blindly truncate generated command lines at 
MAXPGPATH (1024 characters) and then run that, resulting in:


/bin/sh: Syntax error: end of file unexpected (expecting word)
pg_ctl: could not start server
Examine the log output.

The attached patch tries to fix it in the least intrusive way.

While we're at it, is it supposed that pg_ctl is a very short-lived 
process and is therefore allowed to leak memory? I've noticed some 
places where I would like to add a free() call.


-- Ph.From 6f2e70025208fa4d0034ddbe5254e2cb9759dd24 Mon Sep 17 00:00:00 2001
From: Phil Krylov 
Date: Thu, 2 Sep 2021 21:39:58 +0200
Subject: [PATCH] pg_ctl should not truncate command lines at 1024 characters

---
 src/bin/pg_ctl/pg_ctl.c| 47 ++
 src/bin/pg_ctl/t/001_start_stop.pl | 41 -
 2 files changed, 67 insertions(+), 21 deletions(-)

diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
index 7985da0a94..199d13c1fb 100644
--- a/src/bin/pg_ctl/pg_ctl.c
+++ b/src/bin/pg_ctl/pg_ctl.c
@@ -442,7 +442,7 @@ free_readfile(char **optlines)
 static pgpid_t
 start_postmaster(void)
 {
-	char		cmd[MAXPGPATH];
+	char	   *cmd;
 
 #ifndef WIN32
 	pgpid_t		pm_pid;
@@ -487,14 +487,15 @@ start_postmaster(void)
 	 * has the same PID as the current child process.
 	 */
 	if (log_file != NULL)
-		snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
- exec_path, pgdata_opt, post_opts,
- DEVNULL, log_file);
+		cmd = psprintf("exec \"%s\" %s%s < \"%s\" >> \"%s\" 2>&1",
+	   exec_path, pgdata_opt, post_opts,
+	   DEVNULL, log_file);
 	else
-		snprintf(cmd, MAXPGPATH, "exec \"%s\" %s%s < \"%s\" 2>&1",
- exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("exec \"%s\" %s%s < \"%s\" 2>&1",
+	   exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	(void) execl("/bin/sh", "/bin/sh", "-c", cmd, (char *) NULL);
+	free(cmd);
 
 	/* exec failed */
 	write_stderr(_("%s: could not start server: %s\n"),
@@ -553,19 +554,21 @@ start_postmaster(void)
 		else
 			close(fd);
 
-		snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
- comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
+		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" >> \"%s\" 2>&1\"",
+	   comspec, exec_path, pgdata_opt, post_opts, DEVNULL, log_file);
 	}
 	else
-		snprintf(cmd, MAXPGPATH, "\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
- comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("\"%s\" /C \"\"%s\" %s%s < \"%s\" 2>&1\"",
+	   comspec, exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (!CreateRestrictedProcess(cmd, &pi, false))
 	{
 		write_stderr(_("%s: could not start server: error code %lu\n"),
 	 progname, (unsigned long) GetLastError());
+		free(cmd);
 		exit(1);
 	}
+	free(cmd);
 	/* Don't close command process handle here; caller must do so */
 	postmasterProcess = pi.hProcess;
 	CloseHandle(pi.hThread);
@@ -828,7 +831,7 @@ find_other_exec_or_die(const char *argv0, const char *target, const char *versio
 static void
 do_init(void)
 {
-	char		cmd[MAXPGPATH];
+	char	   *cmd;
 
 	if (exec_path == NULL)
 		exec_path = find_other_exec_or_die(argv0, "initdb", "initdb (PostgreSQL) " PG_VERSION "\n");
@@ -840,17 +843,19 @@ do_init(void)
 		post_opts = "";
 
 	if (!silent_mode)
-		snprintf(cmd, MAXPGPATH, "\"%s\" %s%s",
- exec_path, pgdata_opt, post_opts);
+		cmd = psprintf("\"%s\" %s%s",
+	   exec_path, pgdata_opt, post_opts);
 	else
-		snprintf(cmd, MAXPGPATH, "\"%s\" %s%s > \"%s\"",
- exec_path, pgdata_opt, post_opts, DEVNULL);
+		cmd = psprintf("\"%s\" %s%s > \"%s\"",
+	   exec_path, pgdata_opt, post_opts, DEVNULL);
 
 	if (system(cmd) != 0)
 	{
 		write_stderr(_("%s: database system initialization failed\n"), progname);
+		free(cmd);
 		exit(1);
 	}
+	free(cmd);
 }
 
 static void
@@ -2175,7 +2180,7 @@ set_starttype(char *starttypeopt)
 static void
 adjust_data_dir(void)
 {
-	char		cmd[MAXPGPATH],
+	char	   *cmd,
 filename[MAXPGPATH],
 			   *my_exec_path;
 	FILE	   *fd;
@@ -2207,18 +2212,20 @@ adjust_data_dir(void)
 		my_exec_path = pg_strdup(exec_path);
 
 	/* it's important for -C to be the first option, see main.c */
-	snprintf(cmd, MAXPGPATH, "\"%s\" -C data_directory %s%s",
-			 my_exec_path,
-			 pgdata_opt ? pgdata_opt : "",
-			 post_opts ? post_opts : "");
+	cmd = psprintf("\"%s\" -C data_directory %s%s",
+   my_exec_path,
+   pgdata_opt ? pgdata_opt : "",
+   post_opts ? post_opts : "");
 
 	fd = popen(cmd, "r");
 	if (fd == NULL || fgets(filename, sizeof(filename), fd) == NULL)
 	{
 		write_stderr(_("%s: could not determine the data directory using command \"%s\"\n"), progname, cmd);
+		free(cmd);
 		exit(1)

Re: a misbehavior of partition row movement (?)

2021-09-02 Thread Andrew Dunstan


On 7/13/21 8:09 AM, Amit Langote wrote:
>
>
>
> @Amit patch is not successfully applying, can you please rebase that. 
>
>
> Thanks for the reminder.
>
> Masahiko Sawada, it's been a bit long since you reviewed the
> patch, are you still interested to review that? 
>
>
> Unfortunately, I don’t think I’ll have time in this CF to solve some
> very fundamental issues I found in the patch during the last cycle. 
> I’m fine with either marking this as RwF for now or move to the next CF.


Amit, do you have time now to work on this?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-09-02 Thread Zhihong Yu
On Thu, Sep 2, 2021 at 12:11 PM Paul Martinez  wrote:

> On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson  wrote:
>
> > This patch no longer applies, can you please submit a rebased version?
> It
> > currently fails on catversion.h, to keep that from happening repeatedly
> you can
> > IMO skip that from the patch submission.
>
> Ah, understood. Will do that in the future. Attached are rebased patches,
> not
> including catversion.h changes, for both the ON UPDATE/DELETE case, and the
> ON DELETE only case.
>
> - Paul
>
Hi,
+   case RI_TRIGTYPE_DELETE:
+   queryno = is_set_null
+   ? RI_PLAN_ONDELETE_SETNULL_DOUPDATE
+   : RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE;

Should the new symbols be renamed ?

RI_PLAN_ONDELETE_SETNULL_DOUPDATE -> RI_PLAN_ONDELETE_SETNULL_DODELETE
RI_PLAN_ONDELETE_SETDEFAULT_DOUPDATE -> RI_PLAN_ONDELETE_SETDEFAULT_DODELETE

Cheers


Re: Skipping logical replication transactions on subscriber side

2021-09-02 Thread Mark Dilger



> On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  wrote:
> 
> I've attached rebased patches. 

Here are some review comments:

For the v12-0002 patch:

The documentation changes for ALTER SUBSCRIPTION .. RESET look strange to me.  
You grouped SET and RESET together, much like sql-altertable.html has them 
grouped, but I don't think it flows naturally here, as the two commands do not 
support the same set of parameters.  It might look better if you documented 
these separately.  It might also be good to order the parameters the same, so 
that the differences can more quickly be seen.

For the v12-0003 patch:

I believe this feature is needed, but it also seems like a very powerful 
foot-gun.  Can we do anything to make it less likely that users will hurt 
themselves with this tool?

I am thinking back to support calls I have attended.  When a production system 
is down, there is often some hesitancy to perform ad-hoc operations on the 
database, but once the decision has been made to do so, people try to get the 
whole process done as quickly as possible.  If multiple transactions on the 
publisher fail on the subscriber, they will do so in series, not in parallel.  
The process of clearing these errors will amount to copying the xid of each 
failed transaction to the ALTER SUBSCRIPTION ... SET (skip_xid = xxx) command 
and running it, then the next, then the next,   Perhaps the first couple 
times through the process, the customer will look to see that the failure is of 
the same type and on the same table, but after a short time they will likely 
just script something to clear the rest as quickly as possible.  In the heat of 
the moment, they may not include a check of the failure message, but merely a 
grep of the failing xid.

If the user could instead clear all failed transactions of the same type, that 
might make it less likely that they unthinkingly also skip subsequent errors of 
some different type.  Perhaps something like ALTER SUBSCRIPTION ... SET 
(skip_failures = 'duplicate key value violates unique constraint "test_pkey"')? 
 This is arguably a different feature request, and not something your patch is 
required to address, but I wonder how much we should limit people shooting 
themselves in the foot?  If we built something like this using your skip_xid 
feature, rather than instead of your skip_xid feature, would your feature need 
to be modified?

The docs could use some rewording, too:

+  If incoming data violates any constraints the logical replication
+  will stop until it is resolved. 

In my experience, logical replication doesn't stop, but instead goes into an 
infinite loop of retries.

+  The resolution can be done either
+  by changing data on the subscriber so that it doesn't conflict with
+  incoming change or by skipping the whole transaction.

I'm having trouble thinking of an example conflict where skipping a transaction 
would be better than writing a BEFORE INSERT trigger on the conflicting table 
which suppresses or redirects conflicting rows somewhere else.  Particularly 
for larger transactions containing multiple statements, suppressing the 
conflicting rows using a trigger would be less messy than skipping the 
transaction.  I think your patch adds a useful tool to the toolkit, but maybe 
we should mention more alternatives in the docs?  Something like, "changing the 
data on the subscriber so that it doesn't conflict with incoming changes, or 
dropping the conflicting constraint or unique index, or writing a trigger on 
the subscriber to suppress or redirect conflicting incoming changes, or as a 
last resort, by skipping the whole transaction"?

Perhaps I'm reading your phrase "changing the data on the subscriber" too 
narrowly.  To me, that means running DML (either a DELETE or an UPDATE) on the 
existing data in the table where the conflict arises.  These other options are 
DDL and do not easily come to mind when I read that phrase.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Skipping logical replication transactions on subscriber side

2021-09-02 Thread Mark Dilger


> On Aug 30, 2021, at 12:06 AM, Masahiko Sawada  wrote:
> 
> I've attached rebased patches.

Thanks for these patches, Sawada-san!

The first patch in your series, v12-0001, seems useful to me even before 
committing any of the rest.  I would like to integrate the new 
pg_stat_subscription_errors view it creates into regression tests for other 
logical replication features under development.

In particular, it can be hard to write TAP tests that need to wait for 
subscriptions to catch up or fail.  With your view committed, a new 
PostgresNode function to wait for catchup or for failure can be added, and then 
developers of different projects can all use that.  I am attaching a version of 
such a function, plus some tests of your patch (since it does not appear to 
have any).  Would you mind reviewing these and giving comments or including 
them in your next patch version?



0001-Adding-tests-of-subscription-errors.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: [PATCH] Partial foreign key updates in referential integrity triggers

2021-09-02 Thread Paul Martinez
On Wed, Sep 1, 2021 at 4:11 AM Daniel Gustafsson  wrote:

> This patch no longer applies, can you please submit a rebased version?  It
> currently fails on catversion.h, to keep that from happening repeatedly you 
> can
> IMO skip that from the patch submission.

Ah, understood. Will do that in the future. Attached are rebased patches, not
including catversion.h changes, for both the ON UPDATE/DELETE case, and the
ON DELETE only case.

- Paul


referential-actions-set-cols-v3.patch
Description: Binary data


referential-actions-on-delete-only-set-cols-v2.patch
Description: Binary data


Re: automatically generating node support functions

2021-09-02 Thread Jacob Champion
On Tue, 2021-08-17 at 16:36 +0200, Peter Eisentraut wrote:
> Here is another set of preparatory patches that clean up various special 
> cases and similar in the node support.
> 
> 0001-Remove-T_Expr.patch
> 
> Removes unneeded T_Expr.
> 
> 0002-Add-COPY_ARRAY_FIELD-and-COMPARE_ARRAY_FIELD.patch
> 0003-Add-WRITE_INDEX_ARRAY.patch
> 
> These add macros to handle a few cases that were previously hand-coded.

These look sane to me.

> 0004-Make-node-output-prefix-match-node-structure-name.patch
> 
> Some nodes' output/read functions use a label that is slightly different 
> from their node name, e.g., "NOTIFY" instead of "NOTIFYSTMT".  This 
> cleans that up so that an automated approach doesn't have to deal with 
> these special cases.

Is there any concern about the added serialization length, or is that
trivial in practice? The one that particularly caught my eye is
RANGETBLENTRY, which was previously RTE. But I'm not very well-versed
in all the places these strings can be generated and stored.

> 0005-Add-Cardinality-typedef.patch
> 
> Adds a typedef Cardinality for double fields that store an estimated row 
> or other count.  Works alongside Cost and Selectivity.

Should RangeTblEntry.enrtuples also be a Cardinality?

--Jacob


Read-only vs read only vs readonly

2021-09-02 Thread Magnus Hagander
I had a customer point out to me that we're inconsistent in how we
spell read-only. Turns out we're not as inconsistent as I initially
thought :), but that they did manage to spot the one actual log
message we have that writes it differently than everything else -- but
that broke their grepping...

Almost everywhere we use read-only. Attached patch changes the one log
message where we didn't, as well as a few places in the docs for it. I
did not bother with things like comments in the code.

Two questions:

1. Is it worth fixing? Or just silly nitpicking?

2. What about translations? This string exists in translations --
should we just "fix" it there, without touching the translated string?
Or try to fix both? Or leave it for the translators who will get a
diff on it?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 2b2c70a26e..2f0def9b19 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -9602,7 +9602,7 @@ SCRAM-SHA-256$:&l
   
 
   
-   The pg_available_extensions view is read only.
+   The pg_available_extensions view is read-only.
   
  
 
@@ -9726,8 +9726,8 @@ SCRAM-SHA-256$:&l
   
 
   
-   The pg_available_extension_versions view is read
-   only.
+   The pg_available_extension_versions view is
+   read-only.
   
  
 
@@ -10042,7 +10042,7 @@ SCRAM-SHA-256$:&l
   
 
   
-   The pg_cursors view is read only.
+   The pg_cursors view is read-only.
   
 
  
@@ -11164,7 +11164,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx
   
 
   
-   The pg_prepared_statements view is read only.
+   The pg_prepared_statements view is read-only.
   
  
 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index 22af7dbf51..c43f214020 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -1093,7 +1093,7 @@ primary_slot_name = 'node_a_slot'

 

-Read only transactions and transaction rollbacks need not wait for
+Read-only transactions and transaction rollbacks need not wait for
 replies from standby servers. Subtransaction commits do not wait for
 responses from standby servers, only top-level commits. Long
 running actions such as data loading or index building do not wait
@@ -1962,7 +1962,7 @@ LOG:  entering standby mode
 ... then some time later ...
 
 LOG:  consistent recovery state reached
-LOG:  database system is ready to accept read only connections
+LOG:  database system is ready to accept read-only connections
 
 
 Consistency information is recorded once per checkpoint on the primary.
@@ -2191,7 +2191,7 @@ HINT:  You can then restart the server after making the necessary configuration

 

-Currently, temporary table creation is not allowed during read only
+Currently, temporary table creation is not allowed during read-only
 transactions, so in some cases existing scripts will not run correctly.
 This restriction might be relaxed in a later release. This is
 both an SQL Standard compliance issue and a technical issue.
@@ -2290,7 +2290,7 @@ HINT:  You can then restart the server after making the necessary configuration
 
  Full knowledge of running transactions is required before snapshots
  can be taken. Transactions that use large numbers of subtransactions
- (currently greater than 64) will delay the start of read only
+ (currently greater than 64) will delay the start of read-only
  connections until the completion of the longest running write transaction.
  If this situation occurs, explanatory messages will be sent to the server log.
 
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index d358bbe4a6..cfdcb74221 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -526,7 +526,7 @@ ERROR:  could not serialize access due to concurrent update
 transaction sees a completely stable view of the database.  However,
 this view will not necessarily always be consistent with some serial
 (one at a time) execution of concurrent transactions of the same level.
-For example, even a read only transaction at this level may see a
+For example, even a read-only transaction at this level may see a
 control record updated to show that a batch has been completed but
 not see one of the detail records which is logically
 part of the batch because it read an earlier revision of the control
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9c2c98614a..63043ed8d1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -5199,7 +5199,7 @@ sigusr1_handler(SIGNAL_ARGS)
 		PgStatPID = pgstat_start();
 
 		ereport(LOG,
-(errmsg("database system is ready to accept read only connections")));
+(errmsg("database system is re

Re: Is it safe to use the extended protocol with COPY?

2021-09-02 Thread Robert Haas
On Wed, Sep 1, 2021 at 2:43 PM Tom Lane  wrote:
> Daniele Varrazzo  writes:
> > Someone [1] has pointed out this conversation [2] which suggests that
> > COPY with extended protocol might break in the future.
>
> As was pointed out in that same thread, the odds of us actually
> breaking that case are nil.  I wouldn't recommend changing your
> code on this basis.

I agree that there doesn't seem to be an risk of a wire protocol
change in the near future, but it might still be a good idea to change
any code that does this on the grounds that the current wire protocol
makes reliable error handling impossible - unless you wait to send
Sync until you see how the server responds to the earlier messages.[1]

[1] 
https://www.postgresql.org/message-id/CA%2BTgmoa4eA%2BcPXaiGQmEBp9XisVd3ZE9dbvnbZEvx9UcMiw2tg%40mail.gmail.com

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] support tab-completion for single quote input with equal sign

2021-09-02 Thread Jacob Champion
On Fri, 2021-07-23 at 05:34 +, tanghy.f...@fujitsu.com wrote:
> On Thursday, July 22, 2021 1:05 PM, tanghy.f...@fujitsu.com 
>  wrote
> > I found a problem when using tab-completion as follows:
> > 
> > CREATE SUBSCRIPTION my_subscription 
> > CONNECTION 'host=localhost port=5432 dbname=postgres' [TAB]
> > 
> > The word 'PUBLICATION' couldn't be auto completed as expected.

Hello,

I applied your patch against HEAD (and did a clean build for good
measure) but couldn't get the tab-completion you described -- on my
machine, `PUBLICATION` still fails to complete. Tab completion is
working in general, for example with the `SUBSCRIPTION` and
`CONNECTION` keywords.

Is there additional setup that I need to do?

--Jacob


Re: Possible missing segments in archiving on standby

2021-09-02 Thread Fujii Masao



On 2021/09/02 10:16, Kyotaro Horiguchi wrote:

Ok, I agree that the reader-side needs an amendment.


Thanks for the review! Attached is the updated version of the patch.
Based on my latest patch, I changed the startup process so that
it creates an archive notification file of the streamed WAL segment
including XLOG_SWITCH record if the notification file has not been created yet.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 24165ab03e..6c407045dd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7576,6 +7576,34 @@ StartupXLOG(void)
/* Handle interrupt signals of startup process 
*/
HandleStartupProcInterrupts();
 
+   /*
+* Create an archive notification file of a 
streamed WAL
+* segment if it includes an XLOG_SWITCH record 
and its
+* notification file has not been created yet. 
This is
+* necessary to handle the corner case that 
walreceiver may
+* fail to create such notification file if it 
exits after it
+* receives XLOG_SWITCH record but while it's 
receiving the
+* remaining bytes in the segment. Without this 
handling, WAL
+* archiving of the segment will be delayed 
until subsequent
+* checkpoint creates its notification file 
when removing it
+* even though it can be archived soon.
+*/
+   if (readSource == XLOG_FROM_STREAM &&
+   record->xl_rmid == RM_XLOG_ID &&
+   (record->xl_info & ~XLR_INFO_MASK) == 
XLOG_SWITCH)
+   {
+   char
xlogfilename[MAXFNAMELEN];
+
+   XLogFileName(xlogfilename, curFileTLI, 
readSegNo, wal_segment_size);
+   if 
(!XLogArchiveIsReadyOrDone(xlogfilename))
+   {
+   if (XLogArchivingAlways())
+   
XLogArchiveNotify(xlogfilename, true);
+   else
+   
XLogArchiveForceDone(xlogfilename);
+   }
+   }
+
/*
 * Pause WAL replay, if requested by a 
hot-standby session via
 * SetRecoveryPause().
diff --git a/src/backend/replication/walreceiver.c 
b/src/backend/replication/walreceiver.c
index 60de3be92c..5b07eef3aa 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -125,6 +125,7 @@ static void WalRcvDie(int code, Datum arg);
 static void XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len);
 static void XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr);
 static void XLogWalRcvFlush(bool dying);
+static void XLogWalRcvClose(XLogRecPtr recptr);
 static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
@@ -883,42 +884,11 @@ XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr)
{
int segbytes;
 
-   if (recvFile < 0 || !XLByteInSeg(recptr, recvSegNo, 
wal_segment_size))
+   /* Close the current segment if it's completed */
+   XLogWalRcvClose(recptr);
+
+   if (recvFile < 0)
{
-   /*
-* fsync() and close current file before we switch to 
next one. We
-* would otherwise have to reopen this file to fsync it 
later
-*/
-   if (recvFile >= 0)
-   {
-   charxlogfname[MAXFNAMELEN];
-
-   XLogWalRcvFlush(false);
-
-   XLogFileName(xlogfname, recvFileTLI, recvSegNo, 
wal_segment_size);
-
-   /*
-* XLOG segment files will be re-read by 
recovery in startup
-* process soon, so we don't advise the OS to 
release cache
-* pages associated with the file like 
XLogFileClose() do

Re: Estimating HugePages Requirements?

2021-09-02 Thread Bossart, Nathan
On 9/2/21, 12:54 AM, "Michael Paquier"  wrote:
> Thanks.  Anyway, we don't really need huge_pages_required on Windows,
> do we?  The following docs of Windows tell what do to when using large
> pages:
> https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support
>
> The backend code does that as in PGSharedMemoryCreate(), now that I
> look at it.  And there is no way to change the minimum large page size
> there as far as I can see because that's decided by the processor, no?
> There is a case for shared_memory_size on Windows to be able to adjust
> the sizing of the memory of the host, though.

Yeah, huge_pages_required might not serve much purpose for Windows.
We could always set it to -1 for Windows if it seems like it'll do
more harm than good.

> At the end it would be nice to not finish with two GUCs.  Both depend
> on the reordering of the actions done by the postmaster, so I'd be
> curious to hear the thoughts of others on this particular point.

Of course.  It'd be great to hear others' thoughts on this stuff.

Nathan



Re: [Patch] ALTER SYSTEM READ ONLY

2021-09-02 Thread Robert Haas
On Tue, Aug 31, 2021 at 8:16 AM Amul Sul  wrote:
> Attached is the rebased version for the latest master head. Also,
> added tap tests to test some part of this feature and a separate patch
> to test recovery_end_command execution.

It looks like you haven't given any thought to writing that in a way
that will work on Windows?

> What is usual practice, can have a few tests in TAP and a few in
> pg_regress for the same feature?

Sure, there's no problem with that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: improve pg_receivewal code

2021-09-02 Thread Ronan Dunklau
Le lundi 30 août 2021, 09:32:05 CEST Bharath Rupireddy a écrit :
> Hi,
> 
> I see there's a scope to do following improvements to pg_receivewal.c:

Thank you Bharath for this patch.

> 
> 1) Fetch the server system identifier in the first RunIdentifySystem
> call and use it to identify(via pg_receivewal's ReceiveXlogStream) any
> unexpected changes that may happen in the server while pg_receivewal
> is connected to it. This can be helpful in scenarios when
> pg_receivewal tries to reconnect to the server (see the code around
> pg_usleep with RECONNECT_SLEEP_TIME) but something unexpected has
> happnend in the server that changed the its system identifier. Once
> the pg_receivewal establishes the conenction to server again, then the
> ReceiveXlogStream has a code chunk to compare the system identifier
> that we received in the initial connection.

I'm not sure what kind of problem could be happening here: if you were 
somewhat routed to another server ? Or if we "switched" the cluster listening 
on that port ? 

> 2) Move the RunIdentifySystem to identify timeline id and start LSN
> from the server only if the pg_receivewal failed to get them from
> FindStreamingStart. This way, an extra IDENTIFY_SYSTEM command is
> avoided.

That makes sense, even if we add another IDENTIFY_SYSTEM to check against the 
one set in the first place.

> 3) Place the "replication connetion shouldn't have any database name
> associated" error check right after RunIdentifySystem so that we can
> avoid fetching wal segment size with RetrieveWalSegSize if at all we
> were to fail with that error. This change is similar to what
> pg_recvlogical.c does.

Makes sense.

> 4) Move the RetrieveWalSegSize to just before pg_receivewal.c enters
> main loop to get the wal from the server. This avoids an unnecessary
> query for pg_receivewal with "--create-slot" or "--drop-slot".
> 5) Have an assertion after the pg_receivewal done a good amount of
> work to find start timeline and LSN might be helpful:
> Assert(stream.timeline != 0 && stream.startpos != InvalidXLogRecPtr);
> 
> Attaching a patch that does take care of above improvements. Thoughts?

Overall I think it is good.

However, you have some formatting issues, here it mixes tabs and spaces:

+   /*
+* No valid data can be found in the existing output 
directory.
+* Get start LSN position and current timeline ID from 
the server.
+*/

And here it is not formatted properly:

+static char   *server_sysid = NULL;



> 
> Regards,
> Bharath Rupireddy.


-- 
Ronan Dunklau






Re: using an end-of-recovery record in all cases

2021-09-02 Thread Robert Haas
On Mon, Aug 9, 2021 at 3:00 PM Robert Haas  wrote:
> I decided to try writing a patch to use an end-of-recovery record
> rather than a checkpoint record in all cases.
>
> The first problem I hit was that GetRunningTransactionData() does
> Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid)).
>
> Unfortunately we can't just relax the assertion, because the
> XLOG_RUNNING_XACTS record will eventually be handed to
> ProcArrayApplyRecoveryInfo() for processing ... and that function
> contains a matching assertion which would in turn fail. It in turn
> passes the value to MaintainLatestCompletedXidRecovery() which
> contains yet another matching assertion, so the restriction to normal
> XIDs here looks pretty deliberate. There are no comments, though, so
> the reader is left to guess why. I see one problem:
> MaintainLatestCompletedXidRecovery uses FullXidRelativeTo, which
> expects a normal XID. Perhaps it's best to just dodge the entire issue
> by skipping LogStandbySnapshot() if latestCompletedXid happens to be
> 2, but that feels like a hack, because AFAICS the real problem is that
> StartupXLog() doesn't agree with the rest of the code on whether 2 is
> a legal case, and maybe we ought to be storing a value that doesn't
> need to be computed via TransactionIdRetreat().

Anyone have any thoughts about this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: replay of CREATE TABLESPACE eats data at wal_level=minimal

2021-09-02 Thread Robert Haas
On Wed, Aug 25, 2021 at 8:03 AM Robert Haas  wrote:
> On Wed, Aug 25, 2021 at 1:21 AM Noah Misch  wrote:
> > Sounds good.  I think the log message is the optimal place:
>
> Looks awesome.

Is there anything still standing in the way of committing this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-09-02 Thread Robert Haas
On Thu, Sep 2, 2021 at 2:06 AM Dilip Kumar  wrote:
> 0003- The main patch for WAL logging the created database operation.

Andres pointed out that this approach ends up accessing relations
without taking a lock on them. It doesn't look like you did anything
about that.

+ /* Built-in oids are mapped directly */
+ if (classForm->oid < FirstGenbkiObjectId)
+ relfilenode = classForm->oid;
+ else if (OidIsValid(classForm->relfilenode))
+ relfilenode = classForm->relfilenode;
+ else
+ continue;

Am I missing something, or is this totally busted?

[rhaas pgsql]$ createdb
[rhaas pgsql]$ psql
psql (15devel)
Type "help" for help.

rhaas=# select oid::regclass from pg_class where relfilenode not in
(0, oid) and oid < 1;
 oid
-
(0 rows)

rhaas=# vacuum full pg_attrdef;
VACUUM
rhaas=# select oid::regclass from pg_class where relfilenode not in
(0, oid) and oid < 1;
  oid

 pg_attrdef_adrelid_adnum_index
 pg_attrdef_oid_index
 pg_toast.pg_toast_2604
 pg_toast.pg_toast_2604_index
 pg_attrdef
(5 rows)

  /*
+ * Now drop all buffers holding data of the target database; they should
+ * no longer be dirty so DropDatabaseBuffers is safe.

The way things worked before, this was true, but now AFAICS it's
false. I'm not sure whether that means that DropDatabaseBuffers() here
is actually unsafe or whether it just means that you haven't updated
the comment to explain the reason.

+ * Since we copy the file directly without looking at the shared buffers,
+ * we'd better first flush out any pages of the source relation that are
+ * in shared buffers.  We assume no new changes will be made while we are
+ * holding exclusive lock on the rel.

Ditto.

+ /* As always, WAL must hit the disk before the data update does. */

Actually, the way it's coded now, part of the on-disk changes are done
before WAL is issued, and part are done after. I doubt that's the
right idea. There's nothing special about writing the actual payload
bytes vs. the other on-disk changes (creating directories and files).
In any case the ordering deserves a better-considered comment than
this one.

+ XLogRegisterData((char *) PG_MAJORVERSION, nbytes);

Surely this is utterly pointless.

+ CopyDatabase(src_dboid, dboid, src_deftablespace, dst_deftablespace);
  PG_END_ENSURE_ERROR_CLEANUP(createdb_failure_callback,
  PointerGetDatum(&fparms));

I'd leave braces around the code for which we're ensuring error
cleanup even if it's just one line.

+ if (info == XLOG_DBASEDIR_CREATE)
  {
  xl_dbase_create_rec *xlrec = (xl_dbase_create_rec *) XLogRecGetData(record);

Seems odd to rename the record but not change the name of the struct.
I think I would be inclined to keep the existing record name, but if
we're going to change it we should be more thorough.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: shared-memory based stats collector

2021-09-02 Thread Jaime Casanova
On Mon, Jul 26, 2021 at 06:27:54PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2021-07-26 17:52:01 +0900, Kyotaro Horiguchi wrote:
> > > > Yeah, thank you very much for checking that. However, this patch is
> > > > now developed in Andres' GitHub repository.  So I'm at a loss what to
> > > > do for the failure..
> > > 
> > > I'll post a rebased version soon.
> > 
> > (Sorry if you feel being hurried, which I didn't meant to.)
> 
> No worries!
> 
> I had intended to post a rebase by now. But while I did mostly finish
> that (see [1]) I unfortunately encountered a new issue around
> partitioned tables, see [2]. Currently I'm hoping for a few thoughts on
> that thread about the best way to address the issues.
> 
> Greetings,
> 
> Andres Freund
> 
> [1] https://github.com/anarazel/postgres/tree/shmstat
> [2] 
> https://www.postgresql.org/message-id/20210722205458.f2bug3z6qzxzpx2s%40alap3.anarazel.de
> 
> 

Hi Andres,

Are you going planning to post a rebase soon?

-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL




Re: mark the timestamptz variant of date_bin() as stable

2021-09-02 Thread John Naylor
On Wed, Sep 1, 2021 at 3:25 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > On Wed, Sep 1, 2021 at 2:44 PM Tom Lane  wrote:
> >> I see that these two answers are both exactly multiples of 24 hours
away
> >> from the given origin.  But if I'm binning on the basis of "days" or
> >> larger units, I would sort of expect to get local midnight, and I'm not
> >> getting that once I cross a DST boundary.
>
> > Hmm, that's seems like a reasonable expectation. I can get local
midnight
> > if I recast to timestamp:
>
> > # select date_bin('1 day', '2021-11-10 00:00
+00'::timestamptz::timestamp,
> > '2021-09-01 00:00 -04'::timestamptz::timestamp);
> >   date_bin
> > -
> >  2021-11-09 00:00:00
> > (1 row)
>
> Yeah, and then back to timestamptz if that's what you really need :-(
>
> > It's a bit unintuitive, though.
>
> Agreed.  If we keep it like this, adding some documentation around
> the point would be a good idea I think.

Having heard no votes on changing this behavior (and it would be a bit of
work), I'll start on a documentation patch. And I'll go ahead and re-mark
the function as immutable tomorrow barring objections.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: .ready and .done files considered harmful

2021-09-02 Thread Dipesh Pandit
Hi,

Thanks for the feedback.

> I attached two patches that demonstrate what I'm thinking this change
> should look like.  One is my take on the keep-trying-the-next-file
> approach, and the other is a new version of the multiple-files-per-
> readdir approach (with handling for "cheating" archive commands).  I
> personally feel that the multiple-files-per-readdir approach winds up
> being a bit cleaner and more resilient than the keep-trying-the-next-
> file approach.  However, the keep-trying-the-next-file approach will
> certainly be more efficient (especially for the extreme cases
> discussed in this thread), and I don't have any concrete concerns with
> this approach that seem impossible to handle.

I agree that multiple-files-pre-readdir is cleaner and has the resilience
of the
current implementation. However, I have a few suggestion on keep-trying-the
-next-file approach patch shared in previous thread.

+   /* force directory scan the first time we call pgarch_readyXlog() */
+   PgArchForceDirScan();
+

We should not force a directory in pgarch_ArchiverCopyLoop(). This gets
called
whenever archiver wakes up from the wait state. This will result in a
situation where the archiver performs a full directory scan despite having
the
accurate information about the next anticipated log segment.
Instead we can check if lastSegNo is initialized and continue directory
scan
until it gets initialized in pgarch_readyXlog().

+   return lastSegNo;
We should return "true" here.

I am thinking if we can add a log message for files which are
archived as part of directory scan. This will be useful for diagnostic
purpose
to check if desired files gets archived as part of directory scan in
special
scenarios. I also think that we should add a few comments in
pgarch_readyXlog().

I have incorporated these changes and attached a patch
v1-0001-keep-trying-the-next-file-approach.patch.

+   /*
+* We must use <= because the archiver may have just completed a
+* directory scan and found a later segment (but hasn't updated
+* shared memory yet).
+*/
+   if (this_segno <= arch_segno)
+   PgArchForceDirScan();

I still think that we should use '<' operator here because
arch_segno represents the segment number of the most recent
.ready file found by the archiver. This gets updated in shared
memory only if archiver has successfully found a .ready file.
In a normal scenario this_segno will be greater than arch_segno
whereas in cases where a .ready file is created out of order
this_segno may be less than arch_segno. I am wondering
if there is a scenario where arch_segno is equal to this_segno
unless I am missing something.

Thanks,
Dipesh
From 4fb41105ba4ed9d02a2a551bb4f6cf693ec5c31e Mon Sep 17 00:00:00 2001
From: Dipesh Pandit 
Date: Thu, 2 Sep 2021 17:16:20 +0530
Subject: [PATCH] keep trying the next file approach

---
 src/backend/access/transam/xlog.c|  20 
 src/backend/access/transam/xlogarchive.c |  23 
 src/backend/postmaster/pgarch.c  | 195 ++-
 src/include/access/xlogdefs.h|   1 +
 src/include/postmaster/pgarch.h  |   4 +
 5 files changed, 216 insertions(+), 27 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 24165ab..49caa61 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -9483,6 +9483,16 @@ CreateCheckPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per checkpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
@@ -9848,6 +9858,16 @@ CreateRestartPoint(int flags)
 	RemoveOldXlogFiles(_logSegNo, RedoRecPtr, endptr);
 
 	/*
+	 * Force the archiver to perform a directory scan.
+	 *
+	 * Ordinarily, this should not be needed, but it seems like a good idea
+	 * to make sure we scan the archive_status directory every once in a
+	 * while to make sure we haven't left anything behind.  Calling it here
+	 * ensures we do a directory scan at least once per restartpoint.
+	 */
+	PgArchForceDirScan();
+
+	/*
 	 * Make more log segments if needed.  (Do this after recycling old log
 	 * segments, since that may supply some of the needed files.)
 	 */
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index b9c19b2..44630e7 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -492,6 +492,29 @@ XLogArchiveNotify(const char *xlog, bool nudge)
 	

Re: Fix pkg-config file for static linking

2021-09-02 Thread Filip Gospodinov

On 02.09.21 13:07, Peter Eisentraut wrote:

On 20.07.21 22:04, Filip Gospodinov wrote:
Anyway, this issue is orthogonal to my original patch. I'm proposing 
to hardcode -lpgcommon and -lpgport in Libs.private instead. Patch is 
attached.


Makes sense.  I think we could do it without hardcoding those library 
names, as in the attached patch.  But it comes out to the same result 
AFAICT.


That is my impression too. Enumerating them in a variable would just 
lead to an indirection. Please let me know if you'd still prefer a 
solution without hardcoding.





Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Thomas Munro
On Fri, Sep 3, 2021 at 12:44 AM Thomas Munro  wrote:
> NtFileCreate()

Erm, that's spelled NtCreateFile.  I see Michael mentioned this
before[1]; I don't think it's only available in kernel mode though,
the docs[2] say "This function is the user-mode equivalent to the
ZwCreateFile function", and other open source user space stuff is
using it.  It's explicitly internal and subject to change though,
hence my desire to avoid it.

[1] 
https://www.postgresql.org/message-id/flat/a9c76882-27c7-9c92-7843-21d5521b70a9%40postgrespro.ru
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/winternl/nf-winternl-ntcreatefile




Re: corruption of WAL page header is never reported

2021-09-02 Thread Fujii Masao




On 2021/09/02 16:26, Kyotaro Horiguchi wrote:

I believe errmsg_buf is an interface to emit error messages dedicated
to xlogreader that doesn't have an access to elog facility, and
xlogreader doesn't (or ought not to or expect to) suppose
non-xlogreader callback functions set the variable.  In that sense I
don't think theoriginally proposed patch is proper for the reason that
the non-xlogreader callback function may set errmsg_buf.  This is what
I meant by the word "modularity".

For that reason I avoided in my second proposal to call
XLogReaderValidatePageHeader() at all while not in standby mode,
because calling the validator function while in non-standby mode
results in the non-xlogreader function return errmsg_buf.  Of course
we can instead always consume errmsg_buf in the function but I don't
like to shadow the caller's task.


Understood. Thanks for clarifying this!


Does that makes sense?


Yes, I'm fine with your latest patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Fix typo in comments

2021-09-02 Thread Fujii Masao




On 2021/09/02 20:54, houzj.f...@fujitsu.com wrote:

Hi,

When reviewing other patches, I noticed two typos:


Thanks! Both fixes look good to me.
Barring any objection, I will commit the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Thomas Munro
On Thu, Sep 2, 2021 at 11:12 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > I'm no expert, but not AFAICS.  We managed to delete the file while
> > some other backend had it open, which FILE_SHARE_DELETE allowed.  We
> > just can't open it or create a new file with the same name until it's
> > really gone (all handles closed).
>
> Right, but we shouldn't ever need to access such a file --- we couldn't do
> so on Unix, certainly.  So for the open() case, it's sufficient to return
> ENOENT, and the problem is only to make sure that that's what we return if
> the underlying error is ERROR_DELETE_PENDING.

Yeah.  The problem is that it still shows up in directory listings
AFAIK, so something like basebackup.c sees it, and even if it didn't,
it reads the directory, and then stats the files, and then opens the
files at different times.  The non-public API way to ask for the real
reason after such a failure would apparently be to call
NtFileCreate(), which can return STATUS_DELETE_PENDING.  I don't know
what complications that might involve, but I see now that we have code
that digs such non-public APIs out of ntdll.dll already (for long dead
OS versions only).  Hmm.

(Another thing you can't do is delete the directory that contains such
a file, which is a problem for DROP TABLESPACE and the reason I
developed the global barrier thing.)

> It's harder if the desire is to create a new file of the same name.
> I'm inclined to think that the best answer might be "if it hurts,
> don't do that".  We should not have such a case for ordinary relation
> files or WAL files, but maybe there are individual other cases where
> some redesign is indicated?

I guess GetNewRelFileNode()’s dilemma branch is an example; it'd
probably be nicer to users to treat a pending-deleted file as a
collision.

if (access(rpath, F_OK) == 0)
{
/* definite collision */
collides = true;
}
else
{
/*
 * Here we have a little bit of a dilemma: if errno is something
 * other than ENOENT, should we declare a collision and loop? In
 * practice it seems best to go ahead regardless of the errno.  If
 * there is a colliding file we will get an smgr failure when we
 * attempt to create the new relation file.
 */
collides = false;
}




Re: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-09-02 Thread Fujii Masao




On 2021/09/02 18:27, kuroda.hay...@fujitsu.com wrote:

I added the following comments:

```diff
-   /* Use "postgres_fdw" as fallback_application_name. */
+   /*
+* Use pgfdw_application_name as application_name.
+*
+* Note that this check must be behind constructing generic 
options
+* because pgfdw_application_name has higher priority.
+*/
```


Thanks! What about updating the comments furthermore as follows?

-
Use pgfdw_application_name as application_name if set.

PQconnectdbParams() processes the parameter arrays from start to end.
If any key word is repeated, the last value is used. Therefore note that
pgfdw_application_name must be added to the arrays after options of
ForeignServer are, so that it can override application_name set in
ForeignServer.
-


Attached is the fixed version. 0002 will be rebased later.


Thanks for updating the patch!

+   }
+   /* Use "postgres_fdw" as fallback_application_name */

It's better to add new empty line between these two lines.

+-- Disconnect once because the value is used only when establishing connections
+DO $$
+   BEGIN
+   PERFORM postgres_fdw_disconnect_all();
+   END
+$$;

Why does DO command need to be used here to execute
postgres_fdw_disconnect_all()? Instead, we can just execute
"SELECT 1 FROM postgres_fdw_disconnect_all();"?

For test coverage, it's better to test at least the following three cases?

(1) appname is set in neither GUC nor foreign server
   -> "postgres_fdw" set in fallback_application_name is used
(2) appname is set in foreign server, but not in GUC
   -> appname in foreign server is used
(3) appname is set both in GUC and foreign server
  -> appname in GUC is used

+SELECT FROM ft1 LIMIT 1;

"1" should be added just after SELECT in the above statement?
Because postgres_fdw regression test basically uses "SELECT 1 FROM ..."
in other places.

+   DefineCustomStringVariable("postgres_fdw.application_name",
+  "Sets the application 
name. This is used when connects to the remote server.",

What about simplifying this description as follows?

---
Sets the application name to be used on the remote server.
---

+   Configuration Parameters 
+  

The empty characters just after  and before  should be removed?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: Skipping logical replication transactions on subscriber side

2021-09-02 Thread Greg Nancarrow
On Mon, Aug 30, 2021 at 5:07 PM Masahiko Sawada  wrote:
>
> I've attached rebased patches. 0004 patch is not the scope of this
> patch. It's borrowed from another thread[1] to fix the assertion
> failure for newly added tests. Please review them.
>

Some initial comments for the v12-0003 patch:

(1) Patch comment
"This commit introduces another way to skip the transaction in question."

I think it should further explain: "This commit introduces another way
to skip the transaction in question, other than manually updating the
subscriber's database or using pg_replication_origin_advance()."


doc/src/sgml/logical-replication.sgml
(2)

Suggested minor update:

BEFORE:
+   transaction that conflicts with the existing data.  When a conflict produce
+   an error, it is shown in
pg_stat_subscription_errors
+   view as follows:
AFTER:
+   transaction that conflicts with the existing data.  When a conflict produces
+   an error, it is recorded in the
pg_stat_subscription_errors
+   view as follows:

(3)
+   found from those outputs (transaction ID 740 in the above case).
The transaction

Shouldn't it be transaction ID 716?

(4)
+   can be skipped by setting skip_xid to
the subscription

Is it better to say here ... "on the subscription" ?

(5)
Just skipping a transaction could make a subscriber inconsistent, right?

Would it be better as follows?

BEFORE:
+   In either way, those should be used as a last resort. They skip the whole
+   transaction including changes that may not violate any constraint and easily
+   make subscriber inconsistent if a user specifies the wrong transaction ID or
+   the position of origin.

AFTER:
+   Either way, those transaction skipping methods should be used as a
last resort.
+   They skip the whole transaction, including changes that may not violate any
+   constraint.  They may easily make the subscriber inconsistent,
especially if a
+   user specifies the wrong transaction ID or the position of origin.

(6)
The grammar is not great in the following description, so here's a
suggested improvement:

BEFORE:
+  incoming change or by skipping the whole transaction.  This option
+  specifies transaction ID that logical replication worker skips to
+  apply.  The logical replication worker skips all data modification

AFTER:
+  incoming changes or by skipping the whole transaction.  This option
+  specifies the ID of the transaction whose application is to
be skipped
+  by the logical replication worker.  The logical replication worker
+  skips all data modification


src/backend/postmaster/pgstat.c
(7)
BEFORE:
+ * Tell the collector about clear the error of subscription.
AFTER:
+ * Tell the collector to clear the subscription error.


src/backend/replication/logical/worker.c
(8)
+ * subscription is invalidated and* MySubscription->skipxid gets
changed or reset.

There is a "*" after "and".

(9)
Do these lines really need to be moved up?

+ /* extract XID of the top-level transaction */
+ stream_xid = logicalrep_read_stream_start(s, &first_segment);
+

src/backend/postmaster/pgstat.c
(10)

+ bool m_clear; /* clear all fields except for last_failure and
+ * last_errmsg */

I think it should say: clear all fields except for last_failure,
last_errmsg and stat_reset_timestamp.


Regards,
Greg Nancarrow
Fujitsu Australia




Fix typo in comments

2021-09-02 Thread houzj.f...@fujitsu.com
Hi,

When reviewing other patches, I noticed two typos:

1.
src/backend/parser/gram.y
ALTER TABLE  ALTER [COLUMN]  RESET ( column_parameter = value [, 
... ] )

RESET cannot specify value.

2.
src/backend/utils/adt/xid8funcs.c
* Same as pg_current_xact_if_assigned() but doesn't assign a new xid if there

pg_current_xact_if_assigned() should be pg_current_xact_id()

Best regards,
Hou zhijie



0001-fix-typo-in-comments.patch
Description: 0001-fix-typo-in-comments.patch


Re: Postgres Win32 build broken?

2021-09-02 Thread Andrew Dunstan


On 9/1/21 8:01 PM, Ranier Vilela wrote:
> Em qua., 1 de set. de 2021 às 19:49, Andrew Dunstan
> mailto:and...@dunslane.net>> escreveu:
>
>
> On 9/1/21 4:00 PM, Andrew Dunstan wrote:
> > On 8/31/21 9:52 PM, Michael Paquier wrote:
> >> On Tue, Aug 31, 2021 at 07:49:40PM -0300, Ranier Vilela wrote:
> >>> I'm not a perl specialist and it seems to me that the Win32
> build is broken.
> >>> The Win32 build is still important because of the 32-bit
> clients still in
> >>> use.
> >>> I'm investigating the problem.
> >> Being able to see the command you are using for build.pl
> , your
> >> buildenv.pl  and/or config.pl
> , as well as your build dependencies
> >> should help to know what's wrong.
> >>
> >> MSVC builds are tested by various buildfarm members on a daily
> basis,
> >> and nothing is red.  I also have a x86 and x64 configuration with
> >> VS2015 that prove to work as of HEAD at de1d4fe, FWIW.  Now, by
> >> experience, one could say that N Windows PG developpers finish
> with at
> >> least (N+1) different environments.  Basically Simon Riggs's
> theorem
> >> applied to Windows development..
> >
> >
> > I am seeing the same result as Ranier using VS2017 and VS 2019.
> >
> >
>
> But not with VS2013. If you need to build 32 bit client libraries,
> using
> an older VS release is probably your best bet.
>
> Thanks Andrew, but I finally got a workaround for the problem.
> set MSBFLAGS=/p:Platform="Win32"
>
> Now Postgres builds fine in 32 bits with the latest msvc (2019).
> Is it worth documenting this?
>
>


I think we should be able to detect this and do it automatically in
src/tools/msvc/build.pl, or possibly in the project files.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Added schema level support for publication.

2021-09-02 Thread Amit Kapila
On Thu, Sep 2, 2021 at 11:58 AM vignesh C  wrote:
>

Below are few comments on v23. If you have already addressed anything
in v24, then ignore it.

1. The commit message says: A new system table "pg_publication_schema"
has been added, to maintain the schemas that the user wants to publish
through the publication.". The alternative name for this system table
could be "pg_publication_namespace". The reason why this alternative
comes to my mind is that the current system table to store schema
information is named pg_namespace. So shouldn't we be consistent here?
What do others think about this?

2. In function check_publication_add_schema(), the primary error
message should be "cannot add schema \"%s\" to publication". See
check_publication_add_relation() for similar error messages.

3.
+ObjectAddress
+publication_add_schema(Oid pubid, Oid schemaoid, bool if_not_exists)

Isn't it better to use 'schemaid' so that it is consistent with 'pubid'?

4.
ConvertPubObjSpecListToOidList()
{
..
+ schemaoid = linitial_oid(search_path);
+ nspname = get_namespace_name(schemaoid);
+ if (nspname == NULL) /* recently-deleted namespace? */
+ ereport(ERROR,
+ errcode(ERRCODE_UNDEFINED_SCHEMA),
+ errmsg("no schema has been selected"));
+
+ list_free(search_path);
..
}

You can free the memory for 'nspname' as that is not used afterward.

5.
+ schemaRels = GetSchemaPublicationRelations(schemaoid, PUBLICATION_PART_ALL);
+
+ /* Invalidate relcache so that publication info is rebuilt. */
+ InvalidatePublicationRels(schemaRels);

It is better to write this comment above
GetSchemaPublicationRelations, something like below:

+ /* Invalidate relcache so that publication info is rebuilt. */
+ schemaRels = GetSchemaPublicationRelations(schemaoid, PUBLICATION_PART_ALL);
+ InvalidatePublicationRels(schemaRels);

6.
+static List *
+GetPubPartitionOptionRelations(List *result, PublicationPartOpt pub_partopt,
+Oid relid)

I think it is better to name this function as
GetPublicationPartOptRelations as that way it will be more consistent
with existing functions and structure name PublicationPartOpt.

7. All the callers of PublicationAddSchemas() have a superuser check,
then why there is again a check of owner/superuser in that function?

8.
+/*
+ * Gets the list of FOR ALL TABLES IN SCHEMA publication oids associated with a
+ * specified schema oid
+ */
+List *
+GetSchemaPublications(Oid schemaid)

I find it a bit difficult to read this comment. Can we omit "FOR ALL
TABLES IN SCHEMA" from this comment?

9. In the doc patch
(v23-0003-Tests-and-documentation-for-schema-level-support), I see
below line:
   
-   To add a table to a publication, the invoking user must have ownership
-   rights on the table.  The FOR ALL TABLES clause requires
-   the invoking user to be a superuser.
+   To add a table/schema to a publication, the invoking user must have
+   ownership rights on the table/schema.  The FOR ALL TABLES
+   and FOR ALL TABLES IN SCHEMA clause requires the invoking
+   user to be a superuser.

Is it correct to specify the schema in the first line? AFAIU, all
forms of schema addition requires superuser privilege.


-- 
With Regards,
Amit Kapila.




RE: Skipping logical replication transactions on subscriber side

2021-09-02 Thread houzj.f...@fujitsu.com
From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada  wrote:
> I've attached rebased patches. 0004 patch is not the scope of this 
> patch. It's borrowed from another thread[1] to fix the assertion 
> failure for newly added tests. Please review them.

Hi,

I reviewed the 0002 patch and have a suggestion for it.

+   if (IsSet(opts.specified_opts, 
SUBOPT_SYNCHRONOUS_COMMIT))
+   {
+   
values[Anum_pg_subscription_subsynccommit - 1] =
+   CStringGetTextDatum("off");
+   
replaces[Anum_pg_subscription_subsynccommit - 1] = true;
+   }

Currently, the patch set the default value out of parse_subscription_options(),
but I think It might be more standard to set the value in
parse_subscription_options(). Like:

if (!is_reset)
{
...
+   }
+   else
+   opts->synchronous_commit = "off";

And then, we can set the value like:


values[Anum_pg_subscription_subsynccommit - 1] =

CStringGetTextDatum(opts.synchronous_commit);


Besides, instead of adding a switch case like the following:
+   case ALTER_SUBSCRIPTION_RESET_OPTIONS:
+   {

We can add a bool flag(isReset) in AlterSubscriptionStmt and check the flag
when invoking parse_subscription_options(). In this approach, the code can be
shorter.

Attach a diff file based on the v12-0002 which change the code like the above
suggestion.

Best regards,
Hou zj


0001-diff-for-0002_patch
Description: 0001-diff-for-0002_patch


Re: [BUG]Update Toast data failure in logical replication

2021-09-02 Thread Ajin Cherian
On Wed, Aug 11, 2021 at 10:45 PM Dilip Kumar  wrote:
>
> Yeah we can avoid that by detecting any toasted replica identity key
> in HeapDetermineModifiedColumns, check the attached patch.
>

The patch applies cleanly, all tests pass, I tried out a few toast
combination tests and they seem to be working fine.
No review comments, the patch looks good to me.

regards,
Ajin Cherian
Fujitsu Australia




Re: pg_receivewal: remove extra conn = NULL; in StreamLog

2021-09-02 Thread Daniel Gustafsson
> On 1 Sep 2021, at 10:58, Bharath Rupireddy 
>  wrote:
> 
> On Sun, Aug 29, 2021 at 1:27 AM Daniel Gustafsson  wrote:
>> 
>>> On 28 Aug 2021, at 14:10, Bharath Rupireddy 
>>>  wrote:
>> 
>>> It seems there's a redundant assignment statement conn = NULL in
>>> pg_receivewal's StreamLog function. Attaching a tiny patch herewith.
>>> Thoughts?
>> 
>> Agreed, while harmless this is superfluous since conn is already set to NULL
>> after the PQfinish call a few lines up (which was added in a4205fa00d526c3).
>> Unless there are objections I’ll apply this tomorrow or Monday.
> 
> Thanks for picking this up. I added this to CF to not lose it in the
> wild - https://commitfest.postgresql.org/34/3317/

Pushed to master, and entry closed. Thanks.

--
Daniel Gustafsson   https://vmware.com/





Re: Online verification of checksums

2021-09-02 Thread Daniel Gustafsson
> On 9 Jul 2021, at 22:00, Ibrar Ahmed  wrote:

> I am changing the status to "Waiting on Author" based on the latest comments 
> of @David Steele 
> and secondly the patch does not apply cleanly.

This patch hasn’t moved since marked as WoA in the last CF and still doesn’t
apply, unless there is a new version brewing it seems apt to close this as RwF
and await a new entry in a future CF.

--
Daniel Gustafsson   https://vmware.com/





Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Sep 2, 2021 at 10:31 PM Tom Lane  wrote:
>> That seems quite horrid :-(.  But if it works, doesn't that mean that
>> somewhere we are opening a problematic file without the correct
>> sharing flags?

> I'm no expert, but not AFAICS.  We managed to delete the file while
> some other backend had it open, which FILE_SHARE_DELETE allowed.  We
> just can't open it or create a new file with the same name until it's
> really gone (all handles closed).

Right, but we shouldn't ever need to access such a file --- we couldn't do
so on Unix, certainly.  So for the open() case, it's sufficient to return
ENOENT, and the problem is only to make sure that that's what we return if
the underlying error is ERROR_DELETE_PENDING.

It's harder if the desire is to create a new file of the same name.
I'm inclined to think that the best answer might be "if it hurts,
don't do that".  We should not have such a case for ordinary relation
files or WAL files, but maybe there are individual other cases where
some redesign is indicated?

regards, tom lane




Re: [PATCH] test/ssl: rework the sslfiles Makefile target

2021-09-02 Thread Andrew Dunstan


On 9/1/21 8:09 PM, Jacob Champion wrote:
> On Fri, 2021-08-27 at 15:02 +0900, Michael Paquier wrote:
>> On Fri, Aug 13, 2021 at 12:08:16AM +, Jacob Champion wrote:
>>> If _that's_ the case, then our build system is holding all of us
>>> hostage. Which is frustrating to me. Should I shift focus to help with
>>> that, first?
>> Fresh ideas in this area are welcome, yes.
> Since the sslfiles target is its own little island in the dependency
> graph (it doesn't need anything from Makefile.global), would it be
> acceptable to just move it to a standalone sslfiles.mk that the
> Makefile defers to? E.g.
>
> sslfiles:
> $(MAKE) -f sslfiles.mk
> Then we wouldn't have to touch Makefile.global at all, because
> sslfiles.mk wouldn't need to include it. This also reduces .NOTPARALLEL
> pollution as a bonus.
>

I had he same thought yesterday, so I like the idea :-)


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Fix pkg-config file for static linking

2021-09-02 Thread Peter Eisentraut

On 20.07.21 22:04, Filip Gospodinov wrote:
Anyway, this issue is orthogonal to my original patch. I'm proposing to 
hardcode -lpgcommon and -lpgport in Libs.private instead. Patch is 
attached.


Makes sense.  I think we could do it without hardcoding those library 
names, as in the attached patch.  But it comes out to the same result 
AFAICT.
diff --git a/src/Makefile.shlib b/src/Makefile.shlib
index 29a7f6d38c..4b2a62fa14 100644
--- a/src/Makefile.shlib
+++ b/src/Makefile.shlib
@@ -400,7 +400,7 @@ endif # PORTNAME == cygwin || PORTNAME == win32
 # those that point inside the build or source tree.  Use sort to
 # remove duplicates.  Also record the -l flags necessary for static
 # linking, but not those already covered by Requires.private.
-   echo 'Libs.private: $(sort $(filter-out -L.% -L$(top_srcdir)/%,$(filter 
-L%,$(LDFLAGS) $(SHLIB_LINK $(filter-out 
$(PKG_CONFIG_REQUIRES_PRIVATE:lib%=-l%),$(filter -l%,$(SHLIB_LINK)))' >>$@
+   echo 'Libs.private: $(sort $(filter-out -L.% -L$(top_srcdir)/%,$(filter 
-L%,$(LDFLAGS) $(SHLIB_LINK $(filter-out 
$(PKG_CONFIG_REQUIRES_PRIVATE:lib%=-l%),$(filter -l%,$(SHLIB_LINK_INTERNAL) 
$(SHLIB_LINK)))' >>$@
 
 
 ##


Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Thomas Munro
On Thu, Sep 2, 2021 at 10:31 PM Tom Lane  wrote:
> Thomas Munro  writes:
> > A disruptive solution that works in my tests: we could reuse the
> > global barrier proposed in CF #2962.  If you see EACCES, ask every
> > backend to close all vfds at their next CFI() and wait for them all to
> > finish, and then retry.  If you get EACCES again it really means
> > EACCES, but you'll very probably get ENOENT.
>
> That seems quite horrid :-(.  But if it works, doesn't that mean that
> somewhere we are opening a problematic file without the correct
> sharing flags?

I'm no expert, but not AFAICS.  We managed to delete the file while
some other backend had it open, which FILE_SHARE_DELETE allowed.  We
just can't open it or create a new file with the same name until it's
really gone (all handles closed).




Re: public schema default ACL

2021-09-02 Thread Peter Eisentraut

On 30.06.21 03:37, Noah Misch wrote:

On Sat, Mar 27, 2021 at 11:41:07AM +0100, Laurenz Albe wrote:

On Sat, 2021-03-27 at 00:50 -0700, Noah Misch wrote:

On Sat, Feb 13, 2021 at 04:56:29AM -0800, Noah Misch wrote:

I'm attaching the patch for $SUBJECT, which applies atop the four patches from
the two other threads below.  For convenience of testing, I've included a
rollup patch, equivalent to applying all five patches.


I committed prerequisites from one thread, so here's a rebased rollup patch.


I am happy to see this problem tackled!


Rebased.  I've pushed all prerequisites, so there's no longer a distinct
rollup patch.


I think this patch represents the consensus.

The documentation looks okay.  Some places still refer to PostgreSQL 13, 
which should now be changed to 14.


I tried a couple of upgrade scenarios and it appeared to do the right thing.

This patch is actually two separate changes: First, change the owner of 
the public schema to "pg_database_owner"; second, change the default 
privileges set on the public schema by initdb.  I was a bit surprised 
that the former hadn't already be done in PG14.  In any case, if there 
is still any doubt about the latter part, the former can surely go ahead 
separately if needed.






Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Tom Lane
Thomas Munro  writes:
> A disruptive solution that works in my tests: we could reuse the
> global barrier proposed in CF #2962.  If you see EACCES, ask every
> backend to close all vfds at their next CFI() and wait for them all to
> finish, and then retry.  If you get EACCES again it really means
> EACCES, but you'll very probably get ENOENT.

That seems quite horrid :-(.  But if it works, doesn't that mean that
somewhere we are opening a problematic file without the correct
sharing flags?

regards, tom lane




Re: stat() vs ERROR_DELETE_PENDING, round N + 1

2021-09-02 Thread Thomas Munro
On Thu, Sep 2, 2021 at 10:10 AM Thomas Munro  wrote:
> Perhaps we need some combination of the old way (that apparently knew
> how to detect pending deletes) and the new way (that knows about large
> files)?

I tried that, but as far as I can tell, the old approach didn't really
work either :-(

A disruptive solution that works in my tests: we could reuse the
global barrier proposed in CF #2962.  If you see EACCES, ask every
backend to close all vfds at their next CFI() and wait for them all to
finish, and then retry.  If you get EACCES again it really means
EACCES, but you'll very probably get ENOENT.

The cheapest solution would be to assume EACCES really means ENOENT,
but that seems unacceptably incorrect.

I suspect it might be possible to use underdocumented/unstable NtXXX()
interfaces to get at the information, but I don't know much about
that.

Is there another way that is cheap, correct and documented?




RE: Allow escape in application_name (was: [postgres_fdw] add local pid to fallback_application_name)

2021-09-02 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for reviewing!

> This GUC parameter should be set after the options of foreign server
> are set so that its setting can override the server-level ones.
> Isn't it better to comment this?

I added the following comments:

```diff
-   /* Use "postgres_fdw" as fallback_application_name. */
+   /*
+* Use pgfdw_application_name as application_name.
+*
+* Note that this check must be behind constructing generic 
options
+* because pgfdw_application_name has higher priority.
+*/
```

> Do we really need this check hook? Even without that, any non-ASCII characters
> in application_name would be replaced with "?" in the foreign PostgreSQL 
> server
> when it's passed to that.
> 
> On the other hand, if it's really necessary, application_name set in foreign
> server object also needs to be processed in the same way.

I added check_hook because I want to make sure that
input for parse function has only ascii characters.
But in this phase we don't have such a function, so removed.
(Actually I'm not sure it is really needed, so I will check until next phase)

> Why is GUC_IS_NAME flag necessary?

I thought again and I noticed even if extremely long string is specified
it will be truncated at server-side. This check is duplicated so removed.
Maybe such a check is a user-responsibility.

> postgres_fdw.application_name overrides application_name set in foreign server
> object.
> Shouldn't this information be documented?

I added descriptions to postgres_fdw.sgml.

> Isn't it better to have the regression test for this feature?

+1, added. In that test SET the parameter and check pg_stat_activity.

Attached is the fixed version. 0002 will be rebased later.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


v06_0001_add_application_name_GUC.patch
Description: v06_0001_add_application_name_GUC.patch


Re: Column Filtering in Logical Replication

2021-09-02 Thread Alvaro Herrera
I think the WITH RECURSIVE query would be easier and more performant by
using pg_partition_tree and pg_partition_root.


-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"Porque Kim no hacía nada, pero, eso sí,
con extraordinario éxito" ("Kim", Kipling)




RE: Added schema level support for publication.

2021-09-02 Thread houzj.f...@fujitsu.com
From Wed, Sep 1, 2021 2:36 PM Peter Smith  wrote:
> Schema objects are not part of the publication. Current only TABLES are in
> publications, so I thought that \dRp+ output would just be the of "Tables" in
> the publication. Schemas would not even be displayed at all (except in the
> table name).

I think one use case of schema level publication is it can automatically
publish new table created in the shcema(same as ALL TABLE publication). So,
IMO, \dRp+ should output Schema level publication separately to make the user
aware of it.

Best regards,
Hou zj


Re: Column Filtering in Logical Replication

2021-09-02 Thread Alvaro Herrera
On 2021-Sep-02, Rahila Syed wrote:

> After thinking about this, I think it is best to remove the entire table
> from publication,
> if a column specified in the column filter is dropped from the table.

Hmm, I think it would be cleanest to give responsibility to the user: if
the column to be dropped is in the filter, then raise an error, aborting
the drop.  Then it is up to them to figure out what to do.




-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)




RE: Skipping logical replication transactions on subscriber side

2021-09-02 Thread houzj.f...@fujitsu.com
From Mon, Aug 30, 2021 3:07 PM Masahiko Sawada  wrote:
> I've attached rebased patches. 0004 patch is not the scope of this patch. It's
> borrowed from another thread[1] to fix the assertion failure for newly added
> tests. Please review them.

Hi,

I reviewed the v12-0001 patch, here are some comments:

1)
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1441,7 +1441,6 @@ getinternalerrposition(void)
return edata->internalpos;
 }
 
-

It seems a miss change in elog.c

2)

+   TupleDescInitEntry(tupdesc, (AttrNumber) 10, "stats_reset",
+  TIMESTAMPTZOID, -1, 0);

The document doesn't mention the column "stats_reset".

3)

+typedef struct PgStat_StatSubErrEntry
+{
+   Oid subrelid;   /* InvalidOid if the 
apply worker, otherwise
+* the table 
sync worker. hash table key. */

From the comments of subrelid, I think one subscription only have one apply
worker error entry, right ? If so, I was thinking can we move the the apply
error entry to PgStat_StatSubEntry. In that approach, we don't need to build a
inner hash table when there are no table sync error entry.

4)
Is it possible to add some testcases to test the subscription error entry 
delete ?


Best regards,
Hou zj



Re: pg_receivewal starting position

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 10:08:26AM +0200, Ronan Dunklau wrote:
> I could see the use for sending active_pid for use within pg_basebackup: at 
> least we could fail early if the slot is already in use. But at the same 
> time, 
> maybe it won't be in use anymore once we need it.

There is no real concurrent protection with this design.  You could
read that the slot is not active during READ_REPLICATION_SLOT just to
find out after in the process of pg_basebackup streaming WAL that it
became in use in-between.  And the backend-side protections would kick
in at this stage.

Hmm.  The logic doing the decision-making with pg_receivewal may
become more tricky when it comes to pg_replication_slots.wal_status,
max_slot_wal_keep_size and pg_replication_slots.safe_wal_size.  The
number of cases we'd like to consider impacts directly the amount of
data send through READ_REPLICATION_SLOT.  That's not really different
than deciding of a failure, a success or a retry with active_pid at an
earlier or a later stage of a base backup.  pg_receivewal, on the
contrary, can just rely on what the backend tells when it begins
streaming.  So I'd prefer keeping things simple and limit the number
of fields a minimum for this command.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-02 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 09:28:29 CEST Michael Paquier a écrit :
> On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> >  wrote in If I read the patch
> > correctly the situation above is warned by the following message then
> > continue to the next step giving up to identify start position from slot
> > data.
> 
> Better to fallback to the past behavior if attempting to use a
> pg_receivewal >= 15 with a PG instance older than 14.
> 
> >> "server does not suport fetching the slot's position, resuming from the
> >> current server position instead"> 
> > (The message looks a bit too long, though..)
> 
> Agreed.  Falling back to a warning is not the best answer we can have
> here, as there could be various failure types, and for some of them a
> hard failure is adapted;
> - Failure in the backend while running READ_REPLICATION_SLOT.  This
> should imply a hard failure, no?
> - Slot that does not exist.  In this case, we could fall back to the
> current write position of the server.
> 
> by default if the slot information cannot be retrieved.
> Something that's disturbing me in patch 0002 is that we would ignore
> the results of GetSlotInformation() if any error happens, even if
> there is a problem in the backend, like an OOM.  We should be careful
> about the semantics here.

Ok !

> 
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> > 
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> > 
> >  trying all of them in the order of wal, slot, server.
> > 
> > I don't think the option doesn't need to accept multiple values at once.
> 
> What is the difference between "wal" and "server"?  "wal" stands for
> the start position of the set of files stored in the location
> directory, and "server" is the location that we'd receive from the
> server?  I don't think that we need that because, when using a slot,
> we know that we can rely on the LSN that the slot retains for
> pg_receivewal as that should be the same point as what has been
> streamed last.  Could there be an argument instead for changing the
> default and rely on the slot information rather than scanning the
> local WAL archive path for the start point when using --slot?  When
> using pg_receivewal as a service, relying on a scan of the WAL archive
> directory if there is no slot and fallback to an invalid LSN if there
> is nothing is fine by me, but I think that just relying on the slot
> information is saner as the backend makes sure that nothing is
> missing.  That's also more useful when streaming changes from a single
> slot from multiple locations (stream to location 1 with a slot, stop
> pg_receivewal, stream to location 2 that completes 1 with the same
> slot).

One benefit I see from first trying to get it from the local WAL stream is that 
we may end up in a state where it has been flushed to disk but we couldn't 
advance the replication slot. In that case it is better to resume from the 
point on disk. Maybe taking the max(slot_lsn, local_file_lsn) would work best 
for the use case you're describing.

> 
> +pg_log_error("Slot \"%s\" is not a physical replication slot",
> + replication_slot);
> In 0003, the format of this error is not really project-like.
> Something like that perhaps would be more adapted:
> "cannot use the slot provided, physical slot expected."
> 
> I am not really convinced about the need of getting the active state
> and the PID used in the backend when fetcing the slot data,
> particularly if that's just for some frontend-side checks.  The
> backend has safeguards already for all that.

I could see the use for sending active_pid for use within pg_basebackup: at 
least we could fail early if the slot is already in use. But at the same time, 
maybe it won't be in use anymore once we need it.

> 
> While looking at that, I have applied de1d4fe to add
> PostgresNode::command_fails_like(), coming from 0003, and put my hands
> on 0001 as per the attached, as the starting point.  That basically
> comes down to all the points raised upthread, plus some tweaks for
> things I bumped into to get the semantics of the command to what looks
> like the right shape.

Thanks, I was about to send a new patchset with basically the same thing. It 
would be nice to know we work on the same thing concurrently in the future to 
avoid duplicate efforts. I'll rebase and send the updated version for patches 
0002 and 0003 of my original proposal once we reach an agreement over the 
behaviour / options of pg_receivewal.

Also considering the number of different fields to be filled by the 
GetSlotInformation function, my local branch groups them into a dedicate

Re: Column Filtering in Logical Replication

2021-09-02 Thread Peter Smith
On Thu, Sep 2, 2021 at 7:21 AM Rahila Syed  wrote:
>
...
>
> Also added some more tests. Please find attached a rebased and updated patch.

I fetched and applied the v4 patch.

It applied cleanly, and the build and make check was OK.

But I encountered some errors running the TAP subscription tests, as follows:

...
t/018_stream_subxact_abort.pl .. ok
t/019_stream_subxact_ddl_abort.pl .. ok
t/020_messages.pl .. ok
t/021_column_filter.pl . 1/9
#   Failed test 'insert on column c is not replicated'
#   at t/021_column_filter.pl line 126.
#  got: ''
# expected: '1|abc'

#   Failed test 'update on column c is not replicated'
#   at t/021_column_filter.pl line 130.
#  got: ''
# expected: '1|abc'
# Looks like you failed 2 tests of 9.
t/021_column_filter.pl . Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/9 subtests
t/021_twophase.pl .. ok
t/022_twophase_cascade.pl .. ok
t/023_twophase_stream.pl ... ok
t/024_add_drop_pub.pl .. ok
t/100_bugs.pl .. ok

Test Summary Report
---
t/021_column_filter.pl   (Wstat: 512 Tests: 9 Failed: 2)
  Failed tests:  6-7
  Non-zero exit status: 2
Files=26, Tests=263, 192 wallclock secs ( 0.57 usr  0.09 sys + 110.17
cusr 25.45 csys = 136.28 CPU)
Result: FAIL
make: *** [check] Error 1

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Estimating HugePages Requirements?

2021-09-02 Thread Michael Paquier
On Wed, Sep 01, 2021 at 06:28:21PM +, Bossart, Nathan wrote:
> On 8/31/21, 11:54 PM, "Michael Paquier"  wrote:
>> Hmm.  I am not sure about the addition of huge_pages_required, knowing
>> that we would have shared_memory_size.  I'd rather let the calculation
>> part to the user with a scan of /proc/meminfo.
> 
> I included this based on some feedback from Andres upthread [0].  I
> went ahead and split the patch set into 3 pieces in case we end up
> leaving it out.

Thanks.  Anyway, we don't really need huge_pages_required on Windows,
do we?  The following docs of Windows tell what do to when using large
pages:
https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support

The backend code does that as in PGSharedMemoryCreate(), now that I
look at it.  And there is no way to change the minimum large page size
there as far as I can see because that's decided by the processor, no?
There is a case for shared_memory_size on Windows to be able to adjust
the sizing of the memory of the host, though.

>> +#elif defined(WIN32)
>> +   hp_size = GetLargePageMinimum();
>> +#endif
>> +
>> +#if defined(MAP_HUGETLB) || defined(WIN32)
>> +   hp_required = (size_b / hp_size) + 1;
>> As of [1], there is the following description:
>> "If the processor does not support large pages, the return value is
>> zero."
>> So there is a problem here.
> 
> I've fixed this in v4.

At the end it would be nice to not finish with two GUCs.  Both depend
on the reordering of the actions done by the postmaster, so I'd be
curious to hear the thoughts of others on this particular point.
--
Michael


signature.asc
Description: PGP signature


Re: pg_receivewal starting position

2021-09-02 Thread Michael Paquier
On Thu, Sep 02, 2021 at 02:45:54PM +0900, Kyotaro Horiguchi wrote:
> At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy 
>  wrote in 
> If I read the patch correctly the situation above is warned by the
> following message then continue to the next step giving up to identify
> start position from slot data.

Better to fallback to the past behavior if attempting to use a
pg_receivewal >= 15 with a PG instance older than 14.

>> "server does not suport fetching the slot's position, resuming from the 
>> current server position instead"
> 
> (The message looks a bit too long, though..)

Agreed.  Falling back to a warning is not the best answer we can have
here, as there could be various failure types, and for some of them a
hard failure is adapted;
- Failure in the backend while running READ_REPLICATION_SLOT.  This
should imply a hard failure, no?
- Slot that does not exist.  In this case, we could fall back to the
current write position of the server.  

by default if the slot information cannot be retrieved.
Something that's disturbing me in patch 0002 is that we would ignore
the results of GetSlotInformation() if any error happens, even if
there is a problem in the backend, like an OOM.  We should be careful
about the semantics here.

> However, if the operator doesn't know the server is old, pg_receivewal
> starts streaming from unexpected position, which is a kind of
> disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> an option to explicitly specify how to determine the start position.
> 
> --start-source=[server,wal,slot]  specify starting-LSN source, default is
>  trying all of them in the order of wal, slot, server. 
> 
> I don't think the option doesn't need to accept multiple values at once.

What is the difference between "wal" and "server"?  "wal" stands for
the start position of the set of files stored in the location
directory, and "server" is the location that we'd receive from the
server?  I don't think that we need that because, when using a slot,
we know that we can rely on the LSN that the slot retains for
pg_receivewal as that should be the same point as what has been
streamed last.  Could there be an argument instead for changing the
default and rely on the slot information rather than scanning the
local WAL archive path for the start point when using --slot?  When
using pg_receivewal as a service, relying on a scan of the WAL archive
directory if there is no slot and fallback to an invalid LSN if there
is nothing is fine by me, but I think that just relying on the slot
information is saner as the backend makes sure that nothing is
missing.  That's also more useful when streaming changes from a single
slot from multiple locations (stream to location 1 with a slot, stop
pg_receivewal, stream to location 2 that completes 1 with the same
slot).

+pg_log_error("Slot \"%s\" is not a physical replication slot",
+ replication_slot);
In 0003, the format of this error is not really project-like.
Something like that perhaps would be more adapted:
"cannot use the slot provided, physical slot expected."

I am not really convinced about the need of getting the active state
and the PID used in the backend when fetcing the slot data,
particularly if that's just for some frontend-side checks.  The
backend has safeguards already for all that.

While looking at that, I have applied de1d4fe to add 
PostgresNode::command_fails_like(), coming from 0003, and put my hands
on 0001 as per the attached, as the starting point.  That basically
comes down to all the points raised upthread, plus some tweaks for
things I bumped into to get the semantics of the command to what looks
like the right shape.
--
Michael
From adbd72f70ee3592965f2a52500820d1387dcbf85 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 2 Sep 2021 16:25:25 +0900
Subject: [PATCH v4] Add READ_REPLICATION_SLOT command

---
 src/include/nodes/nodes.h   |   1 +
 src/include/nodes/replnodes.h   |  10 ++
 src/backend/replication/repl_gram.y |  16 ++-
 src/backend/replication/repl_scanner.l  |   1 +
 src/backend/replication/walsender.c | 112 
 src/test/recovery/t/001_stream_rep.pl   |  47 +++-
 src/test/recovery/t/006_logical_decoding.pl |  15 ++-
 doc/src/sgml/protocol.sgml  |  66 
 src/tools/pgindent/typedefs.list|   1 +
 9 files changed, 266 insertions(+), 3 deletions(-)

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 6a4d82f0a8..5f78bdd573 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -495,6 +495,7 @@ typedef enum NodeTag
 * TAGS FOR REPLICATION GRAMMAR PARSE NODES (replnodes.h)
 */
T_IdentifySystemCmd,
+   T_ReadReplicationSlotCmd,
T_BaseBackupCmd,
T_CreateReplicationSlotCmd,
T_DropReplicationSlotCmd,
diff --git a/src/include/nodes/replnodes.h b/src/in

Re: corruption of WAL page header is never reported

2021-09-02 Thread Kyotaro Horiguchi
At Thu, 2 Sep 2021 14:44:31 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2021/09/02 13:17, Kyotaro Horiguchi wrote:
> > Did you read the comment just above?
> 
> Yes!

Glad to hear that, or..:p

> > xlog.c:12523
> >> * Check the page header immediately, so that we can retry immediately 
> >> if
> >> * it's not valid. This may seem unnecessary, because XLogReadRecord()
> >> * validates the page header anyway, and would propagate the failure up 
> >> to
> >> * ReadRecord(), which would retry. However, there's a corner case with
> >> * continuation records, if a record is split across two pages such that
> > So when not in standby mode, the same check is performed by xlogreader
> > which has the responsibility to validate the binary data read by
> > XLogPageRead. The page-header validation is a compromise to save a
> > specific case.
> 
> Yes, so XLogPageRead() can skip the validation check of page head if
> not
> in standby mode. On the other hand, there is no problem if it still
> performs
> the validation check as it does for now. No?

Practically yes, and it has always been like that as you say.

> > I don't think it is good choice to conflate read-failure and header
> > validation failure from the view of modularity.
> 
> I don't think that the proposed change does that. But maybe I failed

It's about your idea in a recent mail. not about the proposed
patch(es).

> to get
> your point yet... Anyway the proposed change just tries to reset
> errormsg_buf whenever XLogPageRead() retries, whatever error happened
> before. Also if errormsg_buf is set at that moment, it's reported.

I believe errmsg_buf is an interface to emit error messages dedicated
to xlogreader that doesn't have an access to elog facility, and
xlogreader doesn't (or ought not to or expect to) suppose
non-xlogreader callback functions set the variable.  In that sense I
don't think theoriginally proposed patch is proper for the reason that
the non-xlogreader callback function may set errmsg_buf.  This is what
I meant by the word "modularity".

For that reason I avoided in my second proposal to call
XLogReaderValidatePageHeader() at all while not in standby mode,
because calling the validator function while in non-standby mode
results in the non-xlogreader function return errmsg_buf.  Of course
we can instead always consume errmsg_buf in the function but I don't
like to shadow the caller's task.

Does that makes sense?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pg_receivewal starting position

2021-09-02 Thread Ronan Dunklau
Le jeudi 2 septembre 2021, 08:42:22 CEST Bharath Rupireddy a écrit :
> On Thu, Sep 2, 2021 at 11:15 AM Kyotaro Horiguchi
> 
>  wrote:
> > At Wed, 1 Sep 2021 10:30:05 +0530, Bharath Rupireddy
> >  wrote in> 
> > > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau  
wrote:
> > > > > 7) How about we let pg_receivewal use READ_REPLICATION_SLOT as an
> > > > > option?
> > > > 
> > > > From my point of view, I already expected it to use something like
> > > > that when using a replication slot. Maybe an option to turn it off
> > > > could be useful ?> > 
> > > IMO, pg_receivewal should have a way to turn off/on using
> > > READ_REPLICATION_SLOT. Imagine if the postgres server doesn't support
> > > READ_REPLICATION_SLOT (a lower version) but for some reasons the
> > > pg_receivewal(running separately) is upgraded to the version that uses
> > > READ_REPLICATION_SLOT, knowing that the server doesn't support
> > > READ_REPLICATION_SLOT why should user let pg_receivewal run an
> > > unnecessary code?
> > 
> > If I read the patch correctly the situation above is warned by the
> > following message then continue to the next step giving up to identify
> > start position from slot data.
> > 
> > > "server does not suport fetching the slot's position, resuming from the
> > > current server position instead"> 
> > (The message looks a bit too long, though..)
> > 
> > However, if the operator doesn't know the server is old, pg_receivewal
> > starts streaming from unexpected position, which is a kind of
> > disaster. So I'm inclined to agree to Bharath, but rather I imagine of
> > an option to explicitly specify how to determine the start position.
> > 
> > --start-source=[server,wal,slot]  specify starting-LSN source, default is
> > 
> >  trying all of them in the order of wal, slot, server.
> > 
> > I don't think the option doesn't need to accept multiple values at once.
> 
> If --start-source = 'wal' fails, then pg_receivewal should show up an
> error saying "cannot find start position from <>
> directory, try with "server" or "slot" for --start-source". We might
> end having similar errors for other options as well. Isn't this going
> to create unnecessary complexity?
> 
> The existing way the pg_receivewal fetches start pos i.e. first from
> wal directory and then from server start position, isn't known to the
> user at all, no verbose message or anything specified in the docs. Why
> do we need to expose this with the --start-source option? IMO, we can
> keep it that way and we can just have a way to turn off the new
> behaviour that we are proposing here, i.e.fetching the start position
> from the slot's restart_lsn.

Then it should probably be documented. We write in the docs that it is 
strongly recommended to use a replication slot, but do not mention how we 
resume from have been already processed.

If someone really cares about having control over how the start position is 
defined instead of relying on the auto detection, it would be wiser to add a --
startpos parameter similar to the endpos one, which would override everything 
else, instead of different flags for different behaviours.

Regards,

-- 
Ronan Dunklau