date:20191210

Wrong assert in TransactionGroupUpdateXidStatus

2019-12-10 Thread Dilip Kumar

While testing, my colleague Vignesh has hit an assert in
TransactionGroupUpdateXidStatus.  But that is not reproducible.  After
some analysis and code review, I have found the reason for the same.

As shown in the below code, there is an assert in
TransactionGroupUpdateXidStatus, which assumes that an overflowed
transaction can never get registered for the group update.  But,
actually, that is not true because while registering the transaction
for group update, we only check how many committed children this
transaction has because all aborted sub-transaction would have already
updated their status.  So if the transaction once overflowed but later
all its children are aborted (i.e remaining committed children are <=
THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group
update.

/*
* Overflowed transactions should not use group XID status update
* mechanism.
*/
Assert(!pgxact->overflowed);

A solution could be either we remove this assert or change this assert
to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);

Note:   I could not come up with the reproducible test case as we can
not ensure whether a backend will try to group updates or not because
that depends upon whether it gets the CLogControlLock or not.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Change-wrong-assert-while-group-updating-xid-stat.patch
Description: Binary data

Re: Windows UTF-8, non-ICU collation trouble

2019-12-10 Thread Noah Misch

On Tue, Dec 10, 2019 at 03:41:15PM +1300, Thomas Munro wrote:
> On Fri, Dec 6, 2019 at 8:33 PM Noah Misch  wrote:
> > On Fri, Dec 06, 2019 at 07:56:08PM +1300, Thomas Munro wrote:
> > > On Fri, Dec 6, 2019 at 7:34 PM Noah Misch  wrote:
> > > > We use system UTF-16 collation to implement UTF-8 collation on Windows. 
> > > >  The
> > > > PostgreSQL security team received a report, from Timothy Kuun, that this
> > > > collation does not uphold the "symmetric law" and "transitive law" that 
> > > > we
> > > > require for btree operator classes.  The attached test program 
> > > > demonstrates
> > > > this.  http://www.delphigroups.info/2/62/478610.html quotes reports of 
> > > > that
> > > > problem going back eighteen years.  Most code points are unaffected.  
> > > > Indexing
> > > > an affected code point using such a collation can cause btree index 
> > > > scans to not
> > > > find a row they should find and can make a UNIQUE or PRIMARY KEY 
> > > > constraint
> > > > admit a duplicate.  The security team determined that this doesn't 
> > > > qualify as a
> > > > security vulnerability, but it's still a bug.
> > >
> > > Huh.  Does this apply in modern times?  Since Windows 10, I thought
> > > they adopted[1] CLDR data to drive that, the same definitions used (or
> > > somewhere in the process of being adopted by) GNU, Illumos, FreeBSD
> > > etc.  Basically, everyone gave up on trying to own this rats nest of a
> > > problem and deferred to the experts.
> >
> > Based on my test program, it applies to Windows Server 2016.  I didn't test
> > newer versions.
> 
> I ran a variation of your program on Appveyor's Studio/Server 2019
> image, and the result was the same: it thinks that cmp(s1, s2) == 0,
> cmp(s2, s3) == 0, but cmp(s1, s3) == 1, so the operator fails to be
> transitive.

If that test is captured in self-contained artifacts (a few config files, a
public git repository, etc.), could you share them?  If not, no need to
assemble such artifacts.  I probably won't use them, but I'd be curious to
browse them if you've already assembled them.

> > > If you can still get
> > > index-busting behaviour out of modern Windows collations, wouldn't
> > > that be a bug that someone can file against SQL Server, Windows etc
> > > and get fixed?
> >
> > Perhaps.  I wouldn't have high hopes, given the behavior's long tenure and 
> > the
> > risk of breaking a different set of applications.
> 
> I found a SQL Server test website[3] and tried to get it to do
> something strange, using "Windows" collations (the ones that are
> supposed to be compatible with CompareString() AKA strcoll(), much
> like our "libc" provider).  For Latin1_General_100_CI_AS_SC_UTF8 and
> Korean_100_CS_AS it insisted that cmp(s1, s2) == 1, cmp(s2, s3) == 0,
> cmp(s1, s3) == 1, while for Korean_90_CS_AS it said -1, 0, -1, all
> self-consistent answers, matching neither your results nor the results
> of other implementations.

This does suggest some set of CompareString* parameters is free from the
problem.  If that's right, we could offer collations based on that.  (I'm not
sure it would be worth offering; ICU may be enough.)

Thanks for this extensive testing.

Re: backup manifests

2019-12-10 Thread Rushabh Lathia

On Mon, Dec 9, 2019 at 2:52 PM Rushabh Lathia 
wrote:

>
> Thanks Jeevan for reviewing the patch and offline discussion.
>
> On Mon, Dec 9, 2019 at 11:15 AM Jeevan Chalke <
> jeevan.cha...@enterprisedb.com> wrote:
>
>>
>>
>> On Fri, Dec 6, 2019 at 12:05 PM Rushabh Lathia 
>> wrote:
>>
>>>
>>>
>>> On Fri, Dec 6, 2019 at 1:44 AM Robert Haas 
>>> wrote:
>>>
 On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia <
 rushabh.lat...@gmail.com> wrote:
 > Here is the whole stack of patches.

 I committed 0001, as that's just refactoring and I think (hope) it's
 uncontroversial. I think 0002-0005 need to be squashed together
 (crediting all authors properly and in the appropriate order) as it's
 quite hard to understand right now,
>>>
>>>
>>> Please find attached single patch and I tried to add the credit to all
>>> the authors.
>>>
>>
>> I had a look over the patch and here are my few review comments:
>>
>> 1.
>> +if (pg_strcasecmp(manifest_checksum_algo, "SHA256") == 0)
>> +manifest_checksums = MC_SHA256;
>> +else if (pg_strcasecmp(manifest_checksum_algo, "CRC32C") ==
>> 0)
>> +manifest_checksums = MC_CRC32C;
>> +else if (pg_strcasecmp(manifest_checksum_algo, "NONE") == 0)
>> +manifest_checksums = MC_NONE;
>> +else
>> +ereport(ERROR,
>>
>> Is NONE is a valid input? I think the default is "NONE" only and thus no
>> need
>> of this as an input. It will be better if we simply error out if input is
>> neither "SHA256" nor "CRC32C".
>>
>> I believe you have done this way as from pg_basebackup you are always
>> passing
>> MANIFEST_CHECKSUMS '%s' string which will have "NONE" if no user input is
>> given. But I think passing that conditional will be better like we have
>> maxrate_clause for example.
>>
>> Well, this is what I think, feel free to ignore as I don't see any
>> correctness
>> issue over here.
>>
>>
> I would still keep this NONE as it's look more cleaner in the say of
> given options to the checksums.
>
>
>> 2.
>> +if (manifest_checksums != MC_NONE)
>> +{
>> +checksumbuflen = finalize_manifest_checksum(cCtx, checksumbuf);
>> +switch (manifest_checksums)
>> +{
>> +case MC_NONE:
>> +break;
>> +}
>>
>> Since switch case is within "if (manifest_checksums != MC_NONE)"
>> condition,
>> I don't think we need a case for MC_NONE here. Rather we can use a default
>> case to error out.
>>
>>
> Yeah, with the new patch we don't have this part of code.
>
>
>> 3.
>> +if (manifest_checksums != MC_NONE)
>> +{
>> +initialize_manifest_checksum(&cCtx);
>> +update_manifest_checksum(&cCtx, content, len);
>> +}
>>
>> @@ -1384,6 +1641,9 @@ sendFile(const char *readfilename, const char
>> *tarfilename, struct stat *statbuf
>>  intsegmentno = 0;
>>  char   *segmentpath;
>>  boolverify_checksum = false;
>> +ChecksumCtx cCtx;
>> +
>> +initialize_manifest_checksum(&cCtx);
>>
>>
>> I see that in a few cases you are calling
>> initialize/update_manifest_checksum()
>> conditional and at some other places call is unconditional. It seems like
>> calling unconditional will not have any issues as switch cases inside them
>> return doing nothing when manifest_checksums is MC_NONE.
>>
>>
> Fixed.
>
>
>> 4.
>> initialize/update/finalize_manifest_checksum() functions may be needed by
>> the
>> validation patch as well. And thus I think these functions should not
>> depend
>> on a global variable as such. Also, it will be good if we keep them in a
>> file
>> that is accessible to frontend-only code. Well, you can ignore these
>> comments
>> with the argument saying that this refactoring can be done by the patch
>> adding
>> validation support. I have no issues. Since both the patches are
>> dependent and
>> posted on the same email chain, thought of putting that observation.
>>
>>
> Make sense, I just changed those API to that it doesn't have to
> access the global.
>
>
>> 5.
>> +switch (manifest_checksums)
>> +{
>> +case MC_SHA256:
>> +checksumlabel = "SHA256:";
>> +break;
>> +case MC_CRC32C:
>> +checksumlabel = "CRC32C:";
>> +break;
>> +case MC_NONE:
>> +break;
>> +}
>>
>> This code in AddFileToManifest() is executed for every file for which we
>> are
>> adding an entry. However, the checksumlabel will be going to remain the
>> same
>> throughout. Can it be set just once and then used as is?
>>
>>
> Yeah, with the attached patch we no more have this part of code.
>
>
>> 6.
>> Can we avoid manifest_checksums from declaring it as a global variable?
>> I think for that, we need to pass that to every function and thus need to
>> change the function signature of various functions. Currently, we pass
>> "StringInfo manifest" to all the requir

Re: Online checksums verification in the backend

2019-12-10 Thread Julien Rouhaud

On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier  wrote:
>
> On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas  wrote:
> >> Some people might prefer notices, because you can get those while the
> >> thing is still running, rather than a result set, which you will only
> >> see when the query finishes. Other people might prefer an SRF, because
> >> they want to have the data in structured form so that they can
> >> postprocess it. Not sure what you mean by "more globally."
> >
> > I meant having the results available system-wide, not only to the
> > caller.  I think that emitting a log/notice level should always be
> > done on top on whatever other communication facility we're using.
>
> The problem of notice and logs is that they tend to be ignored.  Now I
> don't see no problems either in adding something into the logs which
> can be found later on for parsing on top of a SRF returned by the
> caller which includes all the corruption details, say with pgbadger
> or your friendly neighborhood grep.  I think that any backend function
> should also make sure to call pgstat_report_checksum_failure() to
> report a report visible at database-level in the catalogs, so as it is
> possible to use that as a cheap high-level warning.  The details of
> the failures could always be dug from the logs or the result of the
> function itself after finding out that something is wrong in
> pg_stat_database.

I agree that adding extra information in the logs and calling
pgstat_report_checksum_failure is a must do, and I changed that
locally.  However, I doubt that the logs is the right place to find
the details of corrupted blocks.  There's no guarantee that the file
will be accessible to the DBA, nor that the content won't get
truncated by the time it's needed.  I really think that corruption is
important enough to justify more specific location.

> >> I guess one
> >> idea would be to provide a way to kick this off in the background via
> >> a background worker or similar and then have it put the results in a
> >> table. But that might fail if there are checksum errors in the
> >> catalogs themselves.
> >
> > Yes that's a concern.  We could maintain a list in (dynamic) shared
> > memory with a simple SQL wrapper to read the data, but that would be
> > lost with a crash/restart.  Or use
> > pgstat_report_checksum_failures_in_db(), modifying it to get an
> > relfilenode, bocknum and forknum and append that to some flat files,
> > hoping that it won't get corrupted either.
>
> If a lot of blocks are corrupted, that could bloat things.  Hence some
> retention policies would be necessary, and that's tricky to define and
> configure properly.  I'd tend to be in the school of just logging the
> information and be done with it, because that's simple and because you
> won't need to worry about any more configuration.

If the number of corrupted blocks becomes high enough to excessively
bloat things, it's likely that the instance is doomed anyway, so I'm
not especially concerned about it.

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2019-12-10 Thread Etsuro Fujita

Hi Amul,

On Tue, Dec 10, 2019 at 3:49 PM amul sul  wrote:
> On Mon, Dec 9, 2019 at 3:08 PM amul sul  wrote:
>> Attached is the rebase version atop the latest master head(2d0fdfaccec).

Thanks for that!

> I have been through your changes proposed in [1] -- the changes make sense to 
> me &
> I didn't see any unobvious behaviour in testing as well.

Thanks for reviewing!  I'll merge the changes into the main patch,
then.  I don't see any issues in the latest version, but I think we
need to polish the patch, so I'll do that.

Best regards,
Etsuro Fujita

Re: backup manifests

2019-12-10 Thread Jeevan Chalke

On Tue, Dec 10, 2019 at 3:29 PM Rushabh Lathia 
wrote:

>
> Attaching another version of 0002 patch, as my collogue Jeevan Chalke
> pointed
> few indentation problem in 0002 patch which I sent earlier.  Fixed the
> same in
> the latest patch.
>

I had a look over the new patch and see no issues. Looks good to me.
Thanks for quickly fixing the review comments posted earlier.

However, here are the minor comments:

1.
@@ -122,6 +133,7 @@ static long long int total_checksum_failures;
 /* Do not verify checksums. */
 static bool noverify_checksums = false;

+
 /*
  * The contents of these directories are removed or recreated during server
  * start so they are not included in backups.  The directories themselves
are

Please remove this unnecessary change.

Need to run the indentation.

Thanks
-- 
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Start Walreceiver completely before shut down it on standby server.

2019-12-10 Thread jiankang liu

Start Walreceiver completely before shut down it on standby server.

The walreceiver will be shut down, when read an invalid record in the
WAL streaming from master.And then, we retry from archive/pg_wal again.

After that, we start walreceiver in RequestXLogStreaming(), and read
record from the WAL streaming. But before walreceiver starts, we read
data from file which be streamed over and present in pg_wal by last
time, because of walrcv->receivedUpto > RecPtr and the wal is actually
flush on disk. Now, we read the invalid record again, what the next to
do? Shut down the walreceiver and do it again.

So, we always read the invalid record, starting the walreceiver and make
it down before it starts completely.

This code fix it by set the walrcv->receivedUpto to the starting point,
we can read nothing before the walreceiver starts and streaming.


start_walreceiver.diff
Description: Binary data

RE: [Proposal] Level4 Warnings show many shadow vars

2019-12-10 Thread Ranier Vilela

De: Alvaro Herrera 
Enviado: segunda-feira, 9 de dezembro de 2019 22:06

>I find this choice a bit ugly and even more confusing than the original.
>I'd change this to be just "segsize".
Ok.

>I would tend to name the GUC variable as if it were a global in the
>sense that I proposed in my previous response (ie. WalSegmentSize), but
>that creates a bit of a problem when one greps the source looking for
>reference to the GUCs.  Some GUCs do have CamelCase names and others
>don't; I've grown fond of the current style of using the same name for
>the variable as for the GUC itself, for grepping reasons.  So I'm not
>going to propose to do that.  But let's not make a function parameter
>have a name that vaguely suggests that it itself is a GUC.
I understand the ease of grepping.
But ideally, having global names that by convention would make it very clear 
that they are global, something like:
pg_conn or gconn or guc_synch_commit
The prefix does not matter, as long as once all the new variables and the ones 
that might have to be changed were chosen, they adopted the agreed nomenclature.
That way, when looking for global names, it would be easy.

>Here I propose to rename the global instead (to WalReceiverConn maybe).
>There's nothing about the name "wrconn" that suggests it's a global
>variable.  In any other place where the object is used as a local
>variable, I'd just use "conn".  Trying to be clever and adding a letter
>here or a letter there makes it *more* likely that you'll reference the
>wrong one in some function.
Again, it could be that name, WalReceiverConn, but nothing in it suggests it is 
a global one.
For a project that makes extensive use of globals, it would help to have a 
nomenclature defined at least for the prefix:
pg_WalReceiverConn or gWalReceiverConn and if it is a guc, guc_WalReceiverConn?

>I don't agree with this change very much.  I think "progname" in
>particular is a bit of a debacle right now but I don't think this is the
>best fix.  I'd leave this alone.
Ok. In such cases, it doesn't hurt today. But for future reasons, it would be 
better to fix everything, imo.

>As before: let's rename the file-level static instead.  "sentPtr" is not
>a good name.
gsent_Ptr or pg_sentPtr?

regards,
Ranier Vilela

Re: backup manifests

2019-12-10 Thread Suraj Kharage

Hi,

Please find attached patch for backup validator implementation (0004
patch). This patch is based
on Rushabh's latest patch for backup manifest.

There are some functions required at client side as well, so I have moved
those functions
and some data structure at common place so that they can be accessible for
both. (0003 patch).

My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap
test cases which
is also attached. As of now, test cases related to the tablespace and tar
backup  format are missing,
will continue work on same and submit the complete patch.

With this mail, I have attached the complete patch stack for backup
manifest and backup
validate implementation.

Please let me know your thoughts on the same.

On Fri, Dec 6, 2019 at 1:44 AM Robert Haas  wrote:

> On Thu, Dec 5, 2019 at 11:22 AM Rushabh Lathia 
> wrote:
> > Here is the whole stack of patches.
>
> I committed 0001, as that's just refactoring and I think (hope) it's
> uncontroversial. I think 0002-0005 need to be squashed together
> (crediting all authors properly and in the appropriate order) as it's
> quite hard to understand right now, and that Suraj's patch to validate
> the backup should be included in the patch stack. It needs
> documentation. Also, we need, either in that patch or a separate, TAP
> tests that exercise this feature. Things we should try to check:
>
> - Plain format backups can be verified against the manifest.
> - Tar format backups can be verified against the manifest after
> untarring (this might be a problem; not sure there's any guarantee
> that we have a working "tar" command available).
> - Verification succeeds for all available checksums algorithms and
> also for no checksum algorithm (should still check which files are
> present, and sizes).
> - If we tamper with a backup by removing a file, adding a file, or
> changing the size of a file, the modification is detected even without
> checksums.
> - If we tamper with a backup by changing the contents of a file but
> not the size, the modification is detected if checksums are used.
> - Everything above still works if there is user-defined tablespace
> that contains a table.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

-- 
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

0001-Backup-manifest-with-file-names-sizes-timestamps-opt.patch
Description: Binary data

0002_new_data_structure_v2.patch
Description: Binary data

0003-refactor.patch
Description: Binary data

0004-backup-validator.patch
Description: Binary data

0005-backup_validator_tap_test_WIP.patch
Description: Binary data

Re: xact_start for walsender & logical decoding not updated

2019-12-10 Thread Tomas Vondra


On Mon, Dec 09, 2019 at 04:04:40PM -0800, Andres Freund wrote:

Hi,

On 2019-12-10 00:44:09 +0100, Tomas Vondra wrote:

I think there's a minor bug in pg_stat_activity tracking of walsender
processes. The issue is that xact_start is only updated at the very
beginning when the walsender starts (so it's almost exactly equal to
backend_start) and then just flips between NULL and that value.

Reproducing this is trivial - just create a publication/subscription
with the built-in logical replication, and run arbitrary workload.
You'll see that the xact_start value never changes.

I think the right fix is calling SetCurrentStatementStartTimestamp()
right before StartTransactionCommand() in ReorderBufferCommit, per the
attached patch.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



diff --git a/src/backend/replication/logical/reorderbuffer.c 
b/src/backend/replication/logical/reorderbuffer.c
index 53affeb877..5235fb31b8 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -1554,7 +1554,10 @@ ReorderBufferCommit(ReorderBuffer *rb, TransactionId xid,
if (using_subtxn)
BeginInternalSubTransaction("replay");
else
+   {
+   SetCurrentStatementStartTimestamp();
StartTransactionCommand();
+   }


I'm quite doubtful this is useful. To me this seems to do nothing but
add the overhead of timestamp computation - which isn't always that
cheap. I don't think you really can draw meaning from this?



I don't want to use this timestamp directly, but it does interfere with
monitoring of long-running transactiosn looking at pg_stat_activity.
With the current behavior, the walsender entries have ancient timestamps
and produce random blips in monitoring. Of course, it's possible to edit
the queries to skip entries with backend_type = walsender, but that's a
bit inconvenient.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: xact_start for walsender & logical decoding not updated

2019-12-10 Thread Tomas Vondra


On Tue, Dec 10, 2019 at 09:42:17AM +0900, Kyotaro Horiguchi wrote:

At Tue, 10 Dec 2019 00:44:09 +0100, Tomas Vondra  
wrote in

Hi,

I think there's a minor bug in pg_stat_activity tracking of walsender
processes. The issue is that xact_start is only updated at the very
beginning when the walsender starts (so it's almost exactly equal to
backend_start) and then just flips between NULL and that value.

Reproducing this is trivial - just create a publication/subscription
with the built-in logical replication, and run arbitrary workload.
You'll see that the xact_start value never changes.

I think the right fix is calling SetCurrentStatementStartTimestamp()
right before StartTransactionCommand() in ReorderBufferCommit, per the
attached patch.


I'm not sure how much xact_start for walsender is useful and we really
is not running a statement there.  Also autovac launcher starts
transaction without a valid statement timestamp perhaps for the same
reason.



Maybe, but then maybe we should change it so that we don't report any
timestamps for such processes.


However, if we want to show something meaningful there, I think
commit_time might be more informative there.  If we use
GetCurrentTimestamp(), StartTransaction() already has the same feature
for autonomous transactions. I suppose we should do them a unified
way.



I don't think so. We have this information from the apply side, and this
is really about the *new* transaction started in reorderbuffer.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Unicode normalization test broken output

2019-12-10 Thread Peter Eisentraut


On 2019-12-09 23:22, Tom Lane wrote:

Peter Eisentraut  writes:

There appear to be several off-by-more-than-one errors in norm_test.c
print_wchar_str().  Attached is a patch to fix this (and make the output
a bit prettier).  Result afterwards:


I concur that this looks broken and your patch improves it.
But I'm not very happy about the remaining assumption that
we don't have to worry about characters above U+.  I'd
rather see it allocate 11 bytes per allowed pg_wchar, and
manage the string contents with something like

p += sprintf(p, "U+%04X ", *s);


Good point.  Fixed in attached patch.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From bece2b343999f1545f1ff2b5be060d7a0e22a91a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Dec 2019 13:17:20 +0100
Subject: [PATCH v2] Fix output of Unicode normalization test

Several off-by-more-than-one errors caused the output in case of a
test failure to be unintelligible.

Discussion: 
https://www.postgresql.org/message-id/flat/6a7a8516-7d11-8fbd-0e8b-eadb4f0679eb%402ndquadrant.com
---
 src/common/unicode/norm_test.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/src/common/unicode/norm_test.c b/src/common/unicode/norm_test.c
index fee58a184a..de7d6a72c7 100644
--- a/src/common/unicode/norm_test.c
+++ b/src/common/unicode/norm_test.c
@@ -23,17 +23,20 @@ static char *
 print_wchar_str(const pg_wchar *s)
 {
 #define BUF_DIGITS 50
-   static char buf[BUF_DIGITS * 2 + 1];
+   static char buf[BUF_DIGITS * 11 + 1];
int i;
+   char   *p;
 
i = 0;
+   p = buf;
while (*s && i < BUF_DIGITS)
{
-   snprintf(&buf[i * 2], 3, "%04X", *s);
+   p += sprintf(p, "U+%04X ", *s);
i++;
s++;
}
-   buf[i * 2] = '\0';
+   *p = '\0';
+
return buf;
 }
 
@@ -67,9 +70,9 @@ main(int argc, char **argv)
if (pg_wcscmp(test->output, result) != 0)
{
printf("FAILURE (NormalizationTest.txt line %d):\n", 
test->linenum);
-   printf("input:\t%s\n", print_wchar_str(test->input));
-   printf("expected:\t%s\n", 
print_wchar_str(test->output));
-   printf("got\t%s\n", print_wchar_str(result));
+   printf("input:%s\n", print_wchar_str(test->input));
+   printf("expected: %s\n", print_wchar_str(test->output));
+   printf("got:  %s\n", print_wchar_str(result));
printf("\n");
exit(1);
}
-- 
2.24.0

Re: about allow_system_table_mods and SET STATISTICS

2019-12-10 Thread Peter Eisentraut


On 2019-12-05 00:16, Tom Lane wrote:

Seems reasonable.  The argument for making this an exception to
allow_system_table_mods was always more about expediency than logical
cleanliness.  After the recent changes I think it's okay to remove the
special case (especially since nobody has griped about it being broken).

However ... if we're not going to have that special case, couldn't
we get rid of the whole business of having a special permissions test?
What is it that ATSimplePermissions can't handle here?  The business
about requiring a colName certainly doesn't need to be done before the
ownership check (in fact, it could be left to execution, so we don't need
a prep function at all anymore).


Good point.  Done in the attached patch.

(If someone wanted to revive the original functionality, it would 
nowadays probably be easier to add a flag ATT_SYSTEM_TABLE to 
ATSimplePermissions(), so there is really no reason to keep the old 
function separate.)


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ca4b1eefade67af55fb16827fca9b1c3c7b1c942 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 10 Dec 2019 13:36:11 +0100
Subject: [PATCH v2] Remove ATPrepSetStatistics

It was once possible to do ALTER TABLE ... SET STATISTICS on system
tables without allow_sytem_table_mods.  This was changed apparently by
accident between PostgreSQL 9.1 and 9.2, but a code comment still
claimed this was possible.  Without that functionality, having a
separate ATPrepSetStatistics() is useless, so use the generic
ATSimplePermissions() instead and move the remaining custom code into
ATExecSetStatistics().
---
 src/backend/commands/tablecmds.c | 57 
 1 file changed, 14 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 5440eb9015..fdc02d9f7f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -386,8 +386,6 @@ static ObjectAddress ATExecAddIdentity(Relation rel, const 
char *colName,
 static ObjectAddress ATExecSetIdentity(Relation rel, const char *colName,
   Node 
*def, LOCKMODE lockmode);
 static ObjectAddress ATExecDropIdentity(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE lockmode);
-static void ATPrepSetStatistics(Relation rel, const char *colName, int16 
colNum,
-   Node *newValue, 
LOCKMODE lockmode);
 static ObjectAddress ATExecSetStatistics(Relation rel, const char *colName, 
int16 colNum,

 Node *newValue, LOCKMODE lockmode);
 static ObjectAddress ATExecSetOptions(Relation rel, const char *colName,
@@ -3948,9 +3946,9 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
pass = AT_PASS_COL_ATTRS;
break;
case AT_SetStatistics:  /* ALTER COLUMN SET STATISTICS */
+   ATSimplePermissions(rel, ATT_TABLE | ATT_MATVIEW | 
ATT_INDEX | ATT_PARTITIONED_INDEX | ATT_FOREIGN_TABLE);
ATSimpleRecursion(wqueue, rel, cmd, recurse, lockmode);
-   /* Performs own permission checks */
-   ATPrepSetStatistics(rel, cmd->name, cmd->num, cmd->def, 
lockmode);
+   /* No command-specific prep needed */
pass = AT_PASS_MISC;
break;
case AT_SetOptions: /* ALTER COLUMN SET ( options ) 
*/
@@ -6702,45 +6700,7 @@ ATExecDropIdentity(Relation rel, const char *colName, 
bool missing_ok, LOCKMODE
 
 /*
  * ALTER TABLE ALTER COLUMN SET STATISTICS
- */
-static void
-ATPrepSetStatistics(Relation rel, const char *colName, int16 colNum, Node 
*newValue, LOCKMODE lockmode)
-{
-   /*
-* We do our own permission checking because (a) we want to allow SET
-* STATISTICS on indexes (for expressional index columns), and (b) we 
want
-* to allow SET STATISTICS on system catalogs without requiring
-* allowSystemTableMods to be turned on.
-*/
-   if (rel->rd_rel->relkind != RELKIND_RELATION &&
-   rel->rd_rel->relkind != RELKIND_MATVIEW &&
-   rel->rd_rel->relkind != RELKIND_INDEX &&
-   rel->rd_rel->relkind != RELKIND_PARTITIONED_INDEX &&
-   rel->rd_rel->relkind != RELKIND_FOREIGN_TABLE &&
-   rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
-   ereport(ERROR,
-   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
-errmsg("\"%s\" is not a table, materialized 
view, index, or foreign table",
-   RelationGetRelationName(rel;
-
-   /*
-* We allow refer

Re: proposal: minscale, rtrim, btrim functions for numeric

2019-12-10 Thread Karl O. Pinc

On Tue, 10 Dec 2019 07:11:59 +0100
Pavel Stehule  wrote:

> út 10. 12. 2019 v 0:03 odesílatel Karl O. Pinc  napsal:
> > I also wonder whether all the trim_scale() tests
> > are now necessary, but not enough to make any suggestions.

> I don't think so tests should be minimalistic - there can be some
> redundancy to coverage some less probable size effects of some future
> changes. More - there is a small symmetry with min_scale tests - and
> third argument - some times I use tests (result part) as
> "documentation".

Fine with me.

Tests pass against HEAD.  Docs build and look good.
Patch looks good to me.

I'm marking it ready for a committer.

Thanks for the work.

Regards,

Karl 
Free Software:  "You don't pay back, you pay forward."
 -- Robert A. Heinlein

Re: BufFileRead() error signalling

2019-12-10 Thread Ibrar Ahmed

You are checking file->dirty twice, first before calling the function and 
within the function too. Same for the Assert. For example.
size_t
BufFileRead(BufFile *file, void *ptr, size_t size)
{   
     size_t      nread = 0;
     size_t      nthistime;
     if (file->dirty)
     {   
         BufFileFlush(file);
         Assert(!file->dirty);
     }
static void
 BufFileFlush(BufFile *file)
 {
     if (file->dirty)
         BufFileDumpBuffer(file);
     Assert(!file->dirty);

The new status of this patch is: Waiting on Author

RE: [Proposal] Level4 Warnings show many shadow vars

2019-12-10 Thread Ranier Vilela

New version the global patch, with the considerations.
Unfortunately WalReceiverConn cannot be used because it is currently the 
typedef name for the structure.
I switched to WalReceiverConnection, it was long but it looks good.
RedoRecPtr proper name has no consensus yet, so it was still lastRedoRecPtr.

regards,
Ranier Vileladiff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 6bc1a6b46d..83be23d77b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -892,8 +892,8 @@ static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static void PreallocXlogFiles(XLogRecPtr endptr);
 static void RemoveTempXlogFiles(void);
-static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr);
-static void RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr);
+static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastRedoRecPtr, XLogRecPtr endptr);
+static void RemoveXlogFile(const char *segname, XLogRecPtr lastRedoRecPtr, XLogRecPtr endptr);
 static void UpdateLastRemovedPtr(char *filename);
 static void ValidateXLOGDirectoryStructure(void);
 static void CleanupBackupHistory(void);
@@ -2299,7 +2299,7 @@ assign_checkpoint_completion_target(double newval, void *extra)
  * XLOG segments? Returns the highest segment that should be preallocated.
  */
 static XLogSegNo
-XLOGfileslop(XLogRecPtr RedoRecPtr)
+XLOGfileslop(XLogRecPtr lastRedoRecPtr)
 {
 	XLogSegNo	minSegNo;
 	XLogSegNo	maxSegNo;
@@ -2311,9 +2311,9 @@ XLOGfileslop(XLogRecPtr RedoRecPtr)
 	 * correspond to. Always recycle enough segments to meet the minimum, and
 	 * remove enough segments to stay below the maximum.
 	 */
-	minSegNo = RedoRecPtr / wal_segment_size +
+	minSegNo = lastRedoRecPtr / wal_segment_size +
 		ConvertToXSegs(min_wal_size_mb, wal_segment_size) - 1;
-	maxSegNo = RedoRecPtr / wal_segment_size +
+	maxSegNo = lastRedoRecPtr / wal_segment_size +
 		ConvertToXSegs(max_wal_size_mb, wal_segment_size) - 1;
 
 	/*
@@ -2328,7 +2328,7 @@ XLOGfileslop(XLogRecPtr RedoRecPtr)
 	/* add 10% for good measure. */
 	distance *= 1.10;
 
-	recycleSegNo = (XLogSegNo) ceil(((double) RedoRecPtr + distance) /
+	recycleSegNo = (XLogSegNo) ceil(((double) lastRedoRecPtr + distance) /
 	wal_segment_size);
 
 	if (recycleSegNo < minSegNo)
@@ -3949,12 +3949,12 @@ RemoveTempXlogFiles(void)
 /*
  * Recycle or remove all log files older or equal to passed segno.
  *
- * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * endptr is current (or recent) end of xlog, and lastRedoRecPtr is the
  * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
  */
 static void
-RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
+RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastRedoRecPtr, XLogRecPtr endptr)
 {
 	DIR		   *xldir;
 	struct dirent *xlde;
@@ -3997,7 +3997,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 /* Update the last removed location in shared memory first */
 UpdateLastRemovedPtr(xlde->d_name);
 
-RemoveXlogFile(xlde->d_name, RedoRecPtr, endptr);
+RemoveXlogFile(xlde->d_name, lastRedoRecPtr, endptr);
 			}
 		}
 	}
@@ -4071,14 +4071,14 @@ RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI)
 /*
  * Recycle or remove a log file that's no longer needed.
  *
- * endptr is current (or recent) end of xlog, and RedoRecPtr is the
+ * endptr is current (or recent) end of xlog, and lastRedoRecPtr is the
  * redo pointer of the last checkpoint. These are used to determine
  * whether we want to recycle rather than delete no-longer-wanted log files.
- * If RedoRecPtr is not known, pass invalid, and the function will recycle,
+ * If lastRedoRecPtr is not known, pass invalid, and the function will recycle,
  * somewhat arbitrarily, 10 future segments.
  */
 static void
-RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
+RemoveXlogFile(const char *segname, XLogRecPtr lastRedoRecPtr, XLogRecPtr endptr)
 {
 	char		path[MAXPGPATH];
 #ifdef WIN32
@@ -4094,10 +4094,10 @@ RemoveXlogFile(const char *segname, XLogRecPtr RedoRecPtr, XLogRecPtr endptr)
 		 * Initialize info about where to try to recycle to.
 		 */
 		XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
-		if (RedoRecPtr == InvalidXLogRecPtr)
+		if (lastRedoRecPtr == InvalidXLogRecPtr)
 			recycleSegNo = endlogSegNo + 10;
 		else
-			recycleSegNo = XLOGfileslop(RedoRecPtr);
+			recycleSegNo = XLOGfileslop(lastRedoRecPtr);
 	}
 	else
 		recycleSegNo = 0;		/* keep compiler quiet */
@@ -7158,11 +7158,11 @@ StartupXLOG(void)
 
 	if (info == XLOG_CHECKPOINT_SHUTDOWN)
 	{
-		CheckPoint	checkPoint;
+		CheckPoint	xcheckPoint;
 
-		memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
-		newTLI = ch

Re: proposal: minscale, rtrim, btrim functions for numeric

2019-12-10 Thread Pavel Stehule

út 10. 12. 2019 v 13:56 odesílatel Karl O. Pinc  napsal:

> On Tue, 10 Dec 2019 07:11:59 +0100
> Pavel Stehule  wrote:
>
> > út 10. 12. 2019 v 0:03 odesílatel Karl O. Pinc  napsal:
> > > I also wonder whether all the trim_scale() tests
> > > are now necessary, but not enough to make any suggestions.
>
> > I don't think so tests should be minimalistic - there can be some
> > redundancy to coverage some less probable size effects of some future
> > changes. More - there is a small symmetry with min_scale tests - and
> > third argument - some times I use tests (result part) as
> > "documentation".
>
> Fine with me.
>
> Tests pass against HEAD.  Docs build and look good.
> Patch looks good to me.
>
> I'm marking it ready for a committer.
>
> Thanks for the work.
>

Thank you for review

Pavel


> Regards,
>
> Karl 
> Free Software:  "You don't pay back, you pay forward."
>  -- Robert A. Heinlein
>

Re: Index corruption / planner issue with one table in my pg 11.6 instance

2019-12-10 Thread Jeremy Finzel

On Tue, Dec 10, 2019 at 12:09 AM Tom Lane  wrote:

> Yeah.  The reported behavior can mostly be explained if we assume
> that there's some HOT chain in the table that involves an update
> of this particular column, so that if we build an index on that
> column we see a broken HOT chain, but building an index on some
> other column doesn't have a problem.
>

The problem exists so far as I can tell on indexing *any column* of *this
particular table*.  I tried same experiment on another table in the same
replication stream, and I cannot reproduce it.

I am building the index **non-concurrently** every time.

> The thing this doesn't easily explain is that the behavior persists
> across repeated index rebuilds.  A broken HOT chain is only broken
> as long as the older entry is still visible-to-somebody, so that
> such situations ought to be self-healing as time passes.  If it
> fails repeatedly, this theory requires assuming that either
>
> 1. You've got some extremely old open transactions (maybe forgotten
> prepared transactions?), or
>

No prepared_xacts and no transactions older than a few hours.  Several hour
transactions are common in this reporting system.  I have not yet seen if
after several hours the index starts showing up in plans.

> 2. Your workload is constantly generating new broken HOT chains of
> the same sort, so that there's usually a live one when you try
> to build an index.
>
> The fact that you even notice the indcheckxmin restriction indicates
> that you do tend to have long-running transactions in the system,
> else the index would come free for use fairly quickly.  So #1 isn't
> as implausible as I might otherwise think.  But #2 seems probably
> more likely on the whole.  OTOH, neither point is exactly within
> the offered evidence.
>

Is there a way for me to test this theory?  I tried the following with no
change in behavior:

   1. Disable write load to table
   2. Vacuum analyze table (not vac full)
   3. Create index
   4. Explain

Still did not pick up the index.

Thanks,
Jeremy

Re: WIP/PoC for parallel backup

2019-12-10 Thread Asif Rehman

On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <
jeevan.cha...@enterprisedb.com> wrote:

>
>
> On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman 
> wrote:
>
>>
>> Sorry, I sent the wrong patches. Please see the correct version of the
>> patches (_v6).
>>
>
> Review comments on these patches:
>
> 1.
> +XLogRecPtrwal_location;
>
> Looking at the other field names in basebackup_options structure, let's use
> wallocation instead. Or better startwallocation to be precise.
>
> 2.
> +int32size;
>
> Should we use size_t here?
>
> 3.
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?
>
> 4.
> +else if (
> +#ifndef WIN32
> + S_ISLNK(statbuf.st_mode)
> +#else
> + pgwin32_is_junction(pathbuf)
> +#endif
> +)
> +{
> +/*
> + * If symlink, write it as a directory. file symlinks only
> allowed
> + * in pg_tblspc
> + */
> +statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
> +_tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf,
> false);
> +}
>
> In normal backup mode, we skip the special file which is not a regular
> file or
> a directory or a symlink inside pg_tblspc. But in your patch, above code,
> treats it as a directory. Should parallel backup too skip such special
> files?
>

Yeah going through the code again, I found it a little bit inconsistent. In
fact
SendBackupFiles function is supposed to send the files that were requested
of
it. However, currently is performing these tasks:

1) If the requested file were to be a directory, it will return a TAR
directory entry.
2) If the requested files were to be symlink inside pg_tblspc, it will
return the link path.
3) and as you pointed out above, if the requested files were a symlink
outside pg_tblspc
and inside PGDATA then it will return TAR directory entry.

I think that this function should not take care of any of the above.
Instead, it should
be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should
only send the
regular files and ignore the request of any other kind, be it a directory
or symlink.

Any thoughts?


> 5.
> Please keep header file inclusions in alphabetical order in basebackup.c
> and
> pg_basebackup.c
>
> 6.
> +/*
> + * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
> + * 'base/1/1245/32683', ...) [options]
> + */
>
> Please update these comments as we fetch one file at a time.
>
> 7.
> +backup_file:
> +SCONST{ $$ = (Node *)
> makeString($1); }
> +;
> +
>
> Instead of having this rule with only one constant terminal, we can use
> SCONST directly in backup_files_list. However, I don't see any issue with
> this approach either, just trying to reduce the rules.
>
> 8.
> Please indent code within 80 char limit at all applicable places.
>
> 9.
> Please fix following typos:
>
> identifing => identifying
> optionaly => optionally
> structre => structure
> progrsss => progress
> Retrive => Retrieve
> direcotries => directories
>
>
> =
>
> The other mail thread related to backup manifest [1], is creating a
> backup_manifest file and sends that to the client which has optional
> checksum and other details including filename, file size, mtime, etc.
> There is a patch on the same thread which is then validating the backup
> too.
>
> Since this patch too gets a file list from the server and has similar
> details (except checksum), can somehow parallel backup use the
> backup-manifest
> infrastructure from that patch?
>

This was discussed earlier in the thread, and as Robert suggested, it would
complicate the
code to no real benefit.


> When the parallel backup is in use, will there be a backup_manifest file
> created too? I am just visualizing what will be the scenario when both
> these
> features are checked-in.
>

Yes, I think it should. Since the full backup will have a manifest file,
there is no
reason for parallel backup to not support it.

I'll share the updated patch in the next couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

2019-12-10 Thread Asif Rehman

On Thu, Nov 28, 2019 at 12:57 AM Robert Haas  wrote:

> On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
>  wrote:
> > I am still not sure why we need SEND_BACKUP_FILELIST as a separate
> command.
> > Can't we return the file list with START_BACKUP itself?
>
> I had the same thought, but I think it's better to keep them separate.
> Somebody might want to use the SEND_BACKUP_FILELIST command for
> something other than a backup (I actually think it should be called
> just SEND_FILE_LIST)


Sure. Thanks for the recommendation. To keep the function names in sync, I
intend to do following the
following renamings:
- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST -->  SEND_FILE_LIST

. Somebody might want to start a backup without
> getting a file list because they're going to copy the files at the FS
> level. Somebody might want to get a list of files to process after
> somebody else has started the backup on another connection. Or maybe
> nobody wants to do any of those things, but it doesn't seem to cost us
> much of anything to split the commands, so I think we should.
>

+1

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-09, Tom Lane wrote:

> Some quick review of v19:

Here's v20 with all these comments hopefully addressed.


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 62277fcd3f63ae68495300c98f77c7e4b4713614 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 3 Dec 2019 10:08:35 -0300
Subject: [PATCH v20] Add backend-only appendStringInfoStringQuoted
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Simplifies some coding that prints parameters, as well as optimize to do
it per non-quote chunks instead of per byte.

This lives separetely from common/stringinfo.c so that frontend users of
that file need not pull in unnecessary multibyte-encoding support code.

Author: Álvaro Herrera and Alexey Bashtanov, after a suggestion from Andres Freund
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190920203905.xkv5udsd5dxfs...@alap3.anarazel.de
---
 src/backend/tcop/postgres.c   | 10 +--
 src/backend/utils/mb/Makefile |  1 +
 src/backend/utils/mb/README   |  1 +
 src/backend/utils/mb/stringinfo_mb.c  | 92 +++
 src/pl/plpgsql/src/pl_exec.c  | 36 +++
 src/test/regress/expected/plpgsql.out | 14 
 src/test/regress/sql/plpgsql.sql  | 13 
 7 files changed, 132 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/utils/mb/stringinfo_mb.c

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3b85e48333..3d3172e83d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2348,7 +2348,6 @@ errdetail_params(ParamListInfo params)
 			Oid			typoutput;
 			bool		typisvarlena;
 			char	   *pstring;
-			char	   *p;
 
 			appendStringInfo(¶m_str, "%s$%d = ",
 			 paramno > 0 ? ", " : "",
@@ -2364,14 +2363,7 @@ errdetail_params(ParamListInfo params)
 
 			pstring = OidOutputFunctionCall(typoutput, prm->value);
 
-			appendStringInfoCharMacro(¶m_str, '\'');
-			for (p = pstring; *p; p++)
-			{
-if (*p == '\'') /* double single quotes */
-	appendStringInfoCharMacro(¶m_str, *p);
-appendStringInfoCharMacro(¶m_str, *p);
-			}
-			appendStringInfoCharMacro(¶m_str, '\'');
+			appendStringInfoStringQuoted(¶m_str, pstring, 0);
 
 			pfree(pstring);
 		}
diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index 18dd758cfe..cd4a016449 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -16,6 +16,7 @@ OBJS = \
 	conv.o \
 	encnames.o \
 	mbutils.o \
+	stringinfo_mb.o \
 	wchar.o \
 	wstrcmp.o \
 	wstrncmp.o
diff --git a/src/backend/utils/mb/README b/src/backend/utils/mb/README
index c9bc6e6f8d..7495ca5db2 100644
--- a/src/backend/utils/mb/README
+++ b/src/backend/utils/mb/README
@@ -9,6 +9,7 @@ wchar.c:	mostly static functions and a public table for mb string and
 		multibyte conversion
 mbutils.c:	public functions for the backend only.
 		requires conv.c and wchar.c
+stringinfo_mb.c: public backend-only multibyte-aware stringinfo functions
 wstrcmp.c:	strcmp for mb
 wstrncmp.c:	strncmp for mb
 win866.c:	a tool to generate KOI8 <--> CP866 conversion table
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
new file mode 100644
index 00..baef1aa39a
--- /dev/null
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -0,0 +1,92 @@
+/*-
+ *
+ * stringinfo_mb.c
+ *		Multibyte encoding-aware additional StringInfo facilites
+ *
+ * This is separate from common/stringinfo.c so that frontend users
+ * of that file need not pull in unnecessary multibyte-encoding support
+ * code.
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mb/stringinfo_mb.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "lib/stringinfo.h"
+#include "mb/pg_wchar.h"
+
+
+/*
+ * appendStringInfoStringQuoted
+ *
+ * Append up to maxlen characters from s to str, or the whole input string if
+ * maxlen <= 0, adding single quotes around it and doubling all single quotes.
+ * Add an ellipsis if the copy is incomplete.
+ */
+void
+appendStringInfoStringQuoted(StringInfo str, const char *s, int maxlen)
+{
+	char	   *copy = NULL;
+	const char *chunk_search_start,
+			   *chunk_copy_start,
+			   *chunk_end;
+	bool		ellipsis;
+	int			slen;
+
+	Assert(str != NULL);
+
+	slen = strlen(s);
+	if (maxlen > 0 && maxlen < slen)
+	{
+		int		finallen = pg_mbcliplen(s, slen, maxlen);
+
+		copy = pnstrdup(s, finallen);
+		chunk_search_start = copy;
+		chunk_copy_start = copy;
+
+		ellipsis = true;
+	}
+	else
+	{
+		chunk_search_start = s;
+		chunk_copy_start = s;
+
+		ellipsis = false;
+	}
+
+	appendStringInfoCharMacro(str, '\'');

Re: Collation versioning

2019-12-10 Thread Julien Rouhaud

On Fri, Nov 8, 2019 at 2:24 AM Thomas Munro  wrote:
>
> On Thu, Nov 7, 2019 at 10:20 PM Julien Rouhaud  wrote:
> >
> > I didn't do anything for the spurious warning when running a reindex,
> > and kept original approach of pg_depend catalog.
>
> I see three options:
>
> 1.  Change all or some of index_open(), relation_open(),
> RelationIdGetRelation(), RelationBuildDesc() and
> RelationInitIndexAccessInfo() to take some kind of flag so we can say
> NO_COLLATION_VERSION_CHECK_PLEASE, and then have ReindexIndex() pass
> that flag down when opening it for the purpose of rebuilding it.
> 2.  Use a global state to suppress these warnings while opening that
> index.  Perhaps ReindexIndex() would call RelCacheWarnings(false)
> before index_open(), and use PG_FINALLY to make sure it restores
> warnings with RelCacheWarnings(true).  (This is a less-code-churn
> bad-programming-style version of #1.)
> 3.  Move the place that we run the collation version check.  Instead
> of doing it in RelationInitIndexAccessInfo() so that it runs when the
> relation cache first loads the index, put it into a new routine
> RelationCheckValidity() and call that from ... I don't know, some
> other place that runs whenever we access indexes but not when we
> rebuild them.
>
> 3's probably a better idea, if you can find a reasonable place to call
> it from.  I'm thinking some time around the time the executor acquires
> locks, but using a flag in the relcache entry to make sure it doesn't
> run the check again if there were no warnings last time (so one
> successful version check turns the extra work off for the rest of this
> relcache entry's lifetime).  I think it'd be a feature, not a bug, if
> it gave you the warning every single time you executed a query using
> an index that has a mismatch...  but a practical alternative would be
> to check only once per index and that probably makes more sense.

I tried to dig into approach #3.  I think that trying to perform this
check when the executor acquires locks won't work well with at least
with partitioned table.  I instead tried to handle that in
get_relation_info(), adding a flag in the relcache to emit each
warning only once per backend lifetime, and this seems to work quite
well.

I think that on top of that, we should make sure that non-full vacuum
and analyze also emit such warnings, so that autovacuum can spam the
logs too.

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-10 Thread Tom Lane

Amit Kapila  writes:
> On Sun, Dec 8, 2019 at 10:27 PM Tom Lane  wrote:
>> Doing it like this seems attractive to me because it gets rid of two
>> different failure modes: inability to create a new thread and inability
>> to create a new pipe handle.  Now on the other hand, it means that
>> inability to complete the read/write transaction with a client right
>> away will delay processing of other signals.  But we know that the
>> client is engaged in a CallNamedPipe operation, so how realistic is
>> that concern?

> Right, the client is engaged in a CallNamedPipe operation, but the
> current mechanism can allow multiple such clients and that might lead
> to faster processing of signals.

It would only matter if multiple processes signal the same backend at the
same time, which seems to me to be probably a very minority use-case.
For the normal case of one signal arriving at a time, what I'm suggesting
ought to be noticeably faster because of fewer kernel calls.  Surely
creating a new pipe instance and a new thread are not free.

In any case, the main thing I'm on about here is getting rid of the
failure modes.  The existing code does have a rather lame/buggy
workaround for the cant-create-new-pipe case.  A possible answer for
cant-create-new-thread might be to go ahead and service the current
request locally in the long-lived signal thread.  But that seems like
it's piling useless (and hard to test) complexity on top of useless
complexity.

> Ideally, we can run a couple of tests to see if there is any help in
> servicing the signals with this mechanism over proposed change on
> different Windows machines, but is it really worth the effort?

The failure modes I'm worried about are obviously pretty low-probability;
if they were not, we'd be getting field reports about it.  So I'm not
sure how you can test your way to a conclusion about whether this is an
improvement.  But we're not in the business of ignoring failure modes
just because they're low-probability.  I'd argue that a kernel call
that's not there is a kernel call that cannot fail, and therefore ipso
facto an improvement.

regards, tom lane

Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-10 Thread John W Higgins

On Tue, Dec 10, 2019 at 5:21 AM Ranier Vilela 
wrote:

> New version the global patch, with the considerations.
> Unfortunately WalReceiverConn cannot be used because it is currently the
> typedef name for the structure.
> I switched to WalReceiverConnection, it was long but it looks good.
> RedoRecPtr proper name has no consensus yet, so it was still
> lastRedoRecPtr.
>
>
For someone that expounds consistency - this patch is the furthest thing
from it.

In some places names are randomly changed to have an underscore
(authmethodlocal to authmethod_local with the obvious inconsistency as
well) - in some places names are changed to remove underscores (stop_t to
stopt). Some places random letters are added (checkPoint to xcheckPoint)
some places perfectly good names are truncated (conf_file to file).

Random places remove perfectly good prefixes and replace with single
letters (numTables to nTables)

Random places switch from lower case names to upper case names (sentPtr to
WalSentPtr) most places leave lower case names (days to ndays).

Please at least be consistent within the patch itself.

John W Higgins

Re: Unicode normalization test broken output

2019-12-10 Thread Tom Lane

Peter Eisentraut  writes:
> Good point.  Fixed in attached patch.

This one LGTM.

regards, tom lane

Re: about allow_system_table_mods and SET STATISTICS

2019-12-10 Thread Tom Lane

Peter Eisentraut  writes:
> Good point.  Done in the attached patch.
> (If someone wanted to revive the original functionality, it would 
> nowadays probably be easier to add a flag ATT_SYSTEM_TABLE to 
> ATSimplePermissions(), so there is really no reason to keep the old 
> function separate.)

Yeah --- that way, the behavior would also be conveniently available
to other ALTER TABLE subcommands.

This patch looks good, with one trivial nitpick: it looks a bit odd
to insert the relkind check into ATExecSetStatistics between the
assignment of "newtarget" and the validity check for same.  I'd
put it either before or after that whole stanza.  Just a cosmetic
thing though.

regards, tom lane

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Jeff Janes

On Mon, Dec 9, 2019 at 5:10 PM Jens-Wolfhard Schicke-Uffmann <
drahf...@gmx.de> wrote:

> Hi,
>
> today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> cores) lock contention on a buffer content lock due to taking of a
> SHARED lock (I think):
>

What version of PostgreSQL are you using?

Cheers,

Jeff

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Andres Freund

Hi,

On 2019-12-09 23:10:36 +0100, Jens-Wolfhard Schicke-Uffmann wrote:
> today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> cores) lock contention on a buffer content lock due to taking of a
> SHARED lock (I think):

> Three tables were involved, simplified case:
>
> CREATE TABLE global_config (id BIGINT PRIMARY KEY);
>
> CREATE TABLE b (
>   id BIGINT PRIMARY KEY,
>   config_id BIGINT REFERENCES global_config (id)
> );
>
> CREATE TABLE c (
>   id BIGINT PRIMARY KEY,
>   config_id BIGINT REFERENCES global_config (id)
> );
>
> (I suppose having both b + c doesn't make a difference, but
> maybe it was relevant, so I'm including it.)
>
> Heavy INSERT + UPDATE traffic on b + c (all trivial via id),
> SELECTs on global_config (again by id).
> As the name suggests, there were only very few rows in
> global_config, specifically only one was referenced by all
> INSERT + UPDATEs on b + c.
>
> On lighter load, all three types of queries were taking <1ms (as
> expected), as load grew, all three went to ~50ms avg. execution time
> together. AWS RDS console showed wait on LWLock:buffer_content as the
> main contribution to that time.
>
> Checking the code, I concluded that I observed lock contention
> on the lock taken at the beginning of heap_lock_tuple, where
> an exclusive buffer content lock is held while recording the
> SHARE lock into the tuple and the WAL and the multiXact. I don't know
> the actual number, but potentially up to 7000 active
> transactions were holding a SHARE lock on that row, which could have
> performance implications while scanning for multiXact memberships.

When you say "7000 active transactions" - do you mean to say that you
have set max_connections to something higher than that, and you actually
have that many concurrent transactions?


> Semantically, all that lock traffic was superfluous, as the
> global_config row's key was in no danger of being changed.

Well, postgres can't know that.


> As this situation (some global, essentially static, entity is referenced
> by a much written table) seems not uncommon, I wonder:
>
> 1. Does the above analysis sound about right?

Hard to know without additional data.


> 2. If so, would it be worthwhile to develop a solution?

Possible, but I'm not sure it's worth the complexity.

I'd definitely like to see a proper reproducer and profile for this,
before investigating further.


>I was thinking along the lines of introducing an multiXact
>representation of "everyone": Instead of meticulously recording every
>locking + completing transaciton in a multiXact, after a certain
>number of transactions has accumulated in a single multiXact, it is
>approximated as "everyone". If later a transaction finds that a SHARE
>lock is held by "everyone", the tuple would need no further modification

I think the big problem with a strategy like this is that it's prone to
generate deadlocks that aren't present in the "original" scheduling.


>(not sure if this could even be checked without taking an exclusive
>buffer lock).

It should only require a share lock.


>The hard part would probably be to ensure that an
>attempt to obtain an EXCLUSIVE lock would finally succeed against a
>SHARE lock held by "everyone".

Note that this is a seriously complicated area of the code. It's very
easy to create new bugs that aren't easily testable. I think we'd need a
very convincing use-case for improvements around the problem you outline
and relatively simple solution, to counter stability concerns.


Greetings,

Andres Freund

RE: [Proposal] Level4 Warnings show many shadow vars

2019-12-10 Thread Ranier Vilela

De: John W Higgins 
Enviado: terça-feira, 10 de dezembro de 2019 15:58

>For someone that expounds consistency - this patch is the furthest thing from 
>it.
>In some places names are randomly changed to have an underscore 
>>(authmethodlocal to authmethod_local with the obvious inconsistency as well) 
>- >in some places names are changed to remove underscores (stop_t to stopt). 
>>Some places random letters are added (checkPoint to xcheckPoint) some places 
>>perfectly good names are truncated (conf_file to file).
The first purpose of the patch was to remove collisions from shadow global 
variable names.
The second was not to change the semantics of variable names, hence the use of 
x or putting or remove underscore.
But I agree with you that the choice of names can improve.
xcheckpoint sounds ugly.
stopt sounds ugly too.

>Random places remove perfectly good prefixes and replace with single letters 
>>(numTables to nTables)
numTables already a global variable name.
nTables It seems very reasonable to me to contain the number of tables.

>Random places switch from lower case names to upper case names (sentPtr to 
>>WalSentPtr) most places leave lower case names (days to ndays).
again senPtr already a global variable name.
Well, I tried to follow the local source style a little, since the project does 
not have a default for global names.
There we have some WalSntCtl por example.

ndays sounds very good to me for number of days.

>Please at least be consistent within the patch itself.
I'm trying.

regards,
Ranier Vilela

Re: [Proposal] Level4 Warnings show many shadow vars

2019-12-10 Thread Stephen Frost

Greetings,

* Ranier Vilela (ranier_...@hotmail.com) wrote:
> >For someone that expounds consistency - this patch is the furthest thing 
> >from it.
> >In some places names are randomly changed to have an underscore 
> >>(authmethodlocal to authmethod_local with the obvious inconsistency as 
> >well) - >in some places names are changed to remove underscores (stop_t to 
> >stopt). >Some places random letters are added (checkPoint to xcheckPoint) 
> >some places >perfectly good names are truncated (conf_file to file).
> The first purpose of the patch was to remove collisions from shadow global 
> variable names.

There's multiple ways to get there though and I think what you're seeing
is that the "just change it to something else" answer isn't necessairly
going to be viewed as an improvement (or, at least, not enough of an
improvement to accept the cost of the change).

> The second was not to change the semantics of variable names, hence the use 
> of x or putting or remove underscore.

Why not change the variables?  Changes that also improve the code itself
along with eliminating the shadowing of the global variable are going to
be a lot easier to be accepted.

> >Random places remove perfectly good prefixes and replace with single letters 
> >>(numTables to nTables)
> numTables already a global variable name.

Sure, but have you looked at how it's used?  Instead of just renaming
the numTables variables in the functions that accept it- could those
variables just be removed instead of changing their name to make it look
like they're something different when they aren't actually different?

I've only spent a bit of time looking at it, but it sure looks like the
variables could just be removed, and doing so doesn't break the
regression tests, which supports the idea that maybe there's a better
way to deal with those particular variables rather than renaming them.

Another approach to consider might be to move some global variables into
structures that are then global with better names to indicate that's
what they are.

In short, a hack-and-slash patch that doesn't really spend much time
considering the changes beyond "let's just change these to be different
to avoid shadowing globals" isn't really a good way to go about
addressing these cases and has a good chance of making things more
confusing, not less.

Thanks,

Stephen

signature.asc
Description: PGP signature

allowing broader use of simplehash

2019-12-10 Thread Robert Haas

I recently became annoyed while working on patch A that I could not
use simplehash in shared memory, and then I became annoyed again while
working on patch B that I could not use simplehash in frontend code.
So here are a few patches for discussion.

A significant problem in either case is that a simplehash wants to
live in a memory context; no such thing exists either for data in
shared memory nor in frontend code. However, it seems to be quite easy
to provide a way for simplehash to be defined so that it doesn't care
about memory contexts. See 0001.

As far as frontend code goes, the only other problem I found is that
it makes use of elog() to signal some internal-ish messages. It seemed
to me that the easiest thing to do was, if FRONTEND is defined, use
pg_log_error(...) instead of elog(ERROR, ...). For the one instance of
elog(LOG, ...) in simplehash.h, I chose to use pg_log_info(). It's not
really equivalent, but it's probably the closest thing that currently
exists, and I think it's good enough for what's basically a debugging
message. See 0002.

I think those changes would also be enough to allow simplehash to be
used in a dynamic shared area (DSA). Using it in the main shared
memory segment seems more problematic, because simplehash relies on
being able to resize the hash table. Shared hash tables must have a
fixed maximum size, but with dynahash, we can count on being able to
use all of the entries without significant performance degradation.
simplehash, on the other hand, uses linear probing and relies on being
able to grow the hash table as a way of escaping collisions. By
default, the load factor is not permitted to drop below 0.1, so to
mimic the collision-avoidance behavior that we get in backend-private
uses of simplehash, we'd have to overallocate by 10x, which doesn't
seem desirable.

I'd really like to have an alternative to dynahash, which is awkward
to use and probably not particularly fast, but I'm not sure simplehash
is it.  Maybe what we really need is a third (or nth) hash table
implementation.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-simplehash-Allow-use-of-simplehash-without-MemoryCon.patch
Description: Binary data


0002-simplehash-Allow-use-in-frontend-code.patch
Description: Binary data

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-10, Andres Freund wrote:

> >The hard part would probably be to ensure that an
> >attempt to obtain an EXCLUSIVE lock would finally succeed against a
> >SHARE lock held by "everyone".
> 
> Note that this is a seriously complicated area of the code. It's very
> easy to create new bugs that aren't easily testable. I think we'd need a
> very convincing use-case for improvements around the problem you outline
> and relatively simple solution, to counter stability concerns.

I'd rather have the ability to mark a table READ ONLY (or similar).
Then any FK references can skip the row locks altogether.  For the rare
cases where you need to modify the referenced table, have it marked READ
WRITE, and any row locks are registered normally from that point on,
until you set it back to READ ONLY again.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: xact_start for walsender & logical decoding not updated

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-10, Tomas Vondra wrote:

> On Tue, Dec 10, 2019 at 09:42:17AM +0900, Kyotaro Horiguchi wrote:
> > At Tue, 10 Dec 2019 00:44:09 +0100, Tomas Vondra 
> >  wrote in

> > I'm not sure how much xact_start for walsender is useful and we really
> > is not running a statement there.  Also autovac launcher starts
> > transaction without a valid statement timestamp perhaps for the same
> > reason.
> 
> Maybe, but then maybe we should change it so that we don't report any
> timestamps for such processes.

Yeah, I think we should to that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: disable only nonparallel seq scan.

2019-12-10 Thread Robert Haas

On Sun, Dec 8, 2019 at 1:24 PM Jeff Janes  wrote:
> Is there a way to force a meaningful parallel seq scan, or at least the 
> planning of one, when the planner wants a non-parallel one?
>
> Usually I can do things like with with enable_* setting, but if I `set 
> enable_seqscan to off`, it penalizes the parallel seq scan 8 times harder 
> than it penalizes the non-parallel one, so the plan does not switch.
>
> If I set `force_parallel_mode TO on` then I do get a parallel plan, but it is 
> a degenerate one which tells me nothing I want to know.
>
> If I `set parallel_tuple_cost = 0` (or in some cases to a negative number), I 
> can force it switch, but that destroys the purpose, which is to see what the 
> "would have been" plan estimates are for the parallel seq scan under the 
> default setting of the cost parameters.
>
> I can creep parallel_tuple_cost downward until it switches, and then try to 
> extrapolate back up, but this tedious and not very reliable.

I don't think there's a way to force this, but setting both
parallel_setup_cost and parallel_tuple_cost to 0 seems to often be
enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Statistics improvements for time series data

2019-12-10 Thread Robert Haas

On Sun, Dec 8, 2019 at 7:12 PM Mark Dilger  wrote:
> I think the words "IF AND ONLY IF AUTOVACUUM WOULD" should be
> replaced with a single word and added to the grammar where
> vacuum_option_elem lists VERBOSE, FREEZE and FULL.  Perhaps
> "OPTIONALLY", or "AUTOVACUUMESQUE", though I'm really hoping
> somebody has a better suggestion.

vacuum_option_elem doesn't exist any more, since commit
6776142a07afb4c28961f27059d800196902f5f1.

I think OPTIONALLY would be a fine keyword:

VACUUM (OPTIONALLY) my_table;
ANALYZE (OPTIONALLY) my_table;

It wouldn't even need to be a parser keyword; see
disable_page_skipping for a comparable.

> In the given example, above, the user would likely set the vacuum
> and analyze scale factors to zero and the thresholds to something
> they've empirically determined to work well for their purposes.
> That might be a problem in practice, given that it also impacts
> autovacuum's choices.  Would people prefer that those thresholds
> be passed as parameters to the command directly?
>
>VACUUM sometable OPTIONALLY (vacuum_threshold = 10, vacuum_scale = 0)
>
> and only default to autovacuum's settings when not specified?

I think that syntax is a non-starter. We should try to fit into the
existing mold for vacuum options.  But I don't see a reason why we
couldn't allow:

VACUUM (OPTIONALLY, THRESHOLD 10, SCALE 0) my_table;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Start Walreceiver completely before shut down it on standby server.

2019-12-10 Thread Ashwin Agrawal

On Tue, Dec 10, 2019 at 3:06 AM jiankang liu  wrote:

> Start Walreceiver completely before shut down it on standby server.
>
> The walreceiver will be shut down, when read an invalid record in the
> WAL streaming from master.And then, we retry from archive/pg_wal again.
>
> After that, we start walreceiver in RequestXLogStreaming(), and read
> record from the WAL streaming. But before walreceiver starts, we read
> data from file which be streamed over and present in pg_wal by last
> time, because of walrcv->receivedUpto > RecPtr and the wal is actually
> flush on disk. Now, we read the invalid record again, what the next to
> do? Shut down the walreceiver and do it again.
>

I am missing something here, if walrcv->receivedUpto > RecPtr, why are we
getting / reading invalid record?

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-10, Alvaro Herrera wrote:

> On 2019-Dec-09, Tom Lane wrote:
> 
> > Some quick review of v19:
> 
> Here's v20 with all these comments hopefully addressed.

Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes,
so the prototype isn't anywhere in v20.  However, after looking at it
again, I'm not very happy with how that turned out, because pg_wchar.h
is a frontend-exposed header.  So I instead propose to put it in a
separate file src/include/mb/stringinfo_mb.h.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From bd8df7f46556eb8a5de5e64f04b29f4d7a174d0e Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 10 Dec 2019 14:40:56 -0300
Subject: [PATCH v21] Add backend-only appendStringInfoStringQuoted
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This provides a mechanism to emit literal values in informative
messages, such as query parameters.  The new code is more complex than
what it replaces, primarily because it wants to be more efficient.
It also has the (currently unused) additional optional capability of
specifying a maximum size to print.

This lives separetely from common/stringinfo.c so that frontend users of
that file need not pull in unnecessary multibyte-encoding support code.

Author: Álvaro Herrera and Alexey Bashtanov, after a suggestion from Andres Freund
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/20190920203905.xkv5udsd5dxfs...@alap3.anarazel.de
---
 src/backend/tcop/postgres.c   | 11 +---
 src/backend/utils/mb/Makefile |  1 +
 src/backend/utils/mb/README   |  1 +
 src/backend/utils/mb/stringinfo_mb.c  | 92 +++
 src/include/mb/stringinfo_mb.h| 24 +++
 src/pl/plpgsql/src/pl_exec.c  | 37 ---
 src/test/regress/expected/plpgsql.out | 14 
 src/test/regress/sql/plpgsql.sql  | 13 
 8 files changed, 158 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/utils/mb/stringinfo_mb.c
 create mode 100644 src/include/mb/stringinfo_mb.h

diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3b85e48333..512209a38c 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -48,6 +48,7 @@
 #include "libpq/pqformat.h"
 #include "libpq/pqsignal.h"
 #include "mb/pg_wchar.h"
+#include "mb/stringinfo_mb.h"
 #include "miscadmin.h"
 #include "nodes/print.h"
 #include "optimizer/optimizer.h"
@@ -2348,7 +2349,6 @@ errdetail_params(ParamListInfo params)
 			Oid			typoutput;
 			bool		typisvarlena;
 			char	   *pstring;
-			char	   *p;
 
 			appendStringInfo(¶m_str, "%s$%d = ",
 			 paramno > 0 ? ", " : "",
@@ -2364,14 +2364,7 @@ errdetail_params(ParamListInfo params)
 
 			pstring = OidOutputFunctionCall(typoutput, prm->value);
 
-			appendStringInfoCharMacro(¶m_str, '\'');
-			for (p = pstring; *p; p++)
-			{
-if (*p == '\'') /* double single quotes */
-	appendStringInfoCharMacro(¶m_str, *p);
-appendStringInfoCharMacro(¶m_str, *p);
-			}
-			appendStringInfoCharMacro(¶m_str, '\'');
+			appendStringInfoStringQuoted(¶m_str, pstring, 0);
 
 			pfree(pstring);
 		}
diff --git a/src/backend/utils/mb/Makefile b/src/backend/utils/mb/Makefile
index 18dd758cfe..cd4a016449 100644
--- a/src/backend/utils/mb/Makefile
+++ b/src/backend/utils/mb/Makefile
@@ -16,6 +16,7 @@ OBJS = \
 	conv.o \
 	encnames.o \
 	mbutils.o \
+	stringinfo_mb.o \
 	wchar.o \
 	wstrcmp.o \
 	wstrncmp.o
diff --git a/src/backend/utils/mb/README b/src/backend/utils/mb/README
index c9bc6e6f8d..7495ca5db2 100644
--- a/src/backend/utils/mb/README
+++ b/src/backend/utils/mb/README
@@ -9,6 +9,7 @@ wchar.c:	mostly static functions and a public table for mb string and
 		multibyte conversion
 mbutils.c:	public functions for the backend only.
 		requires conv.c and wchar.c
+stringinfo_mb.c: public backend-only multibyte-aware stringinfo functions
 wstrcmp.c:	strcmp for mb
 wstrncmp.c:	strncmp for mb
 win866.c:	a tool to generate KOI8 <--> CP866 conversion table
diff --git a/src/backend/utils/mb/stringinfo_mb.c b/src/backend/utils/mb/stringinfo_mb.c
new file mode 100644
index 00..e84b9ad36f
--- /dev/null
+++ b/src/backend/utils/mb/stringinfo_mb.c
@@ -0,0 +1,92 @@
+/*-
+ *
+ * stringinfo_mb.c
+ *		Multibyte encoding-aware additional StringInfo facilites
+ *
+ * This is separate from common/stringinfo.c so that frontend users
+ * of that file need not pull in unnecessary multibyte-encoding support
+ * code.
+ *
+ *
+ * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/backend/utils/mb/stringinfo_mb.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include "mb/stringinfo_mb.h

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-07, Tom Lane wrote:

> 0002:

Here's a version of this part with fixes for these comments.  It applies
on top of the stringinfo_mb.c patch sent elsewhere in the thread.

(If we were to add a "log_parameters_on_error_max_length" GUC to decide
the length to log, we would get rid of the remaining magical numbers in
this code).


> Seems like BuildParamLogString's "valueLen" parameter ought to be called
> "maxlen", for consistency with 0001 and because "valueLen" is at best
> misleading about what the parameter means.
> 
> I'd toss the enlargeStringInfo call here too, as it seems overly
> complicated and underly correct or useful.
> 
> Probably doing MemoryContextReset after each parameter (even the last one!)
> is a net loss compared to just letting it go till the end.  Although
> I'd be inclined to use ALLOCSET_DEFAULT_SIZES not SMALL_SIZES if you
> do it like that.
> 
> Please do not throw away the existing comment "/* Free result of encoding
> conversion, if any */" in exec_bind_message.
> 
> As above, this risks generating partial multibyte characters.  You might
> be able to get away with letting appendStringInfoStringQuoted do the
> multibyte-aware truncation, but you probably have to copy more than just
> one more extra byte first.
> 
> I have zero faith in this:
> 
> +params_errcxt.arg = (void *) &((ParamsErrorCbData)
> +   { portal->name, params });
> 
> How do you know where the compiler is putting that value, ie what
> its lifespan is going to be?  Better to use an explicit variable.
> 
> I concur with dropping testlibpq5.
> 
>   regards, tom lane


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 45eca917877ea72122478fded75d5722a18d92aa Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 2 Dec 2019 16:02:48 -0300
Subject: [PATCH v21 2/2] Emit parameter values during query bind/execute
 errors

---
 doc/src/sgml/config.sgml  |  23 
 src/backend/nodes/params.c| 114 +
 src/backend/tcop/postgres.c   | 118 +++---
 src/backend/utils/misc/guc.c  |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pgbench/t/001_pgbench_with_server.pl  |  35 ++
 src/include/nodes/params.h|  10 ++
 src/include/utils/guc.h   |   1 +
 8 files changed, 265 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 53ac14490a..5d1c90282f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6595,6 +6595,29 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_parameters_on_error (boolean)
+  
+   log_parameters_on_error configuration parameter
+  
+  
+  
+   
+Controls whether bind parameters are logged when a statement is logged
+as a result of .
+It adds some overhead, as postgres will compute and store textual
+representations of parameter values in memory for all statements,
+even if they eventually do not get logged.
+This setting has no effect on statements logged due to
+ or
+ settings, as they are always logged
+with parameters.
+The default is off.
+Only superusers can change this setting.
+   
+  
+ 
+
  
   log_statement (enum)
   
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index cf4387e40f..c3320ee1c4 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -15,11 +15,16 @@
 
 #include "postgres.h"
 
+#include "access/xact.h"
+#include "lib/stringinfo.h"
+#include "mb/stringinfo_mb.h"
 #include "nodes/bitmapset.h"
 #include "nodes/params.h"
 #include "storage/shmem.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
+#include "utils/portal.h"
 
 
 /*
@@ -44,6 +49,7 @@ makeParamList(int numParams)
 	retval->paramCompileArg = NULL;
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
+	retval->paramValuesStr = NULL;
 	retval->numParams = numParams;
 
 	return retval;
@@ -58,6 +64,8 @@ makeParamList(int numParams)
  * set of parameter values.  If dynamic parameter hooks are present, we
  * intentionally do not copy them into the result.  Rather, we forcibly
  * instantiate all available parameter values and copy the datum values.
+ *
+ * paramValuesStr is not copied, either.
  */
 ParamListInfo
 copyParamList(ParamListInfo from)
@@ -158,6 +166,8 @@ EstimateParamListSpace(ParamListInfo paramLI)
  * RestoreParamList can be used to recreate a ParamListInfo based on the
  * serialized representation; this will be a static, self-contained copy
  * just as copyParamList would create.
+ *
+ * paramValuesStr is not included.
  */
 void
 SerializeParamList(ParamListInfo paramLI, char **start_

Re: backup manifests

2019-12-10 Thread Robert Haas

On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage
 wrote:
> Please find attached patch for backup validator implementation (0004 patch). 
> This patch is based
> on Rushabh's latest patch for backup manifest.
>
> There are some functions required at client side as well, so I have moved 
> those functions
> and some data structure at common place so that they can be accessible for 
> both. (0003 patch).
>
> My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap 
> test cases which
> is also attached. As of now, test cases related to the tablespace and tar 
> backup  format are missing,
> will continue work on same and submit the complete patch.
>
> With this mail, I have attached the complete patch stack for backup manifest 
> and backup
> validate implementation.
>
> Please let me know your thoughts on the same.

Well, for the second time on this thread, please don't take a bunch of
somebody else's code and post it in a patch that doesn't attribute
that person as one of the authors. For the second time on this thread,
the person is me, but don't borrow *anyone's* code without proper
attribution. It's really important!

On a related note, it's a very good idea to use git format-patch and
git rebase -i to maintain patch stacks like this. Rushabh seems to
have done that, but the files you're posting look like raw 'git diff'
output. Notice that this gives him a way to include authorship
information and a tentative commit message in each patch, but you
don't have any of that.

Also on a related note, part of the process of adapting existing code
to a new purpose is adapting the comments. You haven't done that:

+ * Search a result-set hash table for a row matching a given filename.
...
+ * Insert a row into a result-set hash table, provided no such row is already
...
+ * Most of the values
+ * that we're hashing are short integers formatted as text, so there
+ * shouldn't be much room for pathological input.

I think that what we should actually do here is try to use simplehash.
Right now, it won't work for frontend code, but I posted some patches
to try to address that issue:

https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com

That would have a few advantages. One, we wouldn't need to know the
number of elements in advance, because simplehash can grow
dynamically. Two, we could use the iteration interfaces to walk the
hash table.  Your solution to that is pgrhash_seq_search, but that's
actually not well-designed, because it's not a generic iterator
function but something that knows specifically about the 'touch' flag.
I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not
bad, but I think 'matched' will be a little more recognizable.

Please run pgindent. If needed, first add locally defined types to
typedefs.list, so that things indent properly.

It's not a crazy idea to try to share some data structures and code
between the frontend and the backend here, but I think
src/common/backup.c and src/include/common/backup.h is a far too
generic name given what the code is actually doing. It's mostly about
checksums, not backup, and I think it should be named accordingly. I
suggest removing "manifestinfo" and renaming the rest to just talk
about checksums rather than manifests. That would make it logical to
reuse this for any other future code that needs a configurable
checksum type. Also, how about adding a function like:

extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo);

...which would return true and set *algo if name is recognized, and
return false otherwise. That code could be used on both the client and
server sides of this patch, and by any future patches that want to
return this scaffolding.

The file header for backup.h has the wrong filename (string.h). The
header format looks somewhat atypical compared to what we normally do,
too.

It's arguable, but I tend to think that it would be better to
hex-encode the CRC rather than printing it as an integer.  Maybe
hex_encode() is another thing that could be moved into the new
src/common file.

As I said before about Rushabh's patch set, it's very confusing that
we have so many patches here stacked up. Like, you have 0002 moving
stuff, and then 0003 moving it again. That's super-confusing. Please
try to structure the patch set so as to make it as easy to review as
possible.

Regarding the test case patch, error checks are important! Don't do
things like this:

+open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf";
+print $modify_file_sha256 "port = \n";
+close $modify_file_sha256;

If the open fails, then it and the print and the close are going to
silently do nothing. That's bad. I don't know exactly what the
customary error-checking is for things like this in TAP tests, but I
hope it's not like this, because this has a pretty fair chance of
looking like it's testing something that it isn't. Let's figure out
what t

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-10, Alvaro Herrera wrote:

> On 2019-Dec-10, Alvaro Herrera wrote:
> 
> > On 2019-Dec-09, Tom Lane wrote:
> > 
> > > Some quick review of v19:
> > 
> > Here's v20 with all these comments hopefully addressed.
> 
> Grr, I had forgotten to git add the stringinfo.h -> pg_wchar.h changes,
> so the prototype isn't anywhere in v20.  However, after looking at it
> again, I'm not very happy with how that turned out, because pg_wchar.h
> is a frontend-exposed header.  So I instead propose to put it in a
> separate file src/include/mb/stringinfo_mb.h.

Pushed 0001.

Thanks for all the reviews

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

Here's a curious thing that happens with this patch.  If you have
log_duration set so that parameters are logged during the bind phase,
and then an error occurs during the execution phase but you don't have
log_parameters_on_error set true, the second error will log the
parameters nonetheless ... because they were saved in the ParamListInfo
struct by the errdetail_params() call in the check_log_durations block
during bind.

I'm not sure this is a terrible problem, but does anybody think we
*not* save params->paramValuesStr across a log_duration when
log_parameters_on_error is not set?

(I think this is somewhat equivalent to the "was_logged" case in
check_log_duration itself.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: log bind parameter values on error

2019-12-10 Thread Alvaro Herrera

Alexey: I would appreciate it if you give this patch a spin.  Let me
know if it does what you wanted it to do.

On 2019-Dec-10, Alvaro Herrera wrote:

> Here's a curious thing that happens with this patch.  If you have
> log_duration set so that parameters are logged during the bind phase,
> and then an error occurs during the execution phase but you don't have
> log_parameters_on_error set true, the second error will log the
> parameters nonetheless ... because they were saved in the ParamListInfo
> struct by the errdetail_params() call in the check_log_durations block
> during bind.

AFAICS this has a simple solution, which is to stop saving the parameter
string in BuildParamLogString; instead, just *return* the string.
Caller can then assign it into params->paramValuesStr if appropriate
(which, in the case of errdetail_params(), it is not.)  v22 does it like
that.

There is still a much smaller issue that if you have both log_durations
set to log the params during bind, and log_parameters_on_error to true,
the parameters will appear twice.  But I think that's correct by
definition.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 57546105624ac5184a356dc227f0820240e61b62 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Mon, 2 Dec 2019 16:02:48 -0300
Subject: [PATCH v22] Emit parameter values during query bind/execute errors
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This makes such log entries more useful, since the cause of the error
can be dependent on the parameter values.

Author: Alexey Bashtanov, Álvaro Herrera
Discussion: https://postgr.es/m/0146a67b-a22a-0519-9082-bc29756b9...@imap.cc
Reviewed-by: Peter Eisentraut, Andres Freund, Tom Lane
---
 doc/src/sgml/config.sgml  |  23 
 src/backend/nodes/params.c| 105 
 src/backend/tcop/postgres.c   | 119 +++---
 src/backend/utils/misc/guc.c  |  10 ++
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/bin/pgbench/t/001_pgbench_with_server.pl  |  35 ++
 src/include/nodes/params.h|  10 ++
 src/include/utils/guc.h   |   1 +
 8 files changed, 257 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 53ac14490a..5d1c90282f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -6595,6 +6595,29 @@ log_line_prefix = '%m [%p] %q%u@%d/%a '
   
  
 
+ 
+  log_parameters_on_error (boolean)
+  
+   log_parameters_on_error configuration parameter
+  
+  
+  
+   
+Controls whether bind parameters are logged when a statement is logged
+as a result of .
+It adds some overhead, as postgres will compute and store textual
+representations of parameter values in memory for all statements,
+even if they eventually do not get logged.
+This setting has no effect on statements logged due to
+ or
+ settings, as they are always logged
+with parameters.
+The default is off.
+Only superusers can change this setting.
+   
+  
+ 
+
  
   log_statement (enum)
   
diff --git a/src/backend/nodes/params.c b/src/backend/nodes/params.c
index cf4387e40f..907da0e7cf 100644
--- a/src/backend/nodes/params.c
+++ b/src/backend/nodes/params.c
@@ -15,11 +15,14 @@
 
 #include "postgres.h"
 
+#include "access/xact.h"
+#include "mb/stringinfo_mb.h"
 #include "nodes/bitmapset.h"
 #include "nodes/params.h"
 #include "storage/shmem.h"
 #include "utils/datum.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 
 
 /*
@@ -44,6 +47,7 @@ makeParamList(int numParams)
 	retval->paramCompileArg = NULL;
 	retval->parserSetup = NULL;
 	retval->parserSetupArg = NULL;
+	retval->paramValuesStr = NULL;
 	retval->numParams = numParams;
 
 	return retval;
@@ -58,6 +62,8 @@ makeParamList(int numParams)
  * set of parameter values.  If dynamic parameter hooks are present, we
  * intentionally do not copy them into the result.  Rather, we forcibly
  * instantiate all available parameter values and copy the datum values.
+ *
+ * paramValuesStr is not copied, either.
  */
 ParamListInfo
 copyParamList(ParamListInfo from)
@@ -158,6 +164,8 @@ EstimateParamListSpace(ParamListInfo paramLI)
  * RestoreParamList can be used to recreate a ParamListInfo based on the
  * serialized representation; this will be a static, self-contained copy
  * just as copyParamList would create.
+ *
+ * paramValuesStr is not included.
  */
 void
 SerializeParamList(ParamListInfo paramLI, char **start_address)
@@ -251,3 +259,100 @@ RestoreParamList(char **start_address)
 
 	return paramLI;
 }
+
+/*
+ * BuildParamLogString
+ *		Return a string that represent the parameter list, for logging.
+ *
+ * If caller already knows textual rep

Re: Memory-Bounded Hash Aggregation

2019-12-10 Thread Adam Lee

On Wed, Dec 04, 2019 at 10:57:51PM -0800, Jeff Davis wrote:
> > About the `TODO: project needed attributes only` in your patch, when
> > would the input tuple contain columns not needed? It seems like
> > anything
> > you can project has to be in the group or aggregates.
> 
> If you have a table like:
> 
>CREATE TABLE foo(i int, j int, x int, y int, z int);
> 
> And do:
> 
>SELECT i, SUM(j) FROM foo GROUP BY i;
> 
> At least from a logical standpoint, you might expect that we project
> only the attributes we need from foo before feeding them into the
> HashAgg. But that's not quite how postgres works. Instead, it leaves
> the tuples intact (which, in this case, means they have 5 attributes)
> until after aggregation and lazily fetches whatever attributes are
> referenced. Tuples are spilled from the input, at which time they still
> have 5 attributes; so naively copying them is wasteful.
> 
> I'm not sure how often this laziness is really a win in practice,
> especially after the expression evaluation has changed so much in
> recent releases. So it might be better to just project all the
> attributes eagerly, and then none of this would be a problem. If we
> still wanted to be lazy about attribute fetching, that should still be
> possible even if we did a kind of "logical" projection of the tuple so
> that the useless attributes would not be relevant. Regardless, that's
> outside the scope of the patch I'm currently working on.
> 
> What I'd like to do is copy just the attributes needed into a new
> virtual slot, leave the unneeded ones NULL, and then write it out to
> the tuplestore as a MinimalTuple. I just need to be sure to get the
> right attributes.
> 
> Regards,
>   Jeff Davis

Melanie and I tried this, had a installcheck passed patch. The way how
we verify it is composing a wide table with long unnecessary text
columns, then check the size it writes on every iteration.

Please check out the attachment, it's based on your 1204 version.

-- 
Adam Lee
diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index f509c8e8f5..fe4a520305 100644
--- a/src/backend/executor/nodeAgg.c
+++ b/src/backend/executor/nodeAgg.c
@@ -1291,6 +1291,68 @@ project_aggregates(AggState *aggstate)
return NULL;
 }
 
+static bool
+find_aggregated_cols_walker(Node *node, Bitmapset **colnos)
+{
+   if (node == NULL)
+   return false;
+
+   if (IsA(node, Var))
+   {
+   Var*var = (Var *) node;
+
+   *colnos = bms_add_member(*colnos, var->varattno);
+
+   return false;
+   }
+   return expression_tree_walker(node, find_aggregated_cols_walker,
+ (void *) 
colnos);
+}
+
+/*
+ * find_aggregated_cols
+ *   Construct a bitmapset of the column numbers of aggregated Vars
+ *   appearing in our targetlist and qual (HAVING clause)
+ */
+static Bitmapset *
+find_aggregated_cols(AggState *aggstate)
+{
+   Agg*node = (Agg *) aggstate->ss.ps.plan;
+   Bitmapset  *colnos = NULL;
+   ListCell   *temp;
+
+   /*
+* We only want the columns used by aggregations in the targetlist or 
qual
+*/
+   if (node->plan.targetlist != NULL)
+   {
+   foreach(temp, (List *) node->plan.targetlist)
+   {
+   if (IsA(lfirst(temp), TargetEntry))
+   {
+   Node *node = (Node *)((TargetEntry 
*)lfirst(temp))->expr;
+   if (IsA(node, Aggref) || IsA(node, 
GroupingFunc))
+   find_aggregated_cols_walker(node, 
&colnos);
+   }
+   }
+   }
+
+   if (node->plan.qual != NULL)
+   {
+   foreach(temp, (List *) node->plan.qual)
+   {
+   if (IsA(lfirst(temp), TargetEntry))
+   {
+   Node *node = (Node *)((TargetEntry 
*)lfirst(temp))->expr;
+   if (IsA(node, Aggref) || IsA(node, 
GroupingFunc))
+   find_aggregated_cols_walker(node, 
&colnos);
+   }
+   }
+   }
+
+   return colnos;
+}
+
 /*
  * find_unaggregated_cols
  *   Construct a bitmapset of the column numbers of un-aggregated Vars
@@ -1520,6 +1582,23 @@ find_hash_columns(AggState *aggstate)
for (i = 0; i < perhash->numCols; i++)
colnos = bms_add_member(colnos, grpColIdx[i]);
 
+   /*
+* Find the columns used by aggregations
+*
+* This is shared by the entire aggregation.
+*/
+   if (aggstate->aggregated_columns == NULL)
+   aggstate->aggregated_columns = 
find_aggregated_cols(aggstate);
+
+   /*
+* The nec

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Jens-Wolfhard Schicke-Uffmann

Hi,

On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:
> > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> > cores) lock contention on a buffer content lock due to taking of a
> > SHARED lock (I think):
> When you say "7000 active transactions" - do you mean to say that you
> have set max_connections to something higher than that, and you actually
> have that many concurrent transactions?
Yes, max connections was 2, active connections around 7000 at that
time. Unfortunately, I don't have actual numbers of connections in
transactions for that point in time. (We were trying to establish
maximum performance of a larger system.)

> > Semantically, all that lock traffic was superfluous, as the
> > global_config row's key was in no danger of being changed.
> Well, postgres can't know that.
I am aware; it's just an argument for why it might be possible to
shove some optimization there.

> > 1. Does the above analysis sound about right?
> Hard to know without additional data.
What data would be worth recording next time? (Except number of
active transactions, obviously.)


> > 2. If so, would it be worthwhile to develop a solution?
> Possible, but I'm not sure it's worth the complexity.
>
> I'd definitely like to see a proper reproducer and profile for this,
> before investigating further.
I'll see if and when I can include this into my client's project
schedule. Might be a while, but I'll get back to you when I have
a reproducer + profile data (of an up-to-date vanilla Postgres,
not 10.7+AWS aurora patches).


> I think we'd need a very convincing use-case for improvements around the 
> problem
> you outline.
Understood. I'll try to get an iron-clad profile of the problematic case
first.


Regards,
  Drahflow


signature.asc
Description: PGP signature

Re: allowing broader use of simplehash

2019-12-10 Thread Andres Freund

Hi,

Neat!

On 2019-12-10 13:07:02 -0500, Robert Haas wrote:
> I recently became annoyed while working on patch A that I could not
> use simplehash in shared memory, and then I became annoyed again while
> working on patch B that I could not use simplehash in frontend code.
> So here are a few patches for discussion.

I wanted to use it in frontend code a couple times as well.

> A significant problem in either case is that a simplehash wants to
> live in a memory context; no such thing exists either for data in
> shared memory nor in frontend code. However, it seems to be quite easy
> to provide a way for simplehash to be defined so that it doesn't care
> about memory contexts. See 0001.

I wonder if we shouldn't instead just go for an "implicit" memory
context instead. It's a bit ugly to have a growing set of different
signatures.

> As far as frontend code goes, the only other problem I found is that
> it makes use of elog() to signal some internal-ish messages. It seemed
> to me that the easiest thing to do was, if FRONTEND is defined, use
> pg_log_error(...) instead of elog(ERROR, ...). For the one instance of
> elog(LOG, ...) in simplehash.h, I chose to use pg_log_info(). It's not
> really equivalent, but it's probably the closest thing that currently
> exists, and I think it's good enough for what's basically a debugging
> message. See 0002.

Yea, I think that's fine.

> I think those changes would also be enough to allow simplehash to be
> used in a dynamic shared area (DSA). Using it in the main shared
> memory segment seems more problematic, because simplehash relies on
> being able to resize the hash table. Shared hash tables must have a
> fixed maximum size, but with dynahash, we can count on being able to
> use all of the entries without significant performance degradation.
> simplehash, on the other hand, uses linear probing and relies on being
> able to grow the hash table as a way of escaping collisions. By
> default, the load factor is not permitted to drop below 0.1, so to
> mimic the collision-avoidance behavior that we get in backend-private
> uses of simplehash, we'd have to overallocate by 10x, which doesn't
> seem desirable.

It'd be fine to set SH_GROW_MIN_FILLFACTOR to something higher, for many
uses. I've only added that after the fact, because somebody demonstrated
a workload with SQL level data that had a *lot* of conflicts with our
hash functions. But that shouldn't be a concern for most other uses.

> I'd really like to have an alternative to dynahash, which is awkward
> to use and probably not particularly fast, but I'm not sure simplehash
> is it.  Maybe what we really need is a third (or nth) hash table
> implementation.

I think it depends a bit on what use-cases you'd like to cover? I think
there's unfortunately a lot of tradeoffs here that are hard to square:

1) For performance, we want the hashtable code to be specialized for the
   specific key/value combination. I'm not aware of a way to do that
   without some code generation thing like simplehash. Being able to use
   simpler pointer math by having fixed sizes, and avoiding indirect
   function calls, is important for performance.

   It's fairly annoying to have to do the song-and-dance for simplehash
   when it's just a local lookup table or something.

2) For performance, using a chained hashtable turns out to be
   problematic, if the hashtable will often get big (for small tables
   the CPU caches makes it ok).  It's hard to avoid reallocations
   (and/or smoother growth) for non-chaining tables however.

3) For lots of one-off uses of hashtables that aren't performance
   critical, we want a *simple* API. That IMO would mean that key/value
   end up being separately allocated pointers, and that just a
   comparator is provided when creating the hashtable.

4) For some hashtables it's important to be very concurrent - but it's
   considerably harder to do that with an open addressing one.

While I don't think it's possible to avoid compromise on all these
aspects, I think it'd be a lot more realistic to have one implementation
fulfilling most needs (except perhaps the concurrency part) if we didn't
have the limitations of C.  This kind of thing really is one where
e.g. C++ style templates are just extremely hard to beat in C.

Greetings,

Andres Freund

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Jens-Wolfhard Schicke-Uffmann

Hi,

On Tue, Dec 10, 2019 at 03:07:05PM -0300, Alvaro Herrera wrote:
> I'd rather have the ability to mark a table READ ONLY (or similar).
> Then any FK references can skip the row locks altogether.  For the rare
> cases where you need to modify the referenced table, have it marked READ
> WRITE, and any row locks are registered normally from that point on,
> until you set it back to READ ONLY again.
However, that would require changes to applications writing to the table
and a good understanding of performance characteristics by everyone
trying to get to that scale. (OTOH, there is certainly an argument to be
made that whoever hits this kind of problem better also has an idea of
postgres performance tuning anyway.)

More troubling (to me) is that I already know of another table in the
system which should be next-in-line for the same problem, but only on
some rows: It represents accounting entities, of which a very (nearly
static) few are payment processors and all others are customers. From
the application's perspective there's not too much difference between
those, but any customer row will typically only be share locked once,
whereas share locks on payment processor rows will be held by most of
the transactions currently active.

That use-case is not very uncommon I think, so it migth be worthwhile
to implement a solution which does not require all rows of a table to
share similar lock contention characteristics, or writability.

Regards,
  Drahflow

signature.asc
Description: PGP signature

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Alvaro Herrera

On 2019-Dec-10, Jens-Wolfhard Schicke-Uffmann wrote:

> More troubling (to me) is that I already know of another table in the
> system which should be next-in-line for the same problem, but only on
> some rows: It represents accounting entities, of which a very (nearly
> static) few are payment processors and all others are customers. From
> the application's perspective there's not too much difference between
> those, but any customer row will typically only be share locked once,
> whereas share locks on payment processor rows will be held by most of
> the transactions currently active.

Well, you could partition that table.  This probably means you'll need
to improve Postgres implementation of PKs on partitioned tables, though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Contention on LWLock buffer_content, due to SHARED lock(?)

2019-12-10 Thread Andres Freund

Hi,

On 2019-12-10 22:44:17 +0100, Jens-Wolfhard Schicke-Uffmann wrote:
> On Tue, Dec 10, 2019 at 08:44:17AM -0800, Andres Freund wrote:
> > > today I observed (on a r5.24xlarge AWS RDS instance, i.e. 96 logical
> > > cores) lock contention on a buffer content lock due to taking of a
> > > SHARED lock (I think):
> > When you say "7000 active transactions" - do you mean to say that you
> > have set max_connections to something higher than that, and you actually
> > have that many concurrent transactions?
> Yes, max connections was 2, active connections around 7000 at that
> time. Unfortunately, I don't have actual numbers of connections in
> transactions for that point in time. (We were trying to establish
> maximum performance of a larger system.)

I'd strongly recommend changing your architecture. There's *severe*
overhead in that many concurrent active connections (and some in such a
high max_connections setting). It's likely that you'd be much much
better off by putting in a query pooler in front that limits active
transaction to a significantly smaller number. There's only so many CPU
cores, so at some point adding more concurrency just increases the
overall amount of work that needs to be done (due to the overhead of
managing concurrency and context switches).

> > > 1. Does the above analysis sound about right?
> > Hard to know without additional data.
> What data would be worth recording next time? (Except number of
> active transactions, obviously.)

I think we'd need a CPU profile for starters. But that unfortunately
won't be possible on RDS...

Greetings,

Andres Freund

Re: Wrong assert in TransactionGroupUpdateXidStatus

2019-12-10 Thread Andres Freund

Hi,

Amit, Robert, IIRC that's mostly your feature?

On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
> While testing, my colleague Vignesh has hit an assert in
> TransactionGroupUpdateXidStatus.  But that is not reproducible.  After
> some analysis and code review, I have found the reason for the same.
> 
> As shown in the below code, there is an assert in
> TransactionGroupUpdateXidStatus, which assumes that an overflowed
> transaction can never get registered for the group update.  But,
> actually, that is not true because while registering the transaction
> for group update, we only check how many committed children this
> transaction has because all aborted sub-transaction would have already
> updated their status.  So if the transaction once overflowed but later
> all its children are aborted (i.e remaining committed children are <=
> THRESHOLD_SUBTRANS_CLOG_OPT) then it will be registered for the group
> update.

> /*
> * Overflowed transactions should not use group XID status update
> * mechanism.
> */
> Assert(!pgxact->overflowed);
> 
> A solution could be either we remove this assert or change this assert
> to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);

Maybe I'm missing something, but isn't this a bug then? IIRC We can't
rely on MyProc->subxids once we overflowed, even if since then the
remaining number of children has become low enough. It seems to me that
the actual fix here is to correct the condition in
TransactionIdSetPageStatus() checking whether group updates are possible
- it seems it'd need to verify that the transaction isn't
overflowed.


Also, it's somewhat odd that TransactionIdSetPageStatus() first has

/* Can't use group update when PGPROC overflows. */
StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= 
PGPROC_MAX_CACHED_SUBXIDS,
 "group clog threshold less than PGPROC 
cached subxids");

and then, within an if():

/*
 * We don't try to do group update optimization if a process has
 * overflowed the subxids array in its PGPROC, since in that 
case we
 * don't have a complete list of XIDs for it.
 */
Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= 
PGPROC_MAX_CACHED_SUBXIDS);

Even if these weren't redundant, it can't make sense to test such a
static condition only within an if? Is it possible this was actually
intended to test something different?

Greetings,

Andres Freund

Re: xact_start for walsender & logical decoding not updated

2019-12-10 Thread Andres Freund

Hi,

On 2019-12-10 12:56:56 +0100, Tomas Vondra wrote:
> On Mon, Dec 09, 2019 at 04:04:40PM -0800, Andres Freund wrote:
> > On 2019-12-10 00:44:09 +0100, Tomas Vondra wrote:
> > > I think there's a minor bug in pg_stat_activity tracking of walsender
> > > processes. The issue is that xact_start is only updated at the very
> > > beginning when the walsender starts (so it's almost exactly equal to
> > > backend_start) and then just flips between NULL and that value.
> > > 
> > > Reproducing this is trivial - just create a publication/subscription
> > > with the built-in logical replication, and run arbitrary workload.
> > > You'll see that the xact_start value never changes.
> > > 
> > > I think the right fix is calling SetCurrentStatementStartTimestamp()
> > > right before StartTransactionCommand() in ReorderBufferCommit, per the
> > > attached patch.

> > 
> > > diff --git a/src/backend/replication/logical/reorderbuffer.c 
> > > b/src/backend/replication/logical/reorderbuffer.c
> > > index 53affeb877..5235fb31b8 100644
> > > --- a/src/backend/replication/logical/reorderbuffer.c
> > > +++ b/src/backend/replication/logical/reorderbuffer.c
> > > @@ -1554,7 +1554,10 @@ ReorderBufferCommit(ReorderBuffer *rb, 
> > > TransactionId xid,
> > >   if (using_subtxn)
> > >   BeginInternalSubTransaction("replay");
> > >   else
> > > + {
> > > + SetCurrentStatementStartTimestamp();
> > >   StartTransactionCommand();
> > > + }
> > 
> > I'm quite doubtful this is useful. To me this seems to do nothing but
> > add the overhead of timestamp computation - which isn't always that
> > cheap. I don't think you really can draw meaning from this?
> > 
> 
> I don't want to use this timestamp directly, but it does interfere with
> monitoring of long-running transactiosn looking at pg_stat_activity.
> With the current behavior, the walsender entries have ancient timestamps
> and produce random blips in monitoring. Of course, it's possible to edit
> the queries to skip entries with backend_type = walsender, but that's a
> bit inconvenient.

Oh, I'm not suggesting that we shouldn't fix this somehow, just that I'm
doubtful that that adding a lot of additional
SetCurrentStatementStartTimestamp() calls is the right thing. Besides
the overhead, it'd also just not be a meaningful value here - neither is
it an actual transaction, nor is it the right thing to be monitoring
when concerned about bloat or such.

It seems like it might be better to instead cause NULL to be returned
for the respective column in pg_stat_activity etc?

Greetings,

Andres Freund

Re: pg_control_init() bug

2019-12-10 Thread Tom Lane

"Bossart, Nathan"  writes:
> I noticed that pg_control_init() is failing an assertion on 13devel:

Hmm, yeah.  In a non-assert build I get

regression=# select * from pg_control_init();
ERROR:  function return row and query-specified return row do not match
DETAIL:  Returned row contains 12 attributes, but query expects 11.

> The attached patch seems to clear this up.  I think this was missed in
> 2e4db241.

Evidently.  Thanks for the report!

regards, tom lane

Re: Windows UTF-8, non-ICU collation trouble

2019-12-10 Thread Thomas Munro

On Tue, Dec 10, 2019 at 10:29 PM Noah Misch  wrote:
> On Tue, Dec 10, 2019 at 03:41:15PM +1300, Thomas Munro wrote:
> > I ran a variation of your program on Appveyor's Studio/Server 2019
> > image, and the result was the same: it thinks that cmp(s1, s2) == 0,
> > cmp(s2, s3) == 0, but cmp(s1, s3) == 1, so the operator fails to be
> > transitive.
>
> If that test is captured in self-contained artifacts (a few config files, a
> public git repository, etc.), could you share them?  If not, no need to
> assemble such artifacts.  I probably won't use them, but I'd be curious to
> browse them if you've already assembled them.

https://ci.appveyor.com/project/macdice/locale-sort
https://github.com/macdice/locale-sort

To understand which operating systems the images mentioned in
appveyor.yml correspond to:

https://www.appveyor.com/docs/windows-images-software/

> This does suggest some set of CompareString* parameters is free from the
> problem.  If that's right, we could offer collations based on that.  (I'm not
> sure it would be worth offering; ICU may be enough.)

It would be nice to get to the bottom of that (for example, what is
the relationship between names like "Korean_XXX" and names like
"ko-KR"?), but I'm unlikely to investigate further (I have enough
trouble getting N kinds of Unix to do what I want).  Generally I like
the idea of continuing to support and recommend both operating system
and ICU locales for different use cases.  It should be easy to get all
the software on your system to agree on ordering, which seems like a
thing you should want as an application designer.  The lack of
versioning is not a problem on Windows (see
https://commitfest.postgresql.org/26/2351/).

Re: stress test for parallel workers

2019-12-10 Thread Thomas Munro

On Tue, Oct 15, 2019 at 4:50 AM Tom Lane  wrote:
> > Filed at
> > https://bugzilla.kernel.org/show_bug.cgi?id=205183

For the curious-and-not-subscribed, there's now a kernel patch
proposed for this.  We guessed pretty close, but the problem wasn't
those dodgy looking magic numbers, it was that the bad stack expansion
check only allows for user space to expand the stack
(FAULT_FLAG_USER), and here the kernel itself wants to build a stack
frame.

Re: 'Invalid lp' during heap_xlog_delete

2019-12-10 Thread Daniel Wood

> On December 6, 2019 at 3:06 PM Andres Freund  wrote:
...
> > crash
> > smgrtruncate - Not reached
> 
> This seems like a somewhat confusing description to me, because
> smgrtruncate() is what calls DropRelFileNodeBuffers(). I assume what you
> mean by "smgrtruncate" is not the function, but the smgr_truncate()
> callback?

My mistake.  Yes, smgr_truncate()


> > Even if we reach the truncate we don't fsync it till the next
> > checkpoint.  So on filesystems which delay metadata updates a crash
> > can lose the truncate.
> 
> I think that's probably fine though. Leaving the issue of checkpoint
> completing inbetween the DropRelFileNodeBuffers() and the actual
> truncation aside, we'd have the WAL logged truncation truncating the
> file. I don't think it's unreasonable to except a filesystem that claims
> to support running without full_page_writes (I've seen several such
> claims turning out not to be true under load), to preserve either the
> original page contents or the new file size after a a crash.  If your
> filesystem doesn't, you really ought not to use it with FPW = off.

If the phsyical truncate doesn't occur in the seconds after the cache 
invalidation
nor the fsync within the minutes till the next checkpoint you are NOT left
with a torn page on disk.  You are left with the 'incorrect' page on disk.
In other words, an older page because the invalidation prevent the write
of the most recent dirty page.  Redos don't like old incorrect pages.
But, yes, fullpage writes covers up this anomaly(To be generous).

> > Once we do the fsync(), for the truncate, the REDO read will return
> > BLK_NOTFOUND and the DELETE REDO attempt will be skipped.  WIthout the
> > fsync() or crashing before the truncate, the delete redo depends on
> > the most recent version of the page having been written by the
> > checkpoint.
> 
> We also need to correctly replay this on a standby, so I don't think
> just adding an smgrimmedsync() is the answer. We'd not replay the
> truncation standbys / during PITR, unless I miss something. So we'd just
> end up with the same problem in slightly different situations.

I haven't mentioned to you that we have seen what appears to be the same
problem during PITR's depending on which base backup we start with.  I didn't
mention it because I haven't created a repro to prove it.  I simply suspect it.

> To me it sounds the fix here would be to rejigger the RelationTruncate()
> sequence for truncation of the main for as follows:
> 
> 1) MyPgXact->delayChkpt = true
> 2) XLogInsert(XLOG_SMGR_TRUNCATE)
> 3) smgrtruncate() (which, as now, first does a DropRelFileNodeBuffers(),
>and then calls the smgr_truncate callback)
> 4) MyPgXact->delayChkpt = false
> 
> I'm not worried about the increased delayChkpt = true time. Compared
> with the frequency of RecordTransactionCommit() this seems harmless.

Seems reasonable.

> I'm inclined to think that we should make the XLogFlush() in the
> RelationNeedsWAL() branch of RelationTruncate()
> unconditional. Performing the truncation on the filesystem level without
> actually having persisted the corresponding WAL record is dangerous.

Yes, I also was curious about why it was conditional.

- Dan Wood

Re: Wrong assert in TransactionGroupUpdateXidStatus

2019-12-10 Thread Amit Kapila

On Wed, Dec 11, 2019 at 4:02 AM Andres Freund  wrote:
>
> Hi,
>
> Amit, Robert, IIRC that's mostly your feature?
>

I will look into this today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Session WAL activity

2019-12-10 Thread Kyotaro Horiguchi

At Fri, 6 Dec 2019 11:22:14 +0300, Konstantin Knizhnik 
 wrote in 
> 
> 
> On 06.12.2019 4:57, Michael Paquier wrote:
> > On Thu, Dec 05, 2019 at 12:23:40PM +0300, Konstantin Knizhnik wrote:
> > Please see pgstat.h, close to pgstat_report_wait_start().
> 
> Sorry, I do not understand what should I look for?
> Do you mean this comment:
>     /*
>      * Since this is a four-byte field which is always read and
> written as
>      * four-bytes, updates are atomic.
>  */
> 
> Yes, I already  have noticed that as far as walWritten is 64-bit, its
> update is not atomic at 32-bit platforms and so it is possible to see
> sometimes incorrect values.
> So monotone observe of walWritten can be violated. From my point of
> view it is not so critical to enforce update of this fields under lock
> or accumulating result in local variable with later write it to
> backend status at commit time as Kyotaro proposed. Monitoring of WAL
> activity is especially interested for long living transactions and
> from my point of view it is much more
> important to be able to see up-to-date but may be not always correct
> information then do not see any information at all before commit.
> Please also take in account the percent of 32-bit Postgres
> installations and probability of observing non-atomic update of 64-bit
> walWritten field (I think that you will have no chances to see it even
> if you will run Postgres for a years).

Still I'm not sure non-atomic write is acceptable, but I agree on the
necessity of updating it during a transaction. Couldn't we update
shared stats every n bytes (XLOG_BLCKSZ or such) or every command end?

I think we should refrain from inserting an instruction within the
WALInsertLock section, but I'm not sure which is better between "var
+= var" within the section and "if (inserted) var += var;" outside. If
we can ignore the possitbility of the case where xlogswitch is
omitted, the "if (inserted)" is not needed.

> But what I mean by "wrongly report backend wait event status" is that 
> pg_stat_activity may report wait event status for wrong backend.
> I.e. if backend is already terminated and its PGPROC entry is reused
> by some other backend, than you can see incorrect wait event
> information:
> backend with such PID actually never sleep on this event.

I saw a case where an entry with very old xact_start_timestamp
suddenly popped up in pg_stat_activity but I haven't found the cause..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Fetching timeline during recovery

2019-12-10 Thread Michael Paquier

On Thu, Sep 26, 2019 at 07:20:46PM +0200, Jehan-Guillaume de Rorthais wrote:
> If this solution is accepted, some other function of the same family might be
> good candidates as well, for the sake of homogeneity:
> 
> * pg_current_wal_insert_lsn
> * pg_current_wal_flush_lsn
> * pg_last_wal_replay_lsn
> 
> However, I'm not sure how useful this would be.
> 
> Thanks again for your time, suggestions and review!

+{ oid => '3435', descr => 'current wal flush location',
+  proname => 'pg_last_wal_receive_lsn', provolatile => 'v',
proisstrict => 'f',
This description is incorrect.

And please use OIDs in the range of 8000~ for patches in
development.  You could just use src/include/catalog/unused_oids which
would point out a random range.

+   if (recptr == 0) {
+   nulls[0] = 1;
+   nulls[1] = 1;
+   }
The indendation of the code is incorrect, these should use actual
booleans and recptr should be InvalidXLogRecPtr (note also the
existence of the macro XLogRecPtrIsInvalid).  Just for the style.

As said in the last emails exchanged on this thread, I don't see how
you cannot use multiple functions which have different meaning
depending on if the cluster is a primary or a standby knowing that we
have two different concepts of WAL when at recovery: the received
LSN and the replayed LSN, and three concepts for primaries (insert,
current, flush).  I agree as well with the point of Fujii-san about
not returning the TLI and the LSN across different functions as this
opens the door for a risk of inconsistency for the data received by
the client.

+ * When the first parameter (variable 'with_tli') is true, returns the current
+ * timeline as second field. If false, second field is null.
I don't see much the point of having this input parameter which
determines the NULL-ness of one of the result columns, and I think
that you had better use a completely different function name for each
one of them instead of enforcing the functions.  Let's remember that a
lot of tools use the existing functions directly in the SELECT clause
for LSN calculations, which is just a 64-bit integer *without* a
timeline assigned to it.  However your patch mixes both concepts by
using pg_current_wal_lsn.

So we could do more with the introduction of five new functions which 
allow to grab the LSN and the TLI in use for replay, received, insert,
write and flush positions:
- pg_current_wal_flush_info
- pg_current_wal_insert_info
- pg_current_wal_info
- pg_last_wal_receive_info
- pg_last_wal_replay_info

I would be actually tempted to do the following: one single SRF
function, say pg_wal_info which takes a text argument in input with
the following values: flush, write, insert, receive, replay.  Thinking
more about it that would be rather neat, and more extensible than the
rest discussed until now.  See for example PostgresNode::lsn.
--
Michael


signature.asc
Description: PGP signature

Re: Wrong assert in TransactionGroupUpdateXidStatus

2019-12-10 Thread Amit Kapila

On Wed, Dec 11, 2019 at 4:02 AM Andres Freund  wrote:
> On 2019-12-10 13:55:40 +0530, Dilip Kumar wrote:
>
> > /*
> > * Overflowed transactions should not use group XID status update
> > * mechanism.
> > */
> > Assert(!pgxact->overflowed);
> >
> > A solution could be either we remove this assert or change this assert
> > to Assert(pgxact->nxids <= THRESHOLD_SUBTRANS_CLOG_OPT);
>
> Maybe I'm missing something, but isn't this a bug then? IIRC We can't
> rely on MyProc->subxids once we overflowed, even if since then the
> remaining number of children has become low enough.
>

AFAICS, the MyProc->subxids is maintained properly if the number of
subtransactions is lesser than PGPROC_MAX_CACHED_SUBXIDS (64).  Can
you explain the case where that won't be true?  Also, even if what you
are saying is true, I think the memcmp in TransactionIdSetPageStatus
should avoid taking us a wrong decision.


>
> Also, it's somewhat odd that TransactionIdSetPageStatus() first has
>
> /* Can't use group update when PGPROC overflows. */
> StaticAssertStmt(THRESHOLD_SUBTRANS_CLOG_OPT <= 
> PGPROC_MAX_CACHED_SUBXIDS,
>  "group clog threshold less than 
> PGPROC cached subxids");
>
> and then, within an if():
>
> /*
>  * We don't try to do group update optimization if a process 
> has
>  * overflowed the subxids array in its PGPROC, since in that 
> case we
>  * don't have a complete list of XIDs for it.
>  */
> Assert(THRESHOLD_SUBTRANS_CLOG_OPT <= 
> PGPROC_MAX_CACHED_SUBXIDS);
>
> Even if these weren't redundant, it can't make sense to test such a
> static condition only within an if?
>

I don't remember exactly the reason for this, but now I don't find the
Assert within if () meaningful.  I think we should remove the Assert
inside if() unless Robert or someone see any use of it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Start Walreceiver completely before shut down it on standby server.

2019-12-10 Thread Kyotaro Horiguchi

At Tue, 10 Dec 2019 10:40:53 -0800, Ashwin Agrawal  wrote 
in 
> On Tue, Dec 10, 2019 at 3:06 AM jiankang liu  wrote:
> 
> > Start Walreceiver completely before shut down it on standby server.
> >
> > The walreceiver will be shut down, when read an invalid record in the
> > WAL streaming from master.And then, we retry from archive/pg_wal again.
> >
> > After that, we start walreceiver in RequestXLogStreaming(), and read
> > record from the WAL streaming. But before walreceiver starts, we read
> > data from file which be streamed over and present in pg_wal by last
> > time, because of walrcv->receivedUpto > RecPtr and the wal is actually
> > flush on disk. Now, we read the invalid record again, what the next to
> > do? Shut down the walreceiver and do it again.
> >
> 
> I am missing something here, if walrcv->receivedUpto > RecPtr, why are we
> getting / reading invalid record?

I bet on that the standby is connecting to a wrong master. For
example, something like happens when the master has been reinitalized
from a backup and experienced another history, then the standby was
initialized from the reborn master but the stale archive files on the
standby are left alone.

Anyway that cannot happen on correctly running replication set and
what to do in the case is starting from a new basebackup of the
master, making sure to erase stale archive files if any.

About the proposed fix, it doesn't seem to cause start process to
rewind WAL to that LSN. Even if that happens, it leads to no better
than a broken database.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Reorderbuffer crash during recovery

2019-12-10 Thread vignesh C

> I found couple of crashes in reorderbuffer while review/testing of
> logical_work_mem and logical streaming of large in-progress
> transactions. Stack trace of the same are given below:
> Issue 1:
> #0  0x7f985c7d8337 in raise () from /lib64/libc.so.6
> #1  0x7f985c7d9a28 in abort () from /lib64/libc.so.6
> #2  0x00ec514d in ExceptionalCondition
> (conditionName=0x10eab34 "!dlist_is_empty(head)", errorType=0x10eab24
> "FailedAssertion",
> fileName=0x10eab00 "../../../../src/include/lib/ilist.h",
> lineNumber=458) at assert.c:54
> #3  0x00b4fd13 in dlist_tail_element_off (head=0x338fe60,
> off=48) at ../../../../src/include/lib/ilist.h:458
> #4  0x00b547b7 in ReorderBufferAbortOld (rb=0x32ae7a0,
> oldestRunningXid=895) at reorderbuffer.c:1910
> #5  0x00b3cb5e in DecodeStandbyOp (ctx=0x33424b0,
> buf=0x7fff7e7b1e40) at decode.c:332
> #6  0x00b3c363 in LogicalDecodingProcessRecord (ctx=0x33424b0,
> record=0x3342770) at decode.c:121
> #7  0x00b704b2 in XLogSendLogical () at walsender.c:2845
> #8  0x00b6e9f8 in WalSndLoop (send_data=0xb7038b
> ) at walsender.c:2199
> #9  0x00b6bbf5 in StartLogicalReplication (cmd=0x33167a8) at
> walsender.c:1128
> #10 0x00b6ce83 in exec_replication_command
> (cmd_string=0x328a0a0 "START_REPLICATION SLOT \"sub1\" LOGICAL 0/0
> (proto_version '1', publication_names '\"pub1\"')")
> at walsender.c:1545
> #11 0x00c39f85 in PostgresMain (argc=1, argv=0x32b51c0,
> dbname=0x32b50e0 "testdb", username=0x32b50c0 "user1") at
> postgres.c:4256
> #12 0x00b10dc7 in BackendRun (port=0x32ad890) at postmaster.c:4498
> #13 0x00b0ff3e in BackendStartup (port=0x32ad890) at postmaster.c:4189
> #14 0x00b08505 in ServerLoop () at postmaster.c:1727
> #15 0x00b0781a in PostmasterMain (argc=3, argv=0x3284cb0) at
> postmaster.c:1400
> #16 0x0097492d in main (argc=3, argv=0x3284cb0) at main.c:210
>
> Issue 2:
> #0  0x7f1d7ddc4337 in raise () from /lib64/libc.so.6
> #1  0x7f1d7ddc5a28 in abort () from /lib64/libc.so.6
> #2  0x00ec4e1d in ExceptionalCondition
> (conditionName=0x10ead30 "txn->final_lsn != InvalidXLogRecPtr",
> errorType=0x10ea284 "FailedAssertion",
> fileName=0x10ea2d0 "reorderbuffer.c", lineNumber=3052) at assert.c:54
> #3  0x00b577e0 in ReorderBufferRestoreCleanup (rb=0x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:3052
> #4  0x00b52b1c in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> txn=0x2bafb08) at reorderbuffer.c:1318
> #5  0x00b5279d in ReorderBufferCleanupTXN (rb=0x2ae36b0,
> txn=0x2b9d778) at reorderbuffer.c:1257
> #6  0x00b5475c in ReorderBufferAbortOld (rb=0x2ae36b0,
> oldestRunningXid=3835) at reorderbuffer.c:1973
> #7  0x00b3ca03 in DecodeStandbyOp (ctx=0x2b676d0,
> buf=0x7ffcbc74cc00) at decode.c:332
> #8  0x00b3c208 in LogicalDecodingProcessRecord (ctx=0x2b676d0,
> record=0x2b67990) at decode.c:121
> #9  0x00b70b2b in XLogSendLogical () at walsender.c:2845
>
> From initial analysis it looks like:
> Issue1 it seems like if all the reorderbuffer has been flushed and
> then the server restarts. This problem occurs.
> Issue 2 it seems like if there are many subtransactions present and
> then the server restarts. This problem occurs. The subtransaction's
> final_lsn is not being set and when ReorderBufferRestoreCleanup is
> called the assert fails. May be for this we might have to set the
> subtransaction's final_lsn before cleanup(not sure).
>
> I could not reproduce this issue consistently with a test case, But I
> felt this looks like a problem from review.
>
> For issue1, I could reproduce by the following steps:
> 1) Change ReorderBufferCheckSerializeTXN so that it gets flushed always.
> 2) Have many open transactions with subtransactions open.
> 3) Attach one of the transaction from gdb and call abort().
>
> I'm not sure of the fix for this. If I get time I will try to spend
> more time to find out the fix.

I have further analyzed the issue and found that:
After abort is called, when the process restarts, it will clean the
reorder information for the aborted transactions in
ReorderBufferAbortOld function. It crashes in the below code as there
are no changes present currently and all the changes are serialized:
...
if (txn->serialized && txn->final_lsn == 0))
{
ReorderBufferChange *last =
dlist_tail_element(ReorderBufferChange, node, &txn->changes);

txn->final_lsn = last->lsn;
}
...

It sets the final_lsn here so that it can iterate from the start_lsn
to final_lsn and cleanup the serialized files in
ReorderBufferRestoreCleanup function. One solution We were thinking
was to store the lsn of the last serialized change while serialiizing
and set the final_lsn in the above case where it crashes like the
below code:
..
if (txn->serialized && txn->final_lsn == 0 &&
!dlist_is_empty(&txn->changes))
{
ReorderBufferChange *last =
dlist_tail_element(ReorderBufferChange

Re: Windows buildfarm members vs. new async-notify isolation test

2019-12-10 Thread Amit Kapila

On Tue, Dec 10, 2019 at 9:27 PM Tom Lane  wrote:
>
> Amit Kapila  writes:
> > On Sun, Dec 8, 2019 at 10:27 PM Tom Lane  wrote:
> >> Doing it like this seems attractive to me because it gets rid of two
> >> different failure modes: inability to create a new thread and inability
> >> to create a new pipe handle.  Now on the other hand, it means that
> >> inability to complete the read/write transaction with a client right
> >> away will delay processing of other signals.  But we know that the
> >> client is engaged in a CallNamedPipe operation, so how realistic is
> >> that concern?
>
> > Right, the client is engaged in a CallNamedPipe operation, but the
> > current mechanism can allow multiple such clients and that might lead
> > to faster processing of signals.
>
> It would only matter if multiple processes signal the same backend at the
> same time, which seems to me to be probably a very minority use-case.
> For the normal case of one signal arriving at a time, what I'm suggesting
> ought to be noticeably faster because of fewer kernel calls.  Surely
> creating a new pipe instance and a new thread are not free.
>
> In any case, the main thing I'm on about here is getting rid of the
> failure modes.  The existing code does have a rather lame/buggy
> workaround for the cant-create-new-pipe case.  A possible answer for
> cant-create-new-thread might be to go ahead and service the current
> request locally in the long-lived signal thread.  But that seems like
> it's piling useless (and hard to test) complexity on top of useless
> complexity.
>

I am convinced by your points.  So +1 for your proposed patch.  I have
already reviewed it yesterday and it appears fine to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: On disable_cost

2019-12-10 Thread Laurenz Albe

On Tue, 2019-12-10 at 15:50 -0700, Jim Finnerty wrote:
> As a proof of concept, I hacked around a bit today to re-purpose one of the
> bits of the Cost structure to mean "is_disabled" so that we can distinguish
> 'disabled' from 'non-disabled' paths without making the Cost structure any
> bigger.  In fact, it's still a valid double.  The obvious choice would have
> been to re-purpose the sign bit, but I've had occasion to exploit negative
> costs before so for this POC I used the high-order bit of the fractional
> bits of the double.  (see Wikipedia for double precision floating point for
> the layout).
> 
> The idea is to set a special bit when disable_cost is added to a cost. 
> Dedicating multiple bits instead of just 1 would be easily done, but as it
> is we can accumulate many disable_costs without overflowing, so just
> comparing the cost suffices.

Doesn't that rely on a specific implementation of double precision (IEEE)?
I thought that we don't want to limit ourselves to platforms with IEEE floats.

Yours,
Laurenz Albe

get_database_name() from background worker

2019-12-10 Thread Koichi Suzuki

Hello PG hackers;

I'm writing an extension running on background workers and found
get_database_name() causes SEGV and found internally resource owner was wet
to NULL.   Could anybody let me know how it happens and how I can use this
function.   Argument to get_database_name() looks correct.

Regards;
---
Koichi Suzuki

RE: get_database_name() from background worker

2019-12-10 Thread tsunakawa.ta...@fujitsu.com

From: Koichi Suzuki 
> I'm writing an extension running on background workers and found
> get_database_name() causes SEGV and found internally resource owner was
> wet to NULL.   Could anybody let me know how it happens and how I can use
> this function.   Argument to get_database_name() looks correct.

Did you specify BGWORKER_BACKGROUND_DATABASE_CONNECTION when registering the 
background worker?
Did you start transaction by calling StartTransactionCommand()?


Regards
Takayuki Tsunakawa

RE: get_database_name() from background worker

2019-12-10 Thread ROS Didier

Hi
I would like to know : Are you using pg_background extension to work 
with backgroud workers ?

Thanks in advance

Best Regards

Didier ROS
Expertise SGBD
EDF - DTEO - DSIT - IT DMA


-Message d'origine-
De : tsunakawa.ta...@fujitsu.com [mailto:tsunakawa.ta...@fujitsu.com] 
Envoyé : mercredi 11 décembre 2019 08:21
À : 'Koichi Suzuki' 
Cc : pgsql-hackers@lists.postgresql.org
Objet : RE: get_database_name() from background worker

From: Koichi Suzuki 
> I'm writing an extension running on background workers and found
> get_database_name() causes SEGV and found internally resource owner was
> wet to NULL.   Could anybody let me know how it happens and how I can use
> this function.   Argument to get_database_name() looks correct.

Did you specify BGWORKER_BACKGROUND_DATABASE_CONNECTION when registering the 
background worker?
Did you start transaction by calling StartTransactionCommand()?


Regards
Takayuki Tsunakawa





Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support 
que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.


This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure, either whole or partial, is prohibited except 
formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or 
virus-free.

Re: Unicode normalization test broken output

2019-12-10 Thread Peter Eisentraut


On 2019-12-10 17:16, Tom Lane wrote:

Peter Eisentraut  writes:

Good point.  Fixed in attached patch.


This one LGTM.


done, thanks

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

66 matches

Mail list logo