Re: feature request ctid cast / sql exception

2021-04-18 Thread Vladimír Houba ml .
This is a specific use case, I have a big table without a pk. Updates with
ctid are blazing fast even without an index. I dont need it.

The argument behind this is that users excpect this functionality, its not
just me. Search stackoverflow. They end up using various suboptimal
solutions as I described earlier. This is a very very simple functionality
so please consider it. Im also writing an opensource lib that would make
use of this. My users will be thankfull to you.

On Sat, Apr 17, 2021, 23:05 David G. Johnston 
wrote:

> On Sat, Apr 17, 2021 at 12:58 PM Vladimír Houba ml. 
> wrote:
>
>> I use ctid as a row identifier within a transaction in a Java application.
>>
>
> This doesn't present a very compelling argument since an actual user
> declared primary key is what is expected to be used as a row identifier.
> And as those are typically bigint if you follow this norm you get exactly
> what you say you need.
>
> David J.
>
>


回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));

2021-04-18 Thread Yulin PEI
After several tests, I found that this patch do not fix the bug well.

I think we should use the same logic to treat parent CollateExpr and child 
CollateExpr. In your patch, if the parent node is CollateExpr and the target 
type is non-collatable,  we coerce CollateExpr->arg. If the child node is 
CollateExpr and the target type is non-collatable, we just skip.
Some types can be casted to another type even if  type_is_collatable 
returns false. Like bytea to int (It depends on the content of the string). If 
we simply skip,  bytea will never be casted to int even if the content is all 
digits.

So the attachment is my patch and it works well as far as I tested.




发件人: Tom Lane 
发送时间: 2021年4月13日 0:59
收件人: Yulin PEI 
抄送: pgsql-hackers@lists.postgresql.org 
主题: Re: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' 
COLLATE "C")::INT FROM generate_series(1, 10));

Yulin PEI  writes:
> I found it could cause a crash when executing sql statement: `CREATE VIEW 
> v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10)); ` in 
> postgres 13.2 release.

Nice catch.  I don't think the code in DefineVirtualRelation is wrong:
exprCollation shouldn't report any collation for an expression of a
non-collatable type.  Rather the problem is with an old kluge in
coerce_type(), which will push a type coercion underneath a CollateExpr
... without any mind for the possibility that the coercion result isn't
collatable.  So the right fix is more or less the attached.

regards, tom lane



fix-collation-coercion.patch
Description: fix-collation-coercion.patch


Re: feature request ctid cast / sql exception

2021-04-18 Thread Vladimír Houba ml .
I use ctid as a row identifier within a transaction in a Java application.
To obtain the row ctid I either have to

   - cast it to text and store it as String
   - cast it to string, then convert it to a bigint using UDF which is
   inefficient

I wish I could just cast ctid to bigint and store it as a primitive long
type.

Regarding the exception throwing function it makes good sense for example
in case blocks when you encouter unexpected value.
IMHO "one fits all" solution may be making a raise function with the same
syntax as raise statement in plpgsql.

RAISE([ level ] 'format' [, expression [, ... ]] [ USING option =
expression [, ... ] ])
RAISE([ level ] condition_name [ USING option = expression [, ... ] ])
RAISE([ level ] SQLSTATE 'sqlstate' [ USING option = expression [, ... ] ])
RAISE([ level ] USING option = expression [, ... ])
RAISE()


On Sat, Apr 17, 2021 at 9:46 PM Tom Lane  wrote:

> "David G. Johnston"  writes:
> > On Sat, Apr 17, 2021 at 10:58 AM Vladimír Houba ml. 
> > wrote:
> >> Another nice feature would be a function that can be called from a sql
> >> statement and would throw an exception when executed.
>
> > An assertion-related extension in core would be welcomed.
>
> This has been suggested before, but as soon as you start looking
> at the details you find that it's really hard to get a one-size-fits-all
> definition that's any simpler than the existing plpgsql RAISE
> functionality.  Different people have different ideas about how
> much decoration they want around the message.  So, if 10% of the
> world agrees with your choices and the other 90% keeps on using
> a custom plpgsql function to do it their way, you haven't really
> improved matters much.  OTOH a 90% solution might be interesting to
> incorporate in core, but nobody's demonstrated that one exists.
>
> regards, tom lane
>


-- 
S pozdravom
Vladimír Houba ml.


Re: [CLOBBER_CACHE]Server crashed with segfault 11 while executing clusterdb

2021-04-18 Thread Michael Paquier
On Fri, Apr 09, 2021 at 06:45:45PM -0400, Alvaro Herrera wrote:
> We forgot this patch earlier in the commitfest.  Do people think we
> should still get it in on this cycle?  I'm +1 on that, since it's a
> safety feature poised to prevent more bugs than it's likely to
> introduce.

No objections from here to do that now even after feature freeze.  I
also wonder, while looking at that, why you don't just remove the last
call within src/backend/catalog/heap.c.  This way, nobody is tempted
to use RelationOpenSmgr() anymore, and it could just be removed from
rel.h.
--
Michael


signature.asc
Description: PGP signature


Re: Remove redundant variable from transformCreateStmt

2021-04-18 Thread Amul Sul
On Mon, Apr 19, 2021 at 11:05 AM Bharath Rupireddy
 wrote:
>
> On Mon, Apr 19, 2021 at 9:32 AM Amul Sul  wrote:
> > Kindly ignore the previous version -- has unnecessary change.
> > See the attached.
>
> Thanks for the patch!
>
> How about a slight rewording of the added comment to "Constraints
> validation can be skipped for a newly created table as it contains no
> data. However, this is not necessarily true for a foreign table."?
>

Well, wording is quite subjective, let's leave this to the committer
for the final decision, I don't see anything wrong with it.

> You may want to add it to the commitfest if not done already. And I
> don't think we need to backpatch this as it's not critical.

This is not fixing anything so not a relevant candidate for the backporting.

Regards,
Amul




Re: Table refer leak in logical replication

2021-04-18 Thread Michael Paquier
On Mon, Apr 19, 2021 at 03:08:41PM +0900, Michael Paquier wrote:
> FWIW, I
> would be tempted to send back f1ac27b to the blackboard, then refactor
> the code of the apply worker to use ExecInitResultRelation() so as we
> get more consistency with resource releases, simplifying the business
> with indexes.  Once the code is in a cleaner state, we could come back
> into making an integration with partitioned tables into this code.

But you cannot do that either as f1ac27bf got into 13..
--
Michael


signature.asc
Description: PGP signature


Re: Table refer leak in logical replication

2021-04-18 Thread Michael Paquier
On Sat, Apr 17, 2021 at 07:02:00PM +0530, Amit Kapila wrote:
> Hmm, I am not sure if it is a good idea to open indexes needlessly
> especially when it is not done in the previous code.

Studying the history of this code, I think that f1ac27b is to blame
here for making the code of the apply worker much messier than it was
before.  Before that, we were at a point where we had to rely on one
single ResultRelInfo with all its indexes opened and closed before
doing the DML.  After f1ac27b, the code becomes shaped so as the
original ResultRelInfo may or may not be used depending on if this
code is working on a partitioned table or not.  With an UPDATE, not
one but *two* ResultRelInfo may be used if a tuple is moved to a
different partition.  I think that in the long term, and if we want to
make use of ExecInitResultRelation() in this area, we are going to
need to split the apply code in two parts, roughly (easier to say in
words than actually doing it, still):
- Find out first which relations it is necessary to work on, and
create a set of ResultRelInfo assigned to an executor state by
ExecInitResultRelation(), doing all the relation openings that are
necessary.  The gets funky when attempting to do an update across
partitions.
- Do the actual DML, with all the relations already opened and ready
for use.

On top of that, I am really getting scared by the following, done in
not one but now two places:
/*
 * The tuple to be updated could not be found.
 *
 * TODO what to do here, change the log level to LOG perhaps?
 */
elog(DEBUG1,
 "logical replication did not find row for update "
 "in replication target relation \"%s\"",
 RelationGetRelationName(localrel));
This already existed in once place before f1ac27b, but this got
duplicated in a second area when applying the first update to a
partition table.

The refactoring change done in 1375422c in worker.c without the tuple
routing pieces would be a piece of cake in terms of relations that
require to be opened and closed, including the timings of each call
because they could be unified in single code paths, and I also guess
that we would avoid leak issues really easily.  If the tuple routing
code actually does not consider the case of moving a tuple across
partitions, the level of difficulty to do an integration with
ExecInitResultRelation() is much more reduced, though it makes the
feature much less appealing as it becomes much more difficult to do
some data redistribution across a different set of partitions with
logical changes.

> I am not sure if it is a good idea to do the refactoring related to
> indexes or other things to fix a minor bug in commit 1375422c. It
> might be better to add a simple fix like what Hou-San has originally
> proposed [1] because even using ExecInitResultRelation might not be
> the best thing as it is again trying to open a range table which we
> have already opened in logicalrep_rel_open. OTOH, using
> ExecInitResultRelation might encapsulate the assignment we are doing
> outside.

Yeah, that would be nice to just rely on that.  copyfrom.c does
basically what I guess we should try to copy a maximum here.  With a
proper cleanup of the executor state using ExecCloseResultRelations()
once we are done with the tuple apply.

> In general, it seems doing bigger refactoring or changes
> might lead to some bugs or unintended consequences, so if possible, we
> can try such refactoring as a separate patch. One argument against the
> proposed refactoring could be that with the previous code, we were
> trying to open the indexes just before it is required but with the new
> patch in some cases, they will be opened during the initial phase and
> for other cases, they are opened just before they are required. It
> might not necessarily be a bad idea to rearrange code like that but
> maybe there is a better way to do that.
> 
> [1] - 
> https://www.postgresql.org/message-id/OS0PR01MB571686F75FBDC219FF3DFF0D94769%40OS0PR01MB5716.jpnprd01.prod.outlook.com

This feels like piling one extra hack on top of what looks like an
abuse of the executor calls to me, and the apply code is already full
of it.  True that we do that in ExecuteTruncateGuts() for allow
triggers to be fired, but I think that it would be better to avoid
spread that to consolidate the trigger and execution code.  FWIW, I
would be tempted to send back f1ac27b to the blackboard, then refactor
the code of the apply worker to use ExecInitResultRelation() so as we
get more consistency with resource releases, simplifying the business
with indexes.  Once the code is in a cleaner state, we could come back
into making an integration with partitioned tables into this code.
--
Michael


signature.asc
Description: PGP signature


Windows default locale vs initdb

2021-04-18 Thread Thomas Munro
Hi,

Moving this topic into its own thread from the one about collation
versions, because it concerns pre-existing problems, and that thread
is long.

Currently initdb sets up template databases with old-style Windows
locale names reported by the OS, and they seem to have caused us quite
a few problems over the years:

db29620d "Work around Windows locale name with non-ASCII character."
aa1d2fc5 "Another attempt at fixing Windows Norwegian locale."
db477b69 "Deal with yet another issue related to "Norwegian (Bokmål)"..."
9f12a3b9 "Tolerate version lookup failure for old style Windows locale..."

... and probably more, and also various threads about , for example,
"German_German.1252" vs "German_Switzerland.1252" which seem to get
confused or badly canonicalised or rejected somewhere in the mix.

I hadn't focused on any of that before, being a non-Windows-user, but
the entire contents of win32setlocale.c supports the theory that
Windows' manual meant what it said when it said[1]:

"We do not recommend this form for locale strings embedded in
code or serialized to storage, because these strings are more likely
to be changed by an operating system update than the locale name
form."

I suppose that was the only form available at the time the code was
written, so there was no choice.  The question we asked ourselves
multiple times in the other thread was how we're supposed to get to
the modern BCP 47 form when creating the template databases.  It looks
like one possibility, since Vista, is to call
GetUserDefaultLocaleName()[2], which doesn't appear to have been
discussed before on this list.  That doesn't allow you to ask for the
default for each individual category, but I don't know if that is even
a concept for Windows user settings.  It may be that some of the other
nearby functions give a better answer for some reason.  But one thing
is clear from a test that someone kindly ran for me: it reports
standardised strings like "en-NZ", not strings like "English_New
Zealand.1252".

No patch, but I wondered if any Windows hackers have any feedback on
relative sanity of trying to fix all these problems this way.

[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
[2] 
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-getuserdefaultlocalename




Re: Remove redundant variable from transformCreateStmt

2021-04-18 Thread Bharath Rupireddy
On Mon, Apr 19, 2021 at 9:32 AM Amul Sul  wrote:
> Kindly ignore the previous version -- has unnecessary change.
> See the attached.

Thanks for the patch!

How about a slight rewording of the added comment to "Constraints
validation can be skipped for a newly created table as it contains no
data. However, this is not necessarily true for a foreign table."?

You may want to add it to the commitfest if not done already. And I
don't think we need to backpatch this as it's not critical.


With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com




Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-04-18 Thread Amit Kapila
On Mon, Apr 19, 2021 at 10:32 AM Peter Smith  wrote:
>
> On Mon, Apr 19, 2021 at 2:09 PM Amit Kapila  wrote:
> >
> > On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira  wrote:
> > >
> > > On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote:
> > >
> > > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH"
> > > options, which are currently in some kind of quasi alphabetical /
> > > random order which I found unnecessarily confusing.
> > >
> > > I can't think of any good reason for the current ordering, so PSA my
> > > patch which has identical content but just re-orders that option list
> > > to be alphabetical.
> > >
> > > AFAICS there is not reason to use a random order here. I think this 
> > > parameter
> > > list is in frequency of use. Your patch looks good to me.
> > >
> >
> > I also agree that the current order is quite random. One idea is to
> > keep them in alphabetical order as suggested by Peter and the other
> > could be to arrange parameters based on properties, for example, there
> > are few parameters like binary, streaming, copy_data which are in some
> > way related to the data being replicated and others are more of slot
> > properties (create_slot, slot_name). I see that few parameters among
> > these have some dependencies on other parameters as well. I noticed
> > that the other DDL commands like Create Table, Create Index doesn't
> > have the WITH clause parameters in any alphabetical order so it might
> > be better if the related parameters can be together here but if we
> > think it is not a good idea in this case due to some reason then
> > probably keeping them in alphabetical order makes sense.
> >
>
> Yes, if there were dozens of list items then I would agree that they
> should be grouped somehow. But there aren't.
>

I think this list is going to grow in the future as we enhance this
subsystem. For example, the pending 2PC patch will add another
parameter to this list.

-- 
With Regards,
Amit Kapila.




Re: Replication slot stats misgivings

2021-04-18 Thread Amit Kapila
On Mon, Apr 19, 2021 at 9:00 AM Masahiko Sawada  wrote:
>
> On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila  wrote:
> >
> >
> > 4.
> > +CREATE VIEW pg_stat_replication_slots AS
> > +SELECT
> > +s.slot_name,
> > +s.spill_txns,
> > +s.spill_count,
> > +s.spill_bytes,
> > +s.stream_txns,
> > +s.stream_count,
> > +s.stream_bytes,
> > +s.total_txns,
> > +s.total_bytes,
> > +s.stats_reset
> > +FROM pg_replication_slots as r,
> > +LATERAL pg_stat_get_replication_slot(slot_name) as s
> > +WHERE r.datoid IS NOT NULL; -- excluding physical slots
> > ..
> > ..
> >
> > -/* Get the statistics for the replication slots */
> > +/* Get the statistics for the replication slot */
> >  Datum
> > -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> > +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
> >  {
> >  #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> > - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> > + text *slotname_text = PG_GETARG_TEXT_P(0);
> > + NameData slotname;
> >
> > I think with the above changes getting all the slot stats has become
> > much costlier. Is there any reason why can't we get all the stats from
> > the new hash_table in one shot and return them to the user?
>
> I think the advantage of this approach would be that it can avoid
> showing the stats for already-dropped slots. Like other statistics
> views such as pg_stat_all_tables and pg_stat_all_functions, searching
> the stats by the name got from pg_replication_slots can show only
> available slot stats even if the hash table has garbage slot stats.
>

Sounds reasonable. However, if the create_slot message is missed, it
will show an empty row for it. See below:

postgres=# select slot_name, total_txns from pg_stat_replication_slots;
 slot_name | total_txns
---+
 s1|  0
 s2|  0
   |
(3 rows)

Here, I have manually via debugger skipped sending the create_slot
message for the third slot and we are showing an empty for it. This
won't happen for pg_stat_all_tables, as it will set 0 or other initial
values in such a case. I think we need to address this case.

> Given that pg_stat_replication_slots doesn’t show garbage slot stats
> even if it has, I thought we can avoid checking those garbage stats
> frequently. It should not essentially be a problem for the hash table
> to have entries up to max_replication_slots regardless of live or
> already-dropped.
>

Yeah, but I guess we still might not save much by not doing it,
especially because for the other cases like tables/functions, we are
doing it without any threshold limit.

-- 
With Regards,
Amit Kapila.




Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-04-18 Thread Peter Smith
On Mon, Apr 19, 2021 at 2:09 PM Amit Kapila  wrote:
>
> On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira  wrote:
> >
> > On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote:
> >
> > The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH"
> > options, which are currently in some kind of quasi alphabetical /
> > random order which I found unnecessarily confusing.
> >
> > I can't think of any good reason for the current ordering, so PSA my
> > patch which has identical content but just re-orders that option list
> > to be alphabetical.
> >
> > AFAICS there is not reason to use a random order here. I think this 
> > parameter
> > list is in frequency of use. Your patch looks good to me.
> >
>
> I also agree that the current order is quite random. One idea is to
> keep them in alphabetical order as suggested by Peter and the other
> could be to arrange parameters based on properties, for example, there
> are few parameters like binary, streaming, copy_data which are in some
> way related to the data being replicated and others are more of slot
> properties (create_slot, slot_name). I see that few parameters among
> these have some dependencies on other parameters as well. I noticed
> that the other DDL commands like Create Table, Create Index doesn't
> have the WITH clause parameters in any alphabetical order so it might
> be better if the related parameters can be together here but if we
> think it is not a good idea in this case due to some reason then
> probably keeping them in alphabetical order makes sense.
>

Yes, if there were dozens of list items then I would agree that they
should be grouped somehow. But there aren't.

I think what may seem like a clever grouping to one reader may look
more like an over-complicated muddle to somebody else.

So I prefer just to apply the KISS Principle.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: New Table Access Methods for Multi and Single Inserts

2021-04-18 Thread Bharath Rupireddy
On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy
 wrote:
>
> On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy
>  wrote:
> > Attaching the v4 patch set. Please review it further.
>
> Attaching v5 patch set after rebasing onto the latest master.

Another rebase due to conflicts in 0003. Attaching v6 for review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 6518212583e24b017375512701d9fefa6de20e42 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Wed, 10 Mar 2021 09:53:48 +0530
Subject: [PATCH v6 1/3] New Table AMs for Multi and Single Inserts

This patch introduces new table access methods for multi and
single inserts. Also implements/rearranges the outside code for
heap am into these new APIs.

Main design goal of these new APIs is to give flexibility to
tableam developers in implementing multi insert logic dependent on
the underlying storage engine. Currently, for all the underlying
storage engines, we follow the same multi insert logic such as when
and how to flush the buffered tuples, tuple size calculation, and
this logic doesn't take into account the underlying storage engine
capabilities.

We can also avoid duplicating multi insert code (for existing COPY,
and upcoming CTAS, CREATE/REFRESH MAT VIEW and INSERT SELECTs). We
can also move bulk insert state allocation and deallocation inside
these APIs.
---
 src/backend/access/heap/heapam.c | 212 +++
 src/backend/access/heap/heapam_handler.c |   5 +
 src/backend/access/table/tableamapi.c|   7 +
 src/backend/executor/execTuples.c|  83 -
 src/include/access/heapam.h  |  49 +-
 src/include/access/tableam.h |  87 ++
 src/include/executor/tuptable.h  |   1 +
 7 files changed, 438 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 3b435c107d..d8bfe17f22 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -67,6 +67,7 @@
 #include "utils/datum.h"
 #include "utils/inval.h"
 #include "utils/lsyscache.h"
+#include "utils/memutils.h"
 #include "utils/relcache.h"
 #include "utils/snapmgr.h"
 #include "utils/spccache.h"
@@ -2669,6 +2670,217 @@ heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
 	pgstat_count_heap_insert(relation, ntuples);
 }
 
+/*
+ * heap_insert_begin - allocate and initialize TableInsertState
+ *
+ * For single inserts:
+ *  1) Specify is_multi as false, then multi insert state will be NULL.
+ *
+ * For multi inserts:
+ *  1) Specify is_multi as true, then multi insert state will be allocated and
+ * 	   initialized.
+ *
+ *  Other input parameters i.e. relation, command id, options are common for
+ *  both single and multi inserts.
+ */
+TableInsertState*
+heap_insert_begin(Relation rel, CommandId cid, int options, bool is_multi)
+{
+	TableInsertState *state;
+
+	state = palloc(sizeof(TableInsertState));
+	state->rel = rel;
+	state->cid = cid;
+	state->options = options;
+	/* Below parameters are not used for single inserts. */
+	state->mi_slots = NULL;
+	state->mistate = NULL;
+	state->mi_cur_slots = 0;
+	state->flushed = false;
+
+	if (is_multi)
+	{
+		HeapMultiInsertState *mistate;
+
+		mistate = palloc(sizeof(HeapMultiInsertState));
+		state->mi_slots =
+palloc0(sizeof(TupleTableSlot *) * MAX_BUFFERED_TUPLES);
+		mistate->max_slots = MAX_BUFFERED_TUPLES;
+		mistate->max_size = MAX_BUFFERED_BYTES;
+		mistate->cur_size = 0;
+		/*
+		 * Create a temporary memory context so that we can reset once per
+		 * multi insert batch.
+		 */
+		mistate->context = AllocSetContextCreate(CurrentMemoryContext,
+ "heap_multi_insert",
+ ALLOCSET_DEFAULT_SIZES);
+		state->mistate = mistate;
+	}
+
+	return state;
+}
+
+/*
+ * heap_insert_v2 - insert single tuple into a heap
+ *
+ * Insert tuple from slot into table. This is like heap_insert(), the only
+ * difference is that the parameters for insertion are inside table insert
+ * state structure.
+ */
+void
+heap_insert_v2(TableInsertState *state, TupleTableSlot *slot)
+{
+	bool		shouldFree = true;
+	HeapTuple	tuple = ExecFetchSlotHeapTuple(slot, true, &shouldFree);
+
+	Assert(state);
+
+	/* Update tuple with table oid */
+	slot->tts_tableOid = RelationGetRelid(state->rel);
+	tuple->t_tableOid = slot->tts_tableOid;
+
+	/* Perform insertion, and copy the resulting ItemPointer */
+	heap_insert(state->rel, tuple, state->cid, state->options, state->bistate);
+	ItemPointerCopy(&tuple->t_self, &slot->tts_tid);
+
+	if (shouldFree)
+		pfree(tuple);
+}
+
+/*
+ * heap_multi_insert_v2 - insert multiple tuples into a heap
+ *
+ * Compute size of tuple. See if the buffered slots can hold the tuple. If yes,
+ * store it in the buffers, otherwise flush i.e. insert the so far buffered
+ * tuples into heap.
+ *
+ * Flush can happen:
+ *  1) either if all the buffered slots are filled up
+ *  2) or if total tuple size of the currently buffered slot

Re: Performance degradation of REFRESH MATERIALIZED VIEW

2021-04-18 Thread Masahiko Sawada
On Fri, Apr 16, 2021 at 12:16 PM Kyotaro Horiguchi
 wrote:
>
> At Mon, 12 Apr 2021 15:20:41 +0900, Masahiko Sawada  
> wrote in
> > .
> >
> > On Thu, Mar 11, 2021 at 5:44 PM Masahiko Sawada  
> > wrote:
> > >
> > > Hi,
> > >
> > > While discussing freezing tuples during CTAS[1], we found that
> > > heap_insert() with HEAP_INSERT_FROZEN brings performance degradation.
> > > For instance, with Paul's patch that sets HEAP_INSERT_FROZEN to CTAS,
> > > it took 12 sec whereas the code without the patch took 10 sec with the
> > > following query:
> > >
> > > create table t1 (a, b, c, d) as select i,i,i,i from
> > > generate_series(1,2000) i;
> > >
> > > I've done a simple benchmark of REFRESH MATERIALIZED VIEW with the
> > > following queries:
> > >
> > > create table source as select generate_series(1, 5000);
> > > create materialized view mv as select * from source;
> > > refresh materialized view mv;
> > >
> > > The execution time of REFRESH MATERIALIZED VIEW are:
> > >
> > > w/ HEAP_INSERT_FROZEN flag : 42 sec
> > > w/o HEAP_INSERT_FROZEN flag : 33 sec
> > >
> > > After investigation, I found that such performance degradation happens
> > > on only HEAD code. It seems to me that commit 39b66a91b (and
> > > 7db0cd2145) is relevant that has heap_insert() set VM bits and
> > > PD_ALL_VISIBLE if HEAP_INSERT_FROZEN is specified (so CCing Tomas
> > > Vondra and authors). Since heap_insert() sets PD_ALL_VISIBLE to the
> > > page when inserting a tuple for the first time on the page (around
> > > L2133 in heapam.c), every subsequent heap_insert() on the page reads
> > > and pins a VM buffer (see RelationGetBufferForTuple()).
> >
> > IIUC RelationGetBufferForTuple() pins vm buffer if the page is
> > all-visible since the caller might clear vm bit during operation. But
> > it's not necessarily true in HEAP_FROZEN_INSERT case. When inserting
> > HEAP_FROZEN_INSERT, we might set PD_ALL_VISIBLE flag and all-visible
> > bit but never clear those flag and bit during insertion. Therefore to
> > fix this issue, I think we can have RelationGetBufferForTuple() not to
> > pin vm buffer if we're inserting a frozen tuple (i.g.,
> > HEAP_FROZEN_INSERT case) and the target page is already all-visible.
>
> It seems right to me.
>
> > In HEAP_FROZEN_INSERT, the cases where we need to pin vm buffer would
> > be the table is empty. That way, we will pin vm buffer only for the
> > first time of inserting frozen tuple into the empty page, then set
> > PD_ALL_VISIBLE to the page and all-frozen bit on vm. Also set
> > XLH_INSERT_ALL_FROZEN_SET to WAL. At further insertions, we would not
> > pin vm buffer as long as we’re inserting a frozen tuple into the same
> > page.
> >
> > If the target page is neither empty nor all-visible we will not pin vm
> > buffer, which is fine because if the page has non-frozen tuple we
> > cannot set bit on vm during heap_insert(). If all tuples on the page
> > are already frozen but PD_ALL_VISIBLE is not set for some reason, we
> > would be able to set all-frozen bit on vm but it seems not a good idea
> > since it requires checking during insertion if all existing tuples are
> > frozen or not.
> >
> > The attached patch does the above idea. With this patch, the same
> > performance tests took 33 sec.

Thank you for the comments.

>
> Great! The direction of the patch looks fine to me.
>
> +* If we're inserting frozen entry into empty page, we will 
> set
> +* all-visible to page and all-frozen on visibility map.
> +*/
> +   if (PageGetMaxOffsetNumber(page) == 0)
> all_frozen_set = true;
>
> AFAICS the page is always empty when RelationGetBufferForTuple
> returned a valid vmbuffer.  So the "if" should be an "assert" instead.

There is a chance that RelationGetBufferForTuple() returns a valid
vmbuffer but the page is not empty, since RelationGetBufferForTuple()
checks without a lock if the page is empty. But when it comes to
HEAP_INSERT_FROZEN cases it actually doesn’t happen at least for now
since only one process inserts tuples into the relation. Will fix.

>
> And, the patch changes the value of all_frozen_set to false when the
> page was already all-frozen (thus not empty). It would be fine since
> we don't need to change the visibility of the page in that case but
> the variable name is no longer correct.  set_all_visible or such?

It seems to me that the variable name all_frozen_set corresponds to
XLH_INSERT_ALL_FROZEN_SET but I see your point. How about
set_all_frozen instead since we set all-frozen bits (also implying
setting all-visible)?

BTW I found the following description of XLH_INSERT_ALL_FROZEN_SET but
there is no all_visible_set anywhere:

/* all_frozen_set always implies all_visible_set */
#define XLH_INSERT_ALL_FROZEN_SET   (1<<5)

I'll update those comments as well.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Replication slot stats misgivings

2021-04-18 Thread Amit Kapila
On Fri, Apr 16, 2021 at 8:50 AM Justin Pryzby  wrote:
>
> On Fri, Apr 16, 2021 at 08:48:29AM +0530, Amit Kapila wrote:
> > I am fine with your proposed changes. There are one or two more
> > patches in this area. I can include your suggestions along with those
> > if you don't mind?
>
> However's convenient is fine
>

Thanks for your suggestions. I have pushed your changes as part of the
commit c64dcc7fee.

-- 
With Regards,
Amit Kapila.




Re: Replication slot stats misgivings

2021-04-18 Thread Amit Kapila
On Sun, Apr 18, 2021 at 6:51 PM Masahiko Sawada  wrote:
>
> Yes, also the following expectation in expected/stats.out is wrong:
>
> SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
> spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
> total_bytes FROM pg_stat_replication_slots;
> slot_name| spill_txns | spill_count | total_txns | total_bytes
> -++-++-
>  regression_slot | f  | f   | t  | t
> (1 row)
>
> We should expect all values are 0. Please find attached the patch.
>

Right. Both your and Vignesh's patch will fix the problem but I mildly
prefer Vignesh's one as that seems a bit simpler. So, I went ahead and
pushed his patch with minor other changes. Thanks to both of you.

-- 
With Regards,
Amit Kapila.




Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-04-18 Thread Amit Kapila
On Mon, Apr 19, 2021 at 6:32 AM Euler Taveira  wrote:
>
> On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote:
>
> The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH"
> options, which are currently in some kind of quasi alphabetical /
> random order which I found unnecessarily confusing.
>
> I can't think of any good reason for the current ordering, so PSA my
> patch which has identical content but just re-orders that option list
> to be alphabetical.
>
> AFAICS there is not reason to use a random order here. I think this parameter
> list is in frequency of use. Your patch looks good to me.
>

I also agree that the current order is quite random. One idea is to
keep them in alphabetical order as suggested by Peter and the other
could be to arrange parameters based on properties, for example, there
are few parameters like binary, streaming, copy_data which are in some
way related to the data being replicated and others are more of slot
properties (create_slot, slot_name). I see that few parameters among
these have some dependencies on other parameters as well. I noticed
that the other DDL commands like Create Table, Create Index doesn't
have the WITH clause parameters in any alphabetical order so it might
be better if the related parameters can be together here but if we
think it is not a good idea in this case due to some reason then
probably keeping them in alphabetical order makes sense.

-- 
With Regards,
Amit Kapila.




Re: Remove redundant variable from transformCreateStmt

2021-04-18 Thread Amul Sul
On Mon, Apr 19, 2021 at 9:28 AM Amul Sul  wrote:
>
> On Fri, Apr 16, 2021 at 6:26 AM Bharath Rupireddy
>  wrote:
> >
> > On Thu, Apr 15, 2021 at 8:40 PM Jeevan Ladhe
> >  wrote:
> > > IMHO, I think the idea here was to just get rid of an unnecessary variable
> > > rather than refactoring.
> > >
> > > On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy 
> > >  wrote:
> > >>
> > >> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > Attached patch removes "is_foreign_table" from transformCreateStmt()
> > >> > since it already has cxt.isforeign that can serve the same purpose.
> > >>
> > >> Yeah having that variable as "is_foreign_table" doesn't make sense
> > >> when we have the info in ctx. I'm wondering whether we can do the
> > >> following (like transformFKConstraints). It will be more readable and
> > >> we could also add more comments on why we don't skip validation for
> > >> check constraints i.e. constraint->skip_validation = false in case for
> > >> foreign tables.
> > >
> > > To address your concern here, I think it can be addressed by adding a 
> > > comment
> > > just before we make a call to transformCheckConstraints().
> >
> > +1.
>
> Ok, added the comment in the attached version.

Kindly ignore the previous version -- has unnecessary change.
See the attached.

Regards,
Amul
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 9dd30370dae..9aaa4bde278 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -176,7 +176,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	Oid			namespaceid;
 	Oid			existing_relid;
 	ParseCallbackState pcbstate;
-	bool		is_foreign_table = IsA(stmt, CreateForeignTableStmt);
 
 	/*
 	 * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
@@ -333,8 +332,12 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 
 	/*
 	 * Postprocess check constraints.
+	 *
+	 * For a table, the constraint can be considered validated immediately,
+	 * because the table must be empty.  But for a foreign table this is not
+	 * necessarily the case.
 	 */
-	transformCheckConstraints(&cxt, !is_foreign_table ? true : false);
+	transformCheckConstraints(&cxt, !cxt.isforeign);
 
 	/*
 	 * Postprocess extended statistics.


Re: Remove redundant variable from transformCreateStmt

2021-04-18 Thread Amul Sul
On Fri, Apr 16, 2021 at 6:26 AM Bharath Rupireddy
 wrote:
>
> On Thu, Apr 15, 2021 at 8:40 PM Jeevan Ladhe
>  wrote:
> > IMHO, I think the idea here was to just get rid of an unnecessary variable
> > rather than refactoring.
> >
> > On Thu, Apr 15, 2021 at 5:48 PM Bharath Rupireddy 
> >  wrote:
> >>
> >> On Thu, Apr 15, 2021 at 5:04 PM Amul Sul  wrote:
> >> >
> >> > Hi,
> >> >
> >> > Attached patch removes "is_foreign_table" from transformCreateStmt()
> >> > since it already has cxt.isforeign that can serve the same purpose.
> >>
> >> Yeah having that variable as "is_foreign_table" doesn't make sense
> >> when we have the info in ctx. I'm wondering whether we can do the
> >> following (like transformFKConstraints). It will be more readable and
> >> we could also add more comments on why we don't skip validation for
> >> check constraints i.e. constraint->skip_validation = false in case for
> >> foreign tables.
> >
> > To address your concern here, I think it can be addressed by adding a 
> > comment
> > just before we make a call to transformCheckConstraints().
>
> +1.

Ok, added the comment in the attached version.

Thanks Jeevan & Bharat for the review.

Regards,
Amul
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index 9dd30370dae..17182500c61 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -176,7 +176,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	Oid			namespaceid;
 	Oid			existing_relid;
 	ParseCallbackState pcbstate;
-	bool		is_foreign_table = IsA(stmt, CreateForeignTableStmt);
 
 	/*
 	 * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
@@ -327,14 +326,18 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
 	cxt.alist = list_concat(cxt.alist, cxt.likeclauses);
 
 	/*
-	 * Postprocess foreign-key constraints.
+	 * Postprocess foreign-key and check constraints.
 	 */
 	transformFKConstraints(&cxt, true, false);
 
 	/*
 	 * Postprocess check constraints.
+	 *
+	 * For a table, the constraint can be considered validated immediately,
+	 * because the table must be empty.  But for a foreign table this is not
+	 * necessarily the case.
 	 */
-	transformCheckConstraints(&cxt, !is_foreign_table ? true : false);
+	transformCheckConstraints(&cxt, !cxt.isforeign);
 
 	/*
 	 * Postprocess extended statistics.


Re: Replication slot stats misgivings

2021-04-18 Thread Masahiko Sawada
On Fri, Apr 16, 2021 at 2:58 PM Amit Kapila  wrote:
>
> On Thu, Apr 15, 2021 at 4:35 PM Masahiko Sawada  wrote:
> >
> > Thank you for the update! The patch looks good to me.
> >
>
> I have pushed the first patch. Comments on the next patch
> v13-0001-Use-HTAB-for-replication-slot-statistics:
> 1.
> + /*
> + * Check for all replication slots in stats hash table. We do this check
> + * when replSlotStats has more than max_replication_slots entries, i.e,
> + * when there are stats for the already-dropped slot, to avoid frequent
> + * call SearchNamedReplicationSlot() which acquires LWLock.
> + */
> + if (replSlotStats && hash_get_num_entries(replSlotStats) >
> max_replication_slots)
> + {
> + PgStat_ReplSlotEntry *slotentry;
> +
> + hash_seq_init(&hstat, replSlotStats);
> + while ((slotentry = (PgStat_ReplSlotEntry *) hash_seq_search(&hstat)) != 
> NULL)
> + {
> + if (SearchNamedReplicationSlot(NameStr(slotentry->slotname), true) == NULL)
> + pgstat_report_replslot_drop(NameStr(slotentry->slotname));
> + }
> + }
>
> Is SearchNamedReplicationSlot() so frequently used that we need to do
> this only when the hash table has entries more than
> max_replication_slots? I think it would be better if we can do it
> without such a condition to reduce the chances of missing the slot
> stats. We don't have any such restrictions for any other cases in this
> function.

Please see below comment on #4.

>
> I think it is better to add CHECK_FOR_INTERRUPTS in the above while loop?

Agreed.

>
> 2.
> /*
>   * Replication slot statistics kept in the stats collector
>   */
> -typedef struct PgStat_ReplSlotStats
> +typedef struct PgStat_ReplSlotEntry
>
> I think the comment above this structure can be changed to "The
> collector's data per slot" or something like that. Also, if we have to
> follow table/function/db style, then probably this structure should be
> named as PgStat_StatReplSlotEntry.

Agreed.

>
> 3.
> - * create the statistics for the replication slot.
> + * create the statistics for the replication slot. In case where the
> + * message for dropping the old slot gets lost and a slot with the same is
>
> /the same is/the same name is/.
>
> Can we mention something similar to what you have added here in docs as well?

Agreed.

>
> 4.
> +CREATE VIEW pg_stat_replication_slots AS
> +SELECT
> +s.slot_name,
> +s.spill_txns,
> +s.spill_count,
> +s.spill_bytes,
> +s.stream_txns,
> +s.stream_count,
> +s.stream_bytes,
> +s.total_txns,
> +s.total_bytes,
> +s.stats_reset
> +FROM pg_replication_slots as r,
> +LATERAL pg_stat_get_replication_slot(slot_name) as s
> +WHERE r.datoid IS NOT NULL; -- excluding physical slots
> ..
> ..
>
> -/* Get the statistics for the replication slots */
> +/* Get the statistics for the replication slot */
>  Datum
> -pg_stat_get_replication_slots(PG_FUNCTION_ARGS)
> +pg_stat_get_replication_slot(PG_FUNCTION_ARGS)
>  {
>  #define PG_STAT_GET_REPLICATION_SLOT_COLS 10
> - ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> + text *slotname_text = PG_GETARG_TEXT_P(0);
> + NameData slotname;
>
> I think with the above changes getting all the slot stats has become
> much costlier. Is there any reason why can't we get all the stats from
> the new hash_table in one shot and return them to the user?

I think the advantage of this approach would be that it can avoid
showing the stats for already-dropped slots. Like other statistics
views such as pg_stat_all_tables and pg_stat_all_functions, searching
the stats by the name got from pg_replication_slots can show only
available slot stats even if the hash table has garbage slot stats.
Given that pg_stat_replication_slots doesn’t show garbage slot stats
even if it has, I thought we can avoid checking those garbage stats
frequently. It should not essentially be a problem for the hash table
to have entries up to max_replication_slots regardless of live or
already-dropped.

As another design, we can get all stats from the hash table in one
shot as you suggested. If we do that, it's better to check garbage
slot stats every time pgstat_vacuum_stat() is called so the view
doesn't show those stats but cannot avoid that completely.

I'll change the code pointed out by #1 and #4 according to this design
discussion.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: doc review for v14

2021-04-18 Thread Michael Paquier
On Fri, Apr 16, 2021 at 02:03:10AM -0500, Justin Pryzby wrote:
> A bunch more found with things like this.

Thanks, applied most of it!
--
Michael


signature.asc
Description: PGP signature


Re: File truncation within PostgresNode::issues_sql_like() wrong on Windows

2021-04-18 Thread Michael Paquier
On Sat, Apr 17, 2021 at 09:55:47AM -0400, Andrew Dunstan wrote:
> Yes please, much better to use a symbolic name rather than a magic
> number.  I wouldn't bother backpatching it though.

Okay, done this way then.
--
Michael


signature.asc
Description: PGP signature


Re: PG Docs - CREATE SUBSCRIPTION option list order

2021-04-18 Thread Euler Taveira
On Sun, Apr 18, 2021, at 8:59 PM, Peter Smith wrote:
> The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH"
> options, which are currently in some kind of quasi alphabetical /
> random order which I found unnecessarily confusing.
> 
> I can't think of any good reason for the current ordering, so PSA my
> patch which has identical content but just re-orders that option list
> to be alphabetical.
AFAICS there is not reason to use a random order here. I think this parameter
list is in frequency of use. Your patch looks good to me.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


PG Docs - CREATE SUBSCRIPTION option list order

2021-04-18 Thread Peter Smith
Hi,

The CREATE SUBSCRIPTION documentation [1] includes a list of "WITH"
options, which are currently in some kind of quasi alphabetical /
random order which I found unnecessarily confusing.

I can't think of any good reason for the current ordering, so PSA my
patch which has identical content but just re-orders that option list
to be alphabetical.

--
[1] = https://www.postgresql.org/docs/devel/sql-createsubscription.html

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-create-subscription-options-list-order.patch
Description: Binary data


Re: track_planning causing performance regression

2021-04-18 Thread Justin Pryzby
Reviewing this change which was committed last year as
321fa6a4a26c9b649a0fbec9fc8b019f19e62289

On Fri, Jul 03, 2020 at 03:57:38PM +0900, Fujii Masao wrote:
> On 2020/07/03 13:05, Pavel Stehule wrote:
> > pá 3. 7. 2020 v 4:39 odesílatel Fujii Masao  
> > napsal:
> > 
> > Maybe there can be documented so enabling this option can have a negative 
> > impact on performance.
> 
> Yes. What about adding either of the followings into the doc?
> 
> Enabling this parameter may incur a noticeable performance penalty.
> 
> or
> 
> Enabling this parameter may incur a noticeable performance penalty,
> especially when a fewer kinds of queries are executed on many
> concurrent connections.

Something seems is wrong with this sentence, and I'm not sure what it's trying
to say.  Is this right ?

> Enabling this parameter may incur a noticeable performance penalty,
> especially when a small number of queries are executed on many
> concurrent connections.

-- 
Justin




Re: pg_amcheck option to install extension

2021-04-18 Thread Alvaro Herrera
On 2021-Apr-18, Andrew Dunstan wrote:

> On 4/17/21 3:43 PM, Mark Dilger wrote:

> > I'd also like your impressions on whether we're likely to move
> > contrib/amcheck into core anytime soon.  If so, is it worth adding
> > an option that we'll soon need to deprecate?
> 
> I think if it stays as an extension it will stay in contrib. But it sure
> feels very odd to have a core bin program that relies on a contrib
> extension. It seems one or the other is misplaced.

I've proposed in the past that we should have a way to provide
extensions other than contrib -- specifically src/extensions/ -- and
then have those extensions installed together with the rest of core.
Then it would be perfectly legitimate to have src/bin/pg_amcheck that
depending that extension.  I agree that the current situation is not
great.

-- 
Álvaro Herrera39°49'30"S 73°17'W
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)




Re: partial heap only tuples

2021-04-18 Thread Peter Geoghegan
On Tue, Feb 9, 2021 at 10:48 AM Bossart, Nathan  wrote:
> I'm hoping to gather some early feedback on a heap optimization I've
> been working on.  In short, I'm hoping to add "partial heap only
> tuple" (PHOT) support, which would allow you to skip updating indexes
> for unchanged columns even when other indexes require updates.  Today,
> HOT works wonders when no indexed columns are updated.  However, as
> soon as you touch one indexed column, you lose that optimization
> entirely, as you must update every index on the table.  The resulting
> performance impact is a pain point for many of our (AWS's) enterprise
> customers, so we'd like to lend a hand for some improvements in this
> area.  For workloads involving a lot of columns and a lot of indexes,
> an optimization like PHOT can make a huge difference.  I'm aware that
> there was a previous attempt a few years ago to add a similar
> optimization called WARM [0] [1].  However, I only noticed this
> previous effort after coming up with the design for PHOT, so I ended
> up taking a slightly different approach.  I am also aware of a couple
> of recent nbtree improvements that may mitigate some of the impact of
> non-HOT updates [2] [3], but I am hoping that PHOT serves as a nice
> complement to those.  I've attached a very early proof-of-concept
> patch with the design described below.

I would like to share some thoughts that I have about how I think
about optimizations like PHOT, and how they might fit together with my
own work -- particularly the nbtree bottom-up index deletion feature
you referenced. My remarks could equally well apply to WARM.
Ordinarily this is the kind of thing that would be too hand-wavey for
the mailing list, but we don't have the luxury of in-person
communication right now.

Everybody tends to talk about HOT as if it works perfectly once you
make some modest assumptions, such as "there are no long-running
transactions", and "no UPDATEs will logically modify indexed columns".
But I tend to doubt that that's truly the case -- I think that there
are still pathological cases where HOT cannot keep the total table
size stable in the long run due to subtle effects that eventually
aggregate into significant issues, like heap fragmentation. Ask Jan
Wieck about the stability of some of the TPC-C/BenchmarkSQL tables to
get one example of this. There is no reason to believe that PHOT will
help with that. Maybe that's okay, but I would think carefully about
what that means if I were undertaking this work. Ensuring stability in
the on-disk size of tables in cases where the size of the logical
database is stable should be an important goal of a project like PHOT
or HOT.

If you want to get a better sense of how these inefficiencies might
happen, I suggest looking into using recently added autovacuum logging
to analyze how well HOT works today, using the technique that I go
into here:

https://postgr.es/m/cah2-wzkju+nibskzunbdpz6trse+aqvupae+xgm8zvob4wq...@mail.gmail.com

Small inefficiencies in the on-disk structure have a tendency to
aggregate over time, at least when there is no possible way to reverse
them. The bottom-up index deletion stuff is very effective as a
backstop against index bloat, because things are generally very
non-linear. The cost of an unnecessary page split is very high, and
permanent. But we can make it cheap to *try* to avoid that using
fairly simple heuristics. We can be reasonably confident that we're
about to split the page unnecessarily, and use cues that ramp up the
number of heap page accesses as needed. We ramp up during a bottom-up
index deletion, as we manage to free some index tuples as a result of
previous heap page accesses.

This works very well because we can intervene very selectively. We
aren't interested in deleting index tuples unless and until we really
have to, and in general there tends to be quite a bit of free space to
temporarily store extra version duplicates -- that's what most index
pages look like, even on the busiest of databases. It's possible for
the bottom-up index deletion mechanism to be invoked very
infrequently, and yet make a huge difference. And when it fails to
free anything, it fails permanently for that particular leaf page
(because it splits) -- so now we have lots of space for future index
tuple insertions that cover the original page's key space. We won't
thrash.

My intuition is that similar principles can be applied inside heapam.
Failing to fit related versions on a heap page (having managed to do
so for hours or days before that point) is more or less the heap page
equivalent of a leaf page split from version churn (this is the
pathology that bottom-up index deletion targets). For example, we
could have a fall back mode that compresses old versions that is used
if and only if heap pruning was attempted but then failed. We should
always try to avoid migrating to a new heap page, because that amounts
to a permanent solution to a temporary problem. We should perhaps make
the 

Re: pg_amcheck option to install extension

2021-04-18 Thread Mark Dilger


> On Apr 18, 2021, at 6:19 AM, Andrew Dunstan  wrote:
> 
> how about specifying pg_catalog as the schema instead of public?

Done.



v2-0001-Adding-install-missing-option-to-pg_amcheck.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: SQL-standard function body

2021-04-18 Thread Tom Lane
... BTW, a dependency loop is also possible without using this feature,
by abusing default-value expressions:

create function f1(x int, y int) returns int language sql
as 'select $1 + $2';
create function f2(x int, y int default f1(1,2)) returns int language sql
as 'select $1 + $2';
create or replace function f1(x int, y int default f2(11,12)) returns int 
language sql
as 'select $1 + $2';

The actual use-case for that seems pretty thin, so we never bothered
to worry about it before.  But if we're going to build loop-breaking
logic to handle function body dependencies, it should deal with this
too.  I think that all that's required is for the initial dummy
function declaration to omit defaults as well as providing a dummy
body.

regards, tom lane




Re: SQL-standard function body

2021-04-18 Thread Justin Pryzby
On Sun, Apr 18, 2021 at 03:08:44PM -0400, Tom Lane wrote:
> Noah Misch  writes:
> > Should we be okay releasing v14 without support for breaking function
> > dependency loops, or does that call for an open item?
> 
> Oh!  That should definitely be an open item.  It doesn't seem
> that hard to do something similar to what we do for views,
> i.e. create a dummy function and replace it later.

I added
https://wiki.postgresql.org/index.php?title=PostgreSQL_14_Open_Items&type=revision&diff=35926&oldid=35925




Re: Planning time grows exponentially with levels of nested views

2021-04-18 Thread Joel Jacobson
On Sun, Apr 18, 2021, at 22:14, Tom Lane wrote:
> "Joel Jacobson" mailto:joel%40compiler.org>> writes:
> > I assumed the cost for each nested VIEW layer would grow linear,
> > but my testing shows it appears to grow exponentially:
> 
> I think it's impossible to avoid less-than-O(N^2) growth on this sort
> of case.  For example, the v2 subquery initially has RTEs for v2 itself
> plus v1.  When we flatten v1 into v2, v2 acquires the RTEs from v1,
> namely v1 itself plus foo.  Similarly, once vK-1 is pulled up into vK,
> there are going to be order-of-K entries in vK's rtable, and that stacking
> makes for O(N^2) work overall just in manipulating the rtable.
> 
> We can't get rid of these rtable entries altogether, since all of them
> represent table privilege checks that the executor will need to do.
> It occurs to me though that we don't need the rte->subquery trees anymore
> once those are flattened, so maybe we could do something like the
> attached.  For me, this reduces the slowdown in your example from
> O(N^3) to O(N^2).

Many thanks for explaining and the patch.

I've tested the patch successfully.
Planning time grows much slower now:

EXPLAIN ANALYZE SELECT * FROM v64;
- Planning Time: 14.981 ms
+ Planning Time: 2.802 ms

EXPLAIN ANALYZE SELECT * FROM v128;
- Planning Time: 109.407 ms
+ Planning Time: 11.595 ms

EXPLAIN ANALYZE SELECT * FROM v256;
- Planning Time: 1594.809 ms
+ Planning Time: 46.709 ms

Very nice.

/Joel

Re: Planning time grows exponentially with levels of nested views

2021-04-18 Thread Tom Lane
I wrote:
> If multiple references are actually possible then this'd break it.  There
> seem to be no such cases in the regression tests though, and I'm having a
> hard time wrapping my brain around what would cause it.  "git blame"
> traces this text to my own commit f44639e1b, which has the log entry
> Don't crash if subquery appears multiple times in jointree.  This should
> not happen anyway, but let's try not to get completely confused if it does
> (due to rewriter bugs or whatever).
> so I'm thinking that this was only hypothetical.

Ah, found it.  That was evidently a reaction to the immediately preceding
commit (352871ac9), which fixed a rewriter bug that could lead to exactly
the case of multiple jointree references to one RTE.

I think this patch doesn't make things any worse for such a case though.
If we re-introduced such a bug, the result would be an immediate null
pointer crash while trying to process the second reference to a
flattenable subquery.  That's probably better for debuggability than
what happens now, where we just silently process the duplicate reference.

Anyway, I've stuck this into the next CF for future consideration.

regards, tom lane

PS: to save time for anyone else who wants to investigate this,
it looks like the report mentioned in 352871ac9 was

https://www.postgresql.org/message-id/007401c0860d%24bed809a0%241001a8c0%40archonet.com




Re: Old Postgresql version on i7-1165g7

2021-04-18 Thread Yura Sokolov

Tom Lane писал 2021-04-13 17:45:

Justin Pryzby  writes:

On Fri, Apr 09, 2021 at 04:28:25PM +0300, Yura Sokolov wrote:
Occasinally I found I'm not able to `make check` old Postgresql 
versions.


I've bisected between REL_11_0 and "Rename pg_rewind's 
copy_file_range()"

and
found 372728b0d49552641f0ea83d9d2e08817de038fa

Replace our traditional initial-catalog-data format with a better
design.
This is first commit where `make check` doesn't fail during initdb on 
my

machine.


This doesn't make much sense or help much, since 372728b doesn't 
actually

change the catalogs, or any .c file.


It could make sense if some part of the toolchain that was previously
used to generate postgres.bki doesn't work right on that machine.
Overall though I'd have thought that 372728b would increase not
decrease our toolchain footprint.  It also seems unlikely that a
recent Ubuntu release would contain toolchain bugs that we hadn't
already heard about.


You used make clean too, right ?


Really, when bisecting, you need to use "make distclean" or even
"git clean -dfx" between steps, or you may get bogus results,
because our makefiles aren't that great about tracking dependencies,
especially when you move backwards in the history.

So perhaps a more plausible theory is that this bisection result
is wrong because you weren't careful enough.

regards, tom lane


Sorry for missing mail for a week.

I believe I cleaned before each step since I'm building in external 
directory

and cleanup is just `rm * -r`.

But I'll repeat bisecting tomorrow to be sure.

I don't think it is really PostgreSQL or toolchain bug. I believe it is 
some

corner case that were changed in new Intel CPU.

With regards,
Yura Sokolov.




Re: Planning time grows exponentially with levels of nested views

2021-04-18 Thread Tom Lane
[ redirecting to -hackers so the cfbot can see it ]

"Joel Jacobson"  writes:
> I assumed the cost for each nested VIEW layer would grow linear,
> but my testing shows it appears to grow exponentially:

I think it's impossible to avoid less-than-O(N^2) growth on this sort
of case.  For example, the v2 subquery initially has RTEs for v2 itself
plus v1.  When we flatten v1 into v2, v2 acquires the RTEs from v1,
namely v1 itself plus foo.  Similarly, once vK-1 is pulled up into vK,
there are going to be order-of-K entries in vK's rtable, and that stacking
makes for O(N^2) work overall just in manipulating the rtable.

We can't get rid of these rtable entries altogether, since all of them
represent table privilege checks that the executor will need to do.
It occurs to me though that we don't need the rte->subquery trees anymore
once those are flattened, so maybe we could do something like the
attached.  For me, this reduces the slowdown in your example from
O(N^3) to O(N^2).

I'm slightly worried though by this comment earlier in
pull_up_simple_subquery:

/*
 * Need a modifiable copy of the subquery to hack on.  Even if we didn't
 * sometimes choose not to pull up below, we must do this to avoid
 * problems if the same subquery is referenced from multiple jointree
 * items (which can't happen normally, but might after rule rewriting).
 */

If multiple references are actually possible then this'd break it.  There
seem to be no such cases in the regression tests though, and I'm having a
hard time wrapping my brain around what would cause it.  "git blame"
traces this text to my own commit f44639e1b, which has the log entry

Don't crash if subquery appears multiple times in jointree.  This should
not happen anyway, but let's try not to get completely confused if it does
(due to rewriter bugs or whatever).

so I'm thinking that this was only hypothetical.

It's possible that we could do something similar in the sibling functions
pull_up_simple_union_all etc, but I'm not sure it's worth troubling over.
TBH, for the size of effect you're showing here, I wouldn't be worried
at all; it's only because it seems to be a one-liner to improve it that
I'm interested in doing something.

regards, tom lane

diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index 62a1668796..f93c037778 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -1174,6 +1174,14 @@ pull_up_simple_subquery(PlannerInfo *root, Node *jtnode, RangeTblEntry *rte,
 	Assert(root->placeholder_list == NIL);
 	Assert(subroot->placeholder_list == NIL);
 
+	/*
+	 * We no longer need the RTE's copy of the subquery's query tree.  Getting
+	 * rid of it saves nothing in particular so far as this level of query is
+	 * concerned; but if this query level is in turn pulled up into a parent,
+	 * we'd waste cycles copying the now-unused query tree.
+	 */
+	rte->subquery = NULL;
+
 	/*
 	 * Miscellaneous housekeeping.
 	 *


Re: More info on pg_stat_activity Wait Event Name when is DataFileRead

2021-04-18 Thread Robert Haas
On Sat, Apr 17, 2021 at 1:58 PM PegoraroF10  wrote:
> This long explaining was only to show, at least for me, that would be
> desirable to have an additional information when Postgres is waiting for a
> file. What if DataFileRead showing relfilenode it´s waiting for ?

I agree that this would be nice, but it's pretty much impossible to do
it without adding quite a bit more overhead than the current system
has. And it already has enough overhead to make Andres at least
slightly grumpy, though admittedly a lot of things have enough
overhead to make Andres grumpy, because he REALLY likes it when things
go fast. :-)

I suspect it's best to investigate problems like the one you're having
using a tool like strace, which can provide way more detail than we
could ever cram into a wait event, like the data actually read or
written, timestamps for every operation, etc. But I also kind of
wonder whether it really matters. If your system is getting stuck in a
DataFileRead event for a long period of time, and assuming that's for
real and not just some kind of reporting bug, it sounds a lot like you
have a bad disk or a severely overloaded I/O subsystem. Because what
else would lead to the system getting stuck inside read() for a long
time?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: SQL-standard function body

2021-04-18 Thread Tom Lane
Noah Misch  writes:
> Should we be okay releasing v14 without support for breaking function
> dependency loops, or does that call for an open item?

Oh!  That should definitely be an open item.  It doesn't seem
that hard to do something similar to what we do for views,
i.e. create a dummy function and replace it later.

regards, tom lane




Re: SQL-standard function body

2021-04-18 Thread Noah Misch
On Tue, Jun 30, 2020 at 02:51:38PM -0400, Tom Lane wrote:
> The point remains that exposing the function body's dependencies will
> constrain restore order far more than we are accustomed to see.  It
> might be possible to build examples that flat out can't be restored,
> even granting that we teach pg_dump how to break dependency loops
> by first creating the function with empty body and later redefining
> it with the real body.  (Admittedly, if that's possible then you
> likely could make it happen with views too.  But somehow it seems
> more likely that people would create spaghetti dependencies for
> functions than views.)

Should we be okay releasing v14 without support for breaking function
dependency loops, or does that call for an open item?

-- example
create function f() returns int language sql return 1;
create function g() returns int language sql return f();
create or replace function f() returns int language sql return coalesce(2, g());

-- but when a view can break the cycle, pg_dump still does so
create view v as select 1 as c;
create function f() returns int language sql return coalesce(0, (select 
count(*) from v));
create or replace view v as select f() as c;




Re: 回复: Core dump happens when execute sql CREATE VIEW v1(c1) AS (SELECT ('4' COLLATE "C")::INT FROM generate_series(1, 10));

2021-04-18 Thread Tom Lane
Yulin PEI  writes:
> After several tests, I found that this patch do not fix the bug well.

What do you think is wrong with it?

> So the attachment is my patch and it works well as far as I tested.

This seems equivalent to the already-committed patch [1] except that
it wastes a makeNode call in the coerce-to-uncollatable-type case.

regards, tom lane

[1] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c402b02b9fb53aee2a26876de90a8f95f9a9be92




Re: "could not find pathkey item to sort" for TPC-DS queries 94-96

2021-04-18 Thread Tom Lane
I wrote:
> I think it's time for some refactoring of this code so that we can
> actually share the logic.  Accordingly, I propose the attached.

After sleeping on it, here's an improved version that gets rid of
an unnecessary assumption about ECs usually not containing both
parallel-safe and parallel-unsafe members.  I'd tried to do this
yesterday but didn't like the amount of side-effects on createplan.c
(caused by the direct call sites not being passed the "root" pointer).
However, we can avoid refactoring createplan.c APIs by saying that it's
okay to pass root = NULL to find_computable_ec_member if you're not
asking it to check parallel safety.  And there's not really a need to
put a parallel-safety check into find_ec_member_matching_expr at all;
that task can be left with the one caller that cares.

regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 0188c1e9a1..6627491519 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -35,6 +35,7 @@
 static EquivalenceMember *add_eq_member(EquivalenceClass *ec,
 		Expr *expr, Relids relids, Relids nullable_relids,
 		bool is_child, Oid datatype);
+static bool exprlist_member_ignore_relabel(Expr *node, List *exprs);
 static void generate_base_implied_equalities_const(PlannerInfo *root,
    EquivalenceClass *ec);
 static void generate_base_implied_equalities_no_const(PlannerInfo *root,
@@ -769,6 +770,169 @@ get_eclass_for_sort_expr(PlannerInfo *root,
 	return newec;
 }
 
+/*
+ * find_ec_member_matching_expr
+ *		Locate an EquivalenceClass member matching the given expr, if any;
+ *		return NULL if no match.
+ *
+ * "Matching" is defined as "equal after stripping RelabelTypes".
+ * This is used for identifying sort expressions, and we need to allow
+ * binary-compatible relabeling for some cases involving binary-compatible
+ * sort operators.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ */
+EquivalenceMember *
+find_ec_member_matching_expr(EquivalenceClass *ec,
+			 Expr *expr,
+			 Relids relids)
+{
+	ListCell   *lc;
+
+	/* We ignore binary-compatible relabeling on both ends */
+	while (expr && IsA(expr, RelabelType))
+		expr = ((RelabelType *) expr)->arg;
+
+	foreach(lc, ec->ec_members)
+	{
+		EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+		Expr	   *emexpr;
+
+		/*
+		 * We shouldn't be trying to sort by an equivalence class that
+		 * contains a constant, so no need to consider such cases any further.
+		 */
+		if (em->em_is_const)
+			continue;
+
+		/*
+		 * Ignore child members unless they belong to the requested rel.
+		 */
+		if (em->em_is_child &&
+			!bms_is_subset(em->em_relids, relids))
+			continue;
+
+		/*
+		 * Match if same expression (after stripping relabel).
+		 */
+		emexpr = em->em_expr;
+		while (emexpr && IsA(emexpr, RelabelType))
+			emexpr = ((RelabelType *) emexpr)->arg;
+
+		if (equal(emexpr, expr))
+			return em;
+	}
+
+	return NULL;
+}
+
+/*
+ * find_computable_ec_member
+ *		Locate an EquivalenceClass member that can be computed from the
+ *		expressions appearing in "exprs"; return NULL if no match.
+ *
+ * "exprs" can be either a list of bare expression trees, or a list of
+ * TargetEntry nodes.  Either way, it should contain Vars and possibly
+ * Aggrefs and WindowFuncs, which are matched to the corresponding elements
+ * of the EquivalenceClass's expressions.  As in find_ec_member_matching_expr,
+ * we ignore binary-compatible relabeling.
+ *
+ * Child EC members are ignored unless they belong to given 'relids'.
+ * Also, non-parallel-safe expressions are ignored if 'require_parallel_safe'.
+ *
+ * Note: some callers pass root == NULL for notational reasons.  This is OK
+ * when require_parallel_safe is false.
+ */
+EquivalenceMember *
+find_computable_ec_member(PlannerInfo *root,
+		  EquivalenceClass *ec,
+		  List *exprs,
+		  Relids relids,
+		  bool require_parallel_safe)
+{
+	ListCell   *lc;
+
+	foreach(lc, ec->ec_members)
+	{
+		EquivalenceMember *em = (EquivalenceMember *) lfirst(lc);
+		List	   *exprvars;
+		ListCell   *lc2;
+
+		/*
+		 * We shouldn't be trying to sort by an equivalence class that
+		 * contains a constant, so no need to consider such cases any further.
+		 */
+		if (em->em_is_const)
+			continue;
+
+		/*
+		 * Ignore child members unless they belong to the requested rel.
+		 */
+		if (em->em_is_child &&
+			!bms_is_subset(em->em_relids, relids))
+			continue;
+
+		/*
+		 * Match if all Vars and quasi-Vars are available in "exprs".
+		 */
+		exprvars = pull_var_clause((Node *) em->em_expr,
+   PVC_INCLUDE_AGGREGATES |
+   PVC_INCLUDE_WINDOWFUNCS |
+   PVC_INCLUDE_PLACEHOLDERS);
+		foreach(lc2, exprvars)
+		{
+			if (!exprlist_member_ignore_relabel(lfirst(lc2), exprs))
+break;
+		}
+		list_free(exprvars);
+		if (lc2)
+			continue;			/* we hit a no

Re: 2 questions about volatile attribute of pg_proc.

2021-04-18 Thread David G. Johnston
On Sun, Apr 18, 2021 at 9:08 AM Tom Lane  wrote:

> Isaac Morland  writes:
> > On Sun, 18 Apr 2021 at 11:36, Tom Lane  wrote:
> >> Are you familiar with the halting problem?  I don't see any meaningful
> >> difference here.
>
> > I think what is being suggested is akin to type checking, not solving the
> > halting problem.
>
> Yeah, on further thought we'd be satisfied with a conservative
> approximation, so that removes the theoretical-impossibility objection.
> Still, there are a lot of remaining problems, as you note.
>
>
Yeah, the type checking approach seems blocked by the "black box" nature of
functions.  A possibly more promising approach is for the top-level call to
declare its expectations (which are set by the user) and during execution
if that expectation is violated directly, or is reported as violated deeper
in the call stack, the execution of the function fails with some kind of
invalid state error.  However, as with other suggestions of this nature,
the fundamental blocker here is that to be particularly useful this kind of
validation needs to happen by default (as opposed to opt-in) which risks
breaking existing code.  And so I foresee this request falling into the
same category as those others - an interesting idea that could probably be
made to work but by itself isn't worthwhile enough to go and introduce a
fundamental change to the amount of "parental oversight" PostgreSQL tries
to provide.

David J.


Re: 2 questions about volatile attribute of pg_proc.

2021-04-18 Thread Tom Lane
Isaac Morland  writes:
> On Sun, 18 Apr 2021 at 11:36, Tom Lane  wrote:
>> Are you familiar with the halting problem?  I don't see any meaningful
>> difference here.

> I think what is being suggested is akin to type checking, not solving the
> halting problem.

Yeah, on further thought we'd be satisfied with a conservative
approximation, so that removes the theoretical-impossibility objection.
Still, there are a lot of remaining problems, as you note.

regards, tom lane




Re: 2 questions about volatile attribute of pg_proc.

2021-04-18 Thread Isaac Morland
On Sun, 18 Apr 2021 at 11:36, Tom Lane  wrote:

> Andy Fan  writes:
> > We know volatile is very harmful for optimizers and it is the default
> > value (and safest value) if the user doesn't provide that.  Asking user
> > to set the value is not a good experience,  is it possible to
> auto-generate
> > the value for it rather than use the volatile directly for user defined
> > function. I
> > think it should be possible, we just need to scan the PlpgSQL_stmt to see
> > if there
> > is a volatile function?
>
> Are you familiar with the halting problem?  I don't see any meaningful
> difference here.
>

I think what is being suggested is akin to type checking, not solving the
halting problem. Parse the function text, identify all functions it might
call (without solving the equivalent of the halting problem to see if it
actually does or could), and apply the most volatile value of called
functions to the calling function.

That being said, there are significant difficulties, including but almost
certainly not limited to:

- what happens if one modifies a called function after creating the calling
function?
- EXECUTE
- a PL/PGSQL function's meaning depends on the search path in effect when
it is called, unless it has a SET search_path clause or it fully qualifies
all object references, so it isn't actually possible in general to
determine what a function calls at definition time

If the Haskell compiler is possible then what is being requested here is
conceptually possible even if there are major issues with actually doing it
in the Postgres context. The halting problem is not the problem here.


Re: 2 questions about volatile attribute of pg_proc.

2021-04-18 Thread Tom Lane
Andy Fan  writes:
> We know volatile is very harmful for optimizers and it is the default
> value (and safest value) if the user doesn't provide that.  Asking user
> to set the value is not a good experience,  is it possible to auto-generate
> the value for it rather than use the volatile directly for user defined
> function. I
> think it should be possible, we just need to scan the PlpgSQL_stmt to see
> if there
> is a volatile function?

Are you familiar with the halting problem?  I don't see any meaningful
difference here.

regards, tom lane




Re: Bogus collation version recording in recordMultipleDependencies

2021-04-18 Thread Tom Lane
Julien Rouhaud  writes:
> On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote:
>> It seems to me that there are two things that would be needed to
>> salvage this for PG14: (1) deciding that we're unlikely to come up
>> with a better idea than using pg_depend for this (following the
>> argument that it'd only create duplication to have a parallel
>> dedicated catalog), (2) fixing any remaining flaws in the dependency
>> analyser code.  I'll look into the details some more on Monday.

> So IIUC the issue here is that the code could previously record useless
> collation version dependencies in somes cases, ...
> The new situation is now that the code can record too few version dependencies
> leading to false negative detection, which is way more problematic.

I'm not sure that an error in this direction is all that much more
problematic than the other direction.  If it's okay to claim that
indexes need to be rebuilt when they don't really, then we could just
drop this entire overcomplicated infrastructure and report that all
indexes need to be rebuilt after any collation version change.

But in any case you're oversimplifying tremendously.  The previous code is
just as capable of errors of omission, because it was inquiring into the
wrong composite types, ie those of leaf expression nodes.  The ones we'd
need to look at are the immediate inputs of record_eq and siblings.  Here
are a couple of examples where the leaf types are unhelpful:

... where row(a,b,c)::composite_type < row(d,e,f)::composite_type;
... where function_returning_composite(...) < function_returning_composite(...);

And even if we do this, we're not entirely in the clear in an abstract
sense, because this only covers cases in which an immediate input is
of a known named composite type.  Cases dealing in anonymous RECORD
types simply can't be resolved statically.  It might be that that
can't occur in the specific situation of CREATE INDEX expressions,
but I'm not 100% sure of it.  The apparent counterexample of

... where row(a,b) < row(a,c)

isn't one because we parse that as RowCompareExpr not an application
of record_lt.

> We agreed that having possible false positive dependencies was acceptable for
> the initial implementation and that we will improve it in later versions, as
> otherwise the alternative is to reindex everything without getting any 
> warning,
> which clearly isn't better anyway.

[ shrug... ] You have both false positives and false negatives in the
thing as it stood before f24b15699.  I'm not convinced that it's possible
to completely avoid either issue via static analysis.  I'm inclined to
think that false negatives around record_eq-like functions are not such a
problem for real index definitions, and we'd be better off with fewer
false positives.  But it's all judgment calls.

regards, tom lane




Re: 2 questions about volatile attribute of pg_proc.

2021-04-18 Thread Pavel Stehule
ne 18. 4. 2021 v 17:06 odesílatel Andy Fan 
napsal:

> Hi:
>
> We know volatile is very harmful for optimizers and it is the default
> value (and safest value) if the user doesn't provide that.  Asking user
> to set the value is not a good experience,  is it possible to auto-generate
> the value for it rather than use the volatile directly for user defined
> function. I
> think it should be possible, we just need to scan the PlpgSQL_stmt to see
> if there
> is a volatile function?
>

plpgsql_check does this check - the performance check check if function can
be marked as stable

https://github.com/okbob/plpgsql_check

I don't think so this can be done automatically - plpgsql does not check
objects inside in registration time. You can use objects and functions that
don't exist in CREATE FUNCTION time. And you need to know this info before
optimization time. So if we implement this check automatically, then
planning time can be increased a lot.

Regards

Pavel


> The second question "It is v for “volatile” functions, whose results might
> change at any time.
> (Use v also for functions with side-effects, so that calls to them cannot
> get optimized away.)"
> I think they are different semantics.  One of the results is volatile
> functions can't be removed
> by remove_unused_subquery_output even if it doesn't have side effects. for
> example:
> select b from (select an_expensive_random(), b from t);   Is it by design
> on purpose?
>
>
> --
> Best Regards
> Andy Fan (https://www.aliyun.com/)
>


2 questions about volatile attribute of pg_proc.

2021-04-18 Thread Andy Fan
Hi:

We know volatile is very harmful for optimizers and it is the default
value (and safest value) if the user doesn't provide that.  Asking user
to set the value is not a good experience,  is it possible to auto-generate
the value for it rather than use the volatile directly for user defined
function. I
think it should be possible, we just need to scan the PlpgSQL_stmt to see
if there
is a volatile function?

The second question "It is v for “volatile” functions, whose results might
change at any time.
(Use v also for functions with side-effects, so that calls to them cannot
get optimized away.)"
I think they are different semantics.  One of the results is volatile
functions can't be removed
by remove_unused_subquery_output even if it doesn't have side effects. for
example:
select b from (select an_expensive_random(), b from t);   Is it by design
on purpose?


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


Consider parent's stats for set_append_rel_size.

2021-04-18 Thread Andy Fan
Hi:

I would talk about the impact of init partition prune for
set_append_rel_size.
and create_append_path. Finally I just want to focus on set_append_rel_size
only in this thread.

Given the below example:

CREATE TABLE P (part_key int, v int) PARTITION BY RANGE (part_key);
CREATE TABLE p_1 PARTITION OF p FOR VALUES FROM (0) TO (10);
CREATE TABLE p_2 PARTITION OF p FOR VALUES FROM (10) TO (20);
CREATE TABLE p_3 PARTITION OF p FOR VALUES FROM (20) TO (30);
INSERT INTO p SELECT i % 30, i  FROM generate_series(1, 300)i;

set plan_cache_mode to force_generic_plan ;
prepare s as select * from p where part_key = $1;
explain analyze execute s(2);

Then we will get estimated RelOptInfo.rows = 30, but actually it is 10 rows.

explain analyze execute s(2);
 QUERY PLAN
-
 Append  (cost=0.00..6.90 rows=30 width=8) (actual time=0.019..0.042
rows=10 loops=1)
   Subplans Removed: 2
   ->  Seq Scan on p_1  (cost=0.00..2.25 rows=10 width=8) (actual
time=0.017..0.038 rows=10 loops=1)
 Filter: (part_key = $1)
 Rows Removed by Filter: 90
 Planning Time: 0.885 ms
 Execution Time: 0.156 ms
(7 rows)

Actually there are 2 issues here. one is RelOptInfo->rows which is set by
set_append_rel_size, the other one appendPath->path.rows is set at
create_append_path. They are two independent data. (When we estimate
the rows of a joinrel, we only consider the RelOptInfo.rows rather than
Path.rows).

In set_append_rel_size, it pushes the quals to each child relation and does
a sum of
each child->rows.  child's stats works better than parent stats if we know
exactly which
partitions we would access. But this strategy fails when init prune comes as
above.

So I think considering parent's stats for init prune case might be a good
solution (Ashutosh has mentioned global stats for this a long time
ago[1]).  So I want
to refactor the code like this:

a). should_use_parent_stats(..);  Decides which stats we should use for an
AppendRel.
b). set_append_rel_size_locally:  Just do what we currently do.
c). set_append_rel_size_globally: We calculate the quals selectivity on
AppendRel level, and set the rows with AppendRel->tuples * sel.

More about should_use_parent_stats function:
1. If there are no quals for initial partition prune, we use child's stats.
2. If we have quals for initial partition prune, and the left op is not
used in
   planning time prune, we use parent's stats. For example:  (part_key = 2
and
   part_key > $1);

However when I was coding it, I found out that finding "quals for initial
partition prune"
is not so easy.  So I doubt if we need the troubles to decide which method
to use.   Attached is just the PoC version which will use parent's stats
all the time.

Author: 一挃 
Date:   Sun Apr 18 22:02:54 2021 +0800

Currently the set_append_rel_size doesn't consider the init partition

prune, so the estimated size may be wrong at a big scale sometimes.
In this patch I used the set the rows = parentrel->tuples *
clauseselecitivty. In this case we can loss some accuracy when the
initial
partition prune doesn't happen at all. but generally I think it would
be OK.

Another strategy is we should check if init partition prune can happen.
if we are sure about that, we adapt the above way. or else we can use
the local stats strategy still.

[1]
https://www.postgresql.org/message-id/CAExHW5t5Q7JuUW28QMRO7szuHcbsfx4M9%3DWL%2Bup40h3PCd7dXw%40mail.gmail.com


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)


v1-0001-Currently-the-set_append_rel_size-doesn-t-conside.patch
Description: Binary data


Re: proposal - log_full_scan

2021-04-18 Thread Pavel Stehule
ne 18. 4. 2021 v 14:28 odesílatel Julien Rouhaud 
napsal:

> On Sun, Apr 18, 2021 at 06:21:56AM +0200, Pavel Stehule wrote:
> >
> > The extension like pg_qualstat is good, but it does different work.
>
> Yes definitely.  It was just an idea if you needed something right now that
> could more or less do what you needed, not saying that we shouldn't
> improve the
> core :)
>
> > In
> > complex applications I need to detect buggy (forgotten) queries - last
> week
> > I found two queries over bigger tables without predicates. So the
> qualstat
> > doesn't help me.
>
> Also not totally helpful but powa was created to detect problematic
> queries in
> such cases.  It wouldn't say if it's because of a seq scan or not (so yes
> again
> we need to improve that), but it would give you the slowest (or top
> consumer
> for any resource) for a given time interval.
>
> > This is an application for a government with few (but for
> > government typical) specific: 1) the life cycle is short (one month), 2)
> > there is not slow start - from first moment the application will be used
> by
> > more hundred thousands people, 3) the application is very public - so any
> > issues are very interesting for press and very unpleasant for politics,
> and
> > in next step for all suppliers (there are high penalty for failures), and
> > an admins are not happy from external extensions, 4) the budget is not
> too
> > big - there is not any performance testing environment
> >
> > First stages are covered well today. We can log and process very slow
> > queries, and fix it immediately - with CREATE INDEX CONCURRENTLY I can do
> > it well on production servers too without high risk.
> >
> > But the detection of some bad not too slow queries is hard. And as an
> > external consultant I am not able to install any external extensions to
> the
> > production environment for fixing some hot issues, The risk is not
> > acceptable for project managers and I understand. So I have to use only
> > tools available in Postgres.
>
> Yes I agree that having additional and more specialized tool in core
> postgres
> would definitely help in similar scenario.
>
> I think that having some kind of threshold for seq scan (like the mentioned
> auto_explain.log_seqscan = XXX) in auto_explain would be the best
> approach, as
> you really need the plan to know why a seq scan was chosen and if it was a
> reasonable choice or not.
>

I would like to write this for core and for auto_explain too. I was in a
situation when I hadnot used auto_explain too. Although this extension is
widely used and then the risk is low.

When I detect the query, then I can run the explanation manually. But sure
I think so it can work well inside auto_explain

Regards

Pavel


Re: Replication slot stats misgivings

2021-04-18 Thread vignesh C
On Sun, Apr 18, 2021 at 9:02 AM vignesh C  wrote:
>
> On Sun, Apr 18, 2021 at 8:43 AM Amit Kapila  wrote:
> >
> > On Sun, Apr 18, 2021 at 7:36 AM vignesh C  wrote:
> > >
> > > On Sun, Apr 18, 2021 at 3:51 AM Tom Lane  wrote:
> > > >
> > > > I wrote:
> > > > > The buildfarm suggests that this isn't entirely stable:
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-17%2011%3A14%3A49
> > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-04-17%2016%3A30%3A15
> > > >
> > > > Oh, I missed that hyrax is showing the identical symptom:
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-04-16%2007%3A05%3A44
> > > >
> > > > So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
> > > > that way.
> > > >
> > >
> > > I will try to check and identify why it is failing.
> > >
> >
> > I think the failure is due to the reason that in the new tests after
> > reset, we are not waiting for the stats message to be delivered as we
> > were doing in other cases. Also, for the new test (non-spilled case),
> > we need to decode changes as we are doing for other tests, otherwise,
> > it will show the old stats.
>
> I also felt that is the reason for the failure, I will fix and post a
> patch for this.

Attached a patch which includes the changes for the fix. I have moved
the non-spilled transaction test to reduce the steps which reduces
calling pg_logical_slot_get_changes before this test.

Regards,
Vignesh
diff --git a/contrib/test_decoding/expected/stats.out b/contrib/test_decoding/expected/stats.out
index bc8e601eab..c537e38bfe 100644
--- a/contrib/test_decoding/expected/stats.out
+++ b/contrib/test_decoding/expected/stats.out
@@ -51,20 +51,15 @@ BEGIN
 extract(epoch from clock_timestamp() - start_time);
 END
 $$ LANGUAGE plpgsql;
--- spilling the xact
-BEGIN;
-INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
-COMMIT;
-SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
+-- non-spilled xact
+INSERT INTO stats_test values(1);
+SELECT count(*) FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 ---
-  5002
+ 3
 (1 row)
 
--- Check stats, wait for the stats collector to update. We can't test the
--- exact stats count as that can vary if any background transaction (say by
--- autovacuum) happens in parallel to the main transaction.
-SELECT wait_for_decode_stats(false, true);
+SELECT wait_for_decode_stats(false, false);
  wait_for_decode_stats 
 ---
  
@@ -73,17 +68,17 @@ SELECT wait_for_decode_stats(false, true);
 SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes FROM pg_stat_replication_slots;
 slot_name| spill_txns | spill_count | total_txns | total_bytes 
 -++-++-
- regression_slot | t  | t   | t  | t
+ regression_slot | f  | f   | t  | t
 (1 row)
 
--- reset the slot stats, and wait for stats collector to reset
+-- reset the slot stats, and wait for stats collector's total txn to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
  pg_stat_reset_replication_slot 
 
  
 (1 row)
 
-SELECT wait_for_decode_stats(true, true);
+SELECT wait_for_decode_stats(true, false);
  wait_for_decode_stats 
 ---
  
@@ -95,13 +90,19 @@ SELECT slot_name, spill_txns, spill_count, total_txns, total_bytes FROM pg_stat_
  regression_slot |  0 |   0 |  0 |   0
 (1 row)
 
--- decode and check stats again.
+-- spilling the xact
+BEGIN;
+INSERT INTO stats_test SELECT 'serialize-topbig--1:'||g.i FROM generate_series(1, 5000) g(i);
+COMMIT;
 SELECT count(*) FROM pg_logical_slot_peek_changes('regression_slot', NULL, NULL, 'skip-empty-xacts', '1');
  count 
 ---
   5002
 (1 row)
 
+-- Check stats, wait for the stats collector to update. We can't test the
+-- exact stats count as that can vary if any background transaction (say by
+-- autovacuum) happens in parallel to the main transaction.
 SELECT wait_for_decode_stats(false, true);
  wait_for_decode_stats 
 ---
@@ -114,24 +115,42 @@ SELECT slot_name, spill_txns > 0 AS spill_txns, spill_count > 0 AS spill_count,
  regression_slot | t  | t   | t  | t
 (1 row)
 
+-- reset the slot stats, and wait for stats collector to reset
 SELECT pg_stat_reset_replication_slot('regression_slot');
  pg_stat_reset_replication_slot 
 
  
 (1 row)
 
--- non-spilled xact
-INSERT INTO stats_test values(generate_series(1, 10));
-SELECT wait_for_decode_stats(false, false);
+SELECT wait_for_decode_stats(true, true);
  wait_for_decode_stats 
 ---
  
 (

Re: Replication slot stats misgivings

2021-04-18 Thread Masahiko Sawada
On Sun, Apr 18, 2021 at 12:13 PM Amit Kapila  wrote:
>
> On Sun, Apr 18, 2021 at 7:36 AM vignesh C  wrote:
> >
> > On Sun, Apr 18, 2021 at 3:51 AM Tom Lane  wrote:
> > >
> > > I wrote:
> > > > The buildfarm suggests that this isn't entirely stable:
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=anole&dt=2021-04-17%2011%3A14%3A49
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bichir&dt=2021-04-17%2016%3A30%3A15
> > >
> > > Oh, I missed that hyrax is showing the identical symptom:
> > >
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hyrax&dt=2021-04-16%2007%3A05%3A44
> > >
> > > So you might try CLOBBER_CACHE_ALWAYS to see if you can reproduce it
> > > that way.
> > >
> >
> > I will try to check and identify why it is failing.
> >
>
> I think the failure is due to the reason that in the new tests after
> reset, we are not waiting for the stats message to be delivered as we
> were doing in other cases. Also, for the new test (non-spilled case),
> we need to decode changes as we are doing for other tests, otherwise,
> it will show the old stats.
>

Yes, also the following expectation in expected/stats.out is wrong:

SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes FROM pg_stat_replication_slots;
slot_name| spill_txns | spill_count | total_txns | total_bytes
-++-++-
 regression_slot | f  | f   | t  | t
(1 row)

We should expect all values are 0. Please find attached the patch.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/


fix_test_decoding_test.patch
Description: Binary data


Re: pg_amcheck option to install extension

2021-04-18 Thread Andrew Dunstan


On 4/17/21 3:43 PM, Mark Dilger wrote:
>
>> On Apr 16, 2021, at 11:06 AM, Andrew Dunstan  wrote:
>>
>>
>> Hi,
>>
>> Peter Geoghegan suggested that I have the cross version upgrade checker
>> run pg_amcheck on the upgraded module. This seemed to me like a good
>> idea, so I tried it, only to find that it refuses to run unless the
>> amcheck extension is installed. That's fair enough, but it also seems to
>> me like we should have an option to have pg_amcheck install the
>> extension is it's not present, by running something like 'create
>> extension if not exists amcheck'. Maybe in such a case there could also
>> be an option to drop the extension when pg_amcheck's work is done - I
>> haven't thought through all the implications.
>>
>> Given pg_amcheck is a new piece of work I'm not sure if we can sneak
>> this in under the wire for release 14. I will certainly undertake to
>> review anything expeditiously. I can work around this issue in the
>> buildfarm, but it seems like something other users are likely to want.
> We cannot quite use a "create extension if not exists amcheck" command, as we 
> clear the search path and so must specify the schema to use.  Should we 
> instead avoid clearing the search path for this?  What are the security 
> implications of using the first schema of the search path?
>
> When called as `pg_amcheck --install-missing`, the implementation in the 
> attached patch runs per database being checked a "create extension if not 
> exists amcheck with schema public".  If called as `pg_amcheck 
> --install-missing=foo`, it instead runs "create extension if not exists 
> amcheck with schema foo` having escaped "foo" appropriately for the given 
> database.  There is no option to use different schemas for different 
> databases.  Nor is there any option to use the search path.  If somebody 
> needs that, I think they can manage installing amcheck themselves.



how about specifying pg_catalog as the schema instead of public?


>
> Does this meet your needs for v14?  I'd like to get this nailed down quickly, 
> as it is unclear to me that we should even be doing this so late in the 
> development cycle.


I'm ok with or without - I'll just have the buildfarm client pull a list
of databases and install the extension in all of them.


>
> I'd also like your impressions on whether we're likely to move 
> contrib/amcheck into core anytime soon.  If so, is it worth adding an option 
> that we'll soon need to deprecate?


I think if it stays as an extension it will stay in contrib. But it sure
feels very odd to have a core bin program that relies on a contrib
extension. It seems one or the other is misplaced.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: proposal - log_full_scan

2021-04-18 Thread Julien Rouhaud
On Sun, Apr 18, 2021 at 06:21:56AM +0200, Pavel Stehule wrote:
> 
> The extension like pg_qualstat is good, but it does different work.

Yes definitely.  It was just an idea if you needed something right now that
could more or less do what you needed, not saying that we shouldn't improve the
core :)

> In
> complex applications I need to detect buggy (forgotten) queries - last week
> I found two queries over bigger tables without predicates. So the qualstat
> doesn't help me. 

Also not totally helpful but powa was created to detect problematic queries in
such cases.  It wouldn't say if it's because of a seq scan or not (so yes again
we need to improve that), but it would give you the slowest (or top consumer
for any resource) for a given time interval.

> This is an application for a government with few (but for
> government typical) specific: 1) the life cycle is short (one month), 2)
> there is not slow start - from first moment the application will be used by
> more hundred thousands people, 3) the application is very public - so any
> issues are very interesting for press and very unpleasant for politics, and
> in next step for all suppliers (there are high penalty for failures), and
> an admins are not happy from external extensions, 4) the budget is not too
> big - there is not any performance testing environment
> 
> First stages are covered well today. We can log and process very slow
> queries, and fix it immediately - with CREATE INDEX CONCURRENTLY I can do
> it well on production servers too without high risk.
> 
> But the detection of some bad not too slow queries is hard. And as an
> external consultant I am not able to install any external extensions to the
> production environment for fixing some hot issues, The risk is not
> acceptable for project managers and I understand. So I have to use only
> tools available in Postgres.

Yes I agree that having additional and more specialized tool in core postgres
would definitely help in similar scenario.

I think that having some kind of threshold for seq scan (like the mentioned
auto_explain.log_seqscan = XXX) in auto_explain would be the best approach, as
you really need the plan to know why a seq scan was chosen and if it was a
reasonable choice or not.




Fix dropped object handling in pg_event_trigger_ddl_commands

2021-04-18 Thread Sven Klemm
Hello,

when creating an event trigger for ddl_command_end that calls
pg_event_trigger_ddl_commands certain statements will cause the
trigger to fail with a cache lookup error. The error happens on
master, 13 and 12 I didnt test any previous versions.

trg=# ALTER TABLE t ALTER COLUMN f1 SET DATA TYPE bigint, ALTER COLUMN
f1 DROP IDENTITY;
ERROR: XX000: cache lookup failed for relation 13476892
CONTEXT: PL/pgSQL function ddl_end() line 5 at FOR over SELECT rows
LOCATION: getRelationTypeDescription, objectaddress.c:4178

For the ALTER DATA TYPE we create a command to adjust the sequence
which gets recorded in the event trigger commandlist, which leads to
the described failure when the sequence is dropped as part of another
ALTER TABLE subcommand and information about the sequence can no
longer be looked up.

To reproduce:
CREATE OR REPLACE FUNCTION ddl_end()
RETURNS event_trigger AS $$
DECLARE
r RECORD;
BEGIN
FOR r IN SELECT * FROM pg_event_trigger_ddl_commands()
LOOP
RAISE NOTICE 'ddl_end: % %', r.command_tag, r.object_type;
END LOOP;
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER ddl_end ON ddl_command_end EXECUTE PROCEDURE ddl_end();

CREATE TABLE t(f1 int NOT NULL GENERATED ALWAYS AS IDENTITY);
ALTER TABLE t ALTER COLUMN f1 DROP IDENTITY, ALTER COLUMN f1 SET DATA
TYPE bigint;

I tried really hard to look for a different way to detect this error
earlier but since the subcommands are processed independently i
couldnt come up with a non-invasive version. Someone more familiar
with this code might have an idea for a better solution.

Any thoughts?

https://www.postgresql.org/message-id/CAMCrgp39V7JQA_Gc+JaEZV3ALOU1ZG=pwyk3odptq7f6z0j...@mail.gmail.com
--
Regards, Sven Klemm
From 4faf761528c735bd808e7ea4d9f356710e3a8ed0 Mon Sep 17 00:00:00 2001
From: Sven Klemm 
Date: Sat, 17 Apr 2021 21:54:03 +0200
Subject: [PATCH v1] Fix pg_event_trigger_ddl_commands

If an object is modified that is dropped in the same
command we might end up in a position where we
generated a message but can no longer look up the
object information because the object got dropped.
E.g. ALTER TABLE ALTER COLUMN, DROP IDENTITY
---
 src/backend/commands/event_trigger.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/event_trigger.c b/src/backend/commands/event_trigger.c
index 5bde507c75..f7b2c30f82 100644
--- a/src/backend/commands/event_trigger.c
+++ b/src/backend/commands/event_trigger.c
@@ -1936,8 +1936,17 @@ pg_event_trigger_ddl_commands(PG_FUNCTION_ARGS)
 	else if (cmd->type == SCT_AlterTSConfig)
 		addr = cmd->d.atscfg.address;
 
-	type = getObjectTypeDescription(&addr, false);
-	identity = getObjectIdentity(&addr, false);
+	/*
+	 * If an object is modified that is dropped in the same
+	 * command we might end up in a position where we
+	 * generated a message but can no longer look up the
+	 * object information because the object got dropped.
+	 * E.g. ALTER TABLE ALTER COLUMN, DROP IDENTITY
+	 */
+	type = getObjectTypeDescription(&addr, true);
+	identity = getObjectIdentity(&addr, true);
+	if (!identity)
+		continue;
 
 	/*
 	 * Obtain schema name, if any ("pg_temp" if a temp
-- 
2.30.0



Re: Bogus collation version recording in recordMultipleDependencies

2021-04-18 Thread Julien Rouhaud
On Sat, Apr 17, 2021 at 10:01:53AM +1200, Thomas Munro wrote:
> On Sat, Apr 17, 2021 at 8:39 AM Tom Lane  wrote:
> > Per the changes in collate.icu.utf8.out, this gets rid of
> > a lot of imaginary collation dependencies, but it also gets
> > rid of some arguably-real ones.  In particular, calls of
> > record_eq and its siblings will be considered not to have
> > any collation dependencies, although we know that internally
> > those will look up per-column collations of their input types.
> > We could imagine special-casing record_eq etc here, but that
> > sure seems like a hack.
> 
> [...]
> 
> > So I wonder if, rather than continuing to pursue this right now,
> > we shouldn't revert 257836a75 and try again later with a new design
> > that doesn't try to commandeer the existing dependency infrastructure.
> > We might have a better idea about what to do on Windows by the time
> > that's done, too.
> 
> It seems to me that there are two things that would be needed to
> salvage this for PG14: (1) deciding that we're unlikely to come up
> with a better idea than using pg_depend for this (following the
> argument that it'd only create duplication to have a parallel
> dedicated catalog), (2) fixing any remaining flaws in the dependency
> analyser code.  I'll look into the details some more on Monday.

So IIUC the issue here is that the code could previously record useless
collation version dependencies in somes cases, which could lead to false
positive possible corruption messages (and of course additional bloat on
pg_depend).  False positive messages can't be avoided anyway, as a collation
version update may not corrupt the actually indexed set of data, especially for
glibc.  But with the infrastructure as-is advanced user can look into the new
version changes and choose to ignore changes for a specific set of collation,
which is way easier to do with the recorded dependencies.

The new situation is now that the code can record too few version dependencies
leading to false negative detection, which is way more problematic.

This was previously discussed around [1].  Quoting Thomas:

> To state more explicitly what's happening here, we're searching the
> expression trees for subexpresions that have a collation as part of
> their static type.  We don't know which functions or operators are
> actually affected by the collation, though.  For example, if an
> expression says "x IS NOT NULL" and x happens to be a subexpression of
> a type with a particular collation, we don't now that this
> expression's value can't possibly be affected by the collation version
> changing.  So, the system will nag you to rebuild an index just
> because you mentioned it, even though the index can't be corrupted.
> To do better than that, I suppose we'd need declarations in the
> catalog to say which functions/operators are collation sensitive.

We agreed that having possible false positive dependencies was acceptable for
the initial implementation and that we will improve it in later versions, as
otherwise the alternative is to reindex everything without getting any warning,
which clearly isn't better anyway.

FTR was had the same agreement to not handle specific AMs that don't care about
collation (like hash or bloom) in [2], even though I provided a patch to handle
that case ([3]) which was dropped later on ([4]).

Properly and correctly handling collation version dependency in expressions is
a hard problem and will definitely require additional fields in pg_proc, so we
clearly can't add that in pg14.  So yes we have to decide whether we want to
keep the feature in pg14 with the known limitations (and in that case probably
revert f24b15699, possibly improving documentation on the possibility of false
positive) or revert it entirely.

Unsurprisingly, I think that the feature as-is is already a significant
improvement, which can be easily improved, so my vote is to keep it in pg14.
And just to be clear I'm volunteering to work on the expression problem and all
other related improvements for the next version, whether the current feature is
reverted or not.

[1]: 
https://www.postgresql.org/message-id/CA%2BhUKGK8CwBcTcXWL2kUjpHT%2B6t2hEFCzkcZ-Z7xXbz%3DC4NLCQ%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/13b0c950-80f9-4c10-7e0f-f59feac56a98%402ndquadrant.com
[3]: https://www.postgresql.org/message-id/20200908144507.GA57691%40nol
[4]: 
https://www.postgresql.org/message-id/CA%2BhUKGKHj4aYmmwKZdZjkD%3DCWRmn%3De6UsS7S%2Bu6oLrrp0orgsw%40mail.gmail.com