Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 11:50:16PM -0500, Tom Lane wrote:
> FWIW, I don't agree with Michael's suggestion above.  A plain table is
> significantly different from a partitioned table with no children:
> you can store rows in the former but not the latter, and you can add
> partitions to the latter but not the former.  So conflating the two
> doesn't seem likely to lead to any good outcome.

Okay, so the opinion moves into this direction.  What would be needed
is just something like the attached patch then based on Amit's
suggestion?  The routine names and comments looked fine to me so I
have not touched the surroundings, and comments are welcome.

> But, having said that, we've learned that it's generally bad for
> catalog-query functions to fail outright just because they're pointed
> at the wrong kind of catalog object.  So I think that what we want here
> is for pg_partition_tree to return NULL or an empty set or some such
> for a plain table, while its output for a childless partitioned table
> should be visibly different from that.  I'm not wedded to details
> beyond that idea.

Yep, that's the intention since cc53123.  I won't come back to return
an ERROR in any case.  Here is what the patch gives for childless
partitions FWIW:
=# CREATE TABLE ptif_test (a int, b int) PARTITION BY range (a);
CREATE TABLE
=# SELECT * FROM pg_partition_tree('ptif_test');
   relid   | parentrelid | isleaf | level
---+-++---
 ptif_test | null| f  | 0
(1 row)
--
Michael
diff --git a/src/backend/utils/adt/partitionfuncs.c b/src/backend/utils/adt/partitionfuncs.c
index 36d9f69cbc..a2fe4f34b6 100644
--- a/src/backend/utils/adt/partitionfuncs.c
+++ b/src/backend/utils/adt/partitionfuncs.c
@@ -35,17 +35,17 @@ static bool
 check_rel_can_be_partition(Oid relid)
 {
 	char		relkind;
+	bool		relispartition;
 
 	/* Check if relation exists */
 	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relid)))
 		return false;
 
 	relkind = get_rel_relkind(relid);
+	relispartition = get_rel_relispartition(relid);
 
 	/* Only allow relation types that can appear in partition trees. */
-	if (relkind != RELKIND_RELATION &&
-		relkind != RELKIND_FOREIGN_TABLE &&
-		relkind != RELKIND_INDEX &&
+	if (!relispartition &&
 		relkind != RELKIND_PARTITIONED_TABLE &&
 		relkind != RELKIND_PARTITIONED_INDEX)
 		return false;
@@ -189,13 +189,6 @@ pg_partition_root(PG_FUNCTION_ARGS)
 	if (!check_rel_can_be_partition(relid))
 		PG_RETURN_NULL();
 
-	/*
-	 * If the relation is not a partition (it may be the partition parent),
-	 * return itself as a result.
-	 */
-	if (!get_rel_relispartition(relid))
-		PG_RETURN_OID(relid);
-
 	/* Fetch the top-most parent */
 	ancestors = get_partition_ancestors(relid);
 	rootrelid = llast_oid(ancestors);
diff --git a/src/test/regress/expected/partition_info.out b/src/test/regress/expected/partition_info.out
index 73269ffd09..24e3fe60f3 100644
--- a/src/test/regress/expected/partition_info.out
+++ b/src/test/regress/expected/partition_info.out
@@ -138,19 +138,18 @@ SELECT relid, parentrelid, level, isleaf
 (6 rows)
 
 DROP TABLE ptif_test;
--- Table that is not part of any partition tree is the only member listed.
+-- Table that is not part of any partition tree is not listed.
 CREATE TABLE ptif_normal_table(a int);
 SELECT relid, parentrelid, level, isleaf
   FROM pg_partition_tree('ptif_normal_table');
-   relid   | parentrelid | level | isleaf 
+-+---+
- ptif_normal_table | | 0 | t
-(1 row)
+ relid | parentrelid | level | isleaf 
+---+-+---+
+(0 rows)
 
 SELECT pg_partition_root('ptif_normal_table');
  pg_partition_root 
 ---
- ptif_normal_table
+ 
 (1 row)
 
 DROP TABLE ptif_normal_table;
diff --git a/src/test/regress/sql/partition_info.sql b/src/test/regress/sql/partition_info.sql
index 119b90afe4..d9dfa5d5d7 100644
--- a/src/test/regress/sql/partition_info.sql
+++ b/src/test/regress/sql/partition_info.sql
@@ -64,7 +64,7 @@ SELECT relid, parentrelid, level, isleaf
 
 DROP TABLE ptif_test;
 
--- Table that is not part of any partition tree is the only member listed.
+-- Table that is not part of any partition tree is not listed.
 CREATE TABLE ptif_normal_table(a int);
 SELECT relid, parentrelid, level, isleaf
   FROM pg_partition_tree('ptif_normal_table');


signature.asc
Description: PGP signature


Re: readdir is incorrectly implemented at Windows

2019-02-28 Thread Michael Paquier
On Fri, Mar 01, 2019 at 10:23:02AM +0300, Konstantin Knizhnik wrote:
> Yes, Yuri Kurenkov and Grigory Smalking did a lot in investigation
> of this problem.
> (the irony is that the problem detected by Yuri was caused by
> another bug in pg_probackup, but we thought that it was related with
> permissions and come to this issue).

Thanks for confirming, Konstantin.  Let's wait a couple of days to see
if anybody has objections or comments, and I'll try to commit this
patch.
--
Michael


signature.asc
Description: PGP signature


Re: readdir is incorrectly implemented at Windows

2019-02-28 Thread Konstantin Knizhnik




On 01.03.2019 9:13, Michael Paquier wrote:

On Thu, Feb 28, 2019 at 11:15:53AM +0900, Michael Paquier wrote:

Could you add it to the next commit fest as a bug fix please?  I think
that I will be able to look at that in details soon, but if not it
would be better to not lose track of your fix.

Okay, I have looked at your patch.  And double-checked on Windows.  To
put it short, I agree with the approach you are taking.  I have been
curious about the mention to MinGW in the existing code as well as in
the patch you are proposing, and I have checked if what your patch and
what the current state are correct, and I think that HEAD is wrong on
one thing.

First mingw64 code can be found here:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/
https://github.com/Alexpux/mingw-w64/

Then, the implementation of readdir/opendir/closedir can be found in
mingw-w64-crt/misc/dirent.c.  Looking at their implementation of
readdir, I can see two things:
1) When beginning a search in a directory, _tfindfirst gets used,
which returns ENOENT as error if no matches are found:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/findfirst-functions?view=vs-2017
So from this point of view your patch is right: you make readdir()
return errno=0 which matches the normal *nix behaviors.  And MinGW
does not do that.
2) On follow-up lookups, MinGW code uses _tfindnext, and actually
*enforces* errno=0 when seeing ERROR_NO_MORE_FILES.  So from this
point of view the code's comment in HEAD is incorrect as of today.
The current implementation exists since 399a36a so perhaps MinGW was
not like that when dirent.c has been added in src/port/, but that's
not true today. So let's fix the comment at the same time.

Attached is an updated patch with my suggestions.  Does it look fine
to you?

Yes, certainly.



Also, I think that we should credit Yuri Kurenkov for the discovery of
the issue, with yourself, Konstantin, as the author of the patch.
Are there other people involved which should be credited?  Like
Grigory?


Yes,  Yuri Kurenkov and Grigory Smalking did a lot in investigation of 
this problem.
(the irony is that the problem detected by Yuri was caused by another 
bug in pg_probackup, but we thought

that it was related with permissions and come to this issue).


--
Michael


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Fabien COELHO



Hello Andres,


I think putting this into the control file is a seriously bad
idea. Postmaster interlocks against other postmasters running via
postmaster.pid.


Having a second interlock mechanism, in a different file, doesn't make 
any sort of sense. Nor does it seem sane to have external tool write 
over data as INTENSELY critical as the control file, when they then have 
to understand CRCs etc.


On this point, there are functions for that, get/update_controlfile, so 
this should be factored out.


The initial insentive for raising the issue, probably in the wrong thread 
and without a clear understanding of the matter, is that I've been 
reviewing a patch to enable/disable checksums on a stopped cluster.


The patch updates all the checksums in all the files, and changes the 
control file to tell that there are checksums. Currently it checks and 
creates a fake "posmaster.pid" file to attempt to prevent another tool to 
run concurrently to this operation, with ISTM a procedure prone to race 
conditions thus does not warrant that it would be the only tool running on 
the cluster. This looked to me as a bad hack. Given that other command 
that take on a cluster seemed to use the controlfile to signal that they 
are doing something, I'd thought that it would be the way to go, but then 
I noticed that the control file read/write procedure looks as bad as the 
postmaster.pid hack to ensure that only one command is active.


Nevertheless, I'm ranting in the wrong thread and it seems that these is 
no real problem to solve, so I'll say fine to the to-me-unconvincing 
"postmaster.pid" hack proposed in the patch.


--
Fabien.



Re: 2019-03 Starts Tomorrow

2019-02-28 Thread David Steele

On 3/1/19 8:19 AM, Michael Paquier wrote:

On Thu, Feb 28, 2019 at 10:47:06PM -0500, Tom Lane wrote:

Michael Paquier  writes:

So do we have anybody willing to take the glorious position of CFM for
this commit fest?


IIRC, Steele already said he'd do it.


Okay, fine for me of course if that's the case!  For what it's worth,
I did not understand that he wanted to be CFM, I just understood that
this email is a reminder that the CF will begin...  These are quite
separate things.

Sorry for being confused.
Not at all.  I volunteered on the thread closing out the last CF so it 
wasn't that obvious.


So, yes, I am the CFM.

--
-David
da...@pgmasters.net



Re: 2019-03 Starts Tomorrow

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 10:47:06PM -0500, Tom Lane wrote:
> Michael Paquier  writes:
>> So do we have anybody willing to take the glorious position of CFM for
>> this commit fest?
> 
> IIRC, Steele already said he'd do it.

Okay, fine for me of course if that's the case!  For what it's worth,
I did not understand that he wanted to be CFM, I just understood that
this email is a reminder that the CF will begin...  These are quite
separate things.

Sorry for being confused.
--
Michael


signature.asc
Description: PGP signature


Re: Prevent extension creation in temporary schemas

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 10:52:52PM -0500, Tom Lane wrote:
> If you're suggesting that we disable that security restriction
> during extension creation, I really can't see how that'd be a
> good thing ...

No, I don't mean that.  I was just wondering if someone can set
search_path within the SQL script which includes the extension
contents to bypass the restriction and the error.  They can always
prefix such objects with pg_temp anyway if need be...
--
Michael


signature.asc
Description: PGP signature


Re: readdir is incorrectly implemented at Windows

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 11:15:53AM +0900, Michael Paquier wrote:
> Could you add it to the next commit fest as a bug fix please?  I think
> that I will be able to look at that in details soon, but if not it
> would be better to not lose track of your fix.

Okay, I have looked at your patch.  And double-checked on Windows.  To
put it short, I agree with the approach you are taking.  I have been
curious about the mention to MinGW in the existing code as well as in
the patch you are proposing, and I have checked if what your patch and
what the current state are correct, and I think that HEAD is wrong on
one thing.

First mingw64 code can be found here:
https://sourceforge.net/p/mingw-w64/mingw-w64/ci/master/tree/
https://github.com/Alexpux/mingw-w64/

Then, the implementation of readdir/opendir/closedir can be found in
mingw-w64-crt/misc/dirent.c.  Looking at their implementation of
readdir, I can see two things:
1) When beginning a search in a directory, _tfindfirst gets used,
which returns ENOENT as error if no matches are found:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/findfirst-functions?view=vs-2017
So from this point of view your patch is right: you make readdir()
return errno=0 which matches the normal *nix behaviors.  And MinGW
does not do that.
2) On follow-up lookups, MinGW code uses _tfindnext, and actually
*enforces* errno=0 when seeing ERROR_NO_MORE_FILES.  So from this
point of view the code's comment in HEAD is incorrect as of today.
The current implementation exists since 399a36a so perhaps MinGW was
not like that when dirent.c has been added in src/port/, but that's
not true today. So let's fix the comment at the same time.

Attached is an updated patch with my suggestions.  Does it look fine
to you?

Also, I think that we should credit Yuri Kurenkov for the discovery of
the issue, with yourself, Konstantin, as the author of the patch.
Are there other people involved which should be credited?  Like
Grigory?
--
Michael
diff --git a/src/port/dirent.c b/src/port/dirent.c
index 7a91450695..191fd062a5 100644
--- a/src/port/dirent.c
+++ b/src/port/dirent.c
@@ -83,7 +83,13 @@ readdir(DIR *d)
 		d->handle = FindFirstFile(d->dirname, );
 		if (d->handle == INVALID_HANDLE_VALUE)
 		{
-			errno = ENOENT;
+			if (GetLastError() == ERROR_FILE_NOT_FOUND)
+			{
+/* No files at all, force errno=0 (unlike mingw) */
+errno = 0;
+return NULL;
+			}
+			_dosmaperr(GetLastError());
 			return NULL;
 		}
 	}
@@ -93,7 +99,7 @@ readdir(DIR *d)
 		{
 			if (GetLastError() == ERROR_NO_MORE_FILES)
 			{
-/* No more files, force errno=0 (unlike mingw) */
+/* No more files, force errno=0 (like mingw) */
 errno = 0;
 return NULL;
 			}


signature.asc
Description: PGP signature


Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Etsuro Fujita

(2019/03/01 14:17), Tatsuro Yamada wrote:

Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

https://commitfest.postgresql.org/22/2049/


Please remove the above and keep this:

https://commitfest.postgresql.org/22/1779/

which I moved from the January CF to the March one on behalf of him.

Best regards,
Etsuro Fujita




Re: jsonpath

2019-02-28 Thread Alexander Korotkov
On Fri, Mar 1, 2019 at 3:36 AM Nikita Glukhov  wrote:
> I can also offset to explicitly pass timezone info into jsonpath function 
> using
> the special user dataype encapsulating struct pg_tz.

More interesting question is what would be the source of timezone.  If
even you encapsulate timezone in a separate datatype, the expression
will be still just stable assuming timezone is generated by stable
subexpression.  What we actually need is immutable timezone.  Day once
timezone is updated, you create new timezone version, while old
version is immutable.  Then if jsonpath has given particular *timezone
version*, it might remain immutable.  But that requires significant
rework of our timezone infrastructure.

> But simple integer timezone offset can be passed now using jsonpath variables
> (standard says only about integer timezone offsets; also it requires presence
> of timezone offset it in the input string if the format string contain 
> timezone
> components):
>
> =# SELECT jsonb_path_query(
>  '"28-02-2019 12:34"',
>  '$.datetime("DD-MM- HH24:MI TZH", $tz)',
>  jsonb_build_object('tz', EXTRACT(TIMEZONE FROM now()))
>);
>
>   jsonb_path_query
> -
>  "2019-02-28T12:34:00+03:00"
> (1 row)

Standard specifies fixed offset to be given for *particular datetime*.
For instance, if json contains offset in separate attribute or
whatever, then it's OK to use such two-arguments .datetime() method.
But that seems quite narrow use case.

Standard doesn't mean you get fixed offset extracted from "now()" and
apply it to random datetimes in your json collection.  That would work
correctly for real timezones only when they are fixed offsets, but
there are almost none of them!  So, that's just plain wrong, we never
should encourage users to do something like this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread David G. Johnston
On Thu, Feb 28, 2019 at 7:51 PM Michael Paquier  wrote:
> On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
> > El 28/2/19 a las 15:13, David Steele escribió:
> > +   
> > + The exclusive backup method is deprecated and should be avoided in
> > favor
> > + of the non-exclusive backup method or
> > + pg_basebackup.
> > +   
> >
> > Isn't pg_basebackup a non-exclusive backup method?
>
> It seems to me that it is exactly what this sentence means.  Perhaps
> it should be splitted into two sentences for better clarity?

Comments on the documentation patch should be placed on the thread
linked to the documentation patch as opposed to here.

David J.



Re: [HACKERS] Block level parallel vacuum

2019-02-28 Thread Masahiko Sawada
On Thu, Feb 28, 2019 at 2:44 AM Robert Haas  wrote:
>
> On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada  wrote:
> > Thank you. Attached the rebased patch.
>
> Here are some review comments.

Thank you for reviewing the patches!

>
> + started by a single utility command.  Currently, the parallel
> + utility commands that support the use of parallel workers are
> + CREATE INDEX and VACUUM
> + without FULL option, and only when building
> + a B-tree index.  Parallel workers are taken from the pool of
>
> That sentence is garbled.  The end part about b-tree indexes applies
> only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

Fixed.

>
> +  Vacuum index and cleanup index in parallel
> +  N background
> workers (for the detail
> +  of each vacuum phases, please refer to  linkend="vacuum-phases"/>. If the
>
> I have two problems with this.  One is that I can't understand the
> English very well. I think you mean something like: "Perform the
> 'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
> N background workers," but I'm not entirely sure.  The other is that
> if that is what you mean, I don't think it's a sufficient description.
> Users need to understand whether, for example, only one worker can be
> used per index, or whether the work for a single index can be split
> across workers.
>
> +  parallel degree N
> is omitted,
> +  then VACUUM decides the number of workers based on
> +  number of indexes on the relation which further limited by
> +  . Also if
> this option
>
> Now this makes it sound like it's one worker per index, but you could
> be more explicit about it.

Fixed.

>
> +  is specified multile times, the last parallel degree
> +  N is considered
> into the account.
>
> Typo, but I'd just delete this sentence altogether; the behavior if
> the option is multiply specified seems like a triviality that need not
> be documented.

Understood, removed.

>
> +Setting a value for parallel_workers via
> + also controls how many parallel
> +worker processes will be requested by a VACUUM
> +against the table. This setting is overwritten by setting
> +N of
> PARALLEL
> +option.
>
> I wonder if we really want this behavior.  Should a setting that
> controls the degree of parallelism when scanning the table also affect
> VACUUM?  I tend to think that we probably don't ever want VACUUM of a
> table to be parallel by default, but rather something that the user
> must explicitly request.  Happy to hear other opinions.  If we do want
> this behavior, I think this should be written differently, something
> like this: The PARALLEL N option to VACUUM takes precedence over this
> option.

For example, I can imagine a use case where a batch job does parallel
vacuum to some tables in a maintenance window. The batch operation
will need to compute and specify the degree of parallelism every time
according to for instance the number of indexes, which would be
troublesome. But if we can set the degree of parallelism for each
tables it can just to do 'VACUUM (PARALLEL)'.

>
> + * parallel mode nor destories the parallel context. For updating the index
>
> Spelling.

Fixed.

>
> +/* DSM keys for parallel lazy vacuum */
> +#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFF1)
> +#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFF2)
> +#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFF3)
>
> Any special reason not to use just 1, 2, 3 here?  The general
> infrastructure stuff uses high numbers to avoid conflicting with
> plan_node_id values, but end clients of the parallel infrastructure
> can generally just use small integers.

It seems that I was worrying unnecessarily, changed to 1, 2, 3.

>
> + bool updated; /* is the stats updated? */
>
> is -> are
>
> + * LVDeadTuples controls the dead tuple TIDs collected during heap scan.
>
> what do you mean by "controls", exactly? stores?

Fixed.

>
> + * This is allocated in a dynamic shared memory segment when parallel
> + * lazy vacuum mode, or allocated in a local memory.
>
> If this is in DSM, then max_tuples is a wart, I think.  We can't grow
> the segment at that point.  I'm suspicious that we need a better
> design here.  It looks like you gather all of the dead tuples in
> backend-local memory and then allocate an equal amount of DSM to copy
> them.  But that means that we are using twice as much memory, which
> seems pretty bad.  You'd have to do that at least momentarily no
> matter what, but it's not obvious that the backend-local copy is ever
> freed.

Hmm, the current design is more simple; only the leader process scans
heap and save dead tuples TID to DSM. The DSM is allocated at once
when starting lazy vacuum and we never need to enlarge DSM . Also we
can use the same code around heap vacuum and collecting dead tuples
for both single process vacuum and parallel vacuum. Once index 

Re: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Vladimir Sitnikov
Robert> This email thread is really short on clear demonstrations that X or Y
Robert> is useful.

It is useful when the whole database does **not** crash, isn't it?

Case A (==current PostgeSQL mode): syscache grows, then OOMkiller
chimes in, kills the database process, and it leads to the complete
cluster failure (all other PG processes terminate themselves).

Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki
asks):  a single ill-behaved process works a bit slower and/or
consumers more CPU than the other ones. The whole DB is still alive.

I'm quite sure "case B" is much better for the end users and for the
database administrators.

So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way
to limit the memory consumption of a single process (e.g. syscache,
workmem, etc, etc).

Robert> However, memory usage is quite unpredictable.  It depends on how many
Robert> backends are active

The number of backends can be limited by ensuring a proper limits at
application connection pool level and/or pgbouncer and/or things like
that.

Robert>how many copies of work_mem and/or
Robert> maintenance_work_mem are in use

There might be other patches to cap the total use of
work_mem/maintenance_work_mem,

Robert>I don't think we
Robert> can say that just imposing a limit on the size of the system caches is
Robert> going to be enough to reliably prevent an out of memory condition

The less possibilities there are for OOM the better. Quite often it is
much better to fail a single SQL rather than kill all the DB
processes.

Vladimir



Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Tatsuro Yamada

Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

   https://commitfest.postgresql.org/22/2049/

Thanks,
Tatsuro Yamada






Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Tatsuro Yamada

On 2019/02/23 6:02, Robert Haas wrote:

On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
 wrote:

This patch is rebased on HEAD.
I'll tackle revising the patch based on feedbacks next month.


+   Running VACUUM FULL is listed in
pg_stat_progress_cluster
+   view because it uses CLUSTER command
internally.  See .

It's not really true to say that VACUUM FULL uses the CLUSTER command
internally.  It's not really true.  It uses a good chunk of the same
infrastructure, but it certainly doesn't use the actual command, and
it's not really the exact same thing either, because internally it's
doing a sequential scan but no sort, which never happens with CLUSTER.
I'm not sure exactly how to rephrase this, but I think we need to make
it more precise.

One idea is that maybe we should try to think of a design that could
also handle the rewriting variants of ALTER TABLE, and call it
pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
but I'm not saying we'd necessarily have to do all the work now, just
have a view that we think could also handle that.  Then again, maybe
the needs are too different.



Hmm..., I see.
If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of
the implementation.
For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose 
that
it's tough work.



+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming
(VACUUM FULL).

That sentence contradicts itself.  Just say that it contains a row for
each backend that is currently running CLUSTER or VACUUM FULL.



Fixed.



@@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
  void
  cluster(ClusterStmt *stmt, bool isTopLevel)
  {
+
   if (stmt->relation != NULL)
   {
   /* This is the single-relation case. */

Useless hunk.



Fixed.



@@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
   heap_close(rel, NoLock);

   /* Do the job. */
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
   cluster_rel(tableOid, indexOid, stmt->options);
+ pgstat_progress_end_command();
   }
   else
   {

It seems like that stuff should be inside cluster_rel().



Fixed.



+ /* Set reltuples to total_tuples */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
OldHeap->rd_rel->reltuples);

I object.  If the user wants that, they can get it from pg_class
themselves via an SQL query.  It's also an estimate, not something we
know to be accurate; I want us to only report facts here, not theories


I understand that progress monitor should only report facts, so I
removed that code.




+ pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
PROGRESS_CLUSTER_METHOD_INDEX_SCAN);

I think you should use pgstat_progress_update_multi_param() if
updating multiple parameters at the same time.

Also, some lines in this patch, such as this one, are very long.
Consider techniques to reduce the line length to 80 characters or
less, such as inserting a line break between the two arguments.


Fixed.



+ /* Set scan_method to NULL */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);

NULL and -1 are not the same thing.


Oops, fixed.



I think that we shouldn't have both
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
simpler.


Fixed.



I agree that it's acceptable to report
PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
understand why it's valuable to do so in the context of a progress
indicator.


Actually, I'm not sure why I added it since so much time has passed. :(
So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least.


Attached patch is wip patch.


Thanks!
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..8cf829e72c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -344,6 +344,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3376,9 +3384,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in 

Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Tom Lane
Amit Langote  writes:
> On 2019/03/01 9:22, Michael Paquier wrote:
>> What I am writing next sounds perhaps a bit fancy, but in my opinion a
>> normal table is itself a partition tree, made of one single member:
>> itself.

> That's what we discussed, but it seems that we ended up allowing regular
> standalone tables (possibly in inheritance trees) only because it
> *appeared* to work.  Alvaro already pointed out what appears to be a bug
> in how we compute the value of level.  Instead of trying to fix it, I
> agree we should just disallow tables that are not a partitioned
> table/index or a partition (relispartition).

FWIW, I don't agree with Michael's suggestion above.  A plain table is
significantly different from a partitioned table with no children:
you can store rows in the former but not the latter, and you can add
partitions to the latter but not the former.  So conflating the two
doesn't seem likely to lead to any good outcome.

But, having said that, we've learned that it's generally bad for
catalog-query functions to fail outright just because they're pointed
at the wrong kind of catalog object.  So I think that what we want here
is for pg_partition_tree to return NULL or an empty set or some such
for a plain table, while its output for a childless partitioned table
should be visibly different from that.  I'm not wedded to details
beyond that idea.

regards, tom lane



Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Amit Langote
Hi,

On 2019/03/01 9:22, Michael Paquier wrote:
> On Thu, Feb 28, 2019 at 04:32:03PM -0300, Alvaro Herrera wrote:
>> Yeah, looks good, please push.
> 
> Done for this part.
> 
>> I would opt for returning the empty set for legacy inheritance too.
>>
>> More generally, I think we should return empty for anything that's
>> either not RELKIND_PARTITIONED_TABLE or has relispartition set.
> 
> I think that one option is to make the function return only the table
> itself if it is not a partitioned table, which would be more
> consistent with what pg_partition_root() does.
> 
> What I am writing next sounds perhaps a bit fancy, but in my opinion a
> normal table is itself a partition tree, made of one single member:
> itself.

That's what we discussed, but it seems that we ended up allowing regular
standalone tables (possibly in inheritance trees) only because it
*appeared* to work.  Alvaro already pointed out what appears to be a bug
in how we compute the value of level.  Instead of trying to fix it, I
agree we should just disallow tables that are not a partitioned
table/index or a partition (relispartition).  Maybe there won't be any use
cases, so we should change that while we still can.

So, maybe change the check in check_rel_can_be_partition() as follows:

relkind = get_rel_relkind(relid);
relispartition = get_rel_relispartition(relid);

/* Only allow relation types that can appear in partition trees. */
if (!relispartition &&
relkind != RELKIND_PARTITIONED_TABLE &&
relkind != RELKIND_PARTITIONED_INDEX)
return false;

Thanks,
Amit




Re: Prevent extension creation in temporary schemas

2019-02-28 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Feb 28, 2019 at 10:13:17AM -0500, Tom Lane wrote:
>> Yeah, I think it's just because we won't search the pg_temp schema
>> for function or operator names, unless the calling SQL command
>> explicitly writes "pg_temp.foo(...)" or equivalent.  That's an
>> ancient security decision, which we're unlikely to undo.  It
>> certainly puts a crimp in the usefulness of putting extensions into
>> pg_temp, but I don't think it totally destroys the usefulness.
>> You could still use an extension to package, say, the definitions
>> of a bunch of temp tables and views that you need to create often.

> Even with that, it should still be possible to enforce search_path
> within the extension script to allow such objects to be created
> correctly, no?  That would be a bit hacky, still for the purpose of
> temp object handling that looks kind of enough to live with when
> creating an extension.

If you're suggesting that we disable that security restriction
during extension creation, I really can't see how that'd be a
good thing ...

regards, tom lane



Re: 2019-03 Starts Tomorrow

2019-02-28 Thread Tom Lane
Michael Paquier  writes:
> So do we have anybody willing to take the glorious position of CFM for
> this commit fest?

IIRC, Steele already said he'd do it.

regards, tom lane



RE: extension patch of CREATE OR REPLACE TRIGGER

2019-02-28 Thread Osumi, Takamichi
> I've made a patch to add CREATE OR REPLACE TRIGGER with some basic tests in 
> triggers.sql.

>> I see there are two patch entries in the commitfest for this.  Is that a
>> mistake? If so can you "Withdraw" one of them?

Oh my bad. Sorry, this time was my first time to register my patch !
Please withdraw the old one, "extension patch to add OR REPLACE clause to 
CREATE TRIGGER".
My latest version is "extension patch of CREATE OR REPLACE TRIGGER".

Thanks ! 
-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: propagating replica identity to partitions

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 07:41:11PM -0300, Alvaro Herrera wrote:
> We all seem to agree that REPLICA IDENTITY should recurse.

(entering the ring)
FWIW, I agree that having REPLICA IDENTITY recurse on partitions feels
more natural, as much as being able to use ALTER TABLE ONLY to only
update the definition on a given partition member.
--
Michael


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 11:07:47PM -0300, Martín Marqués wrote:
> El 28/2/19 a las 15:13, David Steele escribió:
> +   
> + The exclusive backup method is deprecated and should be avoided in
> favor
> + of the non-exclusive backup method or
> + pg_basebackup.
> +   
> 
> Isn't pg_basebackup a non-exclusive backup method?

It seems to me that it is exactly what this sentence means.  Perhaps
it should be splitted into two sentences for better clarity?
--
Michael


signature.asc
Description: PGP signature


Re: 2019-03 Starts Tomorrow

2019-02-28 Thread Michael Paquier
Hi David,

On Thu, Feb 28, 2019 at 11:05:33AM +0200, David Steele wrote:
> The 2019-03 CF is almost upon us.  The CF will officially start at 00:00 AoE
> (12:00 UTC) on Friday, March 1st.

Thanks for the reminder.

> If you have a patch that has been Waiting on Author without any discussion
> since before the last CF ended then you should submit a new patch for the
> start of the 2019-03 CF.

So do we have anybody willing to take the glorious position of CFM for
this commit fest?
--
Michael


signature.asc
Description: PGP signature


Re: Prevent extension creation in temporary schemas

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 10:13:17AM -0500, Tom Lane wrote:
> Yeah, I think it's just because we won't search the pg_temp schema
> for function or operator names, unless the calling SQL command
> explicitly writes "pg_temp.foo(...)" or equivalent.  That's an
> ancient security decision, which we're unlikely to undo.  It
> certainly puts a crimp in the usefulness of putting extensions into
> pg_temp, but I don't think it totally destroys the usefulness.
> You could still use an extension to package, say, the definitions
> of a bunch of temp tables and views that you need to create often.

Even with that, it should still be possible to enforce search_path
within the extension script to allow such objects to be created
correctly, no?  That would be a bit hacky, still for the purpose of
temp object handling that looks kind of enough to live with when
creating an extension.
--
Michael


signature.asc
Description: PGP signature


Tighten error control for OpenTransientFile/CloseTransientFile

2019-02-28 Thread Michael Paquier
Hi all,

Joe's message here has reminded me that we have lacked a lot of error
handling around CloseTransientFile():
https://www.postgresql.org/message-id/c49b69ec-e2f7-ff33-4f17-0eaa4f2ce...@joeconway.com

This has been mentioned by Alvaro a couple of months ago (cannot find
the thread about that at quick glance), and I just forgot about it at
that time.  Anyway, attached is a patch to do some cleanup for all
that:
- Switch OpenTransientFile to read-only where sufficient.
- Add more error handling for CloseTransientFile
A major take of this patch is to make sure that the new error messages
generated have an elevel consistent with their neighbors.

Just on time for this last CF.  Thoughts?
--
Michael
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9905593661..7b39283c89 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -1991,7 +1991,10 @@ qtext_load_file(Size *buffer_size)
 		return NULL;
 	}
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(LOG,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", PGSS_TEXT_FILE)));
 
 	*buffer_size = stat.st_size;
 	return buf;
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index f5cf9ffc9c..bce4274362 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -1202,7 +1202,10 @@ heap_xlog_logical_rewrite(XLogReaderState *r)
  errmsg("could not fsync file \"%s\": %m", path)));
 	pgstat_report_wait_end();
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 }
 
 /* ---
@@ -1300,7 +1303,11 @@ CheckPointLogicalRewriteHeap(void)
 		(errcode_for_file_access(),
 		 errmsg("could not fsync file \"%s\": %m", path)));
 			pgstat_report_wait_end();
-			CloseTransientFile(fd);
+
+			if (CloseTransientFile(fd))
+ereport(ERROR,
+		(errcode_for_file_access(),
+		 errmsg("could not close file \"%s\": %m", path)));
 		}
 	}
 	FreeDir(mappings_dir);
diff --git a/src/backend/access/transam/slru.c b/src/backend/access/transam/slru.c
index 3623352b9c..974d42fc86 100644
--- a/src/backend/access/transam/slru.c
+++ b/src/backend/access/transam/slru.c
@@ -599,7 +599,7 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	SlruFileName(ctl, path, segno);
 
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		/* expected: file doesn't exist */
@@ -621,7 +621,13 @@ SimpleLruDoesPhysicalPageExist(SlruCtl ctl, int pageno)
 
 	result = endpos >= (off_t) (offset + BLCKSZ);
 
-	CloseTransientFile(fd);
+	if (CloseTransientFile(fd))
+	{
+		slru_errcause = SLRU_CLOSE_FAILED;
+		slru_errno = errno;
+		return false;
+	}
+
 	return result;
 }
 
@@ -654,7 +660,7 @@ SlruPhysicalReadPage(SlruCtl ctl, int pageno, int slotno)
 	 * SlruPhysicalWritePage).  Hence, if we are InRecovery, allow the case
 	 * where the file doesn't exist, and return zeroes instead.
 	 */
-	fd = OpenTransientFile(path, O_RDWR | PG_BINARY);
+	fd = OpenTransientFile(path, O_RDONLY | PG_BINARY);
 	if (fd < 0)
 	{
 		if (errno != ENOENT || !InRecovery)
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c96c8b60ba..cbd9b2cee1 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -370,7 +370,11 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 			}
 			pgstat_report_wait_end();
 		}
-		CloseTransientFile(srcfd);
+
+		if (CloseTransientFile(srcfd))
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not close file \"%s\": %m", path)));
 	}
 
 	/*
@@ -416,7 +420,6 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
@@ -495,7 +498,6 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
 (errcode_for_file_access(),
  errmsg("could not close file \"%s\": %m", tmppath)));
 
-
 	/*
 	 * Now move the completed history file into place with its final name.
 	 */
diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 64679dd2de..21986e48fe 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -1297,7 +1297,11 @@ ReadTwoPhaseFile(TransactionId xid, bool missing_ok)
 	}
 
 	pgstat_report_wait_end();
-	CloseTransientFile(fd);
+
+	if (CloseTransientFile(fd))
+		ereport(ERROR,
+(errcode_for_file_access(),
+ errmsg("could not close file \"%s\": %m", path)));
 
 	hdr = (TwoPhaseFileHeader *) buf;
 	if (hdr->magic != TWOPHASE_MAGIC)
diff --git 

Re: NOT IN subquery optimization

2019-02-28 Thread Richard Guo
On Tue, Feb 26, 2019 at 6:51 AM Li, Zheng  wrote:

> Resend the patch with a whitespace removed so that "git apply patch" works
> directly.
>
>

Hi Zheng,

I have reviewed your patch. Good job except two issues I can find:

1. The patch would give wrong results when the inner side is empty. In this
case, the whole data from outer side should be in the outputs. But with the
patch, we will lose the NULLs from outer side.

2. Because of the new added predicate 'OR (var is NULL)', we cannot use hash
join or merge join to do the ANTI JOIN.  Nested loop becomes the only
choice,
which is low-efficency.

Thanks
Richard


Fix memleaks and error handling in jsonb_plpython

2019-02-28 Thread Nikita Glukhov

Unfortunately, contrib/jsonb_plpython still contain a lot of problems in error
handling that can lead to memory leaks:
 - not all Python function calls are checked for the success
 - not in all places PG exceptions are caught to release Python references
But it seems that this errors can happen only in OOM case.

Attached patch with the fix. Back-patch for PG11 is needed.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From e40a9012c38c3b66888791d3dd3943adf9f310c8 Mon Sep 17 00:00:00 2001
From: Nikita Glukhov 
Date: Tue, 15 Jan 2019 02:14:06 +0300
Subject: [PATCH] Fix memleaks and error handling in jsonb_plpython

---
 contrib/jsonb_plpython/jsonb_plpython.c | 146 ++--
 1 file changed, 100 insertions(+), 46 deletions(-)

diff --git a/contrib/jsonb_plpython/jsonb_plpython.c b/contrib/jsonb_plpython/jsonb_plpython.c
index f44d364..3143b73 100644
--- a/contrib/jsonb_plpython/jsonb_plpython.c
+++ b/contrib/jsonb_plpython/jsonb_plpython.c
@@ -169,53 +169,80 @@ PLyObject_FromJsonbContainer(JsonbContainer *jsonb)
 if (!result)
 	return NULL;
 
-while ((r = JsonbIteratorNext(, , true)) != WJB_DONE)
+PG_TRY();
 {
-	if (r == WJB_ELEM)
+	while ((r = JsonbIteratorNext(, , true)) != WJB_DONE)
 	{
-		PyObject   *elem = PLyObject_FromJsonbValue();
+		if (r == WJB_ELEM)
+		{
+			PyObject   *elem = PLyObject_FromJsonbValue();
+
+			if (!elem || PyList_Append(result, elem))
+			{
+Py_XDECREF(elem);
+Py_DECREF(result);
+return NULL;
+			}
 
-		PyList_Append(result, elem);
-		Py_XDECREF(elem);
+			Py_DECREF(elem);
+		}
 	}
 }
+PG_CATCH();
+{
+	Py_DECREF(result);
+	PG_RE_THROW();
+}
+PG_END_TRY();
 			}
 			break;
 
 		case WJB_BEGIN_OBJECT:
-			result = PyDict_New();
-			if (!result)
-return NULL;
-
-			while ((r = JsonbIteratorNext(, , true)) != WJB_DONE)
 			{
-if (r == WJB_KEY)
-{
-	PyObject   *key = PLyString_FromJsonbValue();
-
-	if (!key)
-		return NULL;
+PyObject   *volatile key = NULL;
 
-	r = JsonbIteratorNext(, , true);
+result = PyDict_New();
+if (!result)
+	return NULL;
 
-	if (r == WJB_VALUE)
+PG_TRY();
+{
+	while ((r = JsonbIteratorNext(, , true)) != WJB_DONE)
 	{
-		PyObject   *value = PLyObject_FromJsonbValue();
+		JsonbValue	v2;
+		PyObject   *val = NULL;
+
+		if (r != WJB_KEY)
+			continue;
+
+		if ((r = JsonbIteratorNext(, , true)) != WJB_VALUE)
+			elog(ERROR, "unexpected jsonb token: %d", r);
 
-		if (!value)
+		if (!(key = PLyString_FromJsonbValue()) ||
+			!(val = PLyObject_FromJsonbValue()) ||
+			PyDict_SetItem(result, key, val))
 		{
 			Py_XDECREF(key);
+			Py_XDECREF(val);
+			Py_DECREF(result);
 			return NULL;
 		}
 
-		PyDict_SetItem(result, key, value);
-		Py_XDECREF(value);
+		Py_DECREF(val);
+		Py_DECREF(key);
+		key = NULL;
 	}
-
+}
+PG_CATCH();
+{
 	Py_XDECREF(key);
+	Py_DECREF(result);
+	PG_RE_THROW();
 }
+PG_END_TRY();
+
+break;
 			}
-			break;
 
 		default:
 			elog(ERROR, "unexpected jsonb token: %d", r);
@@ -233,30 +260,40 @@ PLyObject_FromJsonbContainer(JsonbContainer *jsonb)
 static JsonbValue *
 PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState **jsonb_state)
 {
-	Py_ssize_t	pcount;
-	JsonbValue *out = NULL;
+	JsonbValue *out;
+	PyObject   *items;
+	Py_ssize_t	pcount = PyMapping_Size(obj);
+
+	if (pcount < 0)
+		PLy_elog(ERROR, "PyMapping_Size() failed, while converting mapping into jsonb");
 
-	/* We need it volatile, since we use it after longjmp */
-	volatile PyObject *items_v = NULL;
+	items = PyMapping_Items(obj);
 
-	pcount = PyMapping_Size(obj);
-	items_v = PyMapping_Items(obj);
+	if (!items)
+		PLy_elog(ERROR, "PyMapping_Items() failed, while converting mapping into jsonb");
 
 	PG_TRY();
 	{
 		Py_ssize_t	i;
-		PyObject   *items;
-
-		items = (PyObject *) items_v;
 
 		pushJsonbValue(jsonb_state, WJB_BEGIN_OBJECT, NULL);
 
 		for (i = 0; i < pcount; i++)
 		{
 			JsonbValue	jbvKey;
-			PyObject   *item = PyList_GetItem(items, i);
-			PyObject   *key = PyTuple_GetItem(item, 0);
-			PyObject   *value = PyTuple_GetItem(item, 1);
+			PyObject   *item;
+			PyObject   *key;
+			PyObject   *value;
+
+			item =  PyList_GetItem(items, i);	/* borrowed reference */
+			if (!item)
+PLy_elog(ERROR, "PyList_GetItem() failed, while converting mapping into jsonb");
+
+			key = PyTuple_GetItem(item, 0);	/* borrowed references */
+			value = PyTuple_GetItem(item, 1);
+
+			if (!key || !value)
+PLy_elog(ERROR, "PyTuple_GetItem() failed, while converting mapping into jsonb");
 
 			/* Python dictionary can have None as key */
 			if (key == Py_None)
@@ -279,11 +316,13 @@ PLyMapping_ToJsonbValue(PyObject *obj, JsonbParseState **jsonb_state)
 	}
 	PG_CATCH();
 	{
-		

Re: pgsql: Avoid creation of the free space map for small heap relations, t

2019-02-28 Thread Amit Kapila
On Thu, Feb 28, 2019 at 9:59 AM John Naylor  wrote:
>
> On Thu, Feb 28, 2019 at 10:25 AM Amit Kapila  wrote:
> >
> > Here's an updated patch based on comments by you.  I will proceed with
> > this unless you have any more comments.
>
> Looks good to me. I would just adjust the grammar in the comment, from
> "This prevents us to use the map", to "This prevents us from using the
> map". Perhaps also a comma after "first".
>

Okay, pushed after making changes suggested by you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread Martín Marqués
El 28/2/19 a las 15:13, David Steele escribió:
> 
> It seems to me that the best way to discuss this is via a patch to the
> main documentation.  I've written something to get us started:
> 
> https://commitfest.postgresql.org/22/2042/

+   
+ The exclusive backup method is deprecated and should be avoided in
favor
+ of the non-exclusive backup method or
+ pg_basebackup.
+   


Isn't pg_basebackup a non-exclusive backup method?

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: proposal: variadic argument support for least, greatest function

2019-02-28 Thread Chapman Flack
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

For completeness, I'll mark this reviewed again. It passes installcheck-world, 
the latest patch addresses the suggestions from me, and is improved on the 
code-structure matters that Tom pointed out. I don't know if it will meet Tom's 
threshold for desirability overall, but that sounds like a committer call at 
this point, so I'll change it to RfC.

-Chap

The new status of this patch is: Ready for Committer


RE: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Tsunakawa, Takayuki
From: Ideriha, Takeshi/出利葉 健
> [Size=800, iter=1,000,000]
> Master |15.763
> Patched|16.262 (+3%)
> 
> [Size=32768, iter=1,000,000]
> Master |61.3076
> Patched|62.9566 (+2%)

What's the unit, second or millisecond?
Why is the number of digits to the right of the decimal point?

Is the measurement correct?  I'm wondering because the difference is larger in 
the latter case.  Isn't the accounting processing almost the sane in both cases?
* former: 16.262 - 15.763 = 4.99
* latter: 62.956 - 61.307 = 16.49


> At least compared to previous HashAg version, the overhead is smaller.
> It has some overhead but is increase by 2 or 3% a little bit?

I think the overhead is sufficiently small.  It may get even smaller with a 
trivial tweak.

You added the new member usedspace at the end of MemoryContextData.  The 
original size of MemoryContextData is 72 bytes, and Intel Xeon's cache line is 
64 bytes.  So, the new member will be on a separate cache line.  Try putting 
usedspace before the name member.


Regards
Takayuki Tsunakawa



Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Kyotaro HORIGUCHI
Hello.

At Thu, 28 Feb 2019 21:16:25 +0100, Tomas Vondra  
wrote in 
> > One biggest issue seems to be we don't know the total number of

# One *of* the biggest *issues*?

> > outer tuples before actually reading a null tuple. I doubt of
> > general shortcut for that. It also seems preventing limit node
> > from just using materialized outer.
> > 
> 
> Sure, if you actually want all tuples, you'll have to execute the outer
> plan till completion. But that's not what I'm talking about - what if we
> only ever need to read one row from the limit?

We have no choice than once reading all tuples just to find we
are to return just one row, since estimator is not guaranteed to
be exact as required for this purpose.

> To give you a (admittedly, somewhat contrived and artificial example):
> 
> SELECT * FROM t1 WHERE id IN (
>   SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY
> );
> 
> Maybe this example is bogus and/or does not really matter in practice. I
> don't know, but I've been unable to convince myself that's the case.

I see such kind of idiom common. Even in the quite simple example
above, *we* cannot tell how many tuples the inner should return
unless we actually fetch all tuples in t2. This is the same
problem with count(*).

The query is equivalent to the folloing one.

 SELECT * FROM t1 WHERE id IN (
   SELECT id FROM t2 ORDER BY x
 FETCH FIRST (SELECT ceil(count(*) * 0.1) FROM t2) ROWS ONLY
 );

This scans t2 twice, but this patch does only one full scan
moving another partial scan to tuplestore. We would win if the
outer is complex enough.

Anyway, even excluding the count(*) issue, it seems that we are
not alone in that point. (That is, at least Oracle shows
different E-Rows and A-Rows for PERCENT).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Chapman Flack
Hi, thanks for checking the patches!

On 02/28/19 19:36, Ramanarayana wrote:
> The below statement needs to be executed before running the query to
> replicate the issue
> 
> update xmldata set data = regexp_replace(data::text, '791',
> '791')::xml;

If you are applying that update (and there is a SIZE element originally
791), and then receiving a "more than one value returned by column XPath
expression" error, I believe you are seeing documented, correct behavior.

Your update changes the content of that SIZE element to have three
comment nodes and three text nodes.

The query then contains this column spec:

size_text float PATH 'SIZE/text()'

where the target SQL column type is 'float' and the path expression will
return an XML result consisting of the three text nodes.

As documented, "An XML result assigned to a column of any other type may
not have more than one node, or an error is raised."

So I think this behavior is correct.

If you do any more testing (thank you for taking the interest, by the way!),
could you please add your comments, not to this email thread, but to [1]?

[1]
https://www.postgresql.org/message-id/3e8eab9e-7289-6c23-5e2c-153cccea2257%40anastigmatix.net

That's the one that is registered to the commitfest entry, so comments made
on this thread might be overlooked.

Thanks!
-Chap



Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Andres Freund
Hi,

On 2019-03-01 10:11:53 +0900, Michael Paquier wrote:
> One thing is that we don't protect a data folder to be started when it
> is in the process of being treated by an external tool, like
> pg_rewind, or pg_checksums.  So having an extra flag in the control
> file, which can be used by external tools to tell that the data folder
> is being treated for something does not sound that crazy to me.
> Having a tool write a fake postmaster.pid for this kind of task does
> not sound right from a correctness point of view, because the instance
> is not started.

I think putting this into the control file is a seriously bad
idea. Postmaster interlocks against other postmasters running via
postmaster.pid.  Having a second interlock mechanism, in a different
file, doesn't make any sort of sense. Nor does it seem sane to have
external tool write over data as INTENSELY critical as the control file,
when they then have to understand CRCs etc.


> Let's keep the discussions where they are by the way.  Joe has just
> closed the report of this thread with 4598a99, so I am moving on to
> the correct places.

I don't know what that means, given you replied to the above in this
thread?

Greetings,

Andres Freund



Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 01:07:23PM -0800, Andres Freund wrote:
> Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
> see if the backend has finished starting up etc. It's precisely what the
> backend uses to prevent two postmasters to start etc. It's also what say
> pg_resetwal checks to protect against a concurrently running lcuster
> (albeit in a racy way). If we want to make things more bulletproof,
> that's the place.  The control file is constantly written to, sometimes
> by different processes, it'd just not be a good file for such lockout
> mechanisms.

Hijacking more my own thread...  I can do that, right?  Or not.

One thing is that we don't protect a data folder to be started when it
is in the process of being treated by an external tool, like
pg_rewind, or pg_checksums.  So having an extra flag in the control
file, which can be used by external tools to tell that the data folder 
is being treated for something does not sound that crazy to me.
Having a tool write a fake postmaster.pid for this kind of task does
not sound right from a correctness point of view, because the instance
is not started.

And a lock API won't protect much if the host is unplugged as well if
its state is made durable...

Let's keep the discussions where they are by the way.  Joe has just
closed the report of this thread with 4598a99, so I am moving on to
the correct places.

My apologies for the digressions.
--
Michael


signature.asc
Description: PGP signature


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 04:09:32PM -0500, Joe Conway wrote:
> Committed and push that way.

Thanks for committing a fix.

> By the way, while looking at this, I noted at least a couple of places
> where OpenTransientFile() is being passed O_RDWR when the usage is
> pretty clearly intended to be read-only. For example at least two
> instances in slru.c -- SlruPhysicalReadPage() and
> SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
> fixing those instances?

There are roughly 40~42 callers of OpenTransientFile().  Looking at
them I can see that RestoreSlotFromDisk() could also switch to RDONLY
instead of RDWR.  I am also a bit tired of the lack error handling
around CloseTransientFile().  While in some code paths the file
descriptors are closed for an error, in some others we should report
something.  I am going to send a patch after a lookup.  Let's see all
that on a separate thread.
--
Michael


signature.asc
Description: PGP signature


Re: jsonpath

2019-02-28 Thread Andres Freund
On 2019-03-01 03:36:49 +0300, Nikita Glukhov wrote:
> Patch 1 is what we are going to commit in PG12.

I think it's too early to make that determination. I think there's a
good chance, but that this needs more independent review.



Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

2019-02-28 Thread Ramanarayana
Hi,
The below statement needs to be executed before running the query to
replicate the issue

update xmldata set data = regexp_replace(data::text, '791',
'791')::xml;

On Thu, 28 Feb 2019 at 17:55, Pavel Stehule  wrote:

>
>
> čt 28. 2. 2019 v 10:31 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> čt 28. 2. 2019 v 9:58 odesílatel Ramanarayana 
>> napsal:
>>
>>> Hi,
>>>
>>> I have tested the three issues fixed in patch 001. Array Indexes
>>> issue is still there.Running the following query returns ERROR: more
>>> than one value returned by column XPath expression
>>>
>>> SELECT xmltable.*
>>> FROM (SELECT data FROM xmldata) x,
>>> LATERAL XMLTABLE('/ROWS/ROW'
>>> PASSING data
>>> COLUMNS
>>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
>>> size_text float PATH 'SIZE/text()',
>>> size_text_1 float PATH 'SIZE/text()[1]',
>>> size_text_2 float PATH 'SIZE/text()[2]',
>>> "SIZE" float, size_xml xml PATH 'SIZE')
>>>
>>> The other two issues are resolved by this patch.
>>>
>>
> I tested xmltable-xpath-result-processing-bugfix-6.patch
>
> and it is working
>
> postgres=# SELECT  xmltable.*
> postgres-#FROM (SELECT data FROM xmldata) x,
> postgres-# LATERAL XMLTABLE('/ROWS/ROW'
> postgres(#  PASSING data
> postgres(#  COLUMNS
> postgres(#   country_name text PATH
> 'COUNTRY_NAME/text()' NOT NULL,
> postgres(#   size_text float PATH
> 'SIZE/text()',
> postgres(#   size_text_1 float PATH
> 'SIZE/text()[1]',
> postgres(#   size_text_2 float PATH
> 'SIZE/text()[2]',
> postgres(#   "SIZE" float, size_xml xml
> PATH 'SIZE')  ;
> ┌──┬───┬─┬─┬──┬┐
>
> │ country_name │ size_text │ size_text_1 │ size_text_2 │ SIZE │
> size_xml  │
> ╞══╪═══╪═╪═╪══╪╡
>
> │ Australia│ ∅ │   ∅ │   ∅ │∅ │
> ∅  │
> │ China│ ∅ │   ∅ │   ∅ │∅ │
> ∅  │
> │ HongKong │ ∅ │   ∅ │   ∅ │∅ │
> ∅  │
> │ India│ ∅ │   ∅ │   ∅ │∅ │
> ∅  │
> │ Japan│ ∅ │   ∅ │   ∅ │∅ │
> ∅  │
> │ Singapore│   791 │ 791 │   ∅ │  791 │  unit="km">791 │
> └──┴───┴─┴─┴──┴┘
>
> (6 rows)
>
> Regards
>
> Pavel
>
>
>>
>> what patches you are used?
>>
>> Regards
>>
>> Pavel
>>
>>
>>> --
>>> Cheers
>>> Ram 4.0
>>>
>>

-- 
Cheers
Ram 4.0


Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Michael Paquier
On Thu, Feb 28, 2019 at 04:32:03PM -0300, Alvaro Herrera wrote:
> Yeah, looks good, please push.

Done for this part.

> I would opt for returning the empty set for legacy inheritance too.
> 
> More generally, I think we should return empty for anything that's
> either not RELKIND_PARTITIONED_TABLE or has relispartition set.

I think that one option is to make the function return only the table
itself if it is not a partitioned table, which would be more
consistent with what pg_partition_root() does.

What I am writing next sounds perhaps a bit fancy, but in my opinion a
normal table is itself a partition tree, made of one single member:
itself.
--
Michael


signature.asc
Description: PGP signature


Re: Online verification of checksums

2019-02-28 Thread Michael Banck
Hi,

Am Donnerstag, den 28.02.2019, 14:29 +0100 schrieb Fabien COELHO:
> > So I have now changed behaviour so that short writes count as skipped
> > files and pg_verify_checksums no longer bails out on them. When this
> > occors a warning is written to stderr and their overall count is also
> > reported at the end. However, unless there are other blocks with bad
> > checksums, the exit status is kept at zero.
> 
> This seems fair when online, however I'm wondering whether it is when 
> offline. I'd say that the whole retry logic should be skipped in this 
> case? i.e. "if (block_retry || !online) { error message and continue }"
> on both short read & checksum failure retries.

Ok, the stand-alone pg_checksums program also got a PR about the LSN
skip logic not being helpful when the instance is offline and somebody
just writes /dev/urandom over the heap files: 

https://github.com/credativ/pg_checksums/pull/6

So I now tried to change the patch so that it only retries blocks when
online.

> Patch applies cleanly, compiles, global & local make check ok.
> 
> I'm wondering whether it should exit(1) on "lseek" failures. Would it make 
> sense to skip the file and report it as such? Should it be counted as a 
> skippedfile?

Ok, I think it makes sense to march on and I changed it that way.

> WRT the final status, ISTM that slippedblocks & files could warrant an 
> error when offline, although they might be ok when online?

Ok, also changed it that way.

New patch attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/doc/src/sgml/ref/pg_verify_checksums.sgml b/doc/src/sgml/ref/pg_verify_checksums.sgml
index 905b8f1222..4ad6edcde6 100644
--- a/doc/src/sgml/ref/pg_verify_checksums.sgml
+++ b/doc/src/sgml/ref/pg_verify_checksums.sgml
@@ -37,9 +37,8 @@ PostgreSQL documentation
   Description
   
pg_verify_checksums verifies data checksums in a
-   PostgreSQL cluster.  The server must be shut
-   down cleanly before running pg_verify_checksums.
-   The exit status is zero if there are no checksum errors, otherwise nonzero.
+   PostgreSQL cluster.  The exit status is zero if
+   there are no checksum errors, otherwise nonzero.  
   
  
 
diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c
index 511262ab5f..7f358988ed 100644
--- a/src/bin/pg_verify_checksums/pg_verify_checksums.c
+++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c
@@ -1,7 +1,7 @@
 /*
  * pg_verify_checksums
  *
- * Verifies page level checksums in an offline cluster
+ * Verifies page level checksums in a cluster
  *
  *	Copyright (c) 2010-2019, PostgreSQL Global Development Group
  *
@@ -24,12 +24,16 @@
 
 
 static int64 files = 0;
+static int64 skippedfiles = 0;
 static int64 blocks = 0;
 static int64 badblocks = 0;
+static int64 skippedblocks = 0;
 static ControlFileData *ControlFile;
+static XLogRecPtr checkpointLSN;
 
 static char *only_relfilenode = NULL;
 static bool verbose = false;
+static bool online = false;
 
 static const char *progname;
 
@@ -86,10 +90,17 @@ scan_file(const char *fn, BlockNumber segmentno)
 	PageHeader	header = (PageHeader) buf.data;
 	int			f;
 	BlockNumber blockno;
+	bool		block_retry = false;
 
 	f = open(fn, O_RDONLY | PG_BINARY, 0);
 	if (f < 0)
 	{
+		if (online && errno == ENOENT)
+		{
+			/* File was removed in the meantime */
+			return;
+		}
+
 		fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
 progname, fn, strerror(errno));
 		exit(1);
@@ -104,26 +115,123 @@ scan_file(const char *fn, BlockNumber segmentno)
 
 		if (r == 0)
 			break;
+		if (r < 0)
+		{
+			fprintf(stderr, _("%s: could not read block %u in file \"%s\": %s\n"),
+	progname, blockno, fn, strerror(errno));
+			return;
+		}
 		if (r != BLCKSZ)
 		{
-			fprintf(stderr, _("%s: could not read block %u in file \"%s\": read %d of %d\n"),
-	progname, blockno, fn, r, BLCKSZ);
-			exit(1);
+			if (online)
+			{
+if (block_retry)
+{
+	/* We already tried once to reread the block, skip it */
+	skippedfiles++;
+	fprintf(stderr, _("%s: warning: could not read block %u in file \"%s\": read %d of %d\n"),
+			progname, blockno, fn, r, BLCKSZ);
+	continue;
+}
+
+/*
+ * Retry the block. It's possible that we read the block while it
+ * was extended or shrinked, so it it ends up looking torn to us.
+ */
+
+/*
+ * Seek back by the amount of bytes we read to the beginning of
+ * the failed block.
+ */
+if (lseek(f, -r, SEEK_CUR) == -1)
+{
+	skippedfiles++;
+	fprintf(stderr, _("%s: could not lseek in file 

Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 3:40 PM Tom Lane  wrote:
> Robert Haas  writes:
> >>> just as a thought, what if we stopped assigning manual OIDs for new
> >>> catalog entries altogether, except for once at the end of each release
> >>> cycle?
>
> Actually ... that leads to an idea that wouldn't add any per-commit
> overhead, or really much change at all to existing processes.  Given
> the existence of a reliable OID-renumbering tool, we could:

> In this scheme, OID collisions are a problem for in-progress patches
> only if two patches are unlucky enough to choose the same random
> high OIDs during the same devel cycle.  That's unlikely, or at least
> a good bit less likely than collisions are today.

That sounds like a reasonable compromise. Perhaps the unused_oids
script could give specific guidance on using a randomly determined
small range of contiguous OIDs that fall within the current range for
that devel cycle. That would prevent collisions caused by the natural
human tendency to prefer a round number. Having contiguous OIDs for
the same patch seems worth preserving.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tomas Vondra



On 3/1/19 12:41 AM, Peter Geoghegan wrote:
> On Thu, Feb 28, 2019 at 3:09 PM Tom Lane  wrote:
>> The only thing that's really clear is that some senior committers don't
>> want to be bothered because they don't think there's a problem here that
>> justifies any additional expenditure of their time.  Perhaps they are
>> right, because I'd expected some comments from non-committer developers
>> confirming that they see a problem, and the silence is deafening.
> 
> I don't think that you can take that as too strong a signal. The
> incentives are different for non-committers.
> 
>> I'm inclined to commit some form of Naylor's tool improvement anyway,
>> because I have use for it; I remember times when I've renumbered OIDs
>> manually in patches, and it wasn't much fun.  But I can't force a
>> process change if there's not consensus for it among the committers.
> 
> I think that that's a reasonable thing to do, provided there is
> obvious feedback that makes it highly unlikely that the committer will
> make an error at the last moment. I have a hard time coming up with a
> suggestion that won't be considered annoying by at least one person,
> though.
> 

FWIW I personally would not mind if such tool / process was added. But I
have a related question - do we have some sort of list of such processes
that I could check? That is, a list of stuff that is expected to be done
by a committer before a commit?

I do recall we have [1], but perhaps we have something else.

https://wiki.postgresql.org/wiki/Committing_checklist


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 3:09 PM Tom Lane  wrote:
> The only thing that's really clear is that some senior committers don't
> want to be bothered because they don't think there's a problem here that
> justifies any additional expenditure of their time.  Perhaps they are
> right, because I'd expected some comments from non-committer developers
> confirming that they see a problem, and the silence is deafening.

I don't think that you can take that as too strong a signal. The
incentives are different for non-committers.

> I'm inclined to commit some form of Naylor's tool improvement anyway,
> because I have use for it; I remember times when I've renumbered OIDs
> manually in patches, and it wasn't much fun.  But I can't force a
> process change if there's not consensus for it among the committers.

I think that that's a reasonable thing to do, provided there is
obvious feedback that makes it highly unlikely that the committer will
make an error at the last moment. I have a hard time coming up with a
suggestion that won't be considered annoying by at least one person,
though.

Would it be awful if there was a #warning directive that kicked in
when the temporary OID range is in use? It should be possible to do
that without breaking -Werror builds, which I believe Robert uses (I
am reminded of the Flex bug that we used to have to work around). It's
not like there are that many patches that need to assign OIDs to new
catalog entries. I would suggest that we put the warning in the
regression tests if I didn't know that that could be missed by the use
of parallel variants, where the output flies by. There is no precedent
for using #warning for something like that, but offhand it seems like
the only thing that would work consistently.

I don't really mind having to do slightly more work when the issue
crops up, especially if that means less work for everyone involved in
aggregate, which is the cost that I'm concerned about the most.
However, an undocumented or under-documented process that requires a
fixed amount of extra mental effort when committing *anything* is
another matter.

-- 
Peter Geoghegan



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
>>> just as a thought, what if we stopped assigning manual OIDs for new
>>> catalog entries altogether, except for once at the end of each release
>>> cycle?

Actually ... that leads to an idea that wouldn't add any per-commit
overhead, or really much change at all to existing processes.  Given
the existence of a reliable OID-renumbering tool, we could:

1. Encourage people to develop new patches using chosen-at-random
high OIDs, in the 7K-9K range.  They do this already, it'd just
be encouraged instead of discouraged.

2. Commit patches as received.

3. Once each devel cycle, after feature freeze, somebody uses the
renumbering tool to shove all the new OIDs down to lower numbers,
freeing the high-OID range for the next devel cycle.  We'd have
to remember to do that, but it could be added to the RELEASE_CHANGES
checklist.

In this scheme, OID collisions are a problem for in-progress patches
only if two patches are unlucky enough to choose the same random
high OIDs during the same devel cycle.  That's unlikely, or at least
a good bit less likely than collisions are today.  If/when it does
happen we'd have a couple of alternatives for ameliorating the problem
--- either the not-yet-committed patch could use the renumbering tool
on their own OIDs, or we could do an off-schedule run of step 3 to get
the already-committed OIDs out of their way.

regards, tom lane



Re: propagating replica identity to partitions

2019-02-28 Thread Alvaro Herrera
Added Peter E to CC; question at the very end.

On 2019-Feb-20, Robert Haas wrote:

> Yeah, we could.  I wonder, though, whether we should just make
> everything recurse.  I think that's what people are commonly going to
> want, at least for partitioned tables, and it doesn't seem to me that
> it would hurt anything to make the inheritance case work that way,
> too.  Right now it looks like we have this list of exceptions:
> 
> - actions for identity columns (ADD GENERATED, SET etc., DROP IDENTITY)
> - TRIGGER
> - CLUSTER
> - OWNER
> - TABLESPACE never recurse to descendant tables
> - Adding a constraint recurses only for CHECK constraints that are not
> marked NO INHERIT.
> 
> I have a feeling we eventually want this list to be empty, right?  We
> want a partitioned table to work as much like a non-partitioned table
> as possible, unless the user asks for some other behavior.  Going from
> six exceptions to four and maybe having some of them depend on whether
> it's partitioning or inheritance doesn't seem like it's as good and
> clear as trying to adopt a really uniform policy.
> 
> I don't buy Simon's argument that we should treat TABLESPACE
> differently because the tables might be really big and take a long
> time to move.  I agree that this could well be true, but nobody is
> proposing to remove the ability to move tables individually or to use
> ONLY here.  Allowing TABLESPACE to recurse just gives people one
> additional choice that they don't have today: to move everything at
> once. We don't lose any functionality by enabling that.

Tablespaces already behave a little bit especially in another sense:
it doesn't recurse to indexes.  I think it's not possible to implement a
three-way flag, where you tell it to move only the table, or to recurse
to child tables but leave local indexes alone, or just to move
everything.  If we don't move the child indexes, are we really recursing
in that sense?

I think changing the behavior of tablespaces is likely to cause pain for
people with existing scripts that expect the command not to recurse.  We
would have to add a switch of some kind to provide the old behavior in
order not to break those, and that doesn't seem so good either.

I agree we definitely want to treat a partitioned table as similarly as
possible to a non-partitioned table, but in the particular case of
tablespace it seems to me better not to mess with the behavior.


CLUSTER: I'm not sure about this one.  For legacy inheritance there's
no principled way to figure out the right index to recurse to.  For
partitioning that seems a very simple problem, but I do wonder if
there's any use case at all for ALTER TABLE CLUSTER there.  My
inclination would be to reject ALTER TABLE CLUSTER on a partitioned
table altogether, if it isn't already.

OWNER: let's just change this one to recurse always.  I think we have a
consensus about this one.

TRIGGER: I think it would be good to recurse, based on the trigger name.
I'm not sure what to do about legacy inheritance; the trigger isn't
likely to be there.  An option is to silently ignore each inheritance
child (not partition) if the trigger isn't present on it.

We all seem to agree that REPLICA IDENTITY should recurse.

Maybe ADD GENERATED AS IDENTITY / DROP IDENTITY should recurse; not
really sure about this one.  Peter?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Index Skip Scan

2019-02-28 Thread Tomas Vondra
On 3/1/19 12:03 AM, Thomas Munro wrote:
> On Fri, Mar 1, 2019 at 11:23 AM Jeff Janes  wrote:
>> On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI 
>>  wrote:
>>> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> 
>>> wrote in 
>>> 
 A bit of adjustment after nodes/relation -> nodes/pathnodes.
>>>
>>> I had a look on this.
>>>
>>> The name "index skip scan" is a different feature from the
>>> feature with the name on other prodcuts, which means "index scan
>>> with postfix key (of mainly of multi column key) that scans
>>> ignoring the prefixing part" As Thomas suggested I'd suggest that
>>> we call it "index hop scan". (I can accept Hopscotch, either:p)
>>
>>
>> I think that what we have proposed here is just an incomplete implementation 
>> of what other products call a skip scan, not a fundamentally different 
>> thing.  They don't ignore the prefix part, they use that part in a way to 
>> cancel itself out to give the same answer, but faster.  I think they would 
>> also use this skip method to get distinct values if that is what is 
>> requested.  But they would go beyond that to also use it to do something 
>> similar to the plan we get with this:
> 
> Hi Jeff,
> 
> "Hop scan" was just a stupid joke that occurred to me when I saw that
> DB2 had gone for "jump scan".  I think "skip scan" is a perfectly good
> name and it's pretty widely used by now (for example, by our friends
> over at SQLite to blow us away at these kinds of queries).
> 

+1 to "hop scan"


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Feb 28, 2019 at 7:59 AM Robert Haas  wrote:
>> OK.  Well, I think that doing nothing is superior to this proposal,
>> for reasons similar to what Peter Eisentraut has already articulated.
>> And I think rather than blasting forward with your own preferred
>> alternative in the face of disagreement, you should be willing to
>> discuss other possible options.  But if you're not willing to do that,
>> I can't make you.

> Peter seemed to not want to do this on the grounds that it isn't
> necessary at all, whereas you think that it doesn't go far enough. If
> there is a consensus against what Tom has said, it's a cacophonous one
> that cannot really be said to be in favor of anything.

The only thing that's really clear is that some senior committers don't
want to be bothered because they don't think there's a problem here that
justifies any additional expenditure of their time.  Perhaps they are
right, because I'd expected some comments from non-committer developers
confirming that they see a problem, and the silence is deafening.

I'm inclined to commit some form of Naylor's tool improvement anyway,
because I have use for it; I remember times when I've renumbered OIDs
manually in patches, and it wasn't much fun.  But I can't force a
process change if there's not consensus for it among the committers.

regards, tom lane



Re: Index Skip Scan

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 11:23 AM Jeff Janes  wrote:
> On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI 
>  wrote:
>> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com> 
>> wrote in 
>> > A bit of adjustment after nodes/relation -> nodes/pathnodes.
>>
>> I had a look on this.
>>
>> The name "index skip scan" is a different feature from the
>> feature with the name on other prodcuts, which means "index scan
>> with postfix key (of mainly of multi column key) that scans
>> ignoring the prefixing part" As Thomas suggested I'd suggest that
>> we call it "index hop scan". (I can accept Hopscotch, either:p)
>
>
> I think that what we have proposed here is just an incomplete implementation 
> of what other products call a skip scan, not a fundamentally different thing. 
>  They don't ignore the prefix part, they use that part in a way to cancel 
> itself out to give the same answer, but faster.  I think they would also use 
> this skip method to get distinct values if that is what is requested.  But 
> they would go beyond that to also use it to do something similar to the plan 
> we get with this:

Hi Jeff,

"Hop scan" was just a stupid joke that occurred to me when I saw that
DB2 had gone for "jump scan".  I think "skip scan" is a perfectly good
name and it's pretty widely used by now (for example, by our friends
over at SQLite to blow us away at these kinds of queries).

Yes, simple distinct value scans are indeed just the easiest kind of
thing to do with this kind of scan-with-fast-forward.  As discussed
already in this thread and the earlier one there is a whole family of
tricks you can do, and the thing that most people call an "index skip
scan" is indeed the try-every-prefix case where you can scan an index
on (a, b) given a WHERE clause b = x.  Perhaps the simple distinct
scan could appear as "Distinct Index Skip Scan"?  And perhaps the
try-every-prefix-scan could appear as just "Index Skip Scan"?  Whether
these are the same executor node is a good question; at one point I
proposed a separate nest-loop like node for the try-every-prefix-scan,
but Robert shot that down pretty fast.  I now suspect (as he said)
that all of this belongs in the index scan node, as different modes.
The behaviour is overlapping; for "Distinct Index Skip Scan" you skip
to each distinct prefix and emit one tuple, whereas for "Index Skip
Scan" you skip to each distinct prefix and then perform a regular scan
for the prefix + the suffix emitting matches.

> (which currently takes 200ms, rather than the 0.9ms taken for the one 
> benefiting from skip scan)

Nice.

> I don't think we should give it a different name, just because our initial 
> implementation is incomplete.

+1

> Or do you think our implementation of one feature does not really get us 
> closer to implementing the other?

My question when lobbing the earlier sketch patches into the mailing
list a few years back was: is this simple index AM interface and
implementation (once debugged) powerful enough for various kinds of
interesting skip-based plans?  So far I have the impression that it
does indeed work for Distinct Index Skip Scan (demonstrated), Index
Skip Scan (no one has yet tried that), and special cases of extrema
aggregate queries (foo, MIN(bar) can be performed by skip scan of
index on (foo, bar)), but may not work for the semi-related merge join
trickery mentioned in a paper posted some time back (though I don't
recall exactly why).  Another question is whether it should all be
done by the index scan node, and I think the answer is yet.

-- 
Thomas Munro
https://enterprisedb.com



Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 11:31 AM Shawn Debnath  wrote:
> On Thu, Feb 28, 2019 at 02:08:49PM -0800, Andres Freund wrote:
> > On 2019-03-01 09:48:33 +1300, Thomas Munro wrote:
> > > On Fri, Mar 1, 2019 at 7:24 AM Andres Freund  wrote:
> > > > On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> > > > > Shawn Debnath  writes:
> > > > > > Another thought: my colleague Anton Shyrabokau suggested potentially
> > > > > > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > > > > > map 5 entries today.
> > > > >
> > > > > Yeah, that seems like it might be a workable answer.
> > > >
> > > > Yea, if we just split that into two 16bit entries, there'd not be much
> > > > lost. Some mild mild performance regression due to more granular
> > > > memory->register reads/writes, but I can't even remotely see that
> > > > matter.
> > >
> > > Ok, that's a interesting way to include it in BufferTag without making
> > > it wider.  But then how about the SMGR interface?  I think that value
> > > would need to be added to the smgropen() interface, and all existing
> > > callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and
> > > new code would use SMGR_UNDO, SMGR_SLRU etc.  That seems OK to me.
> >
> > Right, seems like we should do that independent of whether we end up
> > reusing the dboid or not.
>
> Food for thought: if we are going to muck with the smgr APIs, would it
> make sense to move away from SMgrRelation to something a bit more
> generic like, say,  SMgrHandle to better organize the internal contents
> of this structure? Internally, we could move things into an union and
> based on type of handle: relation, undo, slru/generic, translate the
> contents correctly. As you can guess, SMgrRelationData today is very
> specific to relations and holding md specific information whose memory
> would be better served re-used for the other storage managers.
>
> Thoughts?

Right, it does contain some md-specific stuff without even
apologising.  Also smgropen() was rendered non-virtual at some point
(I mean that implementations don't even get a chance to initialise
anything, which works out only because md-specific code has leaked
into smgr.c).  In one of my undo patches (which I'll post an updated
version of on the appropriate thread soon) I added a void * called
private_data so that undo_file.c can keep track of some stuff, but
yeah I agree that more tidying up could be done.

-- 
Thomas Munro
https://enterprisedb.com



Re: Why don't we have a small reserved OID range for patch revisions?

2019-02-28 Thread Peter Geoghegan
On Thu, Feb 28, 2019 at 7:59 AM Robert Haas  wrote:
> I don't think this is the worst proposal ever.  However, I also think
> that it's not unreasonable to raise the issue that writing OR
> reviewing OR committing a patch already involves adhering to a thicket
> of undocumented rules.  When somebody fails to adhere to one of those
> rules, they get ignored or publicly shamed.  Now you want to add yet
> another step to the process - really two.

There does seem to be a real problem with undocumented processes. For
example, I must confess that it came as news to me that we already had
a reserved OID range. However, I don't think that there is that much
of an issue with adding new mechanisms like this, provided it makes it
easy to do the right thing and hard to do the wrong thing. What Tom
has proposed so far is not something that self-evidently meets that
standard, but it's also not something that self-evidently fails to
meet that standard.

I have attempted to institute some general guidelines for what the
thicket of rules are by creating the "committing checklist" page. This
is necessarily imperfect, because the rules are in many cases open to
interpretation, often for good practical reasons. I don't have any
sympathy for committers that find it hard to remember to do a
catversion bump with any kind of regularity. That complexity seems
inherent, not incidental, since it's often convenient to ignore
catalog incompatibilities during development.

> So, I suspect that for every
> unit of work it saves somebody, it's probably going to generate about
> one unit of extra work for somebody else.

Maybe so. I think that you're jumping to conclusions, though.

> A lot of projects have a much less painful process for getting patches
> integrated than we do.  I don't know how those projects maintain
> adequate code quality, but I do know that making it easy to get a
> patch accepted makes people more likely to contribute patches, and
> increases overall development velocity.  It is not even vaguely
> unreasonable to worry about whether making this more complicated is
> going to hurt more than it helps, and I don't know why you think
> otherwise.

But you seem to want to make the mechanism itself even more
complicated, not less complicated (based on your remarks about making
OID assignment happen during the build). In order to make the use of
the mechanism easier. That seems worth considering, but ISTM that this
is talking at cross purposes. There are far simpler ways of making it
unlikely that a committer is going to miss this step. There is also a
simple way of noticing that they do quickly (e.g. a simple buildfarm
test).

> Right now every entry in pg_proc.dat includes an OID assignment.  What
> I'm proposing is that we would also allow entries that did not have
> one, and the build process would assign one while processing the .dat
> files.  Then later, somebody could use a script that went through and
> rewrote the .dat file to add OID assignments to any entries that
> lacked them.  Since the topic of having tools for automated rewrite of
> those files has been discussed at length, and since we already have a
> script called reformat_dat_file.pl in the tree which  contains
> comments indicating that it could be modified for bulk editing, said
> script having been committed BY YOU, I don't understand why you think
> that bulk editing is infeasible.

I'm also curious to hear what Tom thinks about this.

> OK.  Well, I think that doing nothing is superior to this proposal,
> for reasons similar to what Peter Eisentraut has already articulated.
> And I think rather than blasting forward with your own preferred
> alternative in the face of disagreement, you should be willing to
> discuss other possible options.  But if you're not willing to do that,
> I can't make you.

Peter seemed to not want to do this on the grounds that it isn't
necessary at all, whereas you think that it doesn't go far enough. If
there is a consensus against what Tom has said, it's a cacophonous one
that cannot really be said to be in favor of anything.

-- 
Peter Geoghegan



Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 10:41 AM Shawn Debnath  wrote:
> On Fri, Mar 01, 2019 at 10:33:06AM +1300, Thomas Munro wrote:
> > It doesn't make any sense to put things like clog or any other SLRU in
> > a non-default tablespace though.  It's perfectly OK if not all smgr
> > implementations know how to deal with tablespaces, and the SLRU
> > support should just not support that.
>
> If the generic storage manager, or whatever we end up calling it, ends
> up being generic enough - its possible that tablespace value would have
> to be respected.

Right, you and I have discussed this a bit off-list, but for the
benefit of others, I think what you're getting at with "generic
storage manager" here is something like this: on the one hand, our
proposed revival of SMGR as a configuration point is about is
supporting alternative file layouts for bufmgr data, but at the same
time there is some background noise about direct IO, block encryption,
... and who knows what alternative block storage someone might come up
with ... at the block level.  So although it sounds a bit
contradictory to be saying "let's make all these different SMGRs!" at
the same time as saying "but we'll eventually need a single generic
SMGR that is smart enough to be parameterised for all of these
layouts!", I see why you say it.  In fact, the prime motivation for
putting SLRUs into shared buffers is to get better buffering, because
(anecdotally) slru.c's mini-buffer scheme performs abysmally without
the benefit of an OS page cache.  If we add optional direct IO support
(something I really want), we need it to apply to SLRUs, undo and
relations, ideally without duplicating code, so we'd probably want to
chop things up differently.  At some point I think we'll need to
separate the questions "how to map blocks to filenames and offsets"
and "how to actually perform IO".  I think the first question would be
controlled by the SMGR IDs as discussed, but the second question
probably needs to be controlled by GUCs that control all IO, and/or
special per relation settings (supposing you can encrypt just one
table, as a random example I know nothing about); but that seems way
out of scope for the present projects.  IMHO the best path from here
is to leave md.c totally untouched for now as the SMGR for plain old
relations, while we work on getting these new kinds of bufmgr data
into the tree as a first step, and a later hypothetical direct IO or
whatever project can pay for the refactoring to separate IO from
layout.

-- 
Thomas Munro
https://enterprisedb.com



Re: propagating replica identity to partitions

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-20, Robert Haas wrote:

> On Wed, Feb 20, 2019 at 12:38 PM Alvaro Herrera
>  wrote:

> > Maybe the ALL IN TABLESPACE and OWNED BY sub-forms should be split to a
> > separate para.  I suggest:
> >
> > :   This form changes the table's tablespace to the specified tablespace
> > :   and moves the data file(s) associated with the table to the new
> > :   tablespace. Indexes on the table, if any, are not moved; but they
> > :   can be moved separately with additional SET TABLESPACE commands.
> > :   When applied to a partitioned table, nothing is moved, but any
> > :   partitions created afterwards with CREATE TABLE PARTITION OF
> > :   will use that tablespace.
> > :
> > :   All
> > :   tables in the current database in a tablespace can be moved by using
> > :   the ALL IN TABLESPACE form, which will lock all tables to be moved
> > :   first and then move each one. This form also supports OWNED BY,
> > :   which will only move tables owned by the roles specified. If the
> > :   NOWAIT option is specified then the command will fail if it is
> > :   unable to acquire all of the locks required immediately. Note that
> > :   system catalogs are not moved by this command, use ALTER DATABASE or
> > :   explicit ALTER TABLE invocations instead if desired. The
> > :   information_schema relations are not considered part of the system
> > :   catalogs and will be moved. See also CREATE TABLESPACE.
> 
> Seems reasonable.

Pushed this bit, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Index Skip Scan

2019-02-28 Thread Jeff Janes
On Thu, Jan 31, 2019 at 1:32 AM Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello.
>
> At Wed, 30 Jan 2019 18:19:05 +0100, Dmitry Dolgov <9erthali...@gmail.com>
> wrote in  aa+fz3guncutf52q1sufb7ise37tjpsd...@mail.gmail.com>
> > A bit of adjustment after nodes/relation -> nodes/pathnodes.
>
> I had a look on this.
>
> The name "index skip scan" is a different feature from the
> feature with the name on other prodcuts, which means "index scan
> with postfix key (of mainly of multi column key) that scans
> ignoring the prefixing part" As Thomas suggested I'd suggest that
> we call it "index hop scan". (I can accept Hopscotch, either:p)
>

I think that what we have proposed here is just an incomplete
implementation of what other products call a skip scan, not a fundamentally
different thing.  They don't ignore the prefix part, they use that part in
a way to cancel itself out to give the same answer, but faster.  I think
they would also use this skip method to get distinct values if that is what
is requested.  But they would go beyond that to also use it to do something
similar to the plan we get with this:

Set up:
pgbench -i -s50
create index on pgbench_accounts (bid, aid);
alter table pgbench_accounts drop constraint pgbench_accounts_pkey ;

Query:
explain analyze with t as (select distinct bid from pgbench_accounts )
  select pgbench_accounts.* from pgbench_accounts join t using (bid) where
aid=5;

If we accept this patch, I hope it would be expanded in the future to give
similar performance as the above query does even when the query is written
in its more natural way of:

explain analyze select * from pgbench_accounts where aid=5;

(which currently takes 200ms, rather than the 0.9ms taken for the one
benefiting from skip scan)

I don't think we should give it a different name, just because our initial
implementation is incomplete.

Or do you think our implementation of one feature does not really get us
closer to implementing the other?

Cheers,

Jeff


Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Andres Freund  writes:
> FWIW, I think while distasteful, I could see us actually using oids,
> just ones that are small enough to fit into 16bit...

If we suppose that all smgrs must be built-in, that's not even much
of a restriction...

regards, tom lane



Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-02-28 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Feb-28, Tom Lane wrote:
>> I wasn't really working on that for v12 --- I figured it was way
>> too late in the cycle to be starting on such a significant change.

> Oh, well, it certainly seems far too late *now*.  However, what about
> the idea in 
> https://postgr.es/m/1255.1544562...@sss.pgh.pa.us
> namely that we write out the buffers involved?  That sounds like it
> might be backpatchable, and thus it's not too late for it.

I think that what we had in mind at that point was that allowing forced
writes of empty-but-dirty pages would provide a back-patchable solution
to the problem of ftruncate() failure leaving corrupt state on-disk.
That would not, by itself, remove the need for AccessExclusiveLock, so it
doesn't seem like it would eliminate people's desire for the kind of knob
being discussed here.

Thinking about it, the need for AEL is mostly independent of the data
corruption problem; rather, it's a hack to avoid needing to think about
concurrent-truncation scenarios in table readers.  We could fairly
easily reduce the lock level to something less than AEL if we just
taught seqscans, indexscans, etc that trying to read a page beyond
EOF is not an error.  (Reducing the lock level to the point where
we could allow concurrent *writers* is a much harder problem, I think.
But to ameliorate the issues for standbys, we just need to allow
concurrent readers.)  And we'd have to do something about readers
possibly loading doomed pages back into shmem before the truncation
happens; maybe that can be fixed just by truncating first and flushing
buffers second?

I think the $64 question is whether we're giving up any meaningful degree
of error detection if we allow read-beyond-EOF to not be an error.  If we
conclude that we're not, maybe it wouldn't be a very big patch?

regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
On 2019-03-01 09:48:33 +1300, Thomas Munro wrote:
> On Fri, Mar 1, 2019 at 7:24 AM Andres Freund  wrote:
> > On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> > > Shawn Debnath  writes:
> > > > Another thought: my colleague Anton Shyrabokau suggested potentially
> > > > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > > > map 5 entries today.
> > >
> > > Yeah, that seems like it might be a workable answer.
> >
> > Yea, if we just split that into two 16bit entries, there'd not be much
> > lost. Some mild mild performance regression due to more granular
> > memory->register reads/writes, but I can't even remotely see that
> > matter.
> 
> Ok, that's a interesting way to include it in BufferTag without making
> it wider.  But then how about the SMGR interface?  I think that value
> would need to be added to the smgropen() interface, and all existing
> callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and
> new code would use SMGR_UNDO, SMGR_SLRU etc.  That seems OK to me.

Right, seems like we should do that independent of whether we end up
reusing the dboid or not.


> > > Since, per this discussion, the smgr IDs need not really be OIDs at
> > > all, we just need a few bits for them.
> >
> > Personally I find having them as oids more elegant than the distaste
> > from misusing the database oid for a wihle, but I think it's fair to
> > disagree here.
> 
> It sounds like your buffer mapping redesign would completely change
> the economics and we could reconsider much of this later without too
> much drama; that was one of the things that made me feel that the
> magic database OID approach was acceptable at least in the short term.

Right.

FWIW, I think while distasteful, I could see us actually using oids,
just ones that are small enough to fit into 16bit...

Greetings,

Andres Freund



Re: POC: converting Lists into arrays

2019-02-28 Thread Tom Lane
Here's a rebased version of the main patch.

David Rowley  writes:
> The only thing that I did to manage to speed the patch up was to ditch
> the additional NULL test in lnext().  I don't see why that's required
> since lnext(NULL) would have crashed with the old implementation.

I adopted this idea.  I think at one point where I was fooling with
different implementations for foreach(), it was necessary that lnext()
be cool with a NULL input; but as things stand now, it's not.

I haven't done anything else in the performance direction, but am
planning to play with that next.

I did run through all the list_delete_foo callers and fix the ones
that were still busted.  I also changed things so that with
DEBUG_LIST_MEMORY_USAGE enabled, list deletions would move the data
arrays around, in hopes of catching more stale-pointer problems.
Depressingly, check-world still passed with that added, even before
I'd fixed the bugs I found by inspection.  This does not speak well
for the coverage of our regression tests.

regards, tom lane



reimplement-List-as-array-2.patch.gz
Description: reimplement-List-as-array-2.patch.gz


Re: Index Skip Scan

2019-02-28 Thread Jeff Janes
On Wed, Feb 20, 2019 at 11:33 AM Dmitry Dolgov <9erthali...@gmail.com>
wrote:

> > On Fri, Feb 1, 2019 at 8:24 PM Jesper Pedersen <
> jesper.peder...@redhat.com> wrote:
> >
> > Dmitry and I will look at this and take it into account for the next
> > version.
>
> In the meantime, just to not forget, I'm going to post another version
> with a
> fix for cursor fetch backwards, which was crashing before.


This version of the patch can return the wrong answer.

create index on pgbench_accounts (bid, aid);
begin; declare c  cursor for select distinct on (bid) bid, aid from
pgbench_accounts order by bid, aid;
fetch 2 from c;
 bid |   aid
-+-
   1 |   1
   2 | 100,001

fetch backward 1 from c;
 bid |   aid
-+-
   1 | 100,000

Without the patch, instead of getting a wrong answer, I get an error:

ERROR:  cursor can only scan forward
HINT:  Declare it with SCROLL option to enable backward scan.

If I add "SCROLL", then I do get the right answer with the patch.

Cheers,

Jeff


Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2019-02-28 Thread Peter Moser
Dear all,
we rebased our temporal normalization patch on top of 
554ebf687852d045f0418d3242b978b49f160f44 from 2019-02-28.


On 9/7/18 1:02 PM, Peter Moser wrote:
> The syntax is
> SELECT * FROM (r NORMALIZE s USING() WITH(period_r, period_s)) c;

Please find all information about our decisions and current state within the 
previous email.

> What we like to discuss now is:
> - Is sort_inner_and_outer the correct place to perform this split?
> - How could we support OID_RANGE_ELEM_CONTAINED_OP for a NORMALIZE
>   mergejoin executor? If we use RANGE_ELEM_CONTAINED as operator, it is
>   not an equality operator, but if we use RANGE_EQ it assumes that the
>   right-hand-side of the operator must be a range as well.
> - Should we better change our mergeclause to a RANGE_ELEM_CONTAINED
>   comparison, or keep RANGE_EQ and fix pathkeys later?
> - How do we update equivalence classes, and pathkeys
>   when changing the inner relation's data type from "int4range" to "int"
>   in the query tree inside "sort_inner_and_outer" to get the correct
>   ordering and data types

I will also add this prototype (WIP) patch to the commitfest of March, as 
suggested by two developers met at the
FOSDEM some weeks ago.


Best regards,
Anton, Johann, Michael, Peter

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 2a1d000b03..a309596fa1 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -99,6 +99,106 @@
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 
+// XXX TEMPORAL NORMALIZE PEMOSER 
+// !!! THis is just for prototyping, delete asap...
+
+#include "catalog/pg_operator.h"
+#include "nodes/nodeFuncs.h"
+#include "utils/fmgroids.h"
+#include "utils/rangetypes.h"
+#include "utils/typcache.h"
+#include "access/htup_details.h"/* for heap_getattr */
+#include "nodes/print.h"/* for print_slot */
+#include "utils/datum.h"/* for datumCopy */
+
+
+
+#define TEMPORAL_DEBUG
+/*
+ * #define TEMPORAL_DEBUG
+ * XXX PEMOSER Maybe we should use execdebug.h stuff here?
+ */
+#ifdef TEMPORAL_DEBUG
+static char*
+datumToString(Oid typeinfo, Datum attr)
+{
+	Oid typoutput;
+	booltypisvarlena;
+	getTypeOutputInfo(typeinfo, , );
+	return OidOutputFunctionCall(typoutput, attr);
+}
+
+#define TPGdebug(...)   { printf(__VA_ARGS__); printf("\n"); fflush(stdout); }
+#define TPGdebugDatum(attr, typeinfo)   TPGdebug("%s = %s %ld\n", #attr, datumToString(typeinfo, attr), attr)
+#define TPGdebugSlot(slot)  { printf("Printing Slot '%s'\n", #slot); print_slot(slot); fflush(stdout); }
+
+#else
+#define datumToString(typeinfo, attr)
+#define TPGdebug(...)
+#define TPGdebugDatum(attr, typeinfo)
+#define TPGdebugSlot(slot)
+#endif
+
+TypeCacheEntry *testmytypcache;
+#define setSweepline(datum) \
+	node->sweepline = datumCopy(datum, node->datumFormat->attbyval, node->datumFormat->attlen)
+
+#define freeSweepline() \
+	if (! node->datumFormat->attbyval) pfree(DatumGetPointer(node->sweepline))
+
+ /*
+  * slotGetAttrNotNull
+  *  Same as slot_getattr, but throws an error if NULL is returned.
+  */
+static Datum
+slotGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+
+	result = slot_getattr(slot, attnum, );
+
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+ NameStr(TupleDescAttr(slot->tts_tupleDescriptor, attnum - 1)->attname),
+ attnum)));
+
+	return result;
+}
+
+/*
+ * heapGetAttrNotNull
+ *  Same as heap_getattr, but throws an error if NULL is returned.
+ */
+static Datum
+heapGetAttrNotNull(TupleTableSlot *slot, int attnum)
+{
+	bool isNull;
+	Datum result;
+	HeapTuple tuple;
+
+	tuple = ExecFetchSlotHeapTuple(slot, true, NULL);
+	result = heap_getattr(tuple,
+		  attnum,
+		  slot->tts_tupleDescriptor,
+		  );
+	if(isNull)
+		ereport(ERROR,
+(errcode(ERRCODE_NOT_NULL_VIOLATION),
+ errmsg("Attribute \"%s\" at position %d is null. Temporal " \
+		"adjustment not possible.",
+		NameStr(TupleDescAttr(slot->tts_tupleDescriptor,
+attnum - 1)->attname),
+		attnum)));
+
+	return result;
+}
+
+// XXX TEMPORAL NORMALIZE PEMOSER END 
+
 
 /*
  * States of the ExecMergeJoin state machine
@@ -138,6 +238,10 @@ typedef struct MergeJoinClauseData
 	 * stored here.
 	 */
 	SortSupportData ssup;
+
+	/* needed for Temporal Normalization */
+	bool isnormalize;
+	TypeCacheEntry  *range_typcache;
 }			MergeJoinClauseData;
 
 /* Result type for MJEvalOuterValues and MJEvalInnerValues */
@@ -152,6 +256,59 @@ typedef enum
 #define MarkInnerTuple(innerTupleSlot, mergestate) \
 	ExecCopySlot((mergestate)->mj_MarkedTupleSlot, (innerTupleSlot))
 
+/*
+ * temporalAdjustmentStoreTuple
+ *  While we 

Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 4:09 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Thu, Feb 28, 2019 at 7:37 PM Tom Lane  wrote:
> >> Thomas Munro  writes:
> >>> Our current thinking is that smgropen() should know how to map a small
> >>> number of special database OIDs to different smgr implementations
>
> >> Hmm.  Maybe mapping based on tablespaces would be a better idea?
>
> > In the undo log proposal (about which more soon) we are using
> > tablespaces for their real purpose, so we need that OID.  If you SET
> > undo_tablespaces = foo then future undo data created by your session
> > will be written there, which might be useful for putting that IO on
> > different storage.
>
> Meh.  That's a point, but it doesn't exactly seem like a killer argument.
> Just in the abstract, it seems much more likely to me that people would
> want per-database special rels than per-tablespace special rels.  And
> I think your notion of a GUC that can control this is probably pie in
> the sky anyway: if we can't afford to look into the catalogs to resolve
> names at this code level, how are we going to handle a GUC?

I have this working like so:

* undo logs have a small amount of meta-data in shared memory, stored
in a file at checkpoint time, with all changes WAL logged, visible to
users in pg_stat_undo_logs view
* one of the properties of an undo log is its tablespace (the point
here being that it's not in a catalog)
* you don't need access to any catalogs to find the backing files for
a RelFileNode (the path via tablespace symlinks is derivable from
spcNode)
* therefore you can find your way from an UndoLogRecPtr in (say) a
zheap page to the relevant blocks on disk without any catalog access;
this should work even in the apparently (but not actually) circular
case of a pg_tablespace catalog that is stored in zheap (not something
we can do right now, but hypothetically speaking), and has undo data
that is stored in some non-default tablespace that must be consulted
while scanning the catalog (not that I'm suggesting that would
necessarily be a good idea to suppose catalogs in non-default
tablespaces; I'm just addressing your theoretical point)
* the GUC is used to resolve tablespace names to OIDs only by sessions
that are writing, when selecting (or creating) an undo log to attach
to and begin writing into; those sessions have no trouble reading the
catalog to do so without problematic circularities, as above

Seems to work; the main complications so far were coming up with
reasonable behaviour and interlocking when you drop tablespaces that
contain undo logs (short version: if they're not needed for snapshots
or rollback, they are dropped, wasting the rest of their undo address
space; otherwise they prevents the tablespace from being dropped with
a clear message to that effect).

It doesn't make any sense to put things like clog or any other SLRU in
a non-default tablespace though.  It's perfectly OK if not all smgr
implementations know how to deal with tablespaces, and the SLRU
support should just not support that.

> The real reason I'm concerned about this, though, is that for either
> a database or a tablespace, you can *not* get away with having a magic
> OID just hanging in space with no actual catalog row matching it.
> If nothing else, you need an entry there to prevent someone from
> reusing the OID for another purpose.  And a pg_database row that
> doesn't correspond to a real database is going to break all kinds of
> code, starting with pg_upgrade and the autovacuum launcher.  Special
> rows in pg_tablespace are much less likely to cause issues, because
> of the precedent of pg_global and pg_default.

GetNewObjectId() never returns values < FirstNormalObjectId.

I don't think it's impossible for someone to want to put SMGRs in a
catalog of some kind some day.  Even though the ones for clog, undo
etc would still probably need special hard-coded treatment as
discussed, I suppose it's remotely possible that someone might some
day figure out a useful way to allow extensions that provide different
block storage (nvram?  zfs zvols?  encryption? (see Haribabu's reply))
but I don't have any specific ideas about that or feel inclined to
design something for unknown future use.

-- 
Thomas Munro
https://enterprisedb.com



Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Joe Conway
On 2/28/19 7:20 AM, Michael Paquier wrote:
> On Thu, Feb 28, 2019 at 07:11:04AM -0500, Joe Conway wrote:
>> Sure, will do. What are your thoughts on backpatching? This seems
>> unlikely to be a practical concern in the field, so my inclination is a
>> master only fix.
> 
> I agree that this would unlikely become an issue as an error on the
> control file would most likely be a PANIC when updating it, so fixing
> only HEAD sounds fine to me.  Thanks for asking!


Committed and push that way.

By the way, while looking at this, I noted at least a couple of places
where OpenTransientFile() is being passed O_RDWR when the usage is
pretty clearly intended to be read-only. For example at least two
instances in slru.c -- SlruPhysicalReadPage() and
SimpleLruDoesPhysicalPageExist(). Is it worth while searching for and
fixing those instances?

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: get_controlfile() can leak fds in the backend

2019-02-28 Thread Andres Freund
Hi,

On 2019-02-28 09:54:48 +0100, Fabien COELHO wrote:
> > If we were to want to do more here, ISTM the right approach would use
> > the postmaster pid file, not the control file.
> 
> ISTM that this just means re-inventing a manual poor-featured
> race-condition-prone lock API around another file, which seems to be created
> more or less only by "pg_ctl", while some other commands use the control
> file (eg pg_rewind, AFAICS).

Huh? Postmaster.pid is written by the backend, pg_ctl just checks it to
see if the backend has finished starting up etc. It's precisely what the
backend uses to prevent two postmasters to start etc. It's also what say
pg_resetwal checks to protect against a concurrently running lcuster
(albeit in a racy way). If we want to make things more bulletproof,
that's the place.  The control file is constantly written to, sometimes
by different processes, it'd just not be a good file for such lockout
mechanisms.

Greetings,

Andres Freund



Re: Drop type "smgr"?

2019-02-28 Thread Thomas Munro
On Fri, Mar 1, 2019 at 7:24 AM Andres Freund  wrote:
> On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> > Shawn Debnath  writes:
> > > Another thought: my colleague Anton Shyrabokau suggested potentially
> > > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > > map 5 entries today.
> >
> > Yeah, that seems like it might be a workable answer.
>
> Yea, if we just split that into two 16bit entries, there'd not be much
> lost. Some mild mild performance regression due to more granular
> memory->register reads/writes, but I can't even remotely see that
> matter.

Ok, that's a interesting way to include it in BufferTag without making
it wider.  But then how about the SMGR interface?  I think that value
would need to be added to the smgropen() interface, and all existing
callers would pass in (say) SGMR_RELATION (meaning "use md.c"), and
new code would use SMGR_UNDO, SMGR_SLRU etc.  That seems OK to me.
Ancient POSTGRES had an extra argument like that and would say eg
smgropen(rd->rd_rel->relsmgr, rd), but in this new idea I think it'd
always be a constant or a value from a BufferTag, and the BufferTag
would have been set with a constant, since the code reading these
buffers would always be code that knows which it wants.  We'd be using
this new argument not as a modifier to control storage location as
they did back then, but rather as a whole new namespace for
RelFileNode values, that also happens to be stored differently; that
might be a hint that it could also go into RelFileNode (but I'm not
suggesting that).

> > Since, per this discussion, the smgr IDs need not really be OIDs at
> > all, we just need a few bits for them.
>
> Personally I find having them as oids more elegant than the distaste
> from misusing the database oid for a wihle, but I think it's fair to
> disagree here.

It sounds like your buffer mapping redesign would completely change
the economics and we could reconsider much of this later without too
much drama; that was one of the things that made me feel that the
magic database OID approach was acceptable at least in the short term.

-- 
Thomas Munro
https://enterprisedb.com



Re: Segfault when restoring -Fd dump on current HEAD

2019-02-28 Thread Dmitry Dolgov
> On Thu, Feb 28, 2019 at 9:24 PM Alvaro Herrera  
> wrote:
>
> Pushed, thanks.  I added the reminder comment I mentioned.

Thank you, sorry for troubles.



Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-02-28 Thread Robert Haas
On Tue, Feb 26, 2019 at 5:10 PM Robert Haas  wrote:
> Aside from these problems, I think I have spotted a subtle problem in
> 0001. I'll think about that some more and post another update.

0001 turned out to be guarding against the wrong problem.  It supposed
that if we didn't get a coherent view of the system catalogs due to
concurrent DDL, we could just AcceptInvalidationMessages() and retry.
But that turns out to be wrong, because there's a (very) narrow window
after a process removes itself from the ProcArray and before it sends
invalidation messages.  It wasn't difficult to engineer an alternative
solution that works, but unfortunately it's only good enough to handle
the ATTACH case, so this is another thing that will need more thought
for concurrent DETACH.  Anyway, the updated 0001 contains that code
and some explanatory comments.  The rest of the series is
substantially unchanged.

I'm not currently aware of any remaining correctness issues with this
code, although certainly there may be some.  There has been a certain
dearth of volunteers to review any of this.  I do plan to poke at it a
bit to see whether it has any significant performance impact, but not
today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v4-0004-Reduce-the-lock-level-required-to-attach-a-partit.patch
Description: Binary data


v4-0002-Ensure-that-repeated-PartitionDesc-lookups-return.patch
Description: Binary data


v4-0001-Teach-RelationBuildPartitionDesc-to-cope-with-con.patch
Description: Binary data


v4-0003-Teach-runtime-partition-pruning-to-cope-with-conc.patch
Description: Binary data


Re: Segfault when restoring -Fd dump on current HEAD

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-27, Dmitry Dolgov wrote:

> > On Wed, Feb 27, 2019 at 1:32 PM Alvaro Herrera  
> > wrote:
> >
> > > > I think it would be better to just put back the .defn = "" (etc) to the
> > > > ArchiveEntry calls.
> > >
> > > Then we should do this not only for defn, but for owner and dropStmt too.
> >
> > Yeah, absolutely.
> 
> Done, please find the attached patch.

Pushed, thanks.  I added the reminder comment I mentioned.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Tom Lane wrote:

> Alvaro Herrera  writes:
> > Hopefully we'll get Tom's patch that addresses the failure-to-truncate
> > issues in pg12.
> 
> Hm, are you speaking of the handwaving I did in
> https://www.postgresql.org/message-id/2348.1544474...@sss.pgh.pa.us
> ?
> 
> I wasn't really working on that for v12 --- I figured it was way
> too late in the cycle to be starting on such a significant change.

Oh, well, it certainly seems far too late *now*.  However, what about
the idea in 
https://postgr.es/m/1255.1544562...@sss.pgh.pa.us
namely that we write out the buffers involved?  That sounds like it
might be backpatchable, and thus it's not too late for it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Protect syscache from bloating with negative cache entries

2019-02-28 Thread Robert Haas
On Wed, Feb 27, 2019 at 3:16 AM Ideriha, Takeshi
 wrote:
> I'm afraid I may be quibbling about it.
> What about users who understand performance drops but don't want to
> add memory or decrease concurrency?
> I think that PostgreSQL has a parameter
> which most of users don't mind and use is as default
> but a few of users want to change it.
> In this case as you said, introducing hard limit parameter causes
> performance decrease significantly so how about adding detailed caution
> to the document like planner cost parameter?

There's nothing wrong with a parameter that is useful to some people
and harmless to everyone else, but the people who are proposing that
parameter still have to demonstrate that it has those properties.
This email thread is really short on clear demonstrations that X or Y
is useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: FETCH FIRST clause PERCENT option

2019-02-28 Thread Tomas Vondra
On 2/28/19 12:26 PM, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> At Sat, 23 Feb 2019 22:27:44 +0100, Tomas Vondra 
>  wrote in 
> <81a5c0e9-c17d-28f3-4647-8a4659cdf...@2ndquadrant.com>
>>
>>
>> On 2/23/19 8:53 AM, Surafel Temesgen wrote:
>>>
>>>
>>> On Sun, Feb 10, 2019 at 2:22 AM Tomas Vondra
>>> mailto:tomas.von...@2ndquadrant.com>> wrote:
>>>  
>>>
>>>
>>> I'm not sure I understand - are you saying every time the user does a
>>> FETCH, we have to run the outer plan from scratch? I don't see why would
>>> that be necessary? And if it is, how come there's no noticeable
>>> performance difference?
>>>
>>> Can you share a patch implementing the incremental approach, and a query
>>> demonstrating the issue?
>>>
>>>
>>> I didn't implement it but its obvious that it doesn't work similarly
>>> with previous approach.
>>>
>>
>> Sure, but that's hardly a sufficient argument for the current approach.
>>
>>> We need different implementation and my plan was to use tuplestore per
>>> call and clear
>>>
>>> it after returning tuple but I see that the plan will not go far because
>>> mainly the last returned
>>>
>>> slot is not the last slot we get from outerPlan execution
>>>
>>
>> I'm sorry, I still don't understand what the supposed problem is. I
>> don't think it's all that different from what nodeMaterial.c does, for
>> example.
>>
>> As I explained before, having to execute the outer plan till completion
>> before returning any tuples is an issue. So either it needs fixing or an
>> explanation why it's not an issue.
> 
> One biggest issue seems to be we don't know the total number of
> outer tuples before actually reading a null tuple. I doubt of
> general shortcut for that. It also seems preventing limit node
> from just using materialized outer.
> 

Sure, if you actually want all tuples, you'll have to execute the outer
plan till completion. But that's not what I'm talking about - what if we
only ever need to read one row from the limit?

To give you a (admittedly, somewhat contrived and artificial example):

SELECT * FROM t1 WHERE id IN (
  SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY
);

Maybe this example is bogus and/or does not really matter in practice. I
don't know, but I've been unable to convince myself that's the case.


regards


-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-02-28 Thread Tomas Vondra
Hi,

Attached is an updated version of this patch series. I've decided to
rebase and send both parts (MCV and histograms), although we've agreed
to focus on the MCV part for now. I don't want to leave the histogram to
lag behind, because (a) then it'd be much more work to update it, and
(b) I think it's an useful feedback about likely future changes.

This should address most of the issues pointed out by David in his
recent reviews. Briefly:

1) It fixes/updates a number of comments and docs on various places,
removes redundant comments etc. In most cases I've simply adopted the
wording proposed by David, with minor tweaks in a couple of cases.

2) Reverts changes that exposed analyze_mcv_list - this was a leftover
from the attempt to reuse the single-column algorithm, but we've since
agreed it's not the right approach. So this change is unnecessary.

3) I've tweaked the code to accept RelabelType nodes as supported,
similarly to what examine_variable() does. Previously I concluded we
can't support RelabelType, but it seems that reasoning was bogus. I've
slightly tweaked the regression tests by changing one of the columns to
varchar, so that the queries actualy trigger this.

4) I've tweaked a couple of places (UpdateStatisticsForTypeChange,
statext_clauselist_selectivity and estimate_num_groups_simple) per
David's suggestions. Those were fairly straightforward simplifications.

5) I've removed mcv_count from statext_mcv_build(). As David pointed
out, this was not actually needed - it was another remnant of the
attempt to re-use analyze_mcv_list() which needs such array. But without
it we can access the groups directly.

6) One of the review questions was about the purpose of this code:

  for (i = 0; i < nitems; i++)
  {
  if (groups[i].count < mincount)
  {
  nitems = i;
  break;
  }
  }

It's quite simple - we want to include groups with more occurrences than
mincount, and the groups are sorted by the count (in descending order).
So we simply find the first group with count below mincount, and the
index is the number of groups to keep. I've tried to explain that in a
comment.

7) I've fixed a bunch of format patters in statext_mcv_deserialize(),
particularly those that confused %d and %u. We can't however use %d for
VARSIZE_ANY_EXHDR, because that macro expands into offsetof() etc. So
that would trigger compiler warnings.

8) Yeah, pg_stats_ext_mcvlist_items was broken. The issue was that one
of the output parameters is defined as boolean[], but the function was
building just string. Originally it used BuildTupleFromCStrings(), but
then it switched to heap_form_tuple() without building a valid array.

I've decided to simply revert back to BuildTupleFromCStrings(). It's not
going to be used very frequently, so the small performance difference is
not important.

I've also fixed the formatting issues, pointed out by David.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20190228.patch.gz
Description: application/gzip


0002-multivariate-histograms-20190228.patch.gz
Description: application/gzip


Re: some ri_triggers.c cleanup

2019-02-28 Thread Peter Eisentraut
On 2019-02-25 17:17, Corey Huinker wrote:
> Right, this makes a lot of sense, similar to how ri_restrict() combines
> RESTRICT and NO ACTION.
> 
> 
> I'm pretty sure that's where I got the idea, yes. 

Committed, including your patch.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_partition_tree crashes for a non-defined relation

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Michael Paquier wrote:

> On Wed, Feb 27, 2019 at 03:48:08PM -0300, Alvaro Herrera wrote:
> > I just happened to come across the result of this rationale in
> > pg_partition_tree() (an SRF) while developing a new related function,
> > pg_partition_ancestors(), and find the resulting behavior rather absurd
> > -- it returns one row with all NULL columns, rather than no rows.  I
> > think the sensible behavior would be to do SRF_RETURN_DONE() before
> > stashing any rows to the output, so that we get an empty result set
> > instead.
> 
> Hmm.  Going through the thread again NULL was decided to make the
> whole experience consistent, now by returning nothing we would get
> a behavior as consistent as when NULL is used in input, so point taken
> to tune the behavior for unsupported relkinds and undefined objects.

Right, thanks.

> Does the attached look fine to you?

Yeah, looks good, please push.

What about legacy inheritance?  I see that pg_partition_tree handles
that case perfectly well -- it seems to return the complete hierarchy
rooted at the given relation.  However, it seems odd that it works at
all, don't you think?  Consider this:

create table t1 (a int);
create table t11 () inherits (t1);
create table t2 (b int);
create table t111() inherits (t1, t2);

alvherre=# select * from pg_partition_tree('t1');
 relid | parentrelid | isleaf | level 
---+-++---
 t1| t   | t  | 0
 t11   | t1  | t  | 1
 t111  | t1  | t  | 1
(3 filas)

OK so far... but look at t2's tree:

alvherre=# select * from pg_partition_tree('t2');
 relid | parentrelid | isleaf | level 
---+-++---
 t2| t   | t  | 0
 t111  | t1  | t  | 2

I think this one is just weird -- t1 is not listed as "relid" anywhere,
and why does t111 has level=2?

I would opt for returning the empty set for legacy inheritance too.

More generally, I think we should return empty for anything that's
either not RELKIND_PARTITIONED_TABLE or has relispartition set.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: POC: converting Lists into arrays

2019-02-28 Thread Tom Lane
David Rowley  writes:
> On Thu, 28 Feb 2019 at 09:26, Tom Lane  wrote:
>> 0001 below does this.  I found a couple of places that could use
>> forfive(), as well.  I think this is a clear legibility and
>> error-proofing win, and we should just push it.

> I've looked over this and I agree that it's a good idea.  Reducing the
> number of lnext() usages seems like a good idea in order to reduce the
> footprint of the main patch.

I've pushed that; thanks for reviewing!

>> 0002 below does this.  I'm having a hard time deciding whether this
>> part is a good idea or just code churn.  It might be more readable
>> (especially to newbies) but I can't evaluate that very objectively.

> I'm less decided on this.

Yeah, I think I'm just going to drop that idea.  People didn't seem
very sold on list_cell_is_last() being a readability improvement,
and it certainly does nothing to reduce the footprint of the main
patch.

I now need to rebase the main patch over what I pushed; off to do
that next.

regards, tom lane



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
here is the data,

postgres=# \c template1
You are now connected to database "template1" as user "postgres".
template1=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

template1=# \c postgres
You are now connected to database "postgres" as user "postgres".
postgres=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

postgres=# \c nagdb
You are now connected to database "nagdb" as user "postgres".
nagdb=# \dx
 List of installed extensions
  Name   | Version |   Schema   | Description
-+-++--
 plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)

nagdb=# \c archive_old
You are now connected to database "books_old" as user "postgres".
books_old=# \dx
 List of installed extensions
Name| Version |   Schema   |
Description
+-++---
 pg_stat_statements | 1.1 | public | track execution
statistics of all SQL statements executed
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)

archive_old=# \c production
You are now connected to database "blurb_production" as user "postgres".
production=# \dx
 List of installed extensions
Name| Version |   Schema   |
Description
+-++---
 hstore | 1.1 | public | data type for storing
sets of (key, value) pairs
 pg_stat_statements | 1.1 | public | track execution
statistics of all SQL statements executed
 plpgsql| 1.0 | pg_catalog | PL/pgSQL procedural language
 uuid-ossp  | 1.0 | public | generate universally
unique identifiers (UUIDs)
(4 rows)


Thanks,



On Thu, Feb 28, 2019 at 11:04 AM Sergei Kornilov  wrote:

> Hi
>
> > Yes, i want to get rid of old extension, Could you please share the
> query to find extension which is using pg_reorg.
>
> pg_reorg is name for both tool and extension.
> Check every database in cluster with, for example, psql command "\dx" or
> read pg_dumpall -s output for some CREATE EXTENSION statements to find all
> installed extensions.
>
> regards, Sergei
>


Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Sergei Kornilov
Hi

> Yes, i want to get rid of old extension, Could you please share the query to 
> find extension which is using pg_reorg.

pg_reorg is name for both tool and extension.
Check every database in cluster with, for example, psql command "\dx" or read 
pg_dumpall -s output for some CREATE EXTENSION statements to find all installed 
extensions.

regards, Sergei



Re: [RFC] [PATCH] Flexible "partition pruning" hook

2019-02-28 Thread Mike Palmiotto
On Wed, Feb 27, 2019 at 12:36 PM Peter Eisentraut
 wrote:
> 
> To rephrase this: You have a partitioned table, and you have a RLS
> policy that hides certain rows, and you know based on your business
> logic that under certain circumstances entire partitions will be hidden,
> so they don't need to be scanned.  So you want a planner hook that would
> allow you to prune those partitions manually.
>
> That sounds pretty hackish to me.  We should give the planner and
> executor the appropriate information to make these decisions, like an
> additional partition constraint.

Are you thinking of a partition pruning step for FuncExpr's or
something else? I was considering an implementation where FuncExpr's
were marked for execution-time pruning, but wanted to see if this
patch got any traction first.

> If this information is hidden in
> user-defined functions in a way that cannot be reasoned about, who is
> enforcing these constraints and how do we know they are actually correct?

The author of the extension which utilizes the hook would have to be
sure they use the hook correctly. This is not a new or different
concept to any other existing hook. This hook in particular would be
used by security extensions that have some understanding of the
underlying security model being implemented by RLS.

Thanks,

--
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com



Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Tom Lane
Jeff Janes  writes:
> Should we be trying to estimate the false positive rate and charging
> cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
> recheck and reject those?  I don't think other index types do that, and I'm
> inclined to think the burden should be on the user not to create silly
> indexes that produce an overwhelming number of false positives.

Heap-access costs are added on in costsize.c, not in the index
cost estimator.  I don't remember at the moment whether there's
any explicit accounting for lossy indexes (i.e. false positives).
Up to now, there haven't been cases where we could estimate the
false-positive rate with any accuracy, so we may just be ignoring
the effect.  But if we decide to account for it, I'd rather have
costsize.c continue to add on the actual cost, perhaps based on
a false-positive-rate fraction returned by the index estimator.

regards, tom lane



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Thank you very much Sergei,

Yes, i want to get rid of old extension, Could you please share the query
to find extension which is using pg_reorg.

Regards,




On Thu, Feb 28, 2019 at 10:27 AM Sergei Kornilov  wrote:

> Hello
>
> pgsql-hackers seems wrong list for such question.
>
> > could not load library "$libdir/hstore": ERROR:  could not access file
> "$libdir/hstore": No such file or directory
> > could not load library "$libdir/adminpack": ERROR:  could not access
> file "$libdir/adminpack": No such file or directory
> > could not load library "$libdir/uuid-ossp": ERROR:  could not access
> file "$libdir/uuid-ossp": No such file or directory
> >
> > Observation : the above Libraries are present in 9.2 whereas its mising
> in 10.7. So i decided to go with lower version.
>
> This is contrib modules. They can be shipped in separate package,
> postgresql10-contrib.x86_64 for example (in centos repo)
>
> > Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
> its failed with following error during Check Mode.
> >
> > could not load library "$libdir/pg_reorg":
> > ERROR:  could not access file "$libdir/pg_reorg": No such file or
> directory
> >
> > Observation : In this case , pg_reorg is not present on both Source and
> Target . But strange its failing.
>
> This is 3rd-party extension. Best way would be drop this extension on old
> cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is
> live fork if you need such tool.
>
> regards, Sergei
>


Re: partitioned tables referenced by FKs

2019-02-28 Thread Alvaro Herrera
On 2019-Feb-28, Amit Langote wrote:

> Hi Alvaro,
> 
> I looked at the latest patch and most of the issues/bugs that I was going
> to report based on the late January version of the patch seem to have been
> taken care of; sorry that I couldn't post sooner which would've saved you
> some time.   The patch needs to be rebased on top of ff11e7f4b9 which
> touched ri_triggers.c.

Rebased to current master.  I'll reply later to handle the other issues
you point out.

Given the comments on 0002 in this thread and elsewhere, I'm inclined to
push that one today with minor tweaks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1a0b9c9a699b2c80ebc295513e07db362b39fe0d Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 28 Nov 2018 11:52:00 -0300
Subject: [PATCH v5 1/5] Rework deleteObjectsInList to allow objtype-specific
 checks

This doesn't change any functionality yet.
---
 src/backend/catalog/dependency.c | 41 +++-
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 2048d71535b..0b4c47b808c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -230,29 +230,38 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	int			i;
 
 	/*
-	 * Keep track of objects for event triggers, if necessary.
+	 * Invoke pre-deletion callbacks and keep track of objects for event
+	 * triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
+	for (i = 0; i < targetObjects->numrefs; i++)
 	{
-		for (i = 0; i < targetObjects->numrefs; i++)
+		const ObjectAddress *thisobj = >refs[i];
+		Oid			objectClass = getObjectClass(thisobj);
+
+		if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
 		{
-			const ObjectAddress *thisobj = >refs[i];
-			const ObjectAddressExtra *extra = >extras[i];
-			bool		original = false;
-			bool		normal = false;
-
-			if (extra->flags & DEPFLAG_ORIGINAL)
-original = true;
-			if (extra->flags & DEPFLAG_NORMAL)
-normal = true;
-			if (extra->flags & DEPFLAG_REVERSE)
-normal = true;
-
-			if (EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+			if (EventTriggerSupportsObjectClass(objectClass))
 			{
+bool		original = false;
+bool		normal = false;
+const ObjectAddressExtra *extra = >extras[i];
+
+if (extra->flags & DEPFLAG_ORIGINAL)
+	original = true;
+if (extra->flags & DEPFLAG_NORMAL ||
+	extra->flags & DEPFLAG_REVERSE)
+	normal = true;
+
 EventTriggerSQLDropAddObject(thisobj, original, normal);
 			}
 		}
+
+		/* Invoke class-specific pre-deletion checks */
+		switch (objectClass)
+		{
+			default:
+break;
+		}
 	}
 
 	/*
-- 
2.17.1

>From f353acaffaf275df623e70c296d5833bb9f318e7 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 22 Jan 2019 18:00:31 -0300
Subject: [PATCH v5 2/5] index_get_partition

---
 src/backend/catalog/partition.c  | 35 
 src/backend/commands/tablecmds.c | 40 +---
 src/include/catalog/partition.h  |  1 +
 3 files changed, 47 insertions(+), 29 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 3ccdaff8c45..f9282587f8b 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -145,6 +145,41 @@ get_partition_ancestors_worker(Relation inhRel, Oid relid, List **ancestors)
 	get_partition_ancestors_worker(inhRel, parentOid, ancestors);
 }
 
+/*
+ * Return the OID of the index, in the given partition, that is a child of the
+ * given index or InvalidOid if there isn't one.
+ */
+Oid
+index_get_partition(Relation partition, Oid indexId)
+{
+	List	   *idxlist = RelationGetIndexList(partition);
+	ListCell   *l;
+
+	foreach(l, idxlist)
+	{
+		Oid			partIdx = lfirst_oid(l);
+		HeapTuple	tup;
+		Form_pg_class classForm;
+		bool		ispartition;
+
+		tup = SearchSysCache1(RELOID, ObjectIdGetDatum(partIdx));
+		if (!tup)
+			elog(ERROR, "cache lookup failed for relation %u", partIdx);
+		classForm = (Form_pg_class) GETSTRUCT(tup);
+		ispartition = classForm->relispartition;
+		ReleaseSysCache(tup);
+		if (!ispartition)
+			continue;
+		if (get_partition_parent(lfirst_oid(l)) == indexId)
+		{
+			list_free(idxlist);
+			return partIdx;
+		}
+	}
+
+	return InvalidOid;
+}
+
 /*
  * map_partition_varattnos - maps varattno of any Vars in expr from the
  * attno's of 'from_rel' to the attno's of 'to_rel' partition, each of which
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a93b13c2fe4..251c5cd3fa1 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15454,36 +15454,18 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 static void
 refuseDupeIndexAttach(Relation parentIdx, 

Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Sergei Kornilov
Hello

pgsql-hackers seems wrong list for such question.

> could not load library "$libdir/hstore": ERROR:  could not access file 
> "$libdir/hstore": No such file or directory
> could not load library "$libdir/adminpack": ERROR:  could not access file 
> "$libdir/adminpack": No such file or directory
> could not load library "$libdir/uuid-ossp": ERROR:  could not access file 
> "$libdir/uuid-ossp": No such file or directory
>
> Observation : the above Libraries are present in 9.2 whereas its mising in 
> 10.7. So i decided to go with lower version.

This is contrib modules. They can be shipped in separate package, 
postgresql10-contrib.x86_64 for example (in centos repo)

> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its 
> failed with following error during Check Mode.
>
> could not load library "$libdir/pg_reorg":
> ERROR:  could not access file "$libdir/pg_reorg": No such file or directory
>
> Observation : In this case , pg_reorg is not present on both Source and 
> Target . But strange its failing.

This is 3rd-party extension. Best way would be drop this extension on old 
cluster and perform upgrade. pg_reorg is abandoned for years, pg_repack is live 
fork if you need such tool.

regards, Sergei



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Thanks Mahendra for quick response.

I have followed same way, only difference i didn't bringup Source ( 9.2),
But not sure how that will resolve libraries issue.
All i tried with --check mode only

Thanks,


On Thu, Feb 28, 2019 at 10:23 AM Mahendra Singh  wrote:

> Hi
> Please try with below commands.
>
> Let we want to upgrade v6 to v11.
> Note: I installed my binary inside result folder.
>
> export OLDCLUSTER=./6_EDBAS/EDBAS/result
> export NEWCLUSTER=./11_EDBAS/EDBAS/result
> ./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin
> --new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data
> --new-datadir=$NEWCLUSTER/bin/data
>
> Note: old server should be in running state and new server should not be
> in running state.
>
> Thanks and Regards
> Mahendra
>
> On Thu, 28 Feb 2019 at 23:44, Perumal Raj  wrote:
>
>> Dear SMEs
>>
>> I have finally decided to move forward after great hospitality in Version
>> 9.2.24 :-)
>>
>> First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
>> following error during Check Mode.
>>
>> could not load library "$libdir/hstore": ERROR:  could not access file
>> "$libdir/hstore": No such file or directory
>> could not load library "$libdir/adminpack": ERROR:  could not access file
>> "$libdir/adminpack": No such file or directory
>> could not load library "$libdir/uuid-ossp": ERROR:  could not access file
>> "$libdir/uuid-ossp": No such file or directory
>>
>> Observation : the above Libraries are present in 9.2 whereas its mising
>> in 10.7. So i decided to go with lower version.
>>
>> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
>> its failed with following error during Check Mode.
>>
>> could not load library "$libdir/pg_reorg":
>> ERROR:  could not access file "$libdir/pg_reorg": No such file or
>> directory
>>
>> Observation : In this case , pg_reorg is not present on both Source and
>> Target . But strange its failing.
>>
>>
>> Method Used : pg_upgrade
>>
>> Could you please share some light here to get rid of  library issue .
>>
>> Thanks, in advance ,
>> Raju
>>
>>


Re: plpgsql variable named as SQL keyword

2019-02-28 Thread Pavel Stehule
čt 28. 2. 2019 v 19:20 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > Maybe we should to disallow variables named as sql reserved keyword.
>
> That would just break existing code.  There are lots of other
> examples where you can get away with such things.
>
> We've expended quite a lot of sweat to avoid reserving more names than
> we had to in plpgsql.  I'm disinclined to throw that away just because
> somebody found an error message confusing.  It's not like reserving
> "offset" would cause this case to work.
>

partially I solved it with new warning in plpgsql_check

https://github.com/okbob/plpgsql_check/commit/5b9ef57d570c1d11fb92b9cff76655a03767f662

postgres=# select * from plpgsql_check_function('omega.foo(int, int,
int)');
+---+

|
plpgsql_check_function |
+---+

| warning:0:3:statement block:name of variable "offset" is reserved
keyword |
| Detail: The reserved keyword was used as variable
name.   |
| error:42601:4:RETURN:query "SELECT offset + 1" returned 0
columns |
+---+

(3 rows)

I understand so it has not simple solution (or had not solution). I
reported it +/- for record.

Thank you for reply

Pavel


> regards, tom lane
>


Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
On 2019-02-28 13:16:02 -0500, Tom Lane wrote:
> Shawn Debnath  writes:
> > On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
> >> Also, I don't see why we'd need a fake pg_database row in the first
> >> place.  IIUC, the OID counter wraps around to FirstNormalObjectId, so
> >> nobody should have a database with an OID less than that value.
> 
> > We have scripts under catalog directory that can check to ensure OIDs 
> > aren't re-used accidentally. However, we still have to define an entry 
> > in a catalog somewhere and I was proposing creating a new one, 
> > pg_storage_managers?, to track these entries.
> 
> That would fail to capture the property that the smgr OIDs mustn't
> conflict with database OIDs, so the whole thing still seems like an
> ugly kluge from here.

It's definitely a kludge, but it doesn't seem that bad for now. Delaying
a nice answer till we have a more efficient bufmgr representation
doesn't seem crazy to me.

I don't think there's a real conflict risk given that we don't allow for
duplicated oids across catalogs at bootstrap time, and this is
definitely a bootstrap time issue.


> > Another thought: my colleague Anton Shyrabokau suggested potentially
> > re-using forknumber to differentiate smgrs. We are using 32 bits to
> > map 5 entries today.
> 
> Yeah, that seems like it might be a workable answer.

Yea, if we just split that into two 16bit entries, there'd not be much
lost. Some mild mild performance regression due to more granular
memory->register reads/writes, but I can't even remotely see that
matter.


> Since, per this discussion, the smgr IDs need not really be OIDs at
> all, we just need a few bits for them.

Personally I find having them as oids more elegant than the distaste
from misusing the database oid for a wihle, but I think it's fair to
disagree here.

Greetings,

Andres Freund



Re: Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Mahendra Singh
Hi
Please try with below commands.

Let we want to upgrade v6 to v11.
Note: I installed my binary inside result folder.

export OLDCLUSTER=./6_EDBAS/EDBAS/result
export NEWCLUSTER=./11_EDBAS/EDBAS/result
./11_EDBAS/EDBAS/result/bin/pg_upgrade --old-bindir=$OLDCLUSTER/bin
--new-bindir=$NEWCLUSTER/bin --old-datadir=$OLDCLUSTER/bin/data
--new-datadir=$NEWCLUSTER/bin/data

Note: old server should be in running state and new server should not be in
running state.

Thanks and Regards
Mahendra

On Thu, 28 Feb 2019 at 23:44, Perumal Raj  wrote:

> Dear SMEs
>
> I have finally decided to move forward after great hospitality in Version
> 9.2.24 :-)
>
> First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
> following error during Check Mode.
>
> could not load library "$libdir/hstore": ERROR:  could not access file
> "$libdir/hstore": No such file or directory
> could not load library "$libdir/adminpack": ERROR:  could not access file
> "$libdir/adminpack": No such file or directory
> could not load library "$libdir/uuid-ossp": ERROR:  could not access file
> "$libdir/uuid-ossp": No such file or directory
>
> Observation : the above Libraries are present in 9.2 whereas its mising in
> 10.7. So i decided to go with lower version.
>
> Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but
> its failed with following error during Check Mode.
>
> could not load library "$libdir/pg_reorg":
> ERROR:  could not access file "$libdir/pg_reorg": No such file or directory
>
> Observation : In this case , pg_reorg is not present on both Source and
> Target . But strange its failing.
>
>
> Method Used : pg_upgrade
>
> Could you please share some light here to get rid of  library issue .
>
> Thanks, in advance ,
> Raju
>
>


Re: plpgsql variable named as SQL keyword

2019-02-28 Thread Tom Lane
Pavel Stehule  writes:
> Maybe we should to disallow variables named as sql reserved keyword.

That would just break existing code.  There are lots of other
examples where you can get away with such things.

We've expended quite a lot of sweat to avoid reserving more names than
we had to in plpgsql.  I'm disinclined to throw that away just because
somebody found an error message confusing.  It's not like reserving
"offset" would cause this case to work.

regards, tom lane



Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2019-02-28 Thread Andres Freund


Hi,

Thanks for the quick response.

On 2019-02-28 18:28:37 +0900, Etsuro Fujita wrote:
> > I'm currently
> > converting the EPQ machinery to slots, and in course of that I (with
> > Horiguchi-san's help), converted RefetchForeignRow to return a slot. But
> > there's currently no in-core user of this facility...  I guess I can
> > rebase the preliminary postgres_fdw patch here, but it bitrotted
> > significantly.
> 
> I'll rebase that patch and help the testing, if you want me to.

That'd be awesome.


> > I also feel like there should be some test coverage for
> > an API in a nontrivial part of the code...
> 
> Yeah, but as mentioned above, the row-locking API is provided for FDWs
> operating against local storage, which we don't have in core, unfortunately.

Yea. file_fdw exists, but doesn't support modifications...

Greetings,

Andres Freund



Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Shawn Debnath  writes:
> On Thu, Feb 28, 2019 at 10:35:50AM -0500, Robert Haas wrote:
>> Also, I don't see why we'd need a fake pg_database row in the first
>> place.  IIUC, the OID counter wraps around to FirstNormalObjectId, so
>> nobody should have a database with an OID less than that value.

> We have scripts under catalog directory that can check to ensure OIDs 
> aren't re-used accidentally. However, we still have to define an entry 
> in a catalog somewhere and I was proposing creating a new one, 
> pg_storage_managers?, to track these entries.

That would fail to capture the property that the smgr OIDs mustn't
conflict with database OIDs, so the whole thing still seems like an
ugly kluge from here.

> Another thought: my colleague Anton Shyrabokau suggested potentially
> re-using forknumber to differentiate smgrs. We are using 32 bits to
> map 5 entries today.

Yeah, that seems like it might be a workable answer.

I really do think that relying on magic database OIDs is a bad idea;
if you think you aren't going to need a real database ID in there in
the future, you're being short-sighted.  And I guess the same argument
establishes that relying on magic tablespace OIDs would also be a bad
idea, even if there weren't a near-term proposal on the table for
needing real tablespace IDs in an alternate smgr.  So we need to find
some bits somewhere else, and the fork number field is the obvious
candidate.  Since, per this discussion, the smgr IDs need not really
be OIDs at all, we just need a few bits for them.

regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Andres Freund
Hi,

On 2019-02-28 10:02:46 -0800, Shawn Debnath wrote:
> We have scripts under catalog directory that can check to ensure OIDs 
> aren't re-used accidentally. However, we still have to define an entry 
> in a catalog somewhere and I was proposing creating a new one, 
> pg_storage_managers?, to track these entries. See [1] for previous
> discussion on this topic. We wouldn't need to do catalog lookups for
> being able to use the smgrs as the OIDs will be hardcoded in C, but the
> data will be available for posterity and OID reservation.

I'm inclined to just put them in pg_am, with a new type 's' (we already
have amtype = i for indexes, I'm planning to add 't' for tables
soon). While not a perfect fit for storage managers, it seems to fit
well enough.


> Another thought: my colleague Anton Shyrabokau suggested potentially
> re-using forknumber to differentiate smgrs. We are using 32 bits to
> map 5 entries today. This approach would be similar to how we split up
> the segment numbers and use the higher bits to identify forget or
> unlink requests for checkpointer.

That could probably be done, without incurring too much overhead
here. I'm not sure that the added complexity around the tree is worth it
however.

I personally would just go with the DB oid for the near future, where we
don't need per-database storage for those. And then plan for buffer
manager improvements.

Greetings,

Andres Freund



Question about pg_upgrade from 9.2 to X.X

2019-02-28 Thread Perumal Raj
Dear SMEs

I have finally decided to move forward after great hospitality in Version
9.2.24 :-)

First i attempted to upgrade from 9.2.24 to 10.7, but its failed with
following error during Check Mode.

could not load library "$libdir/hstore": ERROR:  could not access file
"$libdir/hstore": No such file or directory
could not load library "$libdir/adminpack": ERROR:  could not access file
"$libdir/adminpack": No such file or directory
could not load library "$libdir/uuid-ossp": ERROR:  could not access file
"$libdir/uuid-ossp": No such file or directory

Observation : the above Libraries are present in 9.2 whereas its mising in
10.7. So i decided to go with lower version.

Second  i tried to attempt to upgrade from 9.2.24 to 9.6.12,9.4,9.3 but its
failed with following error during Check Mode.

could not load library "$libdir/pg_reorg":
ERROR:  could not access file "$libdir/pg_reorg": No such file or directory

Observation : In this case , pg_reorg is not present on both Source and
Target . But strange its failing.


Method Used : pg_upgrade

Could you please share some light here to get rid of  library issue .

Thanks, in advance ,
Raju


Re: Remove Deprecated Exclusive Backup Mode

2019-02-28 Thread David Steele

On 2/27/19 8:22 PM, Christophe Pettus wrote:




On Feb 26, 2019, at 11:38, Magnus Hagander  wrote:
That should not be a wiki page, really, that should be part of the main 
documentation.


I was just suggesting using a wiki page to draft it before we drop it into the 
main documentation.  I'm open to other suggestions, of course!


It seems to me that the best way to discuss this is via a patch to the 
main documentation.  I've written something to get us started:


https://commitfest.postgresql.org/22/2042/

--
-David
da...@pgmasters.net



Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Jeff Janes
On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes  wrote:

> I've moved this to the hackers list, and added Teodor and Alexander of the
> bloom extension, as I would like to hear their opinions on the costing.
>

My previous patch had accidentally included a couple lines of a different
thing I was working on (memory leak, now-committed), so this patch removes
that diff.

I'm adding it to the commitfest targeting v13.  I'm more interested in
feedback on the conceptual issues rather than stylistic ones, as I would
probably merge the two functions together before proposing something to
actually be committed.

Should we be trying to estimate the false positive rate and charging
cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
recheck and reject those?  I don't think other index types do that, and I'm
inclined to think the burden should be on the user not to create silly
indexes that produce an overwhelming number of false positives.

Cheers,

Jeff


bloom_cost_v3.patch
Description: Binary data


plpgsql variable named as SQL keyword

2019-02-28 Thread Pavel Stehule
Hi

one user of plpgsql_check reported interesting error message

create or replace function omega.foo(a int)
returns int as $$
declare offset integer := 0;
begin
  return offset + 1;
end;
$$ language plpgsql;

postgres=# select omega.foo(10);
ERROR:  query "SELECT offset + 1" returned 0 columns
CONTEXT:  PL/pgSQL function omega.foo(integer) line 4 at RETURN

Maybe we should to disallow variables named as sql reserved keyword.

Regards

Pavel


Re: Drop type "smgr"?

2019-02-28 Thread Tom Lane
Robert Haas  writes:
> On Thu, Feb 28, 2019 at 1:03 AM Thomas Munro  wrote:
>> Nothing seems to break if you remove it (except for some tests using
>> it in an incidental way).  See attached.

> FWIW, +1 from me.

To be clear, I'm not objecting to the proposed patch either.  I was
just wondering where we plan to go from here, given that smgr.c wasn't
getting removed.

BTW, there is stuff in src/backend/storage/smgr/README that is
already obsoleted by this patch, and more that might be obsoleted
if development proceeds as discussed here.  So that needs a look.

regards, tom lane



Re: Drop type "smgr"?

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:06 AM Tom Lane  wrote:
> It's certainly possible/likely that we're going to end up needing to
> widen buffer tags to represent the smgr explicitly, because some use
> cases are going to need a real database spec, some are going to need
> a real tablespace spec, and some might need both.  Maybe we should
> just bite that bullet.

Well, Andres will probably complain about that.  He thinks, IIUC, that
the buffer tags are too wide already and that it's significantly
hurting performance on very very common operations - like buffer
lookups.  I haven't verified that myself, but I tend to think he knows
what he's talking about.

Anyway, given that your argument started off from the premise that we
had to have a pg_database row, I think we'd better look a little
harder at whether that premise is correct before getting too excited
here.  As I said in my earlier reply, I think that we probably don't
need to have a pg_database row given that we wrap around to
FirstNormalObjectId; any value we hard-code would be less than that.
If there are other reasons why we'd need that, it might be useful to
hear about them.

However, all we really need to decide on this thread is whether we
need 'smgr' exposed as an SQL type.  And I can't see why we need that
no matter how all of the rest of this turns out.  Nobody is currently
proposing to give users a choice of smgrs, just to use them for
internal stuff.  Even if that changed later, it doesn't necessarily
mean we'd add back an SQL type, or that if we did it would look like
the one we have today.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 12:28 PM, Robert Haas wrote:
> Mmmph.  If your customers always have a non-production instance where
> problems from production can be easily reproduced, your customers are
> not much like our customers.

Well I certainly did not mean to imply that this is always the case ;-)

But I think it is fair to tell customers that have these tradeoffs in
front of them that it would be even more wise in the case they decided
to use this capability.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 12:05 PM Joe Conway  wrote:
> I think that would affect the server logs too, no? Worth thinking about
> though...

Yeah, I suppose so, although there might be a way to work around that.

> Also manually marking all functions leakproof is far less convenient
> than turning off the check as this patch effectively allows. You would
> want to keep track of the initial condition and be able to restore it if
> needed. Doable but much uglier. Perhaps we could tolerate a hook that
> would allow an extension to do this though?

Yeah, possibly.  I guess we'd have to see how ugly that looks, but

> Again, remember that the actual messages are available in the server
> logs. The presumption is that the server logs are kept secure, and it is
> ok to leak information into them. How the customer does or does not
> decide to pass some of that information on to a support group becomes a
> problem to deal with on a case by case basis.
>
> Also, as mentioned up-thread, in many cases there is or should be a
> non-production instance available to use for reproducing problems to
> debug them. Presumably the data on such a system is faked or has already
> been cleaned up for a wider audience.

Mmmph.  If your customers always have a non-production instance where
problems from production can be easily reproduced, your customers are
not much like our customers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:50 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:44 AM Joe Conway  wrote:
>> No, and Tom stated as much too, but life is all about tradeoffs. Some
>> people will find this an acceptable compromise. For those that don't
>> they don't have to use it. IMHO we tend toward too much nannyism too often.
> 
> Well, I agree with that, too.
> 
> Hmm.  I don't think there's anything preventing you from implementing
> this in "userspace," is there?  A logging hook could suppress all
> error message text, and you could just mark all functions leakproof
> after that, and you'd have this exact behavior in an existing release
> with no core code changes, I think.

I think that would affect the server logs too, no? Worth thinking about
though...

Also manually marking all functions leakproof is far less convenient
than turning off the check as this patch effectively allows. You would
want to keep track of the initial condition and be able to restore it if
needed. Doable but much uglier. Perhaps we could tolerate a hook that
would allow an extension to do this though?

> If you do that, or just stick this patch into your own distro, I would
> be interested to hear some experiences from customers (and those who
> support them) after some time had gone by.  I find it hard to imagine
> delivering customer support in an environment configured this way, but
> sometimes my imagination is limited.

Again, remember that the actual messages are available in the server
logs. The presumption is that the server logs are kept secure, and it is
ok to leak information into them. How the customer does or does not
decide to pass some of that information on to a support group becomes a
problem to deal with on a case by case basis.

Also, as mentioned up-thread, in many cases there is or should be a
non-production instance available to use for reproducing problems to
debug them. Presumably the data on such a system is faked or has already
been cleaned up for a wider audience.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:44 AM Joe Conway  wrote:
> No, and Tom stated as much too, but life is all about tradeoffs. Some
> people will find this an acceptable compromise. For those that don't
> they don't have to use it. IMHO we tend toward too much nannyism too often.

Well, I agree with that, too.

Hmm.  I don't think there's anything preventing you from implementing
this in "userspace," is there?  A logging hook could suppress all
error message text, and you could just mark all functions leakproof
after that, and you'd have this exact behavior in an existing release
with no core code changes, I think.

If you do that, or just stick this patch into your own distro, I would
be interested to hear some experiences from customers (and those who
support them) after some time had gone by.  I find it hard to imagine
delivering customer support in an environment configured this way, but
sometimes my imagination is limited.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Joe Conway
On 2/28/19 11:37 AM, Robert Haas wrote:
> On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
>> > Although, and Joe may hate me for saying this, I think only the
>> > non-constants should be redacted to keep some level of usability for
>> > regular SQL errors. Maybe system errors like the above should be
>> > removed from client messages in general.
>>
>> I started down this path and it looked fragile. I guess if there is
>> generally enough support to think this might be viable I could open up
>> that door again, but I don't want to waste time if the approach is
>> really a non-starter as stated upthread :-/.
> 
> Hmm.  It seems to me that if there's a function that sometimes throws
> an error and other times does not, and if that behavior is dependent
> on the input, then even redacting the error message down to 'ERROR:
> error' does not remove the leak.  So it seems to me that regardless of
> what one thinks about the proposal from a usability perspective, it's
> probably not correct from a security standpoint.  Information that
> couldn't be leaked until present rules would leak with this change,
> when the new GUCs were turned on.
> 
> Am I wrong?


No, and Tom stated as much too, but life is all about tradeoffs. Some
people will find this an acceptable compromise. For those that don't
they don't have to use it. IMHO we tend toward too much nannyism too often.

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: Row Level Security − leakproof-ness and performance implications

2019-02-28 Thread Robert Haas
On Thu, Feb 28, 2019 at 11:14 AM Joe Conway  wrote:
> > Although, and Joe may hate me for saying this, I think only the
> > non-constants should be redacted to keep some level of usability for
> > regular SQL errors. Maybe system errors like the above should be
> > removed from client messages in general.
>
> I started down this path and it looked fragile. I guess if there is
> generally enough support to think this might be viable I could open up
> that door again, but I don't want to waste time if the approach is
> really a non-starter as stated upthread :-/.

Hmm.  It seems to me that if there's a function that sometimes throws
an error and other times does not, and if that behavior is dependent
on the input, then even redacting the error message down to 'ERROR:
error' does not remove the leak.  So it seems to me that regardless of
what one thinks about the proposal from a usability perspective, it's
probably not correct from a security standpoint.  Information that
couldn't be leaked until present rules would leak with this change,
when the new GUCs were turned on.

Am I wrong?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



  1   2   >