date:20180823

Re: [HACKERS] CLUSTER command progress monitor

2018-08-23 Thread Tatsuro Yamada


On 2017/11/22 6:07, Peter Geoghegan wrote:

On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:

Progress reporting on sorts seems like a tricky problem to me, as I
said before.  In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result.  There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples.


This would ordinarily be the point at which I'd say "but you're very
unlikely to require multiple passes for an external sort these days".
But I won't say that on this thread, because CLUSTER generally has
unusually wide tuples, and so is much more likely to be I/O bound, to
require multiple passes, etc. (I bet the v10 enhancements
disproportionately improved CLUSTER performance.)


Hi,

I came back to develop the feature for community.
V4 patch is corrected these following points:

  - Rebase on master (143290efd)
  - Fix document
  - Replace the column name scan_index_relid with cluster_index_relid.
  Thanks to Jeff Janes!

I'm now working on improving the patch based on Robert's comment related to
"Seqscan and Sort case" and also considering how to handle the "Index scan 
case".

Please find attached file.

Regards,
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa77a..5a4bd203ea 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3338,9 +3346,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the suppoted 
+   progress reporting commands are VACUUM and CLUSTER.
+   This may be expanded in the future.
   
 
  
@@ -3352,9 +3360,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Running VACUUM FULL is listed in pg_stat_progress_cluster
+   view because it uses CLUSTER command internally.  See .
   
 
   
@@ -3531,6 +3538,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,

   
 
+ 
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming (VACUUM FULL). 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   Current processing command: CLUSTER/VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase of cluster/vacuum full.  See  or .
+ 
+
+
+ scan_method
+ text
+ 
+   Scan method of table: index scan/seq scan.
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   OID of the index.
+ 
+
+
+ heap_tuples_total
+ bigint
+ 
+   Total number of heap tuples in the table.  This number is reported
+   as of the beginning of the scan; tuples added later will not be (and
+   need not be) visited by this CLUSTER and VACUUM FULL.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is scanning heap, 
+   writing new heap and scan heap and write new heap.
+ 
+
+
+ heap_tuples_vacuumed
+ bigint
+ 
+   Number of heap tuples vacuumed. This counter only advances when the
+   command is VACUUM FULL and the phase is scanning heap.
+ 
+
+
+

libpq debug log

2018-08-23 Thread Iwata, Aya

Hi,

I'm going to propose libpq debug log for analysis of queries on the application 
side.
I think that it is useful to determine whether the cause is on the application 
side or the server side when a slow query occurs. 

The provided information is "date and time" at which execution of processing is 
started, "query", "application side processing", which is picked up information 
from PQtrace(), and "connection id", which is for uniquely identifying the 
connection. 

To collect the log, set the connection string or environment variable.
- logdir or PGLOGDIR : directory where log file written
- logsize or PGLOGSIZE : maximum log size

What do you think about this?  Do you think that such feature is necessary?

Regards, 
Aya Iwata

document that MergeAppend can now prune

2018-08-23 Thread Amit Langote

Hi.

It seems that the following commit missed updating ddl.sgml:

commit 5220bb7533f9891b1e071da6461d5c387e8f7b09
Author: Heikki Linnakangas 
Date:   Thu Jul 19 13:49:43 2018 +0300

Expand run-time partition pruning to work with MergeAppend

Attached patch fixes that.

Thanks,
Amit
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 6aa035188f..70f5fbe917 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -3944,8 +3944,8 @@ EXPLAIN SELECT count(*) FROM measurement WHERE logdate 
= DATE '2008-01-01';
 
 
 
- Also, execution-time partition pruning currently only occurs for the
- Append node type, not MergeAppend.
+ Execution-time partition pruning currently occurs for the
+ Append and MergeAppend node types.

Re: Fix help option of contrib/oid2name

2018-08-23 Thread Tatsuro Yamada


On 2018/08/21 12:57, Tatsuro Yamada wrote:

On 2018/08/21 12:40, Michael Paquier wrote:

On Tue, Aug 21, 2018 at 12:26:15PM +0900, Tatsuro Yamada wrote:

BTW, can I add the patch to the Commitfest September?


You should.


Thanks, I'll do that.
I'll send 2 patches in this week, probably.

I revised the patch and created new tap tests.

fix_oid2name_wip3.patch:
  bug fix
- Remove "-P password" option from the document
  improvements
- Divide "Options section" into "Options" and "Connection Options" sections 
in
  the help message (--help)
- Add long options
- Add "-H host" and "-h host" options to the help message and the document
- Add environment variable (PGHOST, PGPORT and PGUSER) to the document

fix_vacuumlo_wip3.patch:
  improvements
- Add long options
- Add environment variable (PGHOST, PGPORT and PGUSER) to the document

Regarding to Long options,
I read the following both codes and I understand meanings of short options, so I
leave all long options in patches not only for connection options.

oid2name.c
/* these are the opts structures for command line params */
struct options
{
eary   *tables;
eary   *oids;
eary   *filenodes;

boolquiet;
boolsystables;
boolindexes;
boolnodb;
boolextended;
booltablespaces;

char   *dbname;
char   *hostname;
char   *port;
char   *username;
const char *progname;
};

vacuumlo.c
struct _param
{
char   *pg_user;
enum trivalue pg_prompt;
char   *pg_port;
char   *pg_host;
const char *progname;
int verbose;
int dry_run;
longtransaction_limit;
};


Following TAP tests is for checking command-line options but I coudn't add some
test cases such as it is required argument and mixed varaiety of options.
I'm not sure about naming rule of tap test, so I added 300 and 301 to names as
temporarily. Please rename these files to more suitable names.

  300_oid2name.pl
  301_vacuumlo.pl

Please find attached files.

Thanks,
Tatsuro Yamada

diff --git a/contrib/oid2name/oid2name.c b/contrib/oid2name/oid2name.c
index 63e360c4c5..2779af78e8 100644
--- a/contrib/oid2name/oid2name.c
+++ b/contrib/oid2name/oid2name.c
@@ -60,8 +60,26 @@ void		sql_exec_dumpalltbspc(PGconn *, struct options *);
 void
 get_opts(int argc, char **argv, struct options *my_opts)
 {
+	static struct option long_options[] = {
+		{"host", required_argument, NULL, 'h'},
+		{"host", required_argument, NULL, 'H'}, /* deprecated */
+		{"port", required_argument, NULL, 'p'},
+		{"username", required_argument, NULL, 'U'},
+		{"dbname", required_argument, NULL, 'd'},
+		{"table", required_argument, NULL, 't'},
+		{"oid", required_argument, NULL, 'o'},
+		{"filenode", required_argument, NULL, 'f'},
+		{"quiet", no_argument, NULL, 'q'},
+		{"systable", no_argument, NULL, 'S'},
+		{"extended", no_argument, NULL, 'x'},
+		{"index", no_argument, NULL, 'i'},
+		{"tablespace", no_argument, NULL, 's'},
+		{NULL, 0, NULL, 0}
+	};
+
 	int			c;
 	const char *progname;
+	int			optindex;
 
 	progname = get_progname(argv[0]);
 
@@ -93,10 +111,30 @@ get_opts(int argc, char **argv, struct options *my_opts)
 	}
 
 	/* get opts */
-	while ((c = getopt(argc, argv, "H:p:U:d:t:o:f:qSxish")) != -1)
+	while ((c = getopt_long(argc, argv, "h:H:p:U:d:t:o:f:qSxis", long_options, )) != -1)
 	{
 		switch (c)
 		{
+/* host to connect to */
+			case 'h':
+my_opts->hostname = pg_strdup(optarg);
+break;
+
+/* (deprecated) host to connect to */
+			case 'H':
+my_opts->hostname = pg_strdup(optarg);
+break;
+
+/* port to connect to on remote host */
+			case 'p':
+my_opts->port = pg_strdup(optarg);
+break;
+
+/* username */
+			case 'U':
+my_opts->username = pg_strdup(optarg);
+break;
+
 /* specify the database */
 			case 'd':
 my_opts->dbname = pg_strdup(optarg);
@@ -122,46 +160,26 @@ get_opts(int argc, char **argv, struct options *my_opts)
 my_opts->quiet = true;
 break;
 
-/* host to connect to */
-			case 'H':
-my_opts->hostname = pg_strdup(optarg);
-break;
-
-/* port to connect to on remote host */
-			case 'p':
-my_opts->port = pg_strdup(optarg);
-break;
-
-/* username */
-			case 'U':
-my_opts->username = pg_strdup(optarg);
-break;
-
 /* display system tables */
 			case 'S':
 my_opts->systables = true;
 break;
 
-/* also display indexes */
-			case 'i':
-my_opts->indexes = true;
-break;
-
 /* display extra columns */
 			case 'x':
 my_opts->extended = true;
 break;
 
+/* also display indexes */
+			case 'i':
+my_opts->indexes = true;
+break;
+

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley

On 24 August 2018 at 14:12, Stephen Frost  wrote:
> This is happening at the same time as some optimizations around GROUP
> BY, so either there's something different about what's happening there
> and I didn't appreciate it, or does that optimization suffer from a
> similar issue?

There are two separate optimisations in Jim's version of the patch:

1) Attempting to remove useless DISTINCT clause items in cases where a
true subset of the clauses make up the entire table's primary key.
This can be applied no matter how many joins are in the query.  If a
table's PK is a,b then it's not going to be any more distinct if we
distinctify on a,b,c. Distification on a,b is the most we'll never
need.  We already apply this optimisation to the GROUP BY clause.

2) Attempt to remove the entire DISTINCT clause if the query is to a
single base table when the DISTINCT clause contains the entire table's
PK. Here there's need uniquify the results.

I started this thread for #1 and Jim would like to add #2 as an
additional optimisation.  I feel that #2 is different enough from #1
that they should be considered independently of each other. The area
where #2 should be implemented is far away at the other end of
planning from where #1 is being done.  I believe that two separate
patches are the best way forward here as it does not really make sense
to reject #1 because there are concerns with #2, vice versa.  I think
if Jim would like to propose #2, then I think a separate thread is a
better option than tagging along on this one. This might also reduce
confusion.

I'd also rather not turn this thread into people discussing/reviewing
Jim's work that he plans to implement in some fork taken from Postgres
10.5. I don't really think even this mailing list is the right place
for that, let alone this thread.

> > I've not read the patch, but David's reaction makes it sound like its
> > processing is done too early.  There are right places and wrong places
> > to do most everything in the planner, and I do not wish to accept a
> > patch that does something in the wrong place.
>
> Right, I definitely agree with you there.  This seemed like a reasonable
> place given the similar optimization (at least in appearance to me)
> being done there for the GROUP BY case.  I'm happy to admit that I
> haven't looked at it in very much depth (hence my question to David) and
> I'm not an expert in this area, but I did want to bring up that the
> general idea and the relative trade-offs at least sounded reasonable.

You might be confusing #1 and #2. My concern is with #2.  The existing
GROUP BY clause optimisation is almost identical to #1. I just wanted
to also apply it to the DISTINCT clause.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Pluggable Storage - Andres's take

2018-08-23 Thread Andres Freund

Hi,

On 2018-08-24 11:55:41 +1000, Haribabu Kommi wrote:
> On Tue, Aug 21, 2018 at 6:59 PM Andres Freund  wrote:
> 
> > On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund  wrote:
> > > > I'm currently in the process of rebasing zheap onto the pluggable
> > > > storage work. The goal, which seems to work surprisingly well, is to
> > > > find issues that the current pluggable storage patch doesn't yet deal
> > > > with.  I plan to push a tree including a lot of fixes and improvements
> > > > soon.
> > > >
> > > That's good. Did you find any problems in porting zheap into pluggable
> > > storage? Does it needs any API changes or new API requirement?
> >
> > A lot, yes. The big changes are:
> > - removal of HeapPageScanDesc
> > - introduction of explicit support functions for tablesample & bitmap scans
> > - introduction of callbacks for vacuum_rel, cluster
> >
> > And quite a bit more along those lines.
> >
> 
> OK. Those are quite a bit of changes.

I've pushed a current version of that to my git tree to the
pluggable-storage branch. It's not really a version that I think makese
sense to review or such, but it's probably more useful if you work based
on that.  There's also the pluggable-zheap branch, which I found
extremely useful to develop against.

There's a few further changes since last time: - Pluggable handlers are
now stored in static global variables, and thus do not need to be copied
anymore - VACUUM FULL / CLUSTER is moved into one callback that does the
actual copying. The various previous rewrite callbacks imo integrated at
the wrong level.  - there's a GUC that allows to change the default
table AM - moving COPY to use slots (roughly based on your / Haribabu's
patch) - removed the AM specific shmem initialization callbacks -
various AMs are going to need the syncscan APIs, so moving that into AM
callbacks doesn't make sense.

Missing:
- callback for the second scan of CREATE INDEX CONCURRENTLY
- commands/analyze.c integration (Working on it)
- fixing your (Haribabu's) slotification of copy patch to compute memory
  usage somehow
- table creation callback, currently the pluggable-zheap patch has a few
  conditionals outside of access/zheap for that purpose (see RelationTruncate
- header structure cleanup

And then:
- lotsa cleanups
- rebasing onto a newer version of the abstract slot patchset
- splitting out smaller patches

You'd moved the bulk insert into tableam callbacks - I don't quite get
why? There's not really anything AM specific in that code?

> > > Does the new TupleTableSlot abstraction patches has fixed any of these
> > > issues in the recent thread [1]? so that I can look into the change of
> > > FDW API to return slot instead of tuple.
> >
> > Yea, that'd be a good thing to start with.
> >
> 
> I found out only the RefetchForeignRow API needs the change and done the
> same.
> Along with that, I fixed all the issues of  running make check-world.
> Attached patches
> for the same.

Thanks, that's really helpful!  I'll try to merge these soon.

I'm starting to think that we're getting closer to something that
looks right from a high level, even though there's a lot of details to
clean up.

Greetings,

Andres Freund

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-23 Thread Michael Paquier

On Thu, Aug 23, 2018 at 10:00:49PM +0900, Etsuro Fujita wrote:
> I tried this today, but doing git behind the corporate firewall doesn't
> work.  I don't know the clear cause of that, so I'll investigate that
> tomorrow.

You may be able to tweak that by using https as origin point or proper
git proxy settings?
--
Michael


signature.asc
Description: PGP signature

Re: C99 compliance for src/port/snprintf.c

2018-08-23 Thread Thomas Munro

On Thu, Aug 16, 2018 at 12:24 PM, Thomas Munro
 wrote:
> FWIW cfbot is using Visual Studio 2010 right now.  Appveyor provides
> 2008, 2010, 2012, 2013 (AKA 12.0), 2015, 2017, and to test with a
> different toolchain you can take the example patch from
> https://wiki.postgresql.org/wiki/Continuous_Integration and add a line
> like this to the end of the install section (worked for me; for 2015+
> you probably also need to request a different image):
>
>   - 'call "C:\Program Files (x86)\Microsoft Visual Studio
> 12.0\VC\vcvarsall.bat" x86_amd64'
>
> I'll make that change to cfbot if we decree that it is the new
> baseline for PostgreSQL on Windows.

Done.

-- 
Thomas Munro
http://www.enterprisedb.com

RE: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-08-23 Thread Shinoda, Noriyoshi (PN Japan GCS Delivery)

Hi, Hackers.

I updated the patch that I attached the other day.
Added release of dynamically allocated memory and modified source according to 
coding rule.

Regards,
Noriyoshi Shinoda
--
From: Shinoda, Noriyoshi (PN Japan GCS Delivery) 
Sent: Friday, August 17, 2018 3:07 PM
To: pgsql-hack...@postgresql.org
Subject: [HACKERS] Proposal to add work_mem option to postgres_fdw module

Hello hackers,

The attached patch adds a new option work_mem to postgres_fdw contrib module.
Previously, it was impossible to change the setting of work_mem for remote 
session with connection by postgres_fdw.
By adding this option to the CREATE SERVER statement, you can also change the 
work_mem setting for remote sessions with postgres_fdw.

Usage is the same as other options, and specify work_mem as an option of the 
CREATE SERVER statement. The string you specify is the same as the format of 
work_mem in GUC.

Example
--
postgres=# CREATE SERVER remsvr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS ( 
host 'remhost1', port '5433', dbname 'demodb', work_mem '16MB'); 
CREATE SERVER

Not only CREATE SERVER statement but ALTER SERVER statement are also available.

postgres=# ALTER SERVER remsvr1 OPTIONS ( SET work_mem '32MB'); 
ALTER SERVER 
postgres=# SELECT * FROM pg_foreign_server; 
-[ RECORD 1 ]--
srvname    | remsvr1
srvowner   | 10
srvfdw | 16389
srvtype    |
srvversion |
srvacl |
srvoptions | {host=remhost1,port=5433,dbname=demodb,work_mem=32MB}

Within this patch, I implemented a change in GUC work_mem by specifying 
"options -c work_mem=value" for the PQconnectdbParams API function.

Best Regards,
Noriyoshi Shinoda



postgres_fdw_work_mem_v2.patch
Description: postgres_fdw_work_mem_v2.patch

Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-23 Thread Thomas Munro

On Fri, Aug 24, 2018 at 1:44 PM, Andres Freund  wrote:
> On 2018-08-23 22:02:26 +0200, Peter Eisentraut wrote:
>> On 23/08/2018 03:31, Andres Freund wrote:
>> > On 2018-08-22 17:09:05 -0700, Andres Freund wrote:
>> >> Attached is a version doing so.
>> >
>> > Mildly updated version attached. This adds an explanatory commit
>> > message, removes one stray docs C89 reference, and fixes a mis-squashing
>> > of a patch.
>>
>> Let's do it!
>
> Pushed the first two.  I'll send the presumably affected buildfarm
> owners an email, asking them whether they want to update.

Note that designated initialisers are not in C++ yet (though IIUC they
have been accepted in P0329R4[1] for the draft that will become C++20
whatever it turns out to be).

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4691.html

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Hint to set owner for tablespace directory

2018-08-23 Thread Michael Paquier

On Thu, Aug 23, 2018 at 02:24:25PM +0300, Maksim Milyutin wrote:
> I want to add patch that prints hint to set required owner for the
> tablespace directory if this is the cause of the problem (*errno == EPERM*
> after calling *chmod*).

Please do not forget to add this patch to the next commit fest.
--
Michael


signature.asc
Description: PGP signature

Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-23 Thread Michael Paquier

On Thu, Aug 23, 2018 at 09:53:57PM +, Bossart, Nathan wrote:
> This seems reasonable to me.  I think establishing the expected
> behavior here is a good idea.

Thanks, I have pushed the new test series, and reused it to check the
new version of the main patch, which is attached.  I have added a commit
message and I have indented the thing.

After pondering about it, I have also reworked the portion for
partitioned tables so as the list of partitions processed is unchanged
on HEAD, and we keep a consistent behavior compared to past versions.
If VACUUM processing for partitioned tables was something new in 11, I
think that we could have considered it, but changing silently something
that people may rely on for more than one year now is not very
appealing.

I can personally imagine data models with multiple layers of partitions
where the top-most parent has the most restricted access, and then
things get more permitted the more down you get.  For example let's
imagine a table listing a population, which is split by cities.  The
top-most partitioned table references the whole country, which say only
the president has access to.  Then there are partitions which can be
accessed only by the majors of each city.  In this case, even if a mayor
does a VACUUM FULL of its leaf partition then a full read would be
blocked even for the president.

The reverse is technically possible, aka the top-most parent is not
really restrictive, and leafs get more and more restricted, but
logically that does not make much sense as the top-most parent would be
just useless for any kind of operations so as a full table scan.

Still, in the first case, say that each city major uses the same
application layer which vacuums the top-most parent, then we'd break
something that worked in 10 and 11.
--
Michael
From 4f0b6da8236286ab1487d2a537ae777aa3a6490b Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 24 Aug 2018 11:02:19 +0900
Subject: [PATCH] Improve VACUUM and ANALYZE by avoiding early lock queue

A caller of VACUUM can perform early lookup obtention which can cause
other sessions to block on the request done, causing potentially DOS
attacks as even a non-privileged user can attempt a truncation of a
critical catalog table to block even all incoming connection attempts.

Contrary to TRUNCATE, a client could attempt a system-wide VACUUM after
building the list of relations to VACUUM, which can cause vacuum_rel()
to try to lock the relation but the thing would just lock.  When the
client specifies a list of relations and the relation needs to be
skipped, fail hard so as there is no conflict with any relation a user
has no rights to work on.

vacuum_rel() already had the sanity checks needed, except that those
were applied too late.  This commit refactors the code so as relation
skips are checked beforehand, making it safer to avoid too early lock,
for both manual VACUUM with and without a list of relations specified.

An isolation test is added emulating the fact that early locks do not
happen anymore.

When a partitioned table is listed in a manual VACUUM or ANALYZE
command, its full list of partitions is fetched, all partition gets
added to the list to work on, and then each one of them is processed one
by one, with ownership checks happening at the later phase of
vacuum_rel() or analyze_rel().  Trying to do early ownership checks for
each partition is proving to be tedious as this would result in deadlock
risks with lock upgrades, and skipping all partitions if the listed
partitioned table is not owned would result in a behavior change
compared to how Postgres 10 has implemented vacuum for partitioned
tables.  The original problem reported related to early lock queue for
critical relations is fixed anyway, so priority is given to avoiding a
backward-incompatible behavior.

Reported-by: Lloyd Albin, Jeremy Schneider
Author: Michael Paquier
Reviewed by: Nathan Bossart, Kyotaro Horiguchi
Discussion: https://postgr.es/m/152512087100.19803.12733865831237526...@wrigleys.postgresql.org
---
 src/backend/commands/analyze.c|  28 +---
 src/backend/commands/vacuum.c | 156 +-
 src/include/commands/vacuum.h |   3 +
 .../isolation/expected/vacuum-conflict.out| 149 +
 src/test/isolation/isolation_schedule |   1 +
 src/test/isolation/specs/vacuum-conflict.spec |  51 ++
 6 files changed, 328 insertions(+), 60 deletions(-)
 create mode 100644 src/test/isolation/expected/vacuum-conflict.out
 create mode 100644 src/test/isolation/specs/vacuum-conflict.spec

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 3e148f03d0..edbdce81f2 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -196,27 +196,17 @@ analyze_rel(Oid relid, RangeVar *relation, int options,
 	}

 	/*
-	 * Check permissions --- this should match vacuum's check!
+	 * Check if relation needs to be skipped based on

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * David Rowley (david.row...@2ndquadrant.com) wrote:
> >> On 24 August 2018 at 11:34, Stephen Frost  wrote:
> >>> * David Rowley (david.row...@2ndquadrant.com) wrote:
>  My personal opinion of only being able to completely remove the
>  DISTINCT when there's a single item in the rtable (or a single base
>  table) is that it's just too poor to bother with.
> 
> > Hm, so you're suggesting that this isn't the right place for this
> > optimization to be implemented, even now, with the single-relation
> > caveat?
> 
> There is no case where planner optimizations should depend on the length
> of the rtable.  Full stop.
> 
> It could make sense to optimize if there is just one baserel in the join
> tree --- although even that is best checked only after join removal.

Hm, that's certainly a fair point.

> As an example of the difference, such an optimization should be able to
> optimize "select * from view" if the view contains just one base table.
> The rtable will list both the view and the base table, but the view
> is only hanging around for permissions-checking purposes; it should not
> affect the planner's behavior.

This is happening at the same time as some optimizations around GROUP
BY, so either there's something different about what's happening there
and I didn't appreciate it, or does that optimization suffer from a
similar issue?

> I've not read the patch, but David's reaction makes it sound like its
> processing is done too early.  There are right places and wrong places
> to do most everything in the planner, and I do not wish to accept a
> patch that does something in the wrong place.

Right, I definitely agree with you there.  This seemed like a reasonable
place given the similar optimization (at least in appearance to me)
being done there for the GROUP BY case.  I'm happy to admit that I
haven't looked at it in very much depth (hence my question to David) and
I'm not an expert in this area, but I did want to bring up that the
general idea and the relative trade-offs at least sounded reasonable.

I'll also note that I didn't see these concerned raised earlier on the
thread when I re-read your remarks on it, so I'm a bit concerned that
perhaps either this isn't an actual concern to be realized or perhaps it
was missed previously.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Tom Lane

Stephen Frost  writes:
> * David Rowley (david.row...@2ndquadrant.com) wrote:
>> On 24 August 2018 at 11:34, Stephen Frost  wrote:
>>> * David Rowley (david.row...@2ndquadrant.com) wrote:
 My personal opinion of only being able to completely remove the
 DISTINCT when there's a single item in the rtable (or a single base
 table) is that it's just too poor to bother with.

> Hm, so you're suggesting that this isn't the right place for this
> optimization to be implemented, even now, with the single-relation
> caveat?

There is no case where planner optimizations should depend on the length
of the rtable.  Full stop.

It could make sense to optimize if there is just one baserel in the join
tree --- although even that is best checked only after join removal.
As an example of the difference, such an optimization should be able to
optimize "select * from view" if the view contains just one base table.
The rtable will list both the view and the base table, but the view
is only hanging around for permissions-checking purposes; it should not
affect the planner's behavior.

I've not read the patch, but David's reaction makes it sound like its
processing is done too early.  There are right places and wrong places
to do most everything in the planner, and I do not wish to accept a
patch that does something in the wrong place.

regards, tom lane

Re: Pluggable Storage - Andres's take

2018-08-23 Thread Haribabu Kommi

On Tue, Aug 21, 2018 at 6:59 PM Andres Freund  wrote:

> On 2018-08-21 16:55:47 +1000, Haribabu Kommi wrote:
> > On Sun, Aug 5, 2018 at 7:48 PM Andres Freund  wrote:
> > > I'm currently in the process of rebasing zheap onto the pluggable
> > > storage work. The goal, which seems to work surprisingly well, is to
> > > find issues that the current pluggable storage patch doesn't yet deal
> > > with.  I plan to push a tree including a lot of fixes and improvements
> > > soon.
> > >
> > That's good. Did you find any problems in porting zheap into pluggable
> > storage? Does it needs any API changes or new API requirement?
>
> A lot, yes. The big changes are:
> - removal of HeapPageScanDesc
> - introduction of explicit support functions for tablesample & bitmap scans
> - introduction of callbacks for vacuum_rel, cluster
>
> And quite a bit more along those lines.
>

OK. Those are quite a bit of changes.


> > Does the new TupleTableSlot abstraction patches has fixed any of these
> > issues in the recent thread [1]? so that I can look into the change of
> > FDW API to return slot instead of tuple.
>
> Yea, that'd be a good thing to start with.
>

I found out only the RefetchForeignRow API needs the change and done the
same.
Along with that, I fixed all the issues of  running make check-world.
Attached patches
for the same.

Now I will look into the remaining FIXME's that don't conflict with your
further changes.

Regards,
Haribabu Kommi
Fujitsu Australia


0002-check-world-fixes.patch
Description: Binary data


0001-FDW-RefetchForeignRow-API-prototype-change.patch
Description: Binary data

Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-23 Thread Andres Freund

On 2018-08-23 22:02:26 +0200, Peter Eisentraut wrote:
> On 23/08/2018 03:31, Andres Freund wrote:
> > On 2018-08-22 17:09:05 -0700, Andres Freund wrote:
> >> Attached is a version doing so.
> > 
> > Mildly updated version attached. This adds an explanatory commit
> > message, removes one stray docs C89 reference, and fixes a mis-squashing
> > of a patch.
> 
> Let's do it!

Pushed the first two.  I'll send the presumably affected buildfarm
owners an email, asking them whether they want to update.


Greetings,

Andres Freund

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-08-23 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Aug 24, 2018 at 6:53 AM, Tom Lane  wrote:
>> I think this is just a timing problem: the signal gets sent,
>> but it might or might not get received before the current statement ends.

> How about we just wait forever if the function manages to return?
> select case when pg_cancel_backend(pg_backend_pid(), '...') then
> pg_sleep('infinity') end;

Hm, that might work.  I'd pick a long but not infinite timeout --- maybe
60 sec would be good.

regards, tom lane

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-08-23 Thread Peter Geoghegan

On Thu, Aug 23, 2018 at 5:20 PM, Peter Geoghegan  wrote:
> This patch adds an enhancement that is an example of a broader class
> of optimizer enhancement primarily aimed at making star-schema queries
> have more efficient plans, by arranging to use several independent
> nested loop joins based on a common pattern. Each nestloop join has
> one particular dimension table on the outer side, and the fact table
> on the inner side.

Correction: I meant that for each join, the outer side is a scan of
some particular fact table index, while the inner side probes some
particular dimension table's primary key, and evaluates the
dimension's qual.

-- 
Peter Geoghegan

Re: TupleTableSlot abstraction

2018-08-23 Thread Andres Freund

Hi,

On 2018-08-20 19:51:33 +0530, Ashutosh Bapat wrote:
> Sorry, forgot about that. Here's the patch set with that addressed.

Btw, you attach files as tar.zip, but they're actually gzip
compressed...


> From 838a463646a048b3dccff95079a514fdc86effb3 Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat 
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 01/11] Split ExecStoreTuple into ExecStoreHeapTuple and
>  ExecStoreBufferHeapTuple
> 
> ExecStoreTuple() accepts a heap tuple from a buffer or constructed
> on-the-fly. In the first case the caller passed a valid buffer and in
> the later case it passes InvalidBuffer. In the first case,
> ExecStoreTuple() pins the given buffer and in the later case it
> records shouldFree flag. The function has some extra checks to
> differentiate between the two cases.  The usecases never overlap thus
> spending extra cycles in checks is useless. Hence separate these
> usecases into separate functions ExecStoreHeapTuple() to store
> on-the-fly tuple and ExecStoreBufferHeapTuple() to store an on-disk
> tuple from a buffer. This allows to shave some extra cycles while
> storing a tuple in the slot.

It doesn't *yet* allow shaving extra cycles, no?

> *  SLOT ACCESSORS
>   *   ExecSetSlotDescriptor   - set a slot's tuple descriptor
> - *   ExecStoreTuple  - store a physical tuple in the 
> slot
> + *   ExecStoreHeapTuple  - store an on-the-fly heap 
> tuple in the slot
> + *   ExecStoreBufferHeapTuple - store an on-disk heap tuple in the 
> slot
>   *   ExecStoreMinimalTuple   - store a minimal physical tuple in the 
> slot
>   *   ExecClearTuple  - clear contents of a slot
>   *   ExecStoreVirtualTuple   - mark slot as containing a virtual
>   *   tuple

I'd advocate for a separate patch ripping these out, they're almost
always out of date.


>  /* 
> - *   ExecStoreTuple
> + *   ExecStoreHeapTuple
>   *
> - *   This function is used to store a physical tuple into a specified
> + *   This function is used to store an on-the-fly physical tuple 
> into a specified
>   *   slot in the tuple table.
>   *
>   *   tuple:  tuple to store
>   *   slot:   slot to store it in
> - *   buffer: disk buffer if tuple is in a disk page, else 
> InvalidBuffer
>   *   shouldFree: true if ExecClearTuple should pfree() the tuple
>   *   when done with it
>   *
> - * If 'buffer' is not InvalidBuffer, the tuple table code acquires a pin
> - * on the buffer which is held until the slot is cleared, so that the tuple
> - * won't go away on us.
> + * shouldFree is normally set 'true' for tuples constructed on-the-fly.  But 
> it
> + * can be 'false' when the referenced tuple is held in a tuple table slot
> + * belonging to a lower-level executor Proc node.  In this case the 
> lower-level
> + * slot retains ownership and responsibility for eventually releasing the
> + * tuple.  When this method is used, we must be certain that the upper-level
> + * Proc node will lose interest in the tuple sooner than the lower-level one
> + * does!  If you're not certain, copy the lower-level tuple with 
> heap_copytuple
> + * and let the upper-level table slot assume ownership of the copy!
> + *
> + * Return value is just the passed-in slot pointer.
> + * 
> + */
> +TupleTableSlot *
> +ExecStoreHeapTuple(HeapTuple tuple,
> +TupleTableSlot *slot,
> +bool shouldFree)
> +{
> + /*
> +  * sanity checks
> +  */
> + Assert(tuple != NULL);
> + Assert(slot != NULL);
> + Assert(slot->tts_tupleDescriptor != NULL);
> +
> + /*
> +  * Free any old physical tuple belonging to the slot.
> +  */
> + if (slot->tts_shouldFree)
> + heap_freetuple(slot->tts_tuple);
> + if (slot->tts_shouldFreeMin)
> + heap_free_minimal_tuple(slot->tts_mintuple);
> +
> + /*
> +  * Store the new tuple into the specified slot.
> +  */
> + slot->tts_isempty = false;
> + slot->tts_shouldFree = shouldFree;
> + slot->tts_shouldFreeMin = false;
> + slot->tts_tuple = tuple;
> + slot->tts_mintuple = NULL;
> +
> + /* Mark extracted state invalid */
> + slot->tts_nvalid = 0;
> +
> + return slot;
> +}

Uh, there could very well be a buffer previously stored in the slot, no?
This can't currently be applied independently afaict.


> From c4c55bc0d501400ffca2e7393039b2d38660cd2d Mon Sep 17 00:00:00 2001
> From: Ashutosh Bapat 
> Date: Mon, 13 Aug 2018 11:27:57 +0530
> Subject: [PATCH 02/11] tts_nvalid in TupleTableSlot is AttrNumber

> @@ -125,7 +125,7 @@ typedef struct TupleTableSlot
>   MemoryContext tts_mcxt; /* slot itself is in this context */
>   Buffer  tts_buffer; /*

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-08-23 Thread Thomas Munro

On Fri, Aug 24, 2018 at 6:53 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Wed, Jul 25, 2018 at 7:27 PM, Daniel Gustafsson  wrote:
>>> Seems the build of the updated patch built and tested Ok.  Still have no 
>>> idea
>>> why the previous one didn’t.
>
>> That problem apparently didn't go away.  cfbot tested it 7 times in
>> the past week, and it passed only once on Windows:
>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.9691
>> The other times all failed like this:
>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.9833
>
> I think this is just a timing problem: the signal gets sent,
> but it might or might not get received before the current statement ends.
> It's possible that a signal-sent-to-self can be expected to be received
> synchronously on all Unix platforms, but I wouldn't entirely bet on that
> (in particular, the POSIX text for kill(2) does NOT guarantee it); and
> our Windows signal implementation certainly doesn't guarantee anything
> of the sort.

How about we just wait forever if the function manages to return?

select case when pg_cancel_backend(pg_backend_pid(), '...') then
pg_sleep('infinity') end;

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost

Greetings,

* David Rowley (david.row...@2ndquadrant.com) wrote:
> On 24 August 2018 at 11:34, Stephen Frost  wrote:
> > * David Rowley (david.row...@2ndquadrant.com) wrote:
> >> My personal opinion of only being able to completely remove the
> >> DISTINCT when there's a single item in the rtable (or a single base
> >> table) is that it's just too poor to bother with.  I think such a
> >> solution is best left to another patch and it should be much more
> >> complete and be able to track the unique properties through joins.
> >
> > Doesn't that move the goalposts for this pretty far off?  Is there
> > some reason that doing this for the single-item case will make it
> > difficult to implement the additional improvements you're suggesting
> > later?
> 
> The DISTINCT clause removal would likely need to wait until around
> where the distinct paths are created and all the join processing has
> been done.
> 
> If the optimisation is deemed worthwhile, then maybe that would be the
> place to put this code along with a comment that explains how it can
> be improved to support the removal with multiple relations.

Hm, so you're suggesting that this isn't the right place for this
optimization to be implemented, even now, with the single-relation
caveat?

I do agree that including comments about how it could/should be improved
would be good for future developers.

> > Given the cost of a DISTINCT and that we know pretty easily if one
> > exists or not, trying to eliminate it, even in the single-item case,
> > seems pretty worthwhile to me.  If it makes it difficult to improve on
> > this later then I could see pushing back on it, but this patch doesn't
> > seem like its doing that.
> 
> If people truly think it's worthwhile, then maybe it is, but if it's
> being done because it can be done then perhaps it's best not to
> bother.  I was just thinking of the bug reports we'd get when people
> see it does not work with multiple relations. There's already no
> shortage of bug reports that are not bugs. I also wanted to keep this
> patch just doing the simple case that we already apply to GROUP BY,
> but control over the patch may well be out of my hands now.

I'm not really sure I'm following along this part of the discussion.
As long as it isn't making the code particularly ugly or violating the
seperation of various components or making later hacking in this area
particularly worse, then the question primairly comes down to making
sure that the code is of quality and that the performance optimization
is worth the cycles to check for the opportunity and that it's not too
expensive compared to the cost of the operation to implement it, all of
which can be pretty objectively assessed and provide the answer to if
it's worthwhile or not.  Given the cost of a Unique node, and that we
will only check for this optimization when we are going to have to
introduce one, it certainly seems like it's worthwhile (again, proviso
on code quality, et al, above, but I don't think that's what you were
arguing a concern about here).

In short, I don't buy the concern about having to deal with "bug reports
that aren't bug reports" or that they should be a barrier to feature
work.  We already see complaints pretty regularly about optimizations we
don't have and this would at least be a step in the right direction for
dealing with these kinds of queries and that does seem better than doing
nothing, imv.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-08-23 Thread Peter Geoghegan

On Thu, Aug 23, 2018 at 11:10 AM, Tom Lane  wrote:
> Rebased up to HEAD, per cfbot nagging.  Still no substantive change from
> v2.

I happened to have the opportunity to talk to Tom about this patch in
person. I expressed some very general concerns that are worth
repeating publicly.

This patch adds an enhancement that is an example of a broader class
of optimizer enhancement primarily aimed at making star-schema queries
have more efficient plans, by arranging to use several independent
nested loop joins based on a common pattern. Each nestloop join has
one particular dimension table on the outer side, and the fact table
on the inner side. The query plan is not so much a tree as it is a DAG
(directed acyclic graph), because the fact table is visited multiple
times. (There are already cases in Postgres in which the query plan is
technically a DAG, actually, but it could be taken much further.)

Aside from being inherently more efficient, DAG-like star schema plans
are also *ideal* targets for parallel query. The executor can execute
each nested loop join in a parallel worker with minimal contention --
the inner side of each nestloop join all probe a different fact table
index to the others. It's almost like executing several different
simple queries concurrently, with some serial processing at the end.
Even that serial processing can sometimes be minimized by having some
of the parallel workers use a Bloom filter in shared memory.

Tom is already concerned that the optimization added by this patch may
be too much of a special case, which is understandable. It may be that
we're failing to identify some greater opportunity to add DAG-like
plans for star schema queries.

-- 
Peter Geoghegan

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost

Greetings,

* Jim Finnerty (jfinn...@amazon.com) wrote:
> I was thinking about this last night and I realized that putting the new
> hasModifiedDistinct flag on the PlannerInfo struct eliminates the need to
> deal with the serialization issues, and makes it simpler.
> 
> Here's a new patch (v7) that puts the bool on the PlannerInfo struct, and
> adds a couple of tests.
> 
> re: why did you apply the patch on v10?
> 
> I'm developing on a v10.5 codebase at the moment, though this will change
> soon.  If the v7 patch doesn't apply cleanly on later versions, please let
> me know and I'll fix it.

Just going back through this thread to check that I didn't miss
anything, I saw this and felt it deserved a comment- please don't post
feature patches like this against existing released versions and then
ask everyone else to see if it works against current head or not.  We're
all for having new contributors of features, but those contributions
should be against the current HEAD.

> re: if you're proposing the patch for v12, why do you care about catversion?
> 
> Only because it would be a problem to test the patch on 10.5 with a
> catversion change that wouldn't come until v12.  With the v7 patch this
> issue is moot because it no longer requires a catversion change.

This would be an example of concerns that really shouldn't be getting
raised on this list for new features.  The project has a very clear
policy regarding features vs. bug-fixes and I really don't think it's
necessary or desirable to have patches being floated with subsequent
threads here which go directly against that policy.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley

On 24 August 2018 at 11:34, Stephen Frost  wrote:
> Greetings,
>
> * David Rowley (david.row...@2ndquadrant.com) wrote:
>> My personal opinion of only being able to completely remove the
>> DISTINCT when there's a single item in the rtable (or a single base
>> table) is that it's just too poor to bother with.  I think such a
>> solution is best left to another patch and it should be much more
>> complete and be able to track the unique properties through joins.
>
> Doesn't that move the goalposts for this pretty far off?  Is there
> some reason that doing this for the single-item case will make it
> difficult to implement the additional improvements you're suggesting
> later?

The DISTINCT clause removal would likely need to wait until around
where the distinct paths are created and all the join processing has
been done.

If the optimisation is deemed worthwhile, then maybe that would be the
place to put this code along with a comment that explains how it can
be improved to support the removal with multiple relations.

> Given the cost of a DISTINCT and that we know pretty easily if one
> exists or not, trying to eliminate it, even in the single-item case,
> seems pretty worthwhile to me.  If it makes it difficult to improve on
> this later then I could see pushing back on it, but this patch doesn't
> seem like its doing that.

If people truly think it's worthwhile, then maybe it is, but if it's
being done because it can be done then perhaps it's best not to
bother.  I was just thinking of the bug reports we'd get when people
see it does not work with multiple relations. There's already no
shortage of bug reports that are not bugs. I also wanted to keep this
patch just doing the simple case that we already apply to GROUP BY,
but control over the patch may well be out of my hands now.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Stephen Frost

Greetings,

* David Rowley (david.row...@2ndquadrant.com) wrote:
> My personal opinion of only being able to completely remove the
> DISTINCT when there's a single item in the rtable (or a single base
> table) is that it's just too poor to bother with.  I think such a
> solution is best left to another patch and it should be much more
> complete and be able to track the unique properties through joins.

Doesn't that move the goalposts for this pretty far off?  Is there
some reason that doing this for the single-item case will make it
difficult to implement the additional improvements you're suggesting
later?

Given the cost of a DISTINCT and that we know pretty easily if one
exists or not, trying to eliminate it, even in the single-item case,
seems pretty worthwhile to me.  If it makes it difficult to improve on
this later then I could see pushing back on it, but this patch doesn't
seem like its doing that.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: Removing useless DISTINCT clauses

2018-08-23 Thread David Rowley

On 24 August 2018 at 03:29, Finnerty, Jim  wrote:
> Thank you Álvaro.  Here's the distinct_opt_v7 patch.

Determining if there's just 1 base relation by checking the length the
rtable is not a good way. If you want to see why, try creating a
primary key on a partitioned table.

My personal opinion of only being able to completely remove the
DISTINCT when there's a single item in the rtable (or a single base
table) is that it's just too poor to bother with.  I think such a
solution is best left to another patch and it should be much more
complete and be able to track the unique properties through joins.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Make executor's Range Table an array instead of a List

2018-08-23 Thread David Rowley

On 24 August 2018 at 02:26, Amit Langote  wrote:
> One of the patches I sent last week does the same thing, among a
> couple of other things with regard to handling relations in the
> executor.  On a cursory look at the patch, your take of it looks
> better than mine.  Will test tomorrow.  Here is a link to my email:
>
> https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e...@lab.ntt.co.jp
>
> 4th of my patches implements the array'fication of executor's range table.

Sorry, didn't realise. I'll withdraw this and review yours during the
upcoming 'fest.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Flexible configuration for full-text search

2018-08-23 Thread Tom Lane

Alexander Korotkov  writes:
> On Fri, Apr 6, 2018 at 10:52 AM Aleksandr Parfenov
>  wrote:
>> The key point of the patch is to process stopwords the same way as
>> others at the level of the PostgreSQL internals and give users an
>> instrument to process them in a special way via configurations.

> If we're going to do it that way by providing separate dictionaries
> for stop words, then I think we should also make it for builtin
> dictionaries and configurations.  So, I think this patch should also
> split builtin dictionaries into stemmers and stop word dictionaries,
> and provide corresponding configuration over them.  It would be also
> needed to perform some benchmarking to show that new way of defining
> configurations is not worse than previous way in the performance.

I'm hesitant about the backwards-compatibility aspects of this.
Yes, we could set up the standard text search configurations to still
work the same as before, but how will you do it without breaking existing
custom configurations that use those dictionaries?

regards, tom lane

Re: Flexible configuration for full-text search

2018-08-23 Thread Alexander Korotkov

On Fri, Apr 6, 2018 at 10:52 AM Aleksandr Parfenov
 wrote:
> On Thu, 5 Apr 2018 17:26:10 +0300
> Teodor Sigaev  wrote:
> > 4) Initial approach suggested to distinguish three state of
> > dictionary result: null (unknown word), stopword and usual word. Now
> > only two, we lost possibility to catch stopwords. One of way to use
> > stopwrods is: let we have to identical fts configurations, except one
> > skips stopwords and another doesn't. Second configuration is used for
> > indexing, and first one for search by default. But if we can't  find
> > anything ('to be or to be' - phrase contains stopwords only) then we
> > can use second configuration. For now, we need to keep two variant of
> > each dictionary - with and without stopwords. But if it's possible to
> > distinguish stop and nonstop words in configuration then we don't
> > need to have duplicated dictionaries.
>
> With the proposed way to configure it is possible to create a special
> dictionary only for stopword checking and use it at decision-making
> time.
>
> For example, we can create dictionary english_stopword which will
> return word itself in case of stopword and NULL otherwise. With such
> dictionary we create a configuration:
>
> ALTER TEXT SEARCH CONFIGURATION test_cfg ALTER MAPPING FOR asciiword,
>word WITH
> CASE english_stopword WHEN NO MATCH THEN english_hunspell END;
>
> In described example, english_hunspell can be implemented without
> processing of stopwords at all and we can divide stopword processing
> and processing of other words into separate dictionaries.
>
> The key point of the patch is to process stopwords the same way as
> others at the level of the PostgreSQL internals and give users an
> instrument to process them in a special way via configurations.

If we're going to do it that way by providing separate dictionaries
for stop words, then I think we should also make it for builtin
dictionaries and configurations.  So, I think this patch should also
split builtin dictionaries into stemmers and stop word dictionaries,
and provide corresponding configuration over them.  It would be also
needed to perform some benchmarking to show that new way of defining
configurations is not worse than previous way in the performance.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Improve behavior of concurrent ANALYZE/VACUUM

2018-08-23 Thread Bossart, Nathan

On 8/23/18, 12:08 AM, "Michael Paquier"  wrote:
> Normal regression tests are less costly than isolation tests, so let's
> use them as possible.  What you attached is covering only a portion of
> all the scenarios though, as it is as well interesting to see what
> happens if another user owns only the partitioned table, only one
> partition, and the partitioned as well as at least one partition.  I
> have extended your patch as attached.  It applies on top of HEAD.  Once
> applied with the other patch one can easily stop the difference in
> behavior, and this stresses the ownership checks in vacuum_rel() and
> analyze_rel() as well.  Perhaps we could begin by that?

This seems reasonable to me.  I think establishing the expected
behavior here is a good idea.

Nathan

Re: Windows vs C99 (was Re: C99 compliance for src/port/snprintf.c)

2018-08-23 Thread Peter Eisentraut

On 23/08/2018 03:31, Andres Freund wrote:
> On 2018-08-22 17:09:05 -0700, Andres Freund wrote:
>> Attached is a version doing so.
> 
> Mildly updated version attached. This adds an explanatory commit
> message, removes one stray docs C89 reference, and fixes a mis-squashing
> of a patch.

Let's do it!

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: JIT explain output

2018-08-23 Thread Andres Freund

Hi,

On 2018-08-23 21:10:19 +0200, Peter Eisentraut wrote:
> Some minor comments on the JIT explain output.  For example:

>  Seq Scan on tenk1  (cost=0.00..483.00 rows= width=244) (actual
> time=5.564..8.755 rows=4960 loops=1)
>Filter: (unique1 > unique2)
>Rows Removed by Filter: 5040
>  Planning Time: 0.857 ms
>  JIT:
>Functions: 2
>Generation Time: 0.439 ms
>Inlining: false
>Inlining Time: 0.000 ms
>Optimization: false
>Optimization Time: 0.375 ms
>Emission Time: 5.020 ms
>  Execution Time: 25.230 ms
> 
> Why is the optimization time nonzero when optimization is claimed to be off?

Because some absolutely minimal and cheap optimization is done
regardless, it's just a *lot* cheaper.

> Could we perhaps save some vertical space by showing alternatively:
> 
>Optimization Time: 0.375 ms
> 
> and
> 
>Optimization Time: off

> and omit the Optimization: line?

So afaict that'd not fly.

> Similarly for Inlining.

Hm, I'm not sure I like it there either. While there's no such issue
that we spend time doing inlining even if below the cost limit, we can
(and do) emit the "Inlining:" bit even when just EXPLAINing, purely
based on the cost limit, while we cannot show the "Inlining Time"
because we've not actually performed inlining.

> What is the Generation Time and Emissions Time?  Could we document
> this?

Yea, that'd probably a good idea. "Generation" is creating the code to
be emitted on the postgres side (e.g. iterating over the expression,
emitting LLVM IR) and "Emission" is actually converting the IR into
native code.

>  Is there a benefit to the user to see them listed separately?

Yes, they are about different things. And importantly we generate code
in some cases even if we don't end up emitting the code.  And especially
if we add caching we'll probably end up generating code (to use as the
cache key) but not emitting it (because we've already done so).

> It might be useful to see a JIT total time somewhere, which would be the
> sum of all the phases.  Then it's easier to compare that against the
> total run time.

No objection, but it'd make the output a bit longer again...

Greetings,

Andres Freund

JIT explain output

2018-08-23 Thread Peter Eisentraut

Some minor comments on the JIT explain output.  For example:

 Seq Scan on tenk1  (cost=0.00..483.00 rows= width=244) (actual
time=5.564..8.755 rows=4960 loops=1)
   Filter: (unique1 > unique2)
   Rows Removed by Filter: 5040
 Planning Time: 0.857 ms
 JIT:
   Functions: 2
   Generation Time: 0.439 ms
   Inlining: false
   Inlining Time: 0.000 ms
   Optimization: false
   Optimization Time: 0.375 ms
   Emission Time: 5.020 ms
 Execution Time: 25.230 ms

Why is the optimization time nonzero when optimization is claimed to be off?

Could we perhaps save some vertical space by showing alternatively:

   Optimization Time: 0.375 ms

and

   Optimization Time: off

and omit the Optimization: line?  Similarly for Inlining.

What is the Generation Time and Emissions Time?  Could we document this?
 Is there a benefit to the user to see them listed separately?

It might be useful to see a JIT total time somewhere, which would be the
sum of all the phases.  Then it's easier to compare that against the
total run time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: A really subtle lexer bug

2018-08-23 Thread Tom Lane

Andrew Gierth  writes:
> Here's what I will push unless there's something important I missed.

Stylistic nitpick: I don't especially like "continue" as the body of
a for-loop.  How about instead of this:

for (nchars--;
 nchars > 1 &&
 (yytext[nchars - 1] == '+' ||
  yytext[nchars - 1] == '-');
 nchars--)
continue;

do this:

do {
nchars--;
} while (nchars > 1 &&
 (yytext[nchars - 1] == '+' ||
  yytext[nchars - 1] == '-'));

That's a clearer separation between loop action and loop test, and
it makes it more obvious that you're relying on the loop condition
to be true at the start.

Also, I'm not entirely convinced that replacing the strchr() with
a handmade equivalent is a good idea.  Some versions of strchr()
are pretty quick.

No objections beyond that.

regards, tom lane

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2018-08-23 Thread Tom Lane

Thomas Munro  writes:
> On Wed, Jul 25, 2018 at 7:27 PM, Daniel Gustafsson  wrote:
>> Seems the build of the updated patch built and tested Ok.  Still have no idea
>> why the previous one didn’t.

> That problem apparently didn't go away.  cfbot tested it 7 times in
> the past week, and it passed only once on Windows:
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.9691
> The other times all failed like this:
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.9833

I think this is just a timing problem: the signal gets sent,
but it might or might not get received before the current statement ends.
It's possible that a signal-sent-to-self can be expected to be received
synchronously on all Unix platforms, but I wouldn't entirely bet on that
(in particular, the POSIX text for kill(2) does NOT guarantee it); and
our Windows signal implementation certainly doesn't guarantee anything
of the sort.

I don't think there's necessarily anything wrong with the code, but
you can't test it with such a simplistic test as this and expect
stable results.  If you feel an urgent need to have a test case,
try building an isolationtester script.

regards, tom lane

Re: A really subtle lexer bug

2018-08-23 Thread Andrew Gierth

Here's what I will push unless there's something important I missed.

I split the regression tests between create_operator.sql and
polymorphism.sql, because => is really "named argument syntax" rather
than an operator as such, and polymorphism.sql is where the existing
tests were for that.

I tried to keep the three lexers as closely matched as possible, even
though psqlscan.l doesn't actually need some of the changes.

-- 
Andrew (irc:RhodiumToad)

>From 276f9c626f187aafd70185c13caf48cdf5afcf98 Mon Sep 17 00:00:00 2001
From: Andrew Gierth 
Date: Thu, 23 Aug 2018 16:35:33 +0100
Subject: [PATCH 1/2] Reduce an unnecessary O(N^3) loop in lexer.

The lexer's handling of operators contained an O(N^3) hazard when
dealing with long strings of + or - characters; it seems hard to
prevent this case from being O(N^2), but the additional N multiplier
was not needed.

Backpatch all the way since this has been there since 7.x, and it
presents at least a mild hazard in that trying to do Bind, PREPARE or
EXPLAIN on a hostile query could take excessive time (without
honouring cancels or timeouts) even if the query was never executed.
---
 src/backend/parser/scan.l | 28 +---
 src/fe_utils/psqlscan.l   | 28 +---
 src/interfaces/ecpg/preproc/pgc.l | 30 ++
 3 files changed, 64 insertions(+), 22 deletions(-)

diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l
index 0cd782827a..8ee50d85ec 100644
--- a/src/backend/parser/scan.l
+++ b/src/backend/parser/scan.l
@@ -885,20 +885,34 @@ other			.
 	 * to forbid operator names like '?-' that could not be
 	 * sequences of SQL operators.
 	 */
-	while (nchars > 1 &&
-		   (yytext[nchars - 1] == '+' ||
-			yytext[nchars - 1] == '-'))
+	if (nchars > 1 &&
+		(yytext[nchars - 1] == '+' ||
+		 yytext[nchars - 1] == '-'))
 	{
 		int			ic;
 
 		for (ic = nchars - 2; ic >= 0; ic--)
 		{
-			if (strchr("~!@#^&|`?%", yytext[ic]))
+			char c = yytext[ic];
+			if (c == '~' || c == '!' || c == '@' ||
+c == '#' || c == '^' || c == '&' ||
+c == '|' || c == '`' || c == '?' ||
+c == '%')
 break;
 		}
-		if (ic >= 0)
-			break; /* found a char that makes it OK */
-		nchars--; /* else remove the +/-, and check again */
+		if (ic < 0)
+		{
+			/*
+			 * didn't find a qualifying character, so remove
+			 * all trailing [+-]
+			 */
+			for (nchars--;
+ nchars > 1 &&
+ (yytext[nchars - 1] == '+' ||
+  yytext[nchars - 1] == '-');
+ nchars--)
+continue;
+		}
 	}
 
 	SET_YYLLOC();
diff --git a/src/fe_utils/psqlscan.l b/src/fe_utils/psqlscan.l
index 1cc587be34..ab0d56bcd2 100644
--- a/src/fe_utils/psqlscan.l
+++ b/src/fe_utils/psqlscan.l
@@ -817,20 +817,34 @@ other			.
 	 * to forbid operator names like '?-' that could not be
 	 * sequences of SQL operators.
 	 */
-	while (nchars > 1 &&
-		   (yytext[nchars - 1] == '+' ||
-			yytext[nchars - 1] == '-'))
+	if (nchars > 1 &&
+		(yytext[nchars - 1] == '+' ||
+		 yytext[nchars - 1] == '-'))
 	{
 		int			ic;
 
 		for (ic = nchars - 2; ic >= 0; ic--)
 		{
-			if (strchr("~!@#^&|`?%", yytext[ic]))
+			char c = yytext[ic];
+			if (c == '~' || c == '!' || c == '@' ||
+c == '#' || c == '^' || c == '&' ||
+c == '|' || c == '`' || c == '?' ||
+c == '%')
 break;
 		}
-		if (ic >= 0)
-			break; /* found a char that makes it OK */
-		nchars--; /* else remove the +/-, and check again */
+		if (ic < 0)
+		{
+			/*
+			 * didn't find a qualifying character, so remove
+			 * all trailing [+-]
+			 */
+			for (nchars--;
+ nchars > 1 &&
+ (yytext[nchars - 1] == '+' ||
+  yytext[nchars - 1] == '-');
+ nchars--)
+continue;
+		}
 	}
 
 	if (nchars < yyleng)
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 405dee73b0..5d351481f8 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -690,20 +690,34 @@ cppline			{space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
 		 * to forbid operator names like '?-' that could not be
 		 * sequences of SQL operators.
 		 */
-		while (nchars > 1 &&
-			   (yytext[nchars-1] == '+' ||
-yytext[nchars-1] == '-'))
+		if (nchars > 1 &&
+			(yytext[nchars - 1] == '+' ||
+			 yytext[nchars - 1] == '-'))
 		{
 			int		ic;
 
-			for (ic = nchars-2; ic >= 0; ic--)
+			for (ic = nchars - 2; ic >= 0; ic--)
 			{
-if (strchr("~!@#^&|`?%", yytext[ic]))
+char c = yytext[ic];
+if (c == '~' || c == '!' || c == '@' ||
+	c == '#' || c == '^' || c == '&' ||
+	c == '|' || c == '`' || c == '?' ||
+

Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

2018-08-23 Thread Tom Lane

I wrote:
> [ join-or-to-union-4.patch ]

Rebased up to HEAD, per cfbot nagging.  Still no substantive change from
v2.

regards, tom lane

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6269f47..8935503 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*** _outPlannerInfo(StringInfo str, const Pl
*** 2310,2315 
--- 2310,2316 
  	WRITE_FLOAT_FIELD(limit_tuples, "%.0f");
  	WRITE_UINT_FIELD(qual_security_level);
  	WRITE_ENUM_FIELD(inhTargetKind, InheritanceKind);
+ 	WRITE_BOOL_FIELD(needBaseTids);
  	WRITE_BOOL_FIELD(hasJoinRTEs);
  	WRITE_BOOL_FIELD(hasLateralRTEs);
  	WRITE_BOOL_FIELD(hasDeletedRTEs);
diff --git a/src/backend/optimizer/plan/Makefile b/src/backend/optimizer/plan/Makefile
index 88a9f7f..1db6bd5 100644
*** a/src/backend/optimizer/plan/Makefile
--- b/src/backend/optimizer/plan/Makefile
*** top_builddir = ../../../..
*** 13,18 
  include $(top_builddir)/src/Makefile.global
  
  OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \
! 	setrefs.o subselect.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 13,18 
  include $(top_builddir)/src/Makefile.global
  
  OBJS = analyzejoins.o createplan.o initsplan.o planagg.o planmain.o planner.o \
! 	planunionor.o setrefs.o subselect.o
  
  include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 0e73f9c..fdffb6b 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*** clause_sides_match_join(RestrictInfo *ri
*** 155,160 
--- 155,163 
   * cases, but we don't have the infrastructure to prove them.)  We also
   * have to check that the inner side doesn't generate any variables needed
   * above the join.
+  *
+  * Note: making this significantly smarter might break planunionor.c.
+  * Study that file before doing so.
   */
  static bool
  join_is_removable(PlannerInfo *root, SpecialJoinInfo *sjinfo)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index b05adc7..4486885 100644
*** a/src/backend/optimizer/plan/planmain.c
--- b/src/backend/optimizer/plan/planmain.c
*** query_planner(PlannerInfo *root, List *t
*** 212,217 
--- 212,225 
  	add_placeholders_to_base_rels(root);
  
  	/*
+ 	 * Also, if we have a request to emit baserel CTIDs, add those to the
+ 	 * baserel targetlists now.  This likewise has to be done after join
+ 	 * removal, because we only want CTIDs for rels that survive join removal.
+ 	 */
+ 	if (root->needBaseTids)
+ 		add_base_rel_ctids(root);
+ 
+ 	/*
  	 * Construct the lateral reference sets now that we have finalized
  	 * PlaceHolderVar eval levels.
  	 */
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 96bf060..36907a4 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** subquery_planner(PlannerGlobal *glob, Qu
*** 623,628 
--- 623,629 
  	root->minmax_aggs = NIL;
  	root->qual_security_level = 0;
  	root->inhTargetKind = INHKIND_NONE;
+ 	root->needBaseTids = false;
  	root->hasRecursion = hasRecursion;
  	if (hasRecursion)
  		root->wt_param_id = SS_assign_special_param(root);
*** grouping_planner(PlannerInfo *root, bool
*** 1797,1802 
--- 1798,1804 
  		WindowFuncLists *wflists = NULL;
  		List	   *activeWindows = NIL;
  		grouping_sets_data *gset_data = NULL;
+ 		List	   *union_or_subpaths;
  		standard_qp_extra qp_extra;
  
  		/* A recursive query should always have setOperations */
*** grouping_planner(PlannerInfo *root, bool
*** 1874,1879 
--- 1876,1889 
  			preprocess_minmax_aggregates(root, tlist);
  
  		/*
+ 		 * Preprocess join OR clauses that might be better handled as UNIONs.
+ 		 * This likewise needs to be close to the query_planner() call.  But
+ 		 * it doesn't matter which of preprocess_minmax_aggregates() and this
+ 		 * function we call first, because they treat disjoint sets of cases.
+ 		 */
+ 		union_or_subpaths = split_join_or_clauses(root);
+ 
+ 		/*
  		 * Figure out whether there's a hard limit on the number of rows that
  		 * query_planner's result subplan needs to return.  Even if we know a
  		 * hard limit overall, it doesn't apply if the query has any
*** grouping_planner(PlannerInfo *root, bool
*** 1908,1913 
--- 1918,1931 
  	standard_qp_callback, _extra);
  
  		/*
+ 		 * If we found any way to convert a join OR clause to a union, finish
+ 		 * up generating the path(s) for that, and add them into the topmost
+ 		 * scan/join relation.
+ 		 */
+ 		if (union_or_subpaths)
+ 			finish_union_or_paths(root, current_rel, union_or_subpaths);
+ 
+ 		/*
  		 * Convert the query's result tlist into PathTarget format.
  		 *
  		 * Note: it's desirable to

Re: A really subtle lexer bug

2018-08-23 Thread Tom Lane

Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> * Some regression tests exercising these code paths might be a
>  Tom> good thing.

> Agreed. Any preferences where they should go?

There's not really a "lexer/grammar" test script.  Maybe you could
drop it into create_operator.sql, especially if you need to make any
funny operators to test with?

regards, tom lane

Re: Getting NOT NULL constraint from pg_attribute

2018-08-23 Thread Tom Lane

Andres Freund  writes:
> And arguably there's a fair bit of redundancy in pg_attribute, just
> because it's convenient for tupledescs.  Given that pg_attribute very
> commonly is the largest catalog table by far, that very well could use
> some attention.  Without tupdescs in mind, there's really not much point
> for pg_attribute to repeat a good portion of pg_type again, for example,
> nor is attcacheoff really meaningful.

Agreed about attcacheoff, but I'm less sure that we can drop the
"redundant" info copied from pg_type.  The sticking point there is
that somebody could drop a column, then drop the type the column had,
but you still need to be able to skip over values in that column.
So at least attlen and attalign are not removable.

regards, tom lane

Re: proposal: schema private functions

2018-08-23 Thread Pavel Stehule

2018-08-23 17:18 GMT+02:00 Nico Williams :

> Couldn't this be handled by having a new permission on FUNCTIONs
> ("CALL"?) to distinguish EXECUTE?
>

I don't understand to this questions. Private functions will be functions
still - there is not necessity to call these functions differently.


> This would have to be made backwards-compatible, possibly by
> automatically granting CALL ON ALL FUNCTIONS to public at schema create
> time and/or PG upgrade time, which the schema owner could then REVOKE,
> or by requiring CALL permission only functions marked PRIVATE.
>
> Because initially a schema has no functions, the owner can revoke this
> grant to public before creating any functions, thus there would be no
> race contition.  A race condition could be made less likely still by
> having CALL not imply EXECUTE (users would have to have both to
> successfully call a given function).
>
> I would agree that a PRIVATE keyword would be a syntactically convenient
> way to say that in its absence then public gets granted CALL on the
> given function.  But IMO it shouldn't be necessary, and either way
> permissions machinery should be involved.
>
> What do other SQL databases do?  Does any have a PRIVATE keyword for
> FUNCTIONs?
>

The private flag, what I propose, has not relations to access rights - the
limiting factor is scope. The system of access rights is not touched by
this feature.

The other databases has not PRIVATE keyword, what I know - but have some
different mechanism how to hide some internal API

PL/SQL package functions are by default "private" if are not mentioned in
package header.


> Using permissions has the net effect of making visibility more
> fine-grained.
>
> Regarding \df, I'm not sure that hiding function names one cannot call
> is worthwhile, but if it were, then there are several options depending
> on whether confidentiality of function names is to be a security
> feature: RLS on the pg_catalog.pg_proc table (provides confidentiality),
> or having a system view on it or filtering in psql (no real
> confidentiality).
>

Probably private objects (functions) should not be visible in \df .. maybe
\dfP or some similar can be used.


>
> All that said, being able to have PRIVATE schemas, tables, views,
> functions, FDWs, variables, maybe even roles, is definitly appealing,
> mainly for code organization reasons.
>
> I didn't understand how PRIVATE would help reduce the need for SECURITY
> DEFINER.  Can you explain?
>

sure. Any object  marked as private - is not accessible directly .. You
want to protect these objects against access in wrong order, .. but
sometimes you should to use data (or features) of these objects. Now, you
should to use security definer functions and any user can call these
functions, and then the access to protected objects is controlled. But you
should to use security definer functions. When you use private flag, then
this object can be accessed only from function (procedure, ...) from same
schema. And if some user doesn't create right for schema, then he hasn't
free access to private objects. But this user can use any non private
objects from the schema, if has necessary rights, and then the access to
private objects is controlled and security definer functions are not
necessary.



> Nico
> --
>

Re: Memory leak with CALL to Procedure with COMMIT.

2018-08-23 Thread Jonathan S. Katz


> On Aug 23, 2018, at 9:34 AM, Peter Eisentraut 
>  wrote:
> 
> I think I've found a reasonable fix for this.
> 
> The problem arises with the combination of CALL with output parameters
> and doing a COMMIT inside the procedure.  When a CALL has output
> parameters, the portal uses the strategy PORTAL_UTIL_SELECT instead of
> PORTAL_MULTI_QUERY.  Using PORTAL_UTIL_SELECT causes the portal's
> snapshot to be registered with the current resource
> owner (portal->holdSnapshot); see
> 9ee1cf04ab6bcefe03a11837b53f29ca9dc24c7a for the reason.
> 
> Normally, PortalDrop() unregisters the snapshot.  If not, then
> ResourceOwnerRelease() will print a warning about a snapshot leak on
> transaction commit.  A transaction commit normally drops all
> portals (PreCommit_Portals()), except the active portal.  So in case of
> the active portal, we need to manually release the snapshot to avoid the
> warning.
> 
> PreCommit_Portals() already contains some code that deals specially with
> the active portal versus resource owners, so it seems reasonable to add
> there.
> 
> I think this problem could theoretically apply to any
> multiple-transaction utility command.  For example, if we had a variant
> of VACUUM that returned tuples, it would produce the same warning.
> (This patch would fix that as well.)

Applied the patch against head. make check passed, all the tests I ran
earlier in this thread passed as well.

Reviewed the code - looks to match above reasoning, and comments
are clear.

From my perspective it looks good, but happy to hear from someone
more familiar than me with this area of the code.

Thanks,

Jonathan


signature.asc
Description: Message signed with OpenPGP

Re: A really subtle lexer bug

2018-08-23 Thread Andrew Gierth

> "Tom" == Tom Lane  writes:

 >> Patch attached.
 >> This fixes two bugs: first the mis-lexing of two-char ops as mentioned
 >> originally; second, the O(N^3) lexing time of strings of - or +
 >> characters is reduced to O(N^2) (in practice it's better than O(N^2)
 >> once N gets large because the bison stack gets blown out, ending the
 >> loop early).

 Tom> Looks reasonable offhand (didn't test).  A couple of thoughts:

 Tom> * Some regression tests exercising these code paths might be a
 Tom> good thing.

Agreed. Any preferences where they should go?

 Tom> * There should likely be a comment near where EQUALS_GREATER and
 Tom> friends are defined, pointing out that if we add any more
 Tom> multi-character operators with special precedences, this code has
 Tom> to be taught about them.

Agreed; will do this.

-- 
Andrew (irc:RhodiumToad)

Re: proposal: schema private functions

2018-08-23 Thread Nico Williams

Couldn't this be handled by having a new permission on FUNCTIONs
("CALL"?) to distinguish EXECUTE?

This would have to be made backwards-compatible, possibly by
automatically granting CALL ON ALL FUNCTIONS to public at schema create
time and/or PG upgrade time, which the schema owner could then REVOKE,
or by requiring CALL permission only functions marked PRIVATE.

Because initially a schema has no functions, the owner can revoke this
grant to public before creating any functions, thus there would be no
race contition.  A race condition could be made less likely still by
having CALL not imply EXECUTE (users would have to have both to
successfully call a given function).

I would agree that a PRIVATE keyword would be a syntactically convenient
way to say that in its absence then public gets granted CALL on the
given function.  But IMO it shouldn't be necessary, and either way
permissions machinery should be involved.

What do other SQL databases do?  Does any have a PRIVATE keyword for
FUNCTIONs?

Using permissions has the net effect of making visibility more
fine-grained.

Regarding \df, I'm not sure that hiding function names one cannot call
is worthwhile, but if it were, then there are several options depending
on whether confidentiality of function names is to be a security
feature: RLS on the pg_catalog.pg_proc table (provides confidentiality),
or having a system view on it or filtering in psql (no real
confidentiality).

All that said, being able to have PRIVATE schemas, tables, views,
functions, FDWs, variables, maybe even roles, is definitly appealing,
mainly for code organization reasons.

I didn't understand how PRIVATE would help reduce the need for SECURITY
DEFINER.  Can you explain?

Nico
--

Re: Getting NOT NULL constraint from pg_attribute

2018-08-23 Thread Andres Freund

On 2018-08-23 11:04:30 -0400, Tom Lane wrote:
> "David G. Johnston"  writes:
> > On Monday, August 20, 2018, Wu Ivy  wrote:
> >> Why are SELECT query never marked nullable?
> 
> > Basically the nullability property is used by the planner for optimization
> > during the joining of physical tables.  As soon as you try outputting
> > columns the ability to enforce not null goes away because of, in
> > particular, outer joins.  While some changes could maybe be made the
> > cost-benefit to do so doesn't seem favorable.
> 
> A further thought on this is that really it's a historical accident that
> the elements of tuple descriptors are exactly pg_attribute rows.  There
> are a *whole lot* of fields in pg_attribute that aren't especially
> relevant to tuple sets generated on-the-fly within a query, and typically
> won't get filled with anything except default values.  The only fields
> that really mean a lot for a dynamic tuple set are the data type and
> values derived from that, and in some usages the column name.

And arguably there's a fair bit of redundancy in pg_attribute, just
because it's convenient for tupledescs.  Given that pg_attribute very
commonly is the largest catalog table by far, that very well could use
some attention.  Without tupdescs in mind, there's really not much point
for pg_attribute to repeat a good portion of pg_type again, for example,
nor is attcacheoff really meaningful.


> [ wanders away wondering if it'd be worth our time to design a new,
> more compact TupleDesc struct without the meaningless fields ... ]

Yes, I think it'd would be.

Greetings,

Andres Freund

Re: Getting NOT NULL constraint from pg_attribute

2018-08-23 Thread Tom Lane

"David G. Johnston"  writes:
> On Monday, August 20, 2018, Wu Ivy  wrote:
>> Why are SELECT query never marked nullable?

> Basically the nullability property is used by the planner for optimization
> during the joining of physical tables.  As soon as you try outputting
> columns the ability to enforce not null goes away because of, in
> particular, outer joins.  While some changes could maybe be made the
> cost-benefit to do so doesn't seem favorable.

A further thought on this is that really it's a historical accident that
the elements of tuple descriptors are exactly pg_attribute rows.  There
are a *whole lot* of fields in pg_attribute that aren't especially
relevant to tuple sets generated on-the-fly within a query, and typically
won't get filled with anything except default values.  The only fields
that really mean a lot for a dynamic tuple set are the data type and
values derived from that, and in some usages the column name.

[ wanders away wondering if it'd be worth our time to design a new,
more compact TupleDesc struct without the meaningless fields ... ]

regards, tom lane

Re: A really subtle lexer bug

2018-08-23 Thread Tom Lane

Andrew Gierth  writes:
> "Andrew" == Andrew Gierth  writes:
>  Andrew> I guess the fix is to extend the existing special case code
>  Andrew> that checks for one character left after removing trailing [+-]
>  Andrew> and also check for the two-character ops "<>" ">=" "<=" "=>"
>  Andrew> "!=".

> Patch attached.
> This fixes two bugs: first the mis-lexing of two-char ops as mentioned
> originally; second, the O(N^3) lexing time of strings of - or +
> characters is reduced to O(N^2) (in practice it's better than O(N^2)
> once N gets large because the bison stack gets blown out, ending the
> loop early).

Looks reasonable offhand (didn't test).  A couple of thoughts:

* Some regression tests exercising these code paths might be a good thing.

* There should likely be a comment near where EQUALS_GREATER and
friends are defined, pointing out that if we add any more multi-character
operators with special precedences, this code has to be taught about them.

regards, tom lane

Re: Removing useless DISTINCT clauses

2018-08-23 Thread Alvaro Herrera

On 2018-Aug-23, Jim Finnerty wrote:

> distinct_opt_v7.patch
>   

Are you posting this via postgresql-archive.org?  Please don't do that.
These posts of yours are mangled by postgresql-archive and do not
contain the file you're attaching.  Use direct posting by email to
pgsql-hackers@lists.postgresql.org instead.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Make executor's Range Table an array instead of a List

2018-08-23 Thread Amit Langote

(sorry David for the same email twice, I didn't hit Reply All at first)

On Thu, Aug 23, 2018 at 7:58 PM, David Rowley
 wrote:
> Continuing along on my adventures of making performance improvements
> for partitioned tables, I discovered that getrelid is quite slow in
> InitPlan() during UPDATEs and DELETEs when there are many
> resultRelations to process.  A List is a pretty poor choice for this
> data structure since we already know how big it is and we pretty much
> only ever perform lookups by the List's Nth element.
>
> Converting this to an array obviously improves that O(n) List lookup
> into an O(1) operation.
>
> Benchmark against today's master with 300 partitions using prepared 
> statements:
>
> CREATE TABLE partbench (id BIGINT NOT NULL, i1 INT NOT NULL, i2 INT
> NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL) PARTITION
> BY RANGE (id);
> \o /dev/null
> select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
> FOR VALUES FROM (' || (x*10)::text || ') TO (' ||
> ((x+1)*10)::text || ');' from generate_Series(0,299) x;
> \gexec
> \o
>
> ALTER SYSTEM SET plan_cache_mode = 'force_generic_plan'; -- minimise
> planning overhead.
>
> update.sql:
>
> \set p_id 2999
> update partbench set i1 = i1+1 where id = :p_id;
>
> pgbench -n -M prepared -T 60 -f update.sql postgres
>
> Unpatched:
>
> tps = 558.708604 (excluding connections establishing)
> tps = 556.128341 (excluding connections establishing)
> tps = 569.096393 (excluding connections establishing)
>
> Patched:
>
> tps = 683.271356 (excluding connections establishing)
> tps = 678.693578 (excluding connections establishing)
> tps = 694.446564 (excluding connections establishing)
>
> (about 22% improvement)
>
> Of course, it would be much nicer if PlannedStmt->rtable was already
> an array and we didn't have to convert it to one during executor
> startup. We'd need to build an array that's a Node type for that to
> work.  I did suggest something ([1]) last year in this regard, but I
> felt like it might be a bit of an uphill struggle to gain enough
> community support to make that work. So I feel like this patch is
> quite worthwhile in the meantime.
>
> The (fairly small) patch is attached.

One of the patches I sent last week does the same thing, among a
couple of other things with regard to handling relations in the
executor.  On a cursory look at the patch, your take of it looks
better than mine.  Will test tomorrow.  Here is a link to my email:

https://www.postgresql.org/message-id/468c85d9-540e-66a2-1dde-fec2b741e...@lab.ntt.co.jp

4th of my patches implements the array'fication of executor's range table.

Thanks,
Amit

Re: Conflict handling for COPY FROM

2018-08-23 Thread Surafel Temesgen

Hello,

The attached patch add error handling for
Extra data

missing data

invalid oid

null oid and

row count mismatch

And the record that field on the above case write to the file with appended
error message in it and in case of unique violation or exclusion constraint
violation error the failed record write as it is because the case of the
error can not be identified specifically

The new syntax became :

COPY ... WITH ON CONFLICT LOG maximum_error, LOG FILE NAME '…';


Regards

Surafel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..bf21abd8e0 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -364,6 +364,17 @@ COPY { table_name [ ( 

 
+   
+IGNORE_CONFLICTS
+
+ 
+  specifies ignore to error up to specified amount .
+  Instead write the error record to failed record file and 
+  precede to the next record
+ 
+
+   
+
   
  
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9bc67ce60f..ffa6aecbd5 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/fd.h"
+#include "storage/lmgr.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
@@ -118,6 +119,7 @@ typedef struct CopyStateData
 	int			file_encoding;	/* file or remote side's character encoding */
 	bool		need_transcoding;	/* file encoding diff from server? */
 	bool		encoding_embeds_ascii;	/* ASCII can be non-first byte? */
+	FILE	   *failed_rec_file;
 
 	/* parameters from the COPY command */
 	Relation	rel;			/* relation to copy to or from */
@@ -147,6 +149,9 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	char	   *failed_rec_filename;
+	bool	   ignore_conflict;
+	int	   error_limit;	/* total # of error to log */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -766,6 +771,21 @@ CopyLoadRawBuf(CopyState cstate)
 	return (inbytes > 0);
 }
 
+/*
+ * LogCopyError log error in to error log file
+ */
+static void
+LogCopyError(CopyState cstate, const char *str)
+{
+	appendBinaryStringInfo(>line_buf, str, strlen(str));
+#ifndef WIN32
+	appendStringInfoCharMacro(>line_buf, '\n');
+#else
+	appendBinaryStringInfo(>line_buf, "\r\n", strlen("\r\n"));
+#endif
+	fwrite(cstate->line_buf.data, 1, cstate->line_buf.len, cstate->failed_rec_file);
+	cstate->error_limit--;
+}
 
 /*
  *	 DoCopy executes the SQL COPY statement
@@ -1226,6 +1246,19 @@ ProcessCopyOptions(ParseState *pstate,
 defel->defname),
 		 parser_errposition(pstate, defel->location)));
 		}
+		else if (strcmp(defel->defname, "ignore_conflicts") == 0)
+		{
+			List   *conflictOption;
+			if (cstate->ignore_conflict)
+ereport(ERROR,
+		(errcode(ERRCODE_SYNTAX_ERROR),
+		 errmsg("conflicting or redundant options"),
+		 parser_errposition(pstate, defel->location)));
+			cstate->ignore_conflict = true;
+			conflictOption = (List *) defel->arg;
+			cstate->error_limit = intVal(list_nth(conflictOption, 0));
+			cstate->failed_rec_filename = strVal(list_nth(conflictOption, 1));
+		}
 		else
 			ereport(ERROR,
 	(errcode(ERRCODE_SYNTAX_ERROR),
@@ -1749,6 +1782,11 @@ EndCopy(CopyState cstate)
 	(errcode_for_file_access(),
 	 errmsg("could not close file \"%s\": %m",
 			cstate->filename)));
+		if (cstate->failed_rec_filename != NULL && FreeFile(cstate->failed_rec_file))
+			ereport(ERROR,
+	(errcode_for_file_access(),
+	 errmsg("could not close file \"%s\": %m",
+			cstate->failed_rec_filename)));
 	}
 
 	MemoryContextDelete(cstate->copycontext);
@@ -2461,6 +2499,8 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_FROZEN;
 	}
 
+	if (!cstate->ignore_conflict)
+		cstate->error_limit = 0;
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -2579,6 +2619,10 @@ CopyFrom(CopyState cstate)
 		 */
 		insertMethod = CIM_SINGLE;
 	}
+	else if (cstate->ignore_conflict)
+	{
+		insertMethod = CIM_SINGLE;
+	}
 	else
 	{
 		/*
@@ -2968,12 +3012,59 @@ CopyFrom(CopyState cstate)
 		 */
 		tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
 	}
+	else if (cstate->ignore_conflict && cstate->error_limit > 0)
+	{
+		bool		specConflict;
+		uint32		specToken;
+		specConflict = false;
+
+		specToken = SpeculativeInsertionLockAcquire(GetCurrentTransactionId());
+		HeapTupleHeaderSetSpeculativeToken(tuple->t_data, specToken);
+
+		/* insert the tuple, with the speculative token */
+		heap_insert(resultRelInfo->ri_RelationDesc, tuple,
+

Re: Memory leak with CALL to Procedure with COMMIT.

2018-08-23 Thread Peter Eisentraut

I think I've found a reasonable fix for this.

The problem arises with the combination of CALL with output parameters
and doing a COMMIT inside the procedure.  When a CALL has output
parameters, the portal uses the strategy PORTAL_UTIL_SELECT instead of
PORTAL_MULTI_QUERY.  Using PORTAL_UTIL_SELECT causes the portal's
snapshot to be registered with the current resource
owner (portal->holdSnapshot); see
9ee1cf04ab6bcefe03a11837b53f29ca9dc24c7a for the reason.

Normally, PortalDrop() unregisters the snapshot.  If not, then
ResourceOwnerRelease() will print a warning about a snapshot leak on
transaction commit.  A transaction commit normally drops all
portals (PreCommit_Portals()), except the active portal.  So in case of
the active portal, we need to manually release the snapshot to avoid the
warning.

PreCommit_Portals() already contains some code that deals specially with
the active portal versus resource owners, so it seems reasonable to add
there.

I think this problem could theoretically apply to any
multiple-transaction utility command.  For example, if we had a variant
of VACUUM that returned tuples, it would produce the same warning.
(This patch would fix that as well.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d91f42f024afa7bc2764045b615296a87c4033b8 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Aug 2018 15:13:48 +0200
Subject: [PATCH] Fix snapshot leak warning for some procedures

The problem arises with the combination of CALL with output parameters
and doing a COMMIT inside the procedure.  When a CALL has output
parameters, the portal uses the strategy PORTAL_UTIL_SELECT instead of
PORTAL_MULTI_QUERY.  Using PORTAL_UTIL_SELECT causes the portal's
snapshot to be registered with the current resource
owner (portal->holdSnapshot); see
9ee1cf04ab6bcefe03a11837b53f29ca9dc24c7a for the reason.

Normally, PortalDrop() unregisters the snapshot.  If not, then
ResourceOwnerRelease() will print a warning about a snapshot leak on
transaction commit.  A transaction commit normally drops all
portals (PreCommit_Portals()), except the active portal.  So in case of
the active portal, we need to manually release the snapshot to avoid the
warning.

Reported-by: Prabhat Sahu 
---
 src/backend/utils/mmgr/portalmem.c| 14 +++--
 .../src/expected/plpgsql_transaction.out  | 30 +++
 .../plpgsql/src/sql/plpgsql_transaction.sql   | 25 
 3 files changed, 67 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/mmgr/portalmem.c 
b/src/backend/utils/mmgr/portalmem.c
index 04ea32f49f..d34cab0eb8 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -689,13 +689,23 @@ PreCommit_Portals(bool isPrepare)
 
/*
 * Do not touch active portals --- this can only happen in the 
case of
-* a multi-transaction utility command, such as VACUUM.
+* a multi-transaction utility command, such as VACUUM, or a 
commit in
+* a procedure.
 *
 * Note however that any resource owner attached to such a 
portal is
-* still going to go away, so don't leave a dangling pointer.
+* still going to go away, so don't leave a dangling pointer.  
Also
+* unregister any snapshots held by the portal, mainly to avoid
+* snapshot leak warnings from ResourceOwnerRelease().
 */
if (portal->status == PORTAL_ACTIVE)
{
+   if (portal->holdSnapshot)
+   {
+   if (portal->resowner)
+   
UnregisterSnapshotFromOwner(portal->holdSnapshot,
+   
portal->resowner);
+   portal->holdSnapshot = NULL;
+   }
portal->resowner = NULL;
continue;
}
diff --git a/src/pl/plpgsql/src/expected/plpgsql_transaction.out 
b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
index 0b5a039b89..77a83adab5 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_transaction.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_transaction.out
@@ -463,6 +463,36 @@ SELECT * FROM test2;
  42
 (1 row)
 
+-- Test transaction in procedure with output parameters.  This uses a
+-- different portal strategy and different code paths in pquery.c.
+CREATE PROCEDURE transaction_test10a(INOUT x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+  x := x + 1;
+  COMMIT;
+END;
+$$;
+CALL transaction_test10a(10);
+ x  
+
+ 11
+(1 row)
+
+CREATE PROCEDURE transaction_test10b(INOUT x int)
+LANGUAGE plpgsql
+AS $$
+BEGIN
+  x := x - 1;
+  ROLLBACK;
+END;
+$$;
+CALL transaction_test10b(10);
+ x 
+---
+ 9
+(1 row)
+

Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.

2018-08-23 Thread Etsuro Fujita


(2018/08/22 20:08), Etsuro Fujita wrote:

(2018/08/16 12:00), Etsuro Fujita wrote:

(2018/08/15 23:40), Robert Haas wrote:

Given the comments from the RMT, and also on general principle, it
seems like we really need to get on with committing something here.
It's my understanding you plan to do that, since it's your patch.
When?


I plan to do that late next week as I'll go on leave until next Tuesday.


I added the commit message. Please find attached an updated version of
the patch. Does that make sense? If there are not objections, I'll push
this tomorrow.


I tried this today, but doing git behind the corporate firewall doesn't 
work.  I don't know the clear cause of that, so I'll investigate that 
tomorrow.


Best regards,
Etsuro Fujita

Re: [HACKERS] proposal: schema variables

2018-08-23 Thread Pavel Luzanov




On 23.08.2018 12:46, Fabien COELHO wrote:
I do not understand your point, as usual. I raise a factual issue 
about security, and you do not answer how this can be solved with your 
proposal, but appeal to argument of authority and declare your "strong 
opinion".


I do not see any intrinsic opposition between having session objects 
and transactions. Nothing prevents a session object to be 
transactional beyond your willingness that it should not be.


Now, I do expect all PostgreSQL features to be security-wise, whatever 
their scope.


I do not think that security should be traded for "cheap & fast", esp 
as the sole use case for a feature is a security pattern that cannot 
be implemented securely with it. This appears to me as a huge 
contradiction, hence my opposition against this feature as proposed.


I can't to agree with your position.

Consider this example.
I want to record some inappropriate user actions to audit table and 
rollback transaction.

But aborting transaction will also abort record to audit table.
So, do not use tables, becouse they have security implications.

This is very similar to your approach.

Schema variables is a very needed and important feature, but for others 
purposes.


-
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hint to set owner for tablespace directory

2018-08-23 Thread Maksim Milyutin


Hi!


I have noticed the novice users are stuck trying to create tablespace 
over a directory whose owner is not the system postgres user. They 
observed the message "could not set permissions on directory ...: 
permission denied".


I want to add patch that prints hint to set required owner for the 
tablespace directory if this is the cause of the problem (*errno == 
EPERM* after calling *chmod*).



--
Regards,
Maksim Milyutin

diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index f7e9160..a8733d4 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -585,10 +585,16 @@ create_tablespace_directories(const char *location, const Oid tablespaceoid)
 	 InRecovery ? errhint("Create this directory for the tablespace before "
 		  "restarting the server.") : 0));
 		else
+		{
+			bool wrong_owner = (errno == EPERM);
+
 			ereport(ERROR,
 	(errcode_for_file_access(),
 	 errmsg("could not set permissions on directory \"%s\": %m",
-			location)));
+			location),
+	 wrong_owner ? errhint("Install the PostgreSQL system user as "
+		   "the owner of this directory.") : 0));
+		}
 	}
 
 	if (InRecovery)

Make executor's Range Table an array instead of a List

2018-08-23 Thread David Rowley

Continuing along on my adventures of making performance improvements
for partitioned tables, I discovered that getrelid is quite slow in
InitPlan() during UPDATEs and DELETEs when there are many
resultRelations to process.  A List is a pretty poor choice for this
data structure since we already know how big it is and we pretty much
only ever perform lookups by the List's Nth element.

Converting this to an array obviously improves that O(n) List lookup
into an O(1) operation.

Benchmark against today's master with 300 partitions using prepared statements:

CREATE TABLE partbench (id BIGINT NOT NULL, i1 INT NOT NULL, i2 INT
NOT NULL, i3 INT NOT NULL, i4 INT NOT NULL, i5 INT NOT NULL) PARTITION
BY RANGE (id);
\o /dev/null
select 'CREATE TABLE partbench' || x::text || ' PARTITION OF partbench
FOR VALUES FROM (' || (x*10)::text || ') TO (' ||
((x+1)*10)::text || ');' from generate_Series(0,299) x;
\gexec
\o

ALTER SYSTEM SET plan_cache_mode = 'force_generic_plan'; -- minimise
planning overhead.

update.sql:

\set p_id 2999
update partbench set i1 = i1+1 where id = :p_id;

pgbench -n -M prepared -T 60 -f update.sql postgres

Unpatched:

tps = 558.708604 (excluding connections establishing)
tps = 556.128341 (excluding connections establishing)
tps = 569.096393 (excluding connections establishing)

Patched:

tps = 683.271356 (excluding connections establishing)
tps = 678.693578 (excluding connections establishing)
tps = 694.446564 (excluding connections establishing)

(about 22% improvement)

Of course, it would be much nicer if PlannedStmt->rtable was already
an array and we didn't have to convert it to one during executor
startup. We'd need to build an array that's a Node type for that to
work.  I did suggest something ([1]) last year in this regard, but I
felt like it might be a bit of an uphill struggle to gain enough
community support to make that work. So I feel like this patch is
quite worthwhile in the meantime.

The (fairly small) patch is attached.

I'll add this to the September 'fest.

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v1-0001-Change-the-executor-s-range-table-from-a-List-to-.patch
Description: Binary data

Re: [HACKERS] advanced partition matching algorithm for partition-wise join

2018-08-23 Thread Dmitry Dolgov

> On Fri, 27 Jul 2018 at 20:13, Robert Haas  wrote:
>
> On Fri, Jul 27, 2018 at 3:17 AM, Ashutosh Bapat
>  wrote:
> > Apart from the complexity there's also a possibility that this
> > skipping will reduce the efficiency actually in normal cases. Consider
> > a case where A and B have exactly matching partitions. Current
> > partition matching algorithm compare any given range/bound only once
> > and we will have O(n) algorithm. If we use a binary search, however,
> > for every range comparison, it will be O(n log n) algorithm. There
> > will be unnecessary comparisons during binary search. The current
> > algorithm is always O(n), whereas binary search would be O(n log(n))
> > with a possibility of having sub-O(n) complexity in some cases. I
> > would go for an algorithm which has a consistent time complexity
> > always and which is efficient in normal cases, rather than worrying
> > about some cases which are not practical.
>
> Yeah, I think that's a good point.  The normal case here will be that
> the partition bounds are equal, or that there are a few extra
> partitions on one side that don't exist on the other.  We don't want
> other cases to be crazily inefficient, but I suspect in practice that
> if the partitioning bounds aren't pretty close to matching up exactly,
> we're going to end up failing to be able to do a partition-wise join
> at all.  It's not very likely that somebody happens to have a case
> where they've partitioned two tables in randomly different ways, but
> then they decide to join them anyway, but then it turns out that the
> partition bounds happen to be compatible enough that this algorithm
> works.

> On Mon, 23 Jul 2018 at 10:38, Ashutosh Bapat 
>  wrote:
> 1. those cases are rare enough that we can ignore those right now. How
> many times we would encounter the case you have quoted, for example?
> Usually the ranges will be continuous only differing in the first or
> last partition e.g time-series data.

Ok, but what about list partitioning? I believe the area of application for it
can be more diverse than mostly just for time-series, and in the patch almost
the same approach is used to merge list partitions.

Other than that everything seems fine from functionality point of view, and so
far I couldn't find any situation that produces a wrong plan. Some of the joins
were not that intuitive from the first glance, but at the end everything was
according to the documentation.
Taking this into account I wonder if it's possible somehow to give any hints in
an explain result about why it wasn't possible to apply partition wise join? If
something wasn't clear for me I ended up looking at the value of "merged" flag,
and to figure out actual reason one needs to trace the entire algorithm.

Besides that I checked the performance in some simple cases, no problems on
this side (but I'll also do some checks for more complicated joins).

And I still have some complaints about readability, although I can imagine that
it's just me:

* Many functions carry some unrelated arguments just to pass them through,
  which obscures the purpose of a function.

* I want to suggest to introduce a new data structure for partitions mapping
  instead of just keeping them in arrays (was it discussed already before?).

* What is the reason that almost everywhere in the patch there is a naming
  convention "outer/inner" partition maps, and sometimes (e.g. in
  generate_matching_part_pairs) it's "map1/map2"?

Re: [HACKERS] proposal: schema variables

2018-08-23 Thread Fabien COELHO





Security vs "good enough in some cases" looks bad to me.


We don't find a agreement, because you are concentrated on transation, 
me on session. And we have different expectations.


I do not understand your point, as usual. I raise a factual issue about 
security, and you do not answer how this can be solved with your proposal, 
but appeal to argument of authority and declare your "strong opinion".


I do not see any intrinsic opposition between having session objects and 
transactions. Nothing prevents a session object to be transactional beyond 
your willingness that it should not be.


Now, I do expect all PostgreSQL features to be security-wise, whatever 
their scope.


I do not think that security should be traded for "cheap & fast", esp as 
the sole use case for a feature is a security pattern that cannot be 
implemented securely with it. This appears to me as a huge contradiction, 
hence my opposition against this feature as proposed.


The good news is that I'm a nobody: if a committer is happy with your 
patch, it will get committed, you do not need my approval.


--
Fabien.

Re: [HACKERS] proposal: schema variables

2018-08-23 Thread Pavel Stehule

2018-08-23 10:17 GMT+02:00 Fabien COELHO :

>
> Hello Pavel,
>
> 2. holding some session based informations, that can be used in security
>> definer functions.
>>
>
> Hmmm, I see our disagreement. My point is that this feature is *NOT* fit
> for security-related uses because if the transaction fails the variable
> would keep the value it had if the transaction had not failed...
>
> 3. Because it is not transactional, then it allows write operation on read
>>
>
> It is not transactional safe, but it is secure in sense a possibility to
>> set a access rights.
>>
>
> This is a misleading play on words. It is secure wrt to access right, but
> unsecure wrt security purposes which is the only point for having such a
> feature in the first place.
>
> I understand, so some patterns are not possible, but when you need hold
>> some keys per session, then this simply solution can be good enough.
>>
>
> Security vs "good enough in some cases" looks bad to me.
>

We don't find a agreement, because you are concentrated on transation, me
on session. And we have different expectations.


> I think it is possible for some more complex patterns,
>>
>
> I'm not sure of any pattern which would be correct wrt security if it
> depends on the success of a transaction.
>
> but then developer should be smarter, and should to enocode state result
>> to content of variable.
>>
>
> I do not see how the developer can be smarter if they need a transactional
> for security but they do not have it.
>
> There is strong benefit - read write access to variables is very cheap and
>> fast.
>>
>
> I'd say that PostgreSQL is about "ACID & security" first, not "cheap &
> fast" first.
>
> I invite any patch to doc (or everywhere) with explanation and about
>> possible risks.
>>
>
> Hmmm... You are the one proposing the feature...
>
> Here is something, thanks for adjusting it to the syntax you are proposing
> and inserting it where appropriate. Possibly in the corresponding CREATE
> doc?
>
> """
> 
> 
> Beware that session variables are not transactional.
> This is a concern in a security context where the variable must be set to
> some trusted value depending on the success of the transaction:
> if the transaction fails, the variable keeps its trusted value unduly.
> 
>
> 
> For instance, the following pattern does NOT work:
>
> 
> CREATE USER auditer;
> SET ROLE auditer;
> CREATE SESSION VARIABLE is_audited BOOLEAN DEFAULT FALSE ...;
> -- ensure that only "auditer" can write "is_audited":
> REVOKE ... ON SESSION VARIABLE is_audited FROM ...;
>
> -- create an audit function
> CREATE FUNCTION audit_session(...) SECURITY DEFINER AS $$
>   -- record the session and checks in some place...
>   -- then tell it was done:
>   LET is_audited = TRUE;
> $$;
>
> -- the intention is that other security definier functions can check that
> -- the session is audited by checking on "is_audited", eg:
> CREATE FUNCTION only_for_audited(...) SECURITY DEFINER AS $$
>   IF NOT is_audited THEN RAISE "security error";
>   -- do protected stuff here.
> $$;
> 
>
> The above pattern can be attacked with the following approach:
> 
> BEGIN;
> SELECT audit_session(...);
> -- success, "is_audited" is set...
> ROLLBACK;
> -- the audit login has been reverted, but "is_audited" retains its value.
>
> -- any subsequent operation believes wrongly that the session is audited,
> -- but its logging has really been removed by the ROLLBACK.
>
> -- ok but should not:
> SELECT only_for_audited(...);
> 
> 
> 
> """
>
>
It is good example of not supported pattern. It is not designed for this.
I'll merge this doc.

Note: I am not sure, if I have all relations to described issue, but if I
understand well, then solution can be reset on transaction end, maybe reset
on rollback. This is solvable, I'll look how it is complex.


>
> For the record, I'm "-1" on this feature as proposed, for what it's worth,
> because of the misleading security implications. This feature would just
> help people have their security wrong.
>

I respect your opinion - and I hope so integration of your proposed doc is
good warning for users that would to use not transactional variable like
transactional source.

Regards

Pavel


>
> --
> Fabien.
>
>

Re: Adding TCP_USER_TIMEOUT support for libpq/psqlodbc

2018-08-23 Thread AYahorau

Hello ,

Did you take a look into the patches from the previous mail ?
What do you think about my proposal ?
Could you please say if there are any objections in regard to it? 

Thank you in advance, 
Andrei Yahorau



From:   Andrei Yahorau/IBA
To: pgsql-hack...@postgresql.org, 
Cc: Mikalai Keida/IBA@IBA
Date:   20/07/2018 11:43
Subject: Adding TCP_USER_TIMEOUT support for libpq/psqlodbc


Hello PostgreSQL Community!

Not long ago I faced the situation concerning ODBC/libpq client hanging in 
case of some network problems.
I had a discussion regarding this issue within  pgsql-o...@postgresql.org 
and got some suggestions.
Here is this discussion:
https://www.postgresql.org/message-id/OF33DF00A3.D6444835-ON432582C3.003EA7C5-432582C3.0045562B%40iba.by

In a few words the suggestion was to use pqopt keepalive options for ODBC 
configuration for example as follows:
pqopt = keepalives=1 keepalives_idle=5 keepalives_count=1 
keepalives_interval=1

But under some circumstances it can be not reliable and the connection can 
loose its keepalives properties and it can remain hanging. 

Here is a quote from our discussion:
Hmm it seems keepalive stops while waiting for ack.
Therefore it's a matter of retransmission control
You can use TCP_USER_TIMEOUT on linux but the option is not used in libpq.

In my opinion it make sense to add the support of TCP_USER_TIMEOUT socket 
option  to libpq/psqlodbc connection.
The attachment contains a patch with the source code changes regarding 
this issue.
In my implementation it can be configured by new keepalives_user_timeout  
option within pqopt parameter.

[attachment "0001_TCP_USER_TIMEOUT_libpq-int.patch" deleted by Andrei 
Yahorau/IBA] [attachment "0001_TCP_USER_TIMEOUT_fe-connect.patch" deleted 
by Andrei Yahorau/IBA] 

Best regards, 
Andrei Yahorau

Re: csv format for psql

2018-08-23 Thread Fabien COELHO




Bonjour Daniel,


'\n' gets translated by libc when the output is in text mode.
We discussed this upthread, but maybe it should be a code comment:
added now.


My point was more about the documentation which should be clear about what 
is the EOL. I understand from your point above that the EOL is the 
platform-specific one, which is yet again fine with me, but should be said 
clearly in the documentation?


There seems to be a test in the code to set an empty string "" by 
default, but it is unclear to me when this is triggered.


Where is that code?


 + else if (strcmp(param, "fieldsep_csv") == 0)
 + return pset_quoted_string(popt->topt.fieldSepCsv
 +   ? popt->topt.fieldSepCsv
 +   : "");

It is unclear to me when this "" is triggered. Never? If so, maybe a 
comment should say so?


A few comments about v6: Patch applies cleanly, make check ok, doc build 
ok.


Why removing "-C"? As we already have "-A" an "-H", I was fine with it.

It seems that you changed the indentation in "psql-ref.sgml":

  
 -
 + 

but maybe this is just the previous entry which was misaligned...

 -  {"unaligned", "aligned", "wrapped", "html", "asciidoc",
 -   "latex", "latex-longtable", "troff-ms", NULL};
 +  {"aligned", "asciidoc", "csv", "html", "latex", "latex-longtable",
 +   "unaligned", "troff-ms", "wrapped", NULL};

If you want alphabetical, 'u' > 't'.

While testing I found a small issue if "fieldsep_csv" is set to a strange 
value:


  \pset format_csv ',,'
  SELECT ',', ',';
  -- gives the ambiguous:
  

The rule to decide whether to quote should be made safer/smarter. I'd 
suggest that if the string contains any of the caracters used in format 
csv it should be quoted.


--
Fabien.

Re: [HACKERS] proposal: schema variables

2018-08-23 Thread Fabien COELHO




Hello Pavel,


2. holding some session based informations, that can be used in security
definer functions.


Hmmm, I see our disagreement. My point is that this feature is *NOT* fit 
for security-related uses because if the transaction fails the variable 
would keep the value it had if the transaction had not failed...



3. Because it is not transactional, then it allows write operation on read



It is not transactional safe, but it is secure in sense a possibility to
set a access rights.


This is a misleading play on words. It is secure wrt to access right, but 
unsecure wrt security purposes which is the only point for having such a 
feature in the first place.


I understand, so some patterns are not possible, but when you need hold 
some keys per session, then this simply solution can be good enough.


Security vs "good enough in some cases" looks bad to me.


I think it is possible for some more complex patterns,


I'm not sure of any pattern which would be correct wrt security if it 
depends on the success of a transaction.


but then developer should be smarter, and should to enocode state result 
to content of variable.


I do not see how the developer can be smarter if they need a transactional 
for security but they do not have it.


There is strong benefit - read write access to variables is very cheap 
and fast.


I'd say that PostgreSQL is about "ACID & security" first, not "cheap & 
fast" first.



I invite any patch to doc (or everywhere) with explanation and about
possible risks.


Hmmm... You are the one proposing the feature...

Here is something, thanks for adjusting it to the syntax you are proposing 
and inserting it where appropriate. Possibly in the corresponding CREATE 
doc?


"""


Beware that session variables are not transactional.
This is a concern in a security context where the variable must be set to
some trusted value depending on the success of the transaction:
if the transaction fails, the variable keeps its trusted value unduly.



For instance, the following pattern does NOT work:


CREATE USER auditer;
SET ROLE auditer;
CREATE SESSION VARIABLE is_audited BOOLEAN DEFAULT FALSE ...;
-- ensure that only "auditer" can write "is_audited":
REVOKE ... ON SESSION VARIABLE is_audited FROM ...;

-- create an audit function
CREATE FUNCTION audit_session(...) SECURITY DEFINER AS $$
  -- record the session and checks in some place...
  -- then tell it was done:
  LET is_audited = TRUE;
$$;

-- the intention is that other security definier functions can check that
-- the session is audited by checking on "is_audited", eg:
CREATE FUNCTION only_for_audited(...) SECURITY DEFINER AS $$
  IF NOT is_audited THEN RAISE "security error";
  -- do protected stuff here.
$$;


The above pattern can be attacked with the following approach:

BEGIN;
SELECT audit_session(...);
-- success, "is_audited" is set...
ROLLBACK;
-- the audit login has been reverted, but "is_audited" retains its value.

-- any subsequent operation believes wrongly that the session is audited,
-- but its logging has really been removed by the ROLLBACK.

-- ok but should not:
SELECT only_for_audited(...);



"""


For the record, I'm "-1" on this feature as proposed, for what it's worth, 
because of the misleading security implications. This feature would just 
help people have their security wrong.



--
Fabien.

Re: [PATCH] Add regress test for pg_read_all_stats role

2018-08-23 Thread Michael Paquier

On Tue, Aug 21, 2018 at 05:48:49PM +0100, Alexandra Ryzhevich wrote:
> Just to check if changes broke something. I haven't find these checks in
> other regress tests. In other way we get only positive tests. If this
> is not needed then should I remove all the checks for
> regress_role_nopriv role or negative regress_role_nopriv tests only?

+-- should not fail because regress_role_haspriv is a member of
pg_read_all_stats
+SELECT pg_database_size('regression') > 0 AS canread;
What is really disturbing about the ones testing the size functions is
that they may be costly when running installcheck.  By the way, it would
be nice to avoid the system-wide REVOKE, which could impact any tests
running in parallel.  You could simply check for some other NULL
values.

> Changed to use session_preload_libraries.

+-- session_preload_libraries is of PGC_SUSET category
+SET LOCAL session_preload_libraries TO '/tmp/';
+-- vacuum_cost_delay is of PGC_USERSET category
+SET LOCAL vacuum_cost_delay TO 40;

This gets better, thanks.  I would suggest using a less realistic value
than "/tmp/", which could become a security hole if copy-pasted around...
--
Michael


signature.asc
Description: PGP signature

Re: Query is over 2x slower with jit=on

2018-08-23 Thread Andreas Joseph Krogh

På torsdag 23. august 2018 kl. 03:00:42, skrev Jonathan S. Katz <
jk...@postgresql.org >:

 > On Aug 22, 2018, at 7:13 PM, Andres Freund  wrote:
 [snip]
 > For the archives sake: This likely largely is the consequence of
 > building with LLVM's expensive assertions enabled, as confirmed by
 > Jonathan over IM.

 I recompiled with the release version of LLVM. jit=on was still slower,
 but the discrepancy was not as bad as the previously reported result:

 jit = off
 Planning Time: 0.938 ms
 Execution Time: 935.599 ms

 jit = on
 Planning Time: 0.951 ms
 JIT:
   Functions: 184
   Generation Time: 17.605 ms
   Inlining: true
   Inlining Time: 20.522 ms
   Optimization: true
   Optimization Time: 1001.034 ms
   Emission Time: 665.319 ms
 Execution Time: 2491.560 ms

 However, it was still 2x+ slower, so still +1ing for open items.

I compiled with whatever switches LLVM that comes with Ubuntu 18.04 is built 
with, and without debugging or assertions.

--
 Andreas Joseph Krogh

Re: proposal: schema private functions

2018-08-23 Thread Pavel Stehule

2018-08-14 14:00 GMT+02:00 Pavel Stehule :

>
>
> 2018-08-14 13:56 GMT+02:00 Pavel Stehule :
>
>> Hi
>>
>> I would to introduce new flag for routines - PRIVATE. Routines with this
>> flag can be called only from other routines assigned with same schema.
>> Because these routines are not available for top queries, we can hide these
>> functions from some outputs like \df ..
>>
>
> It can be used for code organization purposes, and can be used for
> implementation some security patterns.
>
> The private functions cannot be directly called by user who had not CREATE
> right on schema. But these functions can be evaluated under any user who
> has a access to schema.
>

Same mechanism can be used for SCHEMA VARIABLES - so we can have private
schema variables, that can be used only from functions from same schema.
This can be interesting security mechanism how to run more code without
security definer flag.

Regards

Pavel


> Regards
>
> Pavel
>
>
>> Example:
>>
>> CREATE SCHEMA s1;
>> CREATE SCHEMA s2;
>>
>> CREATE OR REPLACE FUNCTION s1.nested()
>> RETURNS int AS $$
>> BEGIN
>>   RETURN random()*1000;
>> END;
>> $$ PRIVATE;
>>
>> SELECT s1.nested(); -- fails - it is top query
>>
>> CREATE OR REPLACE FUNCTION s1.fx()
>> RETURNS int AS $$
>> BEGIN
>>   RETURN s1.nested();
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> SELECT s1.fx(); -- will work
>>
>> CREATE OR REPLACE FUNCTION s2.fx()
>> RETURNS int AS $$
>> BEGIN
>>   RETURN s1.nested();
>> END;
>> $$ LANGUAGE plpgsql;
>>
>> SELECT s2.fx(); -- fails - it call private function from other schema.
>>
>> This proposal is simple, and strong enough to separate functions that can
>> be directly callable and auxiliary functions, that can be called from other
>> functions.
>>
>> I wrote PoC implementation, and it is not hard, and it should not to
>> impact performance. I introduced query_owner_nspid into query environment.
>> When any functions has flag private, then query_owner_nspid should be same
>> like function namespace id.
>>
>> Comments, notes?
>>
>> Regards
>>
>> Pavel
>>
>>
>

63 matches

Mail list logo