Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-02 Thread Antonin Houska
Tomas Vondra  wrote:

> OK, so the reason is that when building child paths, we don't keep the
> pathkeys unless it matches the "interesting" pathkeys.
> 
> So for example we may have an IndexPath, but with pathkeys=NIL if the
> index does not match the ORDER BY we need.

I don't agree that IndexPath will necessarily have pathkeys set to NIL in such
a case. Even if the index ordering does not match ORDER BY clause of the
query, the path can still be useful, typically for merge join.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: CF app feature request

2018-11-02 Thread Fabien COELHO




I don't know enough about the app internals to comment. But it probably
shouldn't appear in the stats, or else should have its own category in the
stats.


Or that's closer to "Rejected by the author himself"?  "Withdrawn"
sounds like a good term for that, we surely don't want to simply remove
the thing entirely either.  What's actually the issue with not tracking
such things in the stats?


Because the same patch submission is already counted? It is a rare 
occurence, so just a "Withdrawn" state could be enough, and slightly false 
CF stats are no big deal.


--
Fabien.



Re: Small run-time pruning doc fix

2018-11-02 Thread Peter Eisentraut
On 08/10/2018 10:22, David Rowley wrote:
> Before 5220bb7533f a note in ddl.sgml used to mention that run-time
> pruning was only implemented for Append. When we got MergeAppend
> support the commit updated this to mention MergeAppend is supported
> too. This is slightly weird as it's not all that obvious what exactly
> isn't supported when we mention:
> 
> 
>  Both of these behaviors are likely to be changed in a future release
>  of PostgreSQL.
> 
> 
> The attached patch updates this to mention that ModifyTable is
> unsupported which I think makes the above fragment make sense again.

 
- Execution-time partition pruning currently occurs for the
+ Execution-time partition pruning currently only occurs for the
  Append and MergeAppend node
types.
+ It is not yet implemented for the ModifyTable node
+ type.
 

Isn't this implied by the preceding paragraph

Currently, pruning of partitions during the planning of an UPDATE or
DELETE command is implemented using the constraint exclusion method

?

Also, could there be other node types that could benefit from partition
pruning that are not implemented yet?  Not sure we want to mention all
of them.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgbench doc fix

2018-11-02 Thread Fabien COELHO



Robert,


Yes, you need to send params (thus send bind message) anyway.
Regarding re-parsing, maybe you mixed up parse-analythis with
planning? Re-parse-analythis can only be avoided if you can reuse
named (or unnamed) parepared statements.


So given this, I'm struggling to see anything wrong with the current
wording.


ISTM that the point is not that it is wrong, but it could be more precise.


I mean, if you say that you are reusing prepared statements,


It does not say "reuse" explicitely, it says

 "prepared: use extended query protocol with prepared statements."

but the extended protocol does always "prepare" statements before 
executing them, the difference are that with "-M prepared" (1) it is done 
just once and (2) named so that it can be indeed reused.


Note that "extended" prepares much more statements than "prepared":-)


someone will assume that you are avoiding preparing them repeatedly,
which -M extended will not do ... and by the nature of that approach, 
cannot do.


Sure. At the protocol level "prepare" is slightly imprecise, and the 
documentation is about the protocol used.


So I do not think a more precise wording harms. Maybe: "prepared: use 
extended query protocol with REUSED named prepared statements" would be 
even less slightly ambiguous.


--
Fabien.



Re: wal_dump output on CREATE DATABASE

2018-11-02 Thread Peter Eisentraut
On 26/10/2018 15:53, Jean-Christophe Arnu wrote:
> Exemple on CREATE DATABASE (without defining a template database) :
> rmgr: Database    len (rec/tot): 42/    42, tx:    568, lsn:
> 0/01865790, prev 0/01865720, desc: CREATE copy dir 1/1663 to 16384/1663
> 
> It comes out (to me) it may be more consistent to use the same template
> than the other operations in pg_waldump.
> I propose to swap the parameters positions for the copy dir operation
> output.
> 
> You'll find a patch file included which does the switching job :
> rmgr: Database    len (rec/tot): 42/    42, tx:    568, lsn:
> 0/01865790, prev 0/01865720, desc: CREATE copy dir 1663/1 to 1663/16384

I see your point.  I suspect this was mainly implemented that way
because that's how the fields occur in the underlying
xl_dbase_create_rec structure.  (But that structure also has the target
location first, so it's not entirely consistent that way either.)  We
could switch the fields around in the output, but I wonder whether we
couldn't make the whole thing a bit more readable like this:

desc: CREATE copy dir ts=1663 db=1 to ts=1663 db=16384

or maybe like this

desc: CREATE copy dir (ts/db) 1663/1 to 1663/16384

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: CF app feature request

2018-11-02 Thread Michael Paquier
On Fri, Nov 02, 2018 at 08:17:51AM +0100, Fabien COELHO wrote:
> Because the same patch submission is already counted? It is a rare
> occurence, so just a "Withdrawn" state could be enough, and slightly false
> CF stats are no big deal.

Or as we are dealing with duplicated entries, perhaps we could just
delete the entry not wanted, which seems to be 1859 in this case.
Admins can do so.
--
Michael


signature.asc
Description: PGP signature


Re: Getting ERROR: could not open file "base/13164/t3_16388" with partition table with ON COMMIT

2018-11-02 Thread Amit Langote
Hi,

On 2018/11/02 14:27, Michael Paquier wrote:
> On Fri, Nov 02, 2018 at 02:18:04PM +0900, Michael Paquier wrote:
>> This case is funky.

Interesting indeed.

>>  The parent gets dropped at commit time, but it does
>> not know that it should drop the child as well per their dependencies.
>> This actually goes into the internals of performDeletion(), which is
>> scary to touch on back-branches just for such cases..

Well, performDeletion *does* drop the child, because when the parent is
dropped due to its ON COMMIT DROP action, it's done using:

/*
 * Since this is an automatic drop, rather than one
 * directly initiated by the user, we pass the
 * PERFORM_DELETION_INTERNAL flag.
 */
performDeletion(&object,
DROP_CASCADE, PERFORM_DELETION_INTERNAL);

Note the DROP_CASCADE, which means its children will be deleted as part of
this.

> A bit more fun with inheritance:
> =# begin;
> BEGIN
> =# create temp table aa_p (a int) on commit drop;
> CREATE TABLE
> =# create temp table aa_c (a int) inherits (aa_p) on commit delete rows;
> NOTICE:  0: merging column "a" with inherited definition
> LOCATION:  MergeAttributes, tablecmds.c:2339
> CREATE TABLE
> =# insert into aa_p values (1);
> INSERT 0 1
> =# insert into aa_c values (1);
> INSERT 0 1
> =# commit;
> NOTICE:  0: drop cascades to table aa_c
> LOCATION:  reportDependentObjects, dependency.c:995
> ERROR:  XX000: could not open relation with OID 16426
> LOCATION:  relation_open, heapam.c:1138

I think the problem here is that the ON COMMIT DROP action on the parent
is executed before the ON COMMIT DELETE ROWS on the child.  The first
action drops the child along with its parent, causing the 2nd one to fail
upon not finding the table to apply heap_truncate() to.  It's
heap_truncate() that produces "ERROR: could not open relation with OID
xxx".  So, the NOTICE comes from the *successful* dropping of the child
and the ERROR comes from failure of heap_truncate() to find it.

By the way, this error is same as in the partitioned case.  It's simply
that the dependency type is different for regular inheritance child tables
than it is for partitions.  The former requires a user command to specify
CASCADE when dropping via parent, whereas the latter doesn't.  So, you see
the NOTICE in the regular inheritance case, whereas you don't in the
partitioning case.  The error itself is same in both cases.

> Let's treat that as a separate issue, as this happens also with an
> unpatched build.  The only reason why you cannot trigger it with
> partitions is that ON COMMIT is currently broken for them, so we should
> fix the reported case first.  In consequence, I would tend to commit the
> patch proposed and take care of the first, except if of course anybody
> has an objection. 

Agreed that they're two independent issues, although it wouldn't be such a
bad idea to fix them in one go, as they're both issues related to the
handling of ON COMMIT actions on tables in inheritance trees.

I've created a patch for the other issue and attached both.

Thanks,
Amit
>From 5d1cb1beda9a4d3c020933e85715cdd2423e5ba7 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 2 Nov 2018 15:26:37 +0900
Subject: [PATCH v5 2/2] Check relation still exists before applying ON COMMIT
 action

---
 src/backend/commands/tablecmds.c   | 17 +
 src/test/regress/expected/temp.out | 23 +++
 src/test/regress/sql/temp.sql  | 15 +++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ba76c3f9b9..7f742dba7b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -13291,6 +13291,23 @@ PreCommit_on_commit_actions(void)
}
if (oids_to_truncate != NIL)
{
+   List   *tmp = oids_to_truncate;
+
+   /*
+* Check that relations we're about to truncate still exist.  
Some of
+* them might be inheritance children, which would be gone as a 
result
+* of their parent being dropped due to ONCOMMIT_DROP action of 
its
+* own.
+*/
+   oids_to_truncate = NIL;
+   foreach(l, tmp)
+   {
+   Oid relid = lfirst_oid(l);
+
+   if (SearchSysCacheExists1(RELOID, 
ObjectIdGetDatum(relid)))
+   oids_to_truncate = 
lappend_oid(oids_to_truncate, relid);
+   }
+
heap_truncate(oids_to_truncate);
CommandCounterIncrement();  /* XXX needed? */
}
diff --git a/src/test/regress/expected/temp.out 
b/src/test/regress/expected/temp.out
index ed2c591fc6..2cfc5debf1 100644
--- a/src/test/regress/expected/temp.out
+++ b/src/test/regress/expected/temp.out
@@ -213,3 +213,26 @@ select * fr

Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-02 Thread Peter Eisentraut
On 02/11/2018 06:04, Pavel Stehule wrote:
> čt 1. 11. 2018 v 14:31 odesílatel Pavel Stehule  > napsal:
> 
> Cleaned patch with regress tests
> 
> 
> minor cleaning

Could you explain your analysis of the problem and how this patch
proposes to fix it?

About the patch, I suspect printing out

NOTICE:  

in the regression tests might lead to unstable results if there is
concurrent activity.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: CVE-2017-7484-induced bugs, or, btree cmp functions are not leakproof?

2018-11-02 Thread Amit Langote
On 2018/11/01 20:34, Dilip Kumar wrote:
> On Mon, Oct 29, 2018 at 2:53 PM Amit Langote wrote:
>> Anyway, why don't we just use the child table's AppendRelInfo to get the
>> parent's version of varattno instead of creating a new function?  It can
>> be done as shown in the attached revised version of the portion of the
>> patch changing selfuncs.c.  Please take a look.
> 
> +1

Okay, here are two patches:

0001 adds a new RelOptInfo member inh_root_parent that's set for
inheritance child otherrels and contains the RT index of the inheritance
parent table mentioned in the query from which they originated.

0002 is your patch that modifies examine_variable, etc. to use the
permissions granted on parent before reading stats on otherrel inheritance
child tables. I've added your name as the author in the 2nd patch.

Thanks,
Amit
From 04c2c1c2876995322309dc791866c617e2c4a539 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 26 Oct 2018 16:45:59 +0900
Subject: [PATCH 1/2] Store inheritance root parent index in otherrel's
 RelOptInfo

Although it's set by build_simple_rel, it's not being used by any
code yet.
---
 src/backend/nodes/outfuncs.c |  1 +
 src/backend/optimizer/util/relnode.c | 14 ++
 src/include/nodes/relation.h |  4 
 3 files changed, 19 insertions(+)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69731ccdea..53a657c0ae 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2371,6 +2371,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(consider_partitionwise_join);
WRITE_BITMAPSET_FIELD(top_parent_relids);
WRITE_NODE_FIELD(partitioned_child_rels);
+   WRITE_UINT_FIELD(inh_root_parent);
 }
 
 static void
diff --git a/src/backend/optimizer/util/relnode.c 
b/src/backend/optimizer/util/relnode.c
index 39f5729b91..29ba19349f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -215,9 +215,23 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo 
*parent)
rel->top_parent_relids = parent->top_parent_relids;
else
rel->top_parent_relids = bms_copy(parent->relids);
+
+   /*
+* For inheritance child relations, we also set inh_root_parent.
+* Note that 'parent' might itself be a child (a sub-partitioned
+* partition), in which case we simply use its value of
+* inh_root_parent.
+*/
+   if (parent->rtekind == RTE_RELATION)
+   rel->inh_root_parent = parent->inh_root_parent > 0 ?
+   
parent->inh_root_parent :
+   
parent->relid;
}
else
+   {
rel->top_parent_relids = NULL;
+   rel->inh_root_parent = 0;
+   }
 
/* Check type of rtable entry */
switch (rte->rtekind)
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 88d37236f7..bbc1408d05 100644
--- a/src/include/nodes/relation.h
+++ b/src/include/nodes/relation.h
@@ -703,6 +703,10 @@ typedef struct RelOptInfo
List  **partexprs;  /* Non-nullable partition key 
expressions. */
List  **nullable_partexprs; /* Nullable partition key expressions. 
*/
List   *partitioned_child_rels; /* List of RT indexes. */
+
+   Index   inh_root_parent;/* For otherrels, this is the 
RT index of
+* 
inheritance table mentioned in the query
+* from 
which this relation originated */
 } RelOptInfo;
 
 /*
-- 
2.11.0

From e7ce03017391f7472a9d30b039447d66f1d956c6 Mon Sep 17 00:00:00 2001
From: amit 
Date: Fri, 2 Nov 2018 16:48:25 +0900
Subject: [PATCH 2/2] Use permissions granted on parent to read stats on
 otherrel children

Author: Dilip Kumar
Reviewed by: Amit Langote
---
 src/backend/utils/adt/selfuncs.c | 58 +---
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index e0ece74bb9..698a4eca63 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4924,8 +4924,18 @@ examine_variable(PlannerInfo *root, Node *node, int 
varRelid,
{
/* Get index's 
table for permission check */
RangeTblEntry 
*rte;
+   RelOptInfo  
*rel = index->rel;
+
+

Re: bugfix: BUG #15477: Procedure call with named inout refcursor parameter - "invalid input syntax for type boolean"

2018-11-02 Thread Pavel Stehule
pá 2. 11. 2018 v 9:02 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> On 02/11/2018 06:04, Pavel Stehule wrote:
> > čt 1. 11. 2018 v 14:31 odesílatel Pavel Stehule  > > napsal:
> >
> > Cleaned patch with regress tests
> >
> >
> > minor cleaning
>
> Could you explain your analysis of the problem and how this patch
> proposes to fix it?
>

The original code works with original function arguments - and this compare
with argmodes fields. But when you use named args, this expectation is not
valid.

So cycle

foreach(lc, args)
{
   if (armodes[i] ==  PROARGMODE_INOUT)
   {
   }
}

is just wrong

There are another issue

row->varnos[nexpr->argnumber] = param->paramid - 1

It doesn't work if not all named args are INOUT

In case mentioned in bug report this row generated a operation

row->varnos[1] = param->paramid - 1; // in this case a argnumber was 1

but varnos[0] was unitialized .. mostly there was 0 what is number of FOUND
variable - and then result and error message about wrong boolean value.

So I reorder input arguments to correct order (against a info from
argmodes), and that is all.

Maybe it can be better to work with translated arg list instead original
list (then reordering is not necessary), but I am not sure if this
information is available from plan, and reordering in this case is O(N)
operation, where N is low, so it should not to have a impact on
performance.

Regards

Pavel








> About the patch, I suspect printing out
>
> NOTICE:  
>
> in the regression tests might lead to unstable results if there is
> concurrent activity
>

true, this test can be better based. There can be any string.


>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Vacuum Full does not release the disk size space after delete from table

2018-11-02 Thread Haozhou Wang
Hi hackers,

We meet a corner case that related to the behavior of Vacuum Full.

There are two SQL scripts

SQL1:
-
-- Test vacuum full

create schema s2;

set search_path to s2;

create table a (i int);

create table b (i int);

insert into a select generate_series(1,5);

delete from a where i > 10;

-- Disk space should release after vacuum full

vacuum full a;

drop table a, b;

reset search_path;

drop schema s2;
-

SQL2:
--
create schema s1;

set search_path to s1;

create table c(i int);

insert into c select generate_series(1,5);

select pg_sleep(5);

reset search_path;

drop schema s1;
---

If we run both sql scripts on same database in parallel, the "VACUUM FULL
a;" will not release the disk space.

But if we run then on same database in sequence, "VACUUM FULL a" can
release the disk space.

We use the PG master branch and the autovacuum has set to off.

Could someone please help us to figure out the root cause of this issue?

Thanks a lot!

-- 
Regards,
Haozhou


Re: CF app feature request

2018-11-02 Thread Fabien COELHO


Bonjour Michaël,


Because the same patch submission is already counted? It is a rare
occurence, so just a "Withdrawn" state could be enough, and slightly false
CF stats are no big deal.


Or as we are dealing with duplicated entries, perhaps we could just
delete the entry not wanted, which seems to be 1859 in this case.
Admins can do so.


Good. Please proceed!

So the solution is to contact an admin (no clear cue about who is an 
admin, though) to remove the entry.


--
Fabien.

pg_ls_dir_files issue with concurrent file unlink

2018-11-02 Thread Sergei Kornilov
Hello
We have some functions for directory listing, e.g. pg_ls_dir, pg_ls_logdir, 
pg_ls_waldir. All of them call pg_ls_dir_files function 
(src/backend/utils/adt/genfile.c) and i have question about per file error in 
this function.

>   /* Get the file info */
>   snprintf(path, sizeof(path), "%s/%s", fctx->location, 
> de->d_name);
>   if (stat(path, &attrib) < 0)
>   ereport(ERROR,
>   (errcode_for_file_access(),
>errmsg("could not stat directory 
> \"%s\": %m", dir)));

We tried stat file in direectory, but error say only about directory. For 
example it is possible have very unexpected (and fearful) error for pg_ls_dir 
during checkpoint:

> postgres=# SELECT sum(size) AS size_bytes FROM pg_ls_waldir();
> ERROR:  could not stat directory "pg_wal": No such file or directory

It is expected? Maybe it is better skip ENOENT files during directory listing? 
Or change error to "could not stat file in directory \"%s\"" if we do not want 
report filename for some reason. Or report error for file path instead of dir.

regards, Sergei



Re: [HACKERS] Block level parallel vacuum

2018-11-02 Thread Masahiko Sawada
Hi,

On Thu, Nov 1, 2018 at 2:28 PM Yura Sokolov  wrote:
>
> Excuse me for being noisy.
>
> Increasing vacuum's ring buffer improves vacuum upto 6 times.
> https://www.postgresql.org/message-id/flat/20170720190405.GM1769%40tamriel.snowman.net
> This is one-line change.
>
> How much improvement parallel vacuum gives?

It depends on hardware resources you can use.

In current design the scanning heap and vacuuming heap are procesed
with parallel workers at block level (using parallel sequential scan)
and the vacuuming indexes are also processed with parallel worker at
index-level. So even if a table is not large enough the more a table
has indexes you can get better performance. The performance test
result (I attached) I did before shows that parallel vacuum is up to
almost 10 times faster than single-process vacuum in a case. The test
used not-large table (4GB table) with many indexes but it would be
insteresting to test with large table.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


Re: Resetting PGPROC atomics in ProcessInit()

2018-11-02 Thread Amit Kapila
On Sat, Oct 27, 2018 at 9:42 PM Amit Kapila  wrote:
>
> On Sat, Oct 27, 2018 at 8:52 PM Andres Freund  wrote:
> > On October 27, 2018 3:36:45 PM GMT+01:00, Amit Kapila 
> >  wrote:
> > >On Sat, Oct 27, 2018 at 4:11 PM Andres Freund 
> > >wrote:
> > >>
> > >> Hi,
> > >>
> > >> I just noticed, while working on a patch adding things to PGPROC,
> > >that
> > >> the group clearning patches for the proc array and clog reset atomics
> > >in
> > >> InitProcess().
> > >>
> > >> I'm not a big fan of that, because it means that it's not safe to
> > >look
> > >> at the atomics of backends that aren't currently in use.  Is there
> > >any
> > >> reason to not instead initialize them in InitProcGlobal() and just
> > >> assert in InitProcess() that they're 0?
> > >>
> > >
> > >It seems the code written has followed a natural practice i.e PGPROC
> > >members are initialized in InitProcess and ProcGlobal members (like
> > >procArrayGroupFirst) are initialized in InitProcGlobal.  For your use
> > >case, can't you look at procArrayGroupFirst?  If not, then I think we
> > >can do what you are saying as I don't see a problem in initializing
> > >them in InitProcGlobal.
> >
> > In my opinion that's an argument for resetting the contents with 
> > pg_atomic_write, but not reinitializing the atomic
> >
>
> Okay, makes sense.
>
> > (which could reset the spinlock inside while somebody else holds it).
> >
>
> This part is not clear to me, how can this happen?  I think we only
> access these variable for active procs which means no-one can hold it
> till it's reinitialized.
>
> > It's not really a problem for me, but I think the code is pretty much wrong 
> > like this...
> >
>
> I think I understand why it is better to write the way you are
> suggesting, but not clear how the current code can lead to a problem,
> can you please explain in more detail?
>

You haven't confirmed on this part.

Do you want to see this change?  I think if we make this change, we
should backport this as well and I am not sure if we should make such
a change without a strong reason in back-branches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-11-02 Thread Andrey Lepikhov

I do the code review.
Now, it is first patch - v6-0001... dedicated to a logical duplicates 
ordering.


Documentation is full and clear. All non-trivial logic is commented 
accurately.


Patch applies cleanly on top of current master. Regression tests passed 
and my "Retail Indextuple deletion" use cases works without mistakes.

But I have two comments on the code.
New BTScanInsert structure reduces parameters list of many functions and 
look fine. But it contains some optimization part ('restorebinsrch' 
field et al.). It is used very locally in the code - 
_bt_findinsertloc()->_bt_binsrch() routines calling. May be you localize 
this logic into separate struct, which will passed to _bt_binsrch() as 
pointer. Another routines may pass NULL value to this routine. It is may 
simplify usability of the struct.


Due to the optimization the _bt_binsrch() size has grown twice. May be 
you move this to some service routine?



--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company



Re: INSTALL file

2018-11-02 Thread Stephen Frost
Greetings,

* Michael Paquier (mich...@paquier.xyz) wrote:
> On Thu, Nov 01, 2018 at 08:42:34PM -0400, Stephen Frost wrote:
> > If we go down this route, the master branch should probably link to the
> > regularly built devel documentation, so that if/when we do make such a
> > change, we'll point people at the updated documentation too.
> 
> I don't know how others feel about such things, so I may be
> overengineering a simple problem as well :)
> 
> Also, I have not looked in details at the perl tools used to change the
> version number in the tree, but we'd likely want a solution which
> updates automatically the README also in this case depending on the
> version number, perhaps also making sure that for a git repository with
> the master branch in use we don't want to point to the development
> version of the docs.

Updating version_stamp.pl should be pretty straight-forward to have it
also update the README for the back-branches and it shouldn't be hard to
teach it how to ignore the README on the master branch.

As for what's in the README on the master branch, I was saying that it
*should* point to the development documentation, since that should be
current with whatever is actually in the git repo (or only a day behind
or such).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: [PATCH] Change simple_heap_insert() to a macro

2018-11-02 Thread Peter Eisentraut
On 12/10/2018 12:09, Andrey Klychkov wrote:
> I don't see any reasons against to use the proposed macro instead of
> this function.

Macros are weird and should be avoided if possible.  If we were to do
this, it should be an inline function, I think.  But I think it's not
useful here.

I think there have been enough votes against this that I'll close this
CF item.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: zheap: a new storage format for PostgreSQL

2018-11-02 Thread Amit Kapila
On Thu, Nov 1, 2018 at 7:26 PM Tomas Vondra
 wrote:
>
> On 11/01/2018 07:43 AM, Amit Kapila wrote:
> >
> > You can find the latest code at https://github.com/EnterpriseDB/zheap
> >
>
> Seems valgrind complains about a couple of places in the code - nothing
> major, might be noise, but probably worth a look.
>

I have looked at the report and one of those seems to be problematic,
so I have pushed the fix for the same.  The other one for below stack
seems to be bogus:
==7569==  Uninitialised value was created by a stack allocation
==7569==at 0x59043D: znocachegetattr (zheapam.c:6206)
==7569==
{
   
   Memcheck:Cond
   fun:ZHeapDetermineModifiedColumns
   fun:zheap_update

I have checked in the function znocachegetattr that if we initialize
the value of ret_datum, it fixes the reported error, but actually
there is no need for doing it as the code always assign the valid
value to this variable.  I have left it as is for now as I am not sure
whether there is any value in doing such an initialization.

Thanks for the report.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Christian Ohler  writes:
> > On Wed, Oct 31, 2018 at 7:22 AM Tom Lane  wrote:
> >> If we're going to expose the
> >> internal format, let's just change the definition of the type's binary
> >> I/O format, thereby getting a win for purposes like COPY BINARY as well.
> 
> > How would this work from the driver's and application's perspective?  What
> > does the driver do when receiving JSONB data?
> 
> Well, upthread it was posited that applications that read binary JSONB
> data would be willing to track changes in that format (or else have no
> need to, because they don't do anything with it except feed it back to the
> server).  If that isn't the case, then this entire thread is a waste of
> time.  I certainly don't buy that exposing the internal format via some
> other mechanism than binary I/O would be a sufficient excuse for not
> worrying about cross-version compatibility.

Yes, I agree that the applications/libraries would need to be expecting
and able to handle changes to the format, though, of course, we'd only
change the format across major versions- we wouldn't do it in a point
release.  There might be some argument for supporting multiple versions
also, though we don't really have support for anything like that today,
unfortunately.

> > The idea behind the proposal is to improve efficiency by avoiding
> > conversions, and the most straightforward way to do that is for every layer
> > to pass through the raw bytes.
> 
> This argument is, frankly, as bogus as it could possibly be.  In the first
> place, you're essentially saying that ignoring version compatibility
> considerations entirely is the way to avoid future version compatibility
> problems.  I don't buy it.  In the second place, you already admitted
> that format conversion *is* necessary; what PG finds best internally is
> unlikely to be exactly what some other piece of software will want.
> So we'd be better off agreeing on some common interchange format.

There will definitely need to be *some* kind of conversion happening,
the point of doing this would be to give us similar benefits as we have
when passing binary data with the PG protocol today- sure, the data as
the application gets it back probably isn't exactly in the format the
application would like, but at least it's just moving bytes around
typically (network byte order to host byte order, putting things into
some application-side structure, etc).  Today, with jsonb, we have the
data stored as binary, but we have to go through the
binary->text->binary conversion for every binary value and that's far
from free.

> I'm still bemused by the proposition that that common interchange format
> shouldn't be, um, JSON.  We've already seen BSON, BJSON, etc die
> well-deserved deaths.  Why would jsonb internal format, which was never
> for one second intended to be seen anywhere outside PG, be a better
> interchange-format design than those were?

I'd suggest that it's because our jsonb format doesn't have the
limitations that the others had.  That said, I also agree that we
wouldn't stream out the *exact* format that's on disk, but the point, at
least to me, is to avoid bouncing back and forth between binary and text
representation of things like integers, floats, timestamps, etc, where
we already support binary-format results being sent out of PG to an
application.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Amit Kapila
On Wed, Oct 31, 2018 at 10:29 PM Robert Haas  wrote:
>
> On Tue, Oct 23, 2018 at 9:42 AM John Naylor  wrote:
> > A case could be made for setting the threshold to 4, since not having
> > 3 blocks of FSM in shared buffers exactly makes up for the 3 other
> > blocks of heap that are checked when free space runs out.
>
> That doesn't seem like an unreasonable argument.  I'm not sure whether
> the right threshold is 4 or something a little bigger, but I bet it's
> not very large.  It seems important to me that before anybody thinks
> about committing this, we construct some kind of destruction case
> where repeated scans of the whole table are triggered as frequently as
> possible, and then run that test with varying thresholds.
>

Why do you think repeated scans will be a destruction case when there
is no FSM for a small table?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: COPY FROM WHEN condition

2018-11-02 Thread Surafel Temesgen
hi,

On Wed, Oct 31, 2018 at 10:54 AM Masahiko Sawada 
wrote:

> On Tue, Oct 30, 2018 at 11:47 PM Surafel Temesgen 
> wrote:
>
> I've looked at this patch and tested.
>
> When I use a function returning string in WHEN clause I got the following
> error:
>
> =# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) = 'hello');
> ERROR:  could not determine which collation to use for lower() function
> HINT:  Use the COLLATE clause to set the collation explicitly.
> CONTEXT:  COPY hoge, line 1: "1,hoge,2018-01-01"
>
> And then although I specified COLLATE I got an another error (127 =
> T_CollateExpr):
>
> =# copy test from '/tmp/aaa.csv' (format 'csv') when (lower(t) collate
> "en_US" = 'hello');
> ERROR:  unrecognized node type: 127
>
> This error doesn't happen if I put the similar condition in WHEN
> clause for triggers. I think the patch needs to produce a reasonable
> error message.
>
> The attached patch include a fix for it .can you confirm it
regards
Surafel
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68d95..f9350afcdb 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -25,6 +25,7 @@ PostgreSQL documentation
 COPY table_name [ ( column_name [, ...] ) ]
 FROM { 'filename' | PROGRAM 'command' | STDIN }
 [ [ WITH ] ( option [, ...] ) ]
+[ WHEN ( condition ) ]
 
 COPY { table_name [ ( column_name [, ...] ) ] | ( query ) }
 TO { 'filename' | PROGRAM 'command' | STDOUT }
@@ -364,6 +365,30 @@ COPY { table_name [ ( 

 
+   
+WHEN
+
+   
+The optional WHEN clause has the general form
+
+WHEN condition
+
+where condition is
+any expression that evaluates to a result of type
+boolean.  Any row that does not satisfy this
+condition will not be inserted to the table.  A row satisfies the
+condition if it returns true when the actual row values are
+substituted for any variable references.
+   
+
+   
+Currently, WHEN expressions cannot contain
+subqueries.
+   
+
+
+   
+
   
  
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 9bc67ce60f..5e6c1d6133 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -38,7 +38,11 @@
 #include "miscadmin.h"
 #include "optimizer/clauses.h"
 #include "optimizer/planner.h"
+#include "optimizer/prep.h"
 #include "nodes/makefuncs.h"
+#include "parser/parse_coerce.h"
+#include "parser/parse_collate.h"
+#include "parser/parse_expr.h"
 #include "parser/parse_relation.h"
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
@@ -147,6 +151,7 @@ typedef struct CopyStateData
 	bool		convert_selectively;	/* do selective binary conversion? */
 	List	   *convert_select; /* list of column names (can be NIL) */
 	bool	   *convert_select_flags;	/* per-column CSV/TEXT CS flags */
+	Node	   *whenClause;	/* WHEN condition (or NULL) */
 
 	/* these are just for error messages, see CopyFromErrorCallback */
 	const char *cur_relname;	/* table name for error messages */
@@ -178,6 +183,7 @@ typedef struct CopyStateData
 	ExprState **defexprs;		/* array of default att expressions */
 	bool		volatile_defexprs;	/* is any of defexprs volatile? */
 	List	   *range_table;
+	ExprState  *qualexpr;
 
 	TransitionCaptureState *transition_capture;
 
@@ -797,6 +803,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 	Relation	rel;
 	Oid			relid;
 	RawStmt*query = NULL;
+	Node*whenClause= NULL;
 
 	/*
 	 * Disallow COPY to/from file or program except to users with the
@@ -849,6 +856,25 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 		rte = addRangeTableEntryForRelation(pstate, rel, NULL, false, false);
 		rte->requiredPerms = (is_from ? ACL_INSERT : ACL_SELECT);
 
+		if (stmt->whenClause)
+		{
+			/* add rte to column namespace  */
+			addRTEtoQuery(pstate, rte, false, true, true);
+
+			/* Transform the raw expression tree */
+			whenClause = transformExpr(pstate, stmt->whenClause, EXPR_KIND_COPY_FROM_WHEN);
+
+			/*  Make sure it yields a boolean result. */
+			whenClause = coerce_to_boolean(pstate, whenClause, "WHEN");
+			/* we have to fix its collations too */
+			assign_expr_collations(pstate, whenClause);
+
+			whenClause = eval_const_expressions(NULL, whenClause);
+
+			whenClause = (Node *) canonicalize_qual((Expr *) whenClause, false);
+			whenClause = (Node *) make_ands_implicit((Expr *) whenClause);
+		}
+
 		tupDesc = RelationGetDescr(rel);
 		attnums = CopyGetAttnums(tupDesc, rel, stmt->attlist);
 		foreach(cur, attnums)
@@ -997,6 +1023,7 @@ DoCopy(ParseState *pstate, const CopyStmt *stmt,
 
 		cstate = BeginCopyFrom(pstate, rel, stmt->filename, stmt->is_program,
 			   NULL, stmt->attlist, stmt->options);
+		cstate->whenClause=whenClause;
 		*processed = CopyFrom(cstate);	/* copy from file to database */
 		EndCopyFrom(cstate);
 	}
@@ -2534,6 +2561,10 @@ CopyFrom(CopyState cstate)
 			ExecSetupChildParentMapForLeaf(proute);
 	}
 
+	if (cstate->whenClause)
+		cstate->qualexpr = ExecInit

Re: COPY FROM WHEN condition

2018-11-02 Thread Daniel Verite
Pavel Stehule wrote:

> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> > numeric, c text, d date, e json) )
> > WHERE x.d >= '2018-11-01'
> >
> >
> Without some special feature this example is not extra useful. It is based
> on copy on server that can use only super user with full rights.

And if superuser, one might use the file data wrapper [1]  to get
the same results without the need for the explicit COPY in the query.

BTW, this brings into question whether the [WHEN condition] clause
should be included in the options of file_fdw, as it supports pretty
much all COPY options.

Also, psql's \copy should gain the option too?


[1] https://www.postgresql.org/docs/current/static/file-fdw.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: csv format for psql

2018-11-02 Thread Daniel Verite
Michael Paquier wrote:

> If you can produce a new  version, please feel free to post it.

Here's a rebased version with a couple regression tests
added per the discussions during the previous CF.

Now at https://commitfest.postgresql.org/20/1861/

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 13a8b68..98147ef 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -672,6 +672,10 @@ COPY count
 
   
CSV Format
+   
+CSV
+in COPY
+   
 

 This format option is used for importing and exporting the Comma
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index eb9d93a..7617c5e 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -152,6 +152,20 @@ EOF
 
 
 
+  --csv
+  
+  
+  
+   CSV
+   in psql
+  
+  Switches to CSV output mode. This is equivalent
+  to \pset format csv.
+  
+  
+
+
+
   -d dbname
   --dbname=dbname
   
@@ -2557,6 +2571,19 @@ lo_import 152801
   
 
   
+  fieldsep_csv
+  
+  
+  Specifies the field separator to be used in the
+  CSV format. When the separator appears in a field
+  value, that field is output inside double quotes according to
+  CSV rules. To set a tab as field separator, type
+  \pset fieldsep_csv '\t'. The default is a comma.
+  
+  
+  
+
+  
   fieldsep_zero
   
   
@@ -2584,12 +2611,11 @@ lo_import 152801
   format
   
   
-  Sets the output format to one of unaligned,
-  aligned, wrapped,
-  html, asciidoc,
+  Sets the output format to one of aligned,
+  asciidoc, csv, 
html,
   latex (uses tabular),
-  latex-longtable, or
-  troff-ms.
+  latex-longtable, troff-ms,
+  unaligned, or wrapped.
   Unique abbreviations are allowed.  (That would mean one letter
   is enough.)
   
@@ -2597,14 +2623,27 @@ lo_import 152801
   unaligned format writes all columns of a 
row on one
   line, separated by the currently active field separator. This
   is useful for creating output that might be intended to be read
-  in by other programs (for example, tab-separated or comma-separated
-  format).
+  in by other programs.
   
 
   aligned format is the standard, 
human-readable,
   nicely formatted text output;  this is the default.
   
 
+ csv format writes columns separated by
+ commas, applying the quoting rules described in RFC 4180.
+ Alternative separators can be selected with
+ \pset fieldsep_csv.
+ The output is compatible with the CSV format of the
+ COPY command. The header with column names
+ is output unless the tuples_only parameter is
+ on. Title and footers are not printed.
+ Each row is terminated by the system-dependent end-of-line character,
+ which is typically a single newline (\n) for
+ Unix-like systems or a carriage return and newline sequence
+ (\r\n) for Microsoft Windows.
+ 
+
   wrapped format is like 
aligned but wraps
   wide data values across lines to make the output fit in the target
   column width.  The target width is determined as described under
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 5b4d54a..8d0ad71 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -1941,8 +1941,8 @@ exec_command_pset(PsqlScanState scan_state, bool 
active_branch)
 
int i;
static const char *const my_list[] = {
-   "border", "columns", "expanded", "fieldsep", 
"fieldsep_zero",
-   "footer", "format", "linestyle", "null",
+   "border", "columns", "expanded", "fieldsep", 
"fieldsep_csv",
+   "fieldsep_zero", "footer", "format", 
"linestyle", "null",
"numericlocale", "pager", "pager_min_lines",
"recordsep", "recordsep_zero",
"tableattr", "title", "tuples_only",
@@ -3584,6 +3584,9 @@ _align2string(enum printFormat in)
case PRINT_TROFF_MS:
return "troff-ms";
break;
+   case PRINT_CSV:
+   return "csv";
+   break;
}
return "unknown";
 }
@@ -3639,25 +3642,27 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
{
if (!

Fix various typos around the tree

2018-11-02 Thread Daniel Gustafsson
Attached patch fixes a set of typos found here and there around the tree.  Some
are in the documentation which may be of interest for backporting, but I’ve
included all in the same patch.

cheers ./daniel



typos-various.patch
Description: Binary data


Logical decoding on a standby node - any progress?

2018-11-02 Thread Josef Machytka
Hello!

On stackoverflow I got a hint to ask on this mailing list...

I read this question on stackoverflow: Logical decoding on a standby node

and checked PostgreSQL documentation and PG mail lists but I did not find
any up to date info about possibility to use hot standby replica as master
for logical replication to other databases.

Is there any progress in implementation of this feature?

Many thanks.

Josef Machytka

Berlin


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Daniel Gustafsson
> On 2 Nov 2018, at 04:21, Tom Lane  wrote:

> (In short, I remain unconvinced that we'd not be better off spending
> our effort on making json_out faster...)

Shooting wildly from the hip, isn't this a case where we can potentially
utilize the JIT infrastructure to speed it up?

cheers ./daniel



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-02 Thread Tomas Vondra
On 11/02/2018 08:16 AM, Antonin Houska wrote:
> Tomas Vondra  wrote:
> 
>> OK, so the reason is that when building child paths, we don't keep
>> the pathkeys unless it matches the "interesting" pathkeys.
>>
>> So for example we may have an IndexPath, but with pathkeys=NIL if
>> the index does not match the ORDER BY we need.
> 
> I don't agree that IndexPath will necessarily have pathkeys set to 
> NIL in such a case. Even if the index ordering does not match ORDER
> BY clause of the query, the path can still be useful, typically for
> merge join.
> 

But wouldn't that mean there's a MergeJoin below the Limit? AFAIK we
don't push limit_tuples to that node.

After looking at this a bit more after a healthy dose of coffee, I think
for this issue to happen the Limit would have to be placed immediately
above the MergeAppend node. But if the ordering does not match, there
will be an explicit Sort node in between, and we'll push the limit only
to that node (and not below). Which is probably what's happening in the
incremental sort query, BTW.

I certainly agree correctness must not depend on costing. But I don't
think that's really the case here - what you mentioned is merely one
part of the optimization, but there are other bits that make it work. At
least that's my understanding - if you could construct a counter-example
demonstrating the failure, that'd be a clear proof of course.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fix various typos around the tree

2018-11-02 Thread Magnus Hagander
On Fri, Nov 2, 2018 at 1:37 PM Daniel Gustafsson  wrote:

> Attached patch fixes a set of typos found here and there around the tree.
> Some
> are in the documentation which may be of interest for backporting, but I’ve
> included all in the same patch.
>

Pushed, with back-patching of the docs changes. Thanks!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT

2018-11-02 Thread Etsuro Fujita

(2018/10/29 15:58), Kyotaro HORIGUCHI wrote:

At Tue, 23 Oct 2018 13:21:31 +0100, Tom Lane  wrote 
in<18397.1540297...@sss.pgh.pa.us>

After a bit of thought, the problem here is blindingly obvious:
we generally run the backend with SIGPIPE handing set to SIG_IGN,
and evidently popen() allows the called program to inherit that,
at least on these platforms.

So what we need to do is make sure the called program inherits SIG_DFL
handling for SIGPIPE, and then special-case that result as not being
a failure.  The attached POC patch does that and successfully makes
the file_fdw problem go away for me.


Thanks for working on this!


It's just a POC because there are some things that need more thought
than I've given them:

1. Is it OK to revert SIGPIPE to default processing for *all* programs
launched through OpenPipeStream?  If not, what infrastructure do we
need to add to control that?  In particular, is it sane to revert
SIGPIPE for a pipe program that we will write to not read from?
(I thought probably yes, because that is the normal Unix behavior,
but it could be debated.)


Forcing to ignore SIGPIPE lets the program to handle EPIPE, which
might not be handled poperly since it would be unexpected by the
program.

If we don't read from the pipe (that is, we are writing to it),
anyway we shouldn't change the behavior since SIGPIPE can come
from another pipe.


I'm a bit confused here.  Horiguchi-san, are you saying that the called 
program that we will read from should inherit SIG_DFL for SIGPIPE, as 
proposed in the POC patch, but the called program that we will write to 
should inherit SIG_IGN as it currently does?


ISTM that it would be OK to inherit SIG_DFL in both cases, because I 
think it would be the responsibility of the called program to handle 
SIGPIPE properly (if necessary) in both cases.  Maybe I'm missing 
something, though.



2. Likewise, is it always OK for ClosePipeToProgram to ignore a
SIGPIPE failure?  (For ordinary COPY, maybe it shouldn't, since
we don't intend to terminate that early.)


Since the SIGPIPE may come from another pipe, I think we
shouldn't generally.


Agreed; if ClosePipeToProgram ignores that failure, we would fail to get 
a better error message in CopySendEndOfRow if the called program 
(inheriting SIG_DFL for SIGPIPE) was terminated on SIGPIPE:


if (cstate->is_program)
{
if (errno == EPIPE)
{
/*
 * The pipe will be closed automatically on 
error at
 * the end of transaction, but we might get a 
better
 * error message from the subprocess' exit code 
than

 * just "Broken Pipe"
 */
ClosePipeToProgram(cstate);

/*
 * If ClosePipeToProgram() didn't throw an 
error, the

 * program terminated normally, but closed the pipe
 * first. Restore errno, and throw an error.
 */
errno = EPIPE;
}
ereport(ERROR,
(errcode_for_file_access(),
 errmsg("could not write to COPY program: 
%m")));

}


But iff we are explicitly stop reading from
the pipe before detecting an error, it can be ignored since we
aren't interested in further failure.


You mean that we should ignore other failures of the called program when 
we stop reading from the pipe early?



Thus ClosePipeToProgram
might need an extra parameter or CopyState may need an additional
flag named, say, "explicit_close" (or
"ignore_sigpipe_on_pclose"..) in CopyState which will be set in
EndCopy.


3. Maybe this should be implemented at some higher level?


I'm not sure, but it seems to me not needed.


ISTM that it's a good idea to control the ClosePipeToProgram behavior by 
a new member for CopyState.



4. Are there any other signals we ought to be reverting to default
behavior before launching a COPY TO/FROM PROGRAM?


All handlers other than SIG_DEF and IGN are reset on exec().
Only SIGUSR2 seems to be set to SIG_IGN, but it's not likely to
harm anything. Perhaps it would be safer against future changes
if we explicitly reset all changed actions to SIG_DEF, but it
might be too-much..


Not sure, but reverting SIGPIPE to default would be enough as a fix for 
the original issue, if we go with the POC patch.


Best regards,
Etsuro Fujita



Re: pread() and pwrite()

2018-11-02 Thread Jesper Pedersen

Hi Thomas,

On 10/9/18 4:56 PM, Thomas Munro wrote:

Thanks, much nicer.  Rebased.



This still applies, and passes make check-world.

I wonder what the commit policy is on this, if the Windows part isn't 
included. I read Heikki's comment [1] as it would be ok to commit 
benefiting all platforms that has pread/pwrite.


The functions in [2] could be a follow-up patch as well.

[1] 
https://www.postgresql.org/message-id/6cc7c8dd-29f9-7d75-d18a-99f19c076d10%40iki.fi
[2] 
https://www.postgresql.org/message-id/c2f56d0a-cadd-3df1-ae48-b84dc8128c37%40redhat.com


Best regards,
 Jesper




[PATCH] Improvements to "Getting started" tutorial for Google Code-in task

2018-11-02 Thread LAM JUN RONG
Hi,



I’m a student taking part in Google Code-in 2018. The 
task I am currently working on, 
https://codein.withgoogle.com/dashboard/task-instances/6406170207059968/, 
requires that I review and improve the “Getting Started” tutorial in the 
PostgreSQL docs, and submit a patch to this mailing list.



After reviewing the documentation, I found some parts to be slightly unclear. 
For example, in section 1.3 on creating databases, I found “no response” a bit 
unclear or ambiguous, so I replaced it with “exit without any error messages”.



After some experimentation, I found that a part of the documentation was 
incorrect. In Section 1.3, it was stated that “Database names must have an 
alphabetic first character”. However, I was able to create databases with names 
like “1234”, “$” or even “😀😀”. Hence, I decided to remove that sentence.



A diff of my changes is attached.



Thank you and I would appreciate any feedback that would make my first Postgres 
patch better!

Jun Rong



diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 4487d0cfd1..2dc2b82d9a 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -75,7 +75,7 @@ su - postgres
   make programs or older GNU 
make versions will not work.
   (GNU make is sometimes 
installed under
   the name gmake.)  To test for GNU
-  make enter:
+  make and check its version, enter:
 
 make --version
 
@@ -385,8 +385,8 @@ su - postgres
 This script will run a number of tests to determine values for various
 system dependent variables and detect any quirks of your
 operating system, and finally will create several files in the
-build tree to record what it found.  You can also run
-configure in a directory outside the source
+build tree to record what it found.  If it does not print any error 
messages, configuration was successful.
+You can also run configure in a directory outside the 
source
 tree, if you want to keep the build directory separate.  This
 procedure is also called a
 VPATHVPATH
@@ -1610,6 +1610,15 @@ su - postgres
 
 All of PostgreSQL successfully made. Ready to install.
 
+If you see an error message like:
+
+ERROR: `flex' is missing on your system. It is needed to create the
+file `bootscanner.c'. You can either get flex from a GNU mirror site
+or download an official distribution of PostgreSQL, which contains
+pre-packaged flex output.
+
+then one or more of the packages PostgreSQL 
requires is not installed.
+See  for the required packages.

 
   
diff --git a/doc/src/sgml/start.sgml b/doc/src/sgml/start.sgml
index 5b73557835..4e3c86 100644
--- a/doc/src/sgml/start.sgml
+++ b/doc/src/sgml/start.sgml
@@ -19,8 +19,8 @@
 

 If you are not sure whether PostgreSQL
-is already available or whether you can use it for your
-experimentation then you can install it yourself.  Doing so is not
+is already available for your experimentation,
+you can install it yourself.  Doing so is not
 hard and it can be a good exercise.
 PostgreSQL can be installed by any
 unprivileged user; no superuser (root)
@@ -83,7 +83,7 @@
 
  
   
-   The user's client (frontend) application that wants to perform
+   The user's client (frontend), an application that wants to perform
database operations.  Client applications can be very diverse
in nature:  a client could be a text-oriented tool, a graphical
application, a web server that accesses the database to
@@ -99,7 +99,7 @@
 

 As is typical of client/server applications, the client and the
-server can be on different hosts.  In that case they communicate
+server can be on different machines or networks.  In that case they 
communicate
 over a TCP/IP network connection.  You should keep this in mind,
 because the files that can be accessed on a client machine might
 not be accessible (or might only be accessible using a different
@@ -153,7 +153,7 @@
 
 $ createdb mydb
 
-If this produces no response then this step was successful and you can 
skip over the
+If this exits without any error message then this step was successful and 
you can skip over the
 remainder of this section.

 
@@ -168,7 +168,7 @@ createdb: command not found
 
 $ /usr/local/pgsql/bin/createdb mydb
 
-The path at your site might be different.  Contact your site
+The path at your site's server might be different.  Contact your site
 administrator or check the installation instructions to
 correct the situation.

@@ -240,12 +240,11 @@ createdb: database creation failed: ERROR:  permission 
denied to create database

 You can also create databases with other names.
 PostgreSQL allows you to create any
-number of databases at a given site.  Database names must have an
-alphabetic first character and are limited to 63 bytes in
-leng

Re: zheap: a new storage format for PostgreSQL

2018-11-02 Thread Tomas Vondra



On 11/02/2018 12:12 PM, Amit Kapila wrote:
> On Thu, Nov 1, 2018 at 7:26 PM Tomas Vondra
>  wrote:
>>
>> On 11/01/2018 07:43 AM, Amit Kapila wrote:
>>>
>>> You can find the latest code at https://github.com/EnterpriseDB/zheap
>>>
>>
>> Seems valgrind complains about a couple of places in the code - nothing
>> major, might be noise, but probably worth a look.
>>
> 
> I have looked at the report and one of those seems to be problematic,
> so I have pushed the fix for the same.  The other one for below stack
> seems to be bogus:
> ==7569==  Uninitialised value was created by a stack allocation
> ==7569==at 0x59043D: znocachegetattr (zheapam.c:6206)
> ==7569==
> {
>
>Memcheck:Cond
>fun:ZHeapDetermineModifiedColumns
>fun:zheap_update
> 
> I have checked in the function znocachegetattr that if we initialize
> the value of ret_datum, it fixes the reported error, but actually
> there is no need for doing it as the code always assign the valid
> value to this variable.  I have left it as is for now as I am not sure
> whether there is any value in doing such an initialization.
> 

Well, the problem is the ret_datum is modified like this:


thisatt = TupleDescAttr(tupleDesc, attnum);
if (thisatt->attbyval)
memcpy(&ret_datum, tp + off, thisatt->attlen);
else
ret_datum = PointerGetDatum((char *) (tp + off));

which means that for cases with attlen < sizeof(Datum), this ends up
leaving part of the value undefined. So it's a valid issue. I'm sure
it's not the only place where we do something like this, and the other
places don't trigger the valgrind warning, so how do those places do
this? heapam seems to call fetch_att in the end, which essentially calls
Int32GetDatum/Int16GetDatum/CharGetDatum, so why not to use the same
trick here?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: COPY FROM WHEN condition

2018-11-02 Thread Tomas Vondra



On 11/02/2018 03:57 AM, Corey Huinker wrote:
> > Are you thinking something like having a COPY command that provides
> > results in such a way that they could be referenced in a FROM clause
> > (perhaps a COPY that defines a cursor…)?
> 
> That would also be nice, but what I was thinking of was that some
> highly restricted subset of cases of SQL in general could lend
> themselves to levels of optimization that would be impractical in
> other contexts.
> 
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we
> don't have to invent new syntax, or endure confusion whenCOPY-WHEN
> syntax behaves subtly different from a similar FROM-WHERE.
> 
> Also, what would we be saving computationally? The whole file (or
> program output) has to be consumed no matter what, the columns have to
> be parsed no matter what. At least some of the columns have to be
> converted to their assigned datatypes enough to know whether or not to
> filter the row, but we might be able push that logic inside a copy. I'm
> thinking of something like this:
> 
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer,
> b numeric, c text, d date, e json) )
> WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the /opportunity/ to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the
> data type)
> - if column d is converted first, we can filter on it and avoid
> converting columns a,b
> - whatever optimizations we can infer from knowing that the two
> surviving columns will go directly into an aggregate
> 
> If we go this route, we can train the planner to notice other
> optimizations and add those mechanisms at that time, and then existing
> code gets faster.
> 
> If we go the COPY-WHEN route, then we have to make up new syntax for
> every possible future optimization.

IMHO those two things address vastly different use-cases. The COPY WHEN
case deals with filtering data while importing them into a database,
while what you're describing seems to be more about querying data stored
in a CSV file. But we already do have a solution for that - FDW, and I'd
say it's working pretty well. And AFAIK it does give you tools to
implement most of what you're asking for. I don't see why should we bolt
this on top of COPY, or how is it an alternative to COPY WHEN.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Tomas Vondra
On 11/02/2018 01:42 PM, Daniel Gustafsson wrote:
>> On 2 Nov 2018, at 04:21, Tom Lane  wrote:
> 
>> (In short, I remain unconvinced that we'd not be better off spending
>> our effort on making json_out faster...)
> 
> Shooting wildly from the hip, isn't this a case where we can
> potentially utilize the JIT infrastructure to speed it up?
> 

I don't see how could it help here. The crucial idea of JIT is that if
you postpone the compilation, you may provide additional information
that may allow eliminating parts of the code.

For example, we don't know anything about table structure in general, so
the tuple deforming code had to assume any attribute may be NULL etc.
making the code rather complex with many branches, loops etc. But at
runtime we have the tuple descriptor, so with JIT we may eliminate some
of the branches.

But we don't have anything like that for jsonb - every value may be an
arbitrary json document, etc. And each value (even if stored in the same
column) may be entirely different.

Maybe it's just lack of imagination on my side, but I don't see how
could this benefit from JIT :-(

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Robert Haas
On Fri, Nov 2, 2018 at 7:23 AM Amit Kapila  wrote:
> > That doesn't seem like an unreasonable argument.  I'm not sure whether
> > the right threshold is 4 or something a little bigger, but I bet it's
> > not very large.  It seems important to me that before anybody thinks
> > about committing this, we construct some kind of destruction case
> > where repeated scans of the whole table are triggered as frequently as
> > possible, and then run that test with varying thresholds.
>
> Why do you think repeated scans will be a destruction case when there
> is no FSM for a small table?

That's not what I'm saying.  If we don't have the FSM, we have to
check every page of the table.  If there's a workload where that
happens a lot on a table that is just under the size threshold for
creating the FSM, then it's likely to be a worst case for this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Tom Lane
Robert Haas  writes:
> That's not what I'm saying.  If we don't have the FSM, we have to
> check every page of the table.  If there's a workload where that
> happens a lot on a table that is just under the size threshold for
> creating the FSM, then it's likely to be a worst case for this patch.

Hmm, you're assuming something not in evidence: why would that be the
algorithm?  On a FSM-less table, I'd be inclined to just check the
last page and then grow the table if the tuple doesn't fit there.
This would, in many cases, soon result in a FSM being created, but
I think that's just fine.  The point of the change is to optimize
for cases where a table *never* gets more than a few inserts.  Not, IMO,
for cases where a table gets a lot of churn but never has a whole lot of
live tuples.  In the latter scenario we are far better off having a FSM.

regards, tom lane



Re: COPY FROM WHEN condition

2018-11-02 Thread David Fetter
On Thu, Nov 01, 2018 at 10:57:25PM -0400, Corey Huinker wrote:
> >
> > > Are you thinking something like having a COPY command that provides
> > > results in such a way that they could be referenced in a FROM clause
> > > (perhaps a COPY that defines a cursor…)?
> >
> > That would also be nice, but what I was thinking of was that some
> > highly restricted subset of cases of SQL in general could lend
> > themselves to levels of optimization that would be impractical in
> > other contexts.
> 
> If COPY (or a syntactical equivalent) can return a result set, then the
> whole of SQL is available to filter and aggregate the results and we don't
> have to invent new syntax, or endure confusion whenCOPY-WHEN syntax behaves
> subtly different from a similar FROM-WHERE.

That's an excellent point.

> Also, what would we be saving computationally? The whole file (or program
> output) has to be consumed no matter what, the columns have to be parsed no
> matter what. At least some of the columns have to be converted to their
> assigned datatypes enough to know whether or not to filter the row, but we
> might be able push that logic inside a copy. I'm thinking of something like
> this:
> 
> SELECT x.a, sum(x.b)
> FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b 
> numeric, c text, d date, e json) )

Apologies for bike-shedding, but wouldn't the following be a better
fit with the current COPY?

COPY t(a integer, b numeric, c text, d date, e json) FROM 
'/path/to/foo.txt' WITH (FORMAT CSV, INLINE)

> WHERE x.d >= '2018-11-01'
> 
> 
> In this case, there is the *opportunity* to see the following optimizations:
> - columns c and e are never referenced, and need never be turned into a
> datum (though we might do so just to confirm that they conform to the data
> type)

That sounds like something that could go inside the WITH extension
I'm proposing above.

[STRICT_TYPE boolean DEFAULT true]?

This might not be something that has to be in version 1.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: Hash Joins vs. Bloom Filters / take 2

2018-11-02 Thread Robert Haas
On Thu, Nov 1, 2018 at 5:07 PM Thomas Munro
 wrote:
> Would you compute the hash for the outer tuples in the scan, and then
> again in the Hash Join when probing, or would you want to (somehow)
> attach the hash to emitted tuples for later reuse by the higher node?

I'm interested in what Jim has to say, but to me it seems like we
should try to find a way to add a tlist entry for the hash value to
avoid recomputing it.  That's likely to require some tricky planner
surgery, but it's probably doable.

What really seems finicky to me about this whole project is the
costing.  In the best case it's a a huge win; in the worst case it's a
significant loss; and whether it's a gain or a loss is not easy to
figure out from the information that we have available.  We generally
do not have an accurate count of the number of distinct values we're
likely to see (which is important).

Worse, when you start to consider pushdown, you realize that the cost
of the scan depends on the bloom filter we push down to it.  So
consider something like A IJ B IJ C.  It seems like it could be the
case that once we decide to do the A-B join as a hash join with a
bloom filter, it makes sense to also do the join to C as a hash join
and push down the bloom filter, because we'll be able to combine the
two filters and the extra probes will be basically free.  But if we
weren't already doing the A-B join with a bloom filter, then maybe the
filter wouldn't make sense for C either.

Maybe I'm worrying over nothing here, or the wrong things, but costing
this well enough to avoid regressions really looks hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Vacuum Full does not release the disk size space after delete from table

2018-11-02 Thread Tom Lane
Haozhou Wang  writes:
> We meet a corner case that related to the behavior of Vacuum Full.
> ...
> If we run both sql scripts on same database in parallel, the "VACUUM FULL
> a;" will not release the disk space.

I think what's happening is that the delete in script 1 happens after the
"pg_sleep" in script 2 starts.  Then the pg_sleep has an open snapshot
that could potentially see the deleted rows, so they can't be removed yet.

You could check this theory by changing the vacuum to use VERBOSE, and
seeing what it says about rows that can't be removed yet.

regards, tom lane



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Robert Haas
On Fri, Nov 2, 2018 at 10:07 AM Tom Lane  wrote:
> Robert Haas  writes:
> > That's not what I'm saying.  If we don't have the FSM, we have to
> > check every page of the table.  If there's a workload where that
> > happens a lot on a table that is just under the size threshold for
> > creating the FSM, then it's likely to be a worst case for this patch.
>
> Hmm, you're assuming something not in evidence: why would that be the
> algorithm?

I think it's in evidence, in the form of several messages mentioning a
flag called try_every_block.

Just checking the last page of the table doesn't sound like a good
idea to me.  I think that will just lead to a lot of stupid bloat.  It
seems likely that checking every page of the table is fine for npages
<= 3, and that would still be win in a very significant number of
cases, since lots of instances have many empty or tiny tables.  I was
merely reacting to the suggestion that the approach should be used for
npages <= 32; that threshold sounds way too high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PG vs macOS Mojave

2018-11-02 Thread Tom Lane
I wrote:
> Here's a lightly-tested patch for that approach.

Anybody have an opinion about which approach to use?  We need to choose
one, and PDQ too, if we want full buildfarm coverage on it before Monday's
wrap.

The main argument in favor of #1 (restore use of -isysroot) is fear that
Apple's going to force us into that sometime soon anyhow, so we might as
well just bite the bullet instead of inserting weird workarounds to avoid
it.  But perhaps that isn't going to happen.  If it doesn't, then #2
(hack PLPerl and PLTcl include switches only) should result in slightly
less pain for people in Jakob's situation.  But either patch does offer
a workaround for that case.

regards, tom lane



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-11-02 Thread Nathan Bossart
One argument for instead checking WAL file existence before calling
archive_command might be to avoid the increased startup time.
Granted, any added delay from this patch is unlikely to be noticeable
unless your archiver is way behind and archive_status has a huge
number of files.  However, I have seen cases where startup is stuck on
other tasks like SyncDataDirectory() and RemovePgTempFiles() for a
very long time, so perhaps it is worth considering.

Nathan

Re: replication_slots usability issue

2018-11-02 Thread Petr Jelinek
On 01/11/2018 18:54, Andres Freund wrote:>
>> Also, from 691d79a which you just committed:
>> +   ereport(FATAL,
>> +   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> +errmsg("logical replication slots \"%s\" exists, but 
>> wal_level < logical",
>> +   NameStr(cp.slotdata.name)),
>> I can see one grammar mistake here, as you refer to only one slot here.
>> The error messages should read:
>> "logical replication slot \"%s\" exists, but wal_level < logical"
>> and:
>> "physical replication slot \"%s\" exists, but wal_level < replica"
> 
> Darnit. Fixed. Thanks.
> 

Since we are fixing this message, shouldn't the hint for logical slot
say "Change wal_level to be logical or higher" rather than "replica or
higher" :)

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-02, Michael Paquier wrote:

> On Thu, Nov 01, 2018 at 09:31:38PM -0300, Alvaro Herrera wrote:

> > 1. When a CREATE INDEX specifies a tablespace, existing partitions get
> > the index in the correct tablespace; however, the parent index itself
> > does not record the tablespace.  So when new partitions are created
> > later, they get the index in the default tablespace instead of the
> > specified tablespace.  Fix by saving the tablespace in the pg_class row
> > for the parent index.
> 
> I may be missing something of course...  But partitioned tables don't
> register the tablespace they are on either so as it cannot be used by
> any partitions created on it:

This is not relevant to my case, IMO.  Partitioned tables are explicitly
created each time, with their own parameters; if you want to specify the
tablespace in which it is created, you can do so at that point.  This is
not the case with partitioned indexes, because they are created
automatically at CREATE TABLE PARTITION OF time, without an option to
specify where each index goes.

> It seems to me that the current behavior is wanted in this case, because
> partitioned tables and partitioned indexes have no physical storage.

Well, I designed the partitioned indexes feature and I can say for
certain that this behavior was not explicitly designed in, but was just
a oversight.

> > 2. ALTER TABLE SET TABLESPACE, applied to the partitioned index, would
> > raise an error indicating that it's not the correct relation kind.
> 
> Using the previous example, this does not raise an error:
> alter table aa set tablespace popo;
> However the reference to reltablespace in pg_class is not changed.  So I
> would agree with your point to not raise an error and back-patch that,
> but I don't agree with the point of changing reltablespace for a
> partitioned index if that's what you mean.

Same argument here.  The pg_class record for the partitioned index
serves to guide the storage of indexes on future partitions, so it is
valuable to have it.  Not recording the tablespace (and not allowing it
to be changed afterwards) is a usability fail.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PG vs macOS Mojave

2018-11-02 Thread Daniel Gustafsson


> On 2 Nov 2018, at 15:42, Tom Lane  wrote:
> 
> I wrote:
>> Here's a lightly-tested patch for that approach.
> 
> Anybody have an opinion about which approach to use?  We need to choose
> one, and PDQ too, if we want full buildfarm coverage on it before Monday's
> wrap.

Sorry for being slow to respond, I was hoping to find time for testing but it’s
a scarce resource right now.

> The main argument in favor of #1 (restore use of -isysroot) is fear that
> Apple's going to force us into that sometime soon anyhow, so we might as
> well just bite the bullet instead of inserting weird workarounds to avoid
> it.  But perhaps that isn't going to happen.

#1 is the option that appeals to me the most, mostly because it removes all
possible ambiguity of when/if it’s required compared to #2.

+ Most Postgres developers just turn off SIP, though.

Minor nitpick, shouldn’t this be Postgres?

cheers ./daniel


Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-02, Amit Langote wrote:

> On 2018/11/02 10:27, Michael Paquier wrote:

> > It seems to me that the current behavior is wanted in this case, because
> > partitioned tables and partitioned indexes have no physical storage.
> 
> Keith Fiske complained about this behavior for partitioned *tables* a few
> months ago, which led to the following discussion:
> 
> https://www.postgresql.org/message-id/flat/CAKJS1f9PXYcT%2Bj%3DoyL-Lquz%3DScNwpRtmD7u9svLASUygBdbN8w%40mail.gmail.com
> 
> It's Michael's message that was the last one on that thread. :)
> 
> https://www.postgresql.org/message-id/20180413224007.GB27295%40paquier.xyz

I agree with Fiske, FWIW.  I think the current behavior results because
people (including me) overlooked things, not because it was designed
explicitly that way.

> By the way, if we decide to do something about this, I think we do the
> same for partitioned tables.

I'm up for changing the behavior of partitioned tables in pg12 (please
send a patch), but I'm up for changing the behavior of partitioned
tables in pg11.

> There are more than one interesting
> behaviors possible that are mentioned in the above thread for when
> parent's reltablespace is set/changed.

I'm *NOT* proposing to move existing partitions to another tablespace,
in any case.

> IOW, I agree with Michael that if something will be back-patched to 11, it
> should be a small patch to make the unsupported relkind error go away.

I don't.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PG vs macOS Mojave

2018-11-02 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 2 Nov 2018, at 15:42, Tom Lane  wrote:
>> Anybody have an opinion about which approach to use?  We need to choose
>> one, and PDQ too, if we want full buildfarm coverage on it before Monday's
>> wrap.

> #1 is the option that appeals to me the most, mostly because it removes all
> possible ambiguity of when/if it’s required compared to #2.

I'm sort of leaning in that direction too, after sleeping on it.
It's also closer to what we already shipped in 11.0 than #2 is.

regards, tom lane



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure
On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
>
> Hi,
>
> On 2018-10-31 11:13:13 -0400, Andrew Dunstan wrote:
> > I agree that just sending a blob of the internal format isn't a great idea.
>
> It's entirely unacceptable afaict. Besides the whole "exposing
> internals" issue, it's also at least not endianess safe, depends on the
> local alignment requirements (which differ both between platforms and
> 32/64 bit), numeric's internal encoding and probably more.

Binary format consuming applications already have to deal with these
kinds of issues. We already expose internal structures in the other
functions -- not sure why jsonb is held to a different standard.  For
other data types where format changes were made, the standard of
'caveat version' was in place to protect the user.  For jsonb we
decided to implement a version flag within the type itself, which I
thought mistake at the time -- better to have a version header in the
COPY BINARY if needed.

People using binary format are basically interested one one thing,
performance.  It's really the fastest way to get data in an out of the
database.  For my part, I'd like to see some observed demonstrable
benefit in terms of performance to justify making the change.

merlin



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-02 Thread Antonin Houska
Tomas Vondra  wrote:

> On 11/02/2018 08:16 AM, Antonin Houska wrote:
> > Tomas Vondra  wrote:
> > 
> >> OK, so the reason is that when building child paths, we don't keep
> >> the pathkeys unless it matches the "interesting" pathkeys.
> >>
> >> So for example we may have an IndexPath, but with pathkeys=NIL if
> >> the index does not match the ORDER BY we need.
> > 
> > I don't agree that IndexPath will necessarily have pathkeys set to 
> > NIL in such a case. Even if the index ordering does not match ORDER
> > BY clause of the query, the path can still be useful, typically for
> > merge join.
> > 
> 
> But wouldn't that mean there's a MergeJoin below the Limit?

I just wanted to say that pathkeys of MergeAppendPath do depend on those of
child paths, however the MergeAppendPath should expect any pathkeys from the
children. I mentioned MergeJoin just as an example reason for IndexPath to
have pathkeys unrelated to the ORDER BY clause.

> AFAIK we don't push limit_tuples to that node.

I guess you mean that in a plan like this

Limit -> MergeJoin -> MergeAppend

the Limit node would not get propagated to MergeAppend and thus it would not
prevent it from providing sufficient input for the additional sort?

Anyway, the optimization we discuss here is only applied if the MergeAppend
path is at the top of the join tree, so there cannot be any join above it.

> After looking at this a bit more after a healthy dose of coffee, I think
> for this issue to happen the Limit would have to be placed immediately
> above the MergeAppend node. But if the ordering does not match, there
> will be an explicit Sort node in between, and we'll push the limit only
> to that node (and not below). Which is probably what's happening in the
> incremental sort query, BTW.

Yes, the Limit should be applied "from above", when we know that it's
appropriate. (Currently it is applied by create_merge_append_path(), and at
the moment we don't know what will hapen to the output.) But the problem of
MergeAppend is that the limit can affect cost of the child nodes and the
MergeAppend node itself.

That's not clearly bad because if the MergeAppendPath won without the limit,
the limit should not make the cost worse. However I think the only correct
strategy is to apply the limit to all paths of given relation before
set_cheapest() is called, because another path can benefit from the limit even
more than the one to which we're trying to apply the limit. However at that
early stage we don't have the complete plan tree so we don't know if the limit
would ever get propagated down, and what its value would be ...

> I certainly agree correctness must not depend on costing. But I don't
> think that's really the case here - what you mentioned is merely one
> part of the optimization, but there are other bits that make it work.

As I mentioned upthread, I think the problem is currently hidden because the
limit_tuples field of MergeAppendPath is only used for cost estimates, but
executor is not aware of it.

> At least that's my understanding - if you could construct a counter-example
> demonstrating the failure, that'd be a clear proof of course.

I spent some time on it today, but w/o success. However the incremental sort
example I posted upthread should be sufficient reason to do something. I just
don't know what.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



Re: PG vs macOS Mojave

2018-11-02 Thread Jakob Egger

> Am 1. Nov. 2018 um 04:17 schrieb Tom Lane :
> 
> You can verify the thing's built-in settings with something like
> 
> $ echo >test.c
> $ clang -v -E test.c
> 
> and on Xcode 10.0 I get
> 
> "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
>  -cc1 -triple x86_64-apple-macosx10.14.0 -Wdeprecated-objc-isa-usage 
> -Werror=deprecated-objc-isa-usage -E -disable-free -disable-llvm-verifier 
> -discard-value-names -main-file-name test.c -mrelocation-model pic -pic-level 
> 2 -mthread-model posix -mdisable-fp-elim -fno-strict-return -masm-verbose 
> -munwind-tables -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb 
> -target-linker-version 409.12 -v -resource-dir 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0
>  -isysroot 
> /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk
>  -I/usr/local/include -fdebug-compilation-dir /Users/tgl -ferror-limit 19 
> -fmessage-length 80 -stack-protector 1 -fblocks 
> -fencode-extended-block-signature -fobjc-runtime=macosx-10.14.0 
> -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -o - -x c 
> test.c
> 
> while with Xcode 10.1's compiler:
> 
> "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang"
>  -cc1 -triple x86_64-apple-macosx10.14.0 -Wdeprecated-objc-isa-usage 
> -Werror=deprecated-objc-isa-usage -E -disable-free -disable-llvm-verifier 
> -discard-value-names -main-file-name test.c -mrelocation-model pic -pic-level 
> 2 -mthread-model posix -mdisable-fp-elim -fno-strict-return -masm-verbose 
> -munwind-tables -target-cpu penryn -dwarf-column-info -debugger-tuning=lldb 
> -target-linker-version 409.12 -v -resource-dir 
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/10.0.0
>  -fdebug-compilation-dir /Users/tgl -ferror-limit 19 -fmessage-length 80 
> -stack-protector 1 -fblocks -fencode-extended-block-signature 
> -fobjc-runtime=macosx-10.14.0 -fmax-type-align=16 -fdiagnostics-show-option 
> -fcolor-diagnostics -o - -x c test.c
> 
> No default sysroot at all.  So the immediate question is whether Apple
> broke this in Xcode 10.1, or whether it was broken in Xcode 10.0 and
> now they've "fixed" it.

I've tried to replicate your experiment, but I initially got the "-isysroot" 
flag for both Xcode 10 and Xcode 10.1.

Then I upgraded the "Comand Line Tools for Xcode 10.1" using Software Update, 
and suddenly the "-isysroot" flag disappeared (for both Xcode 10 and Xcode 
10.1).

In any case, it looks like we can't rely on "-isysroot" being present.


> Am 2. Nov. 2018 um 16:09 schrieb Tom Lane :
> 
> Daniel Gustafsson mailto:dan...@yesql.se>> writes:
>>> On 2 Nov 2018, at 15:42, Tom Lane >> > wrote:
>>> Anybody have an opinion about which approach to use?  We need to choose
>>> one, and PDQ too, if we want full buildfarm coverage on it before Monday's
>>> wrap.
> 
>> #1 is the option that appeals to me the most, mostly because it removes all
>> possible ambiguity of when/if it’s required compared to #2.
> 
> I'm sort of leaning in that direction too, after sleeping on it.
> It's also closer to what we already shipped in 11.0 than #2 is.
> 
>   regards, tom lane


I think I would prefer #2.

I think it would work better for my use case (end users download a binary of 
the server but may build their own extensions).


Regarding #1:
Since -isysroot is only necessary for building pl/tcl and pl/perl, why do you 
add it to CPPFLAGS in the global makefile?
Wouldn't it make more sense to just add it in the TCL and Perl makefiles?


Best regards,
Jakob

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Tom Lane
Merlin Moncure  writes:
> On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
>> It's entirely unacceptable afaict. Besides the whole "exposing
>> internals" issue, it's also at least not endianess safe, depends on the
>> local alignment requirements (which differ both between platforms and
>> 32/64 bit), numeric's internal encoding and probably more.

> Binary format consuming applications already have to deal with these
> kinds of issues. We already expose internal structures in the other
> functions -- not sure why jsonb is held to a different standard.

I don't think it's being held to a different standard at all.  Even for
data as simple as integers/floats, we convert to uniform endianness on the
wire.  Moreover, we do not expose the exact bits for anything more complex
than machine floats.  Numeric, for instance, gets disassembled into fields
rather than showing the exact header format (let alone several different
header formats, as actually found on disk).

Andres' point about alignment is a pretty good one as well, if it applies
here --- I don't recall just what internal alignment requirements jsonb
has.  We have not historically expected clients to have to deal with that.

I agree that in any particular use-case one could argue for a different
set of trade-offs in choosing the exposed binary format.  But this is the
set that the project has chosen, and one would need to make a FAR stronger
case than has been made here to let jsonb be an exception.

regards, tom lane



Re: partitioned indexes and tablespaces

2018-11-02 Thread Robert Haas
On Fri, Nov 2, 2018 at 11:02 AM Alvaro Herrera  wrote:
> > By the way, if we decide to do something about this, I think we do the
> > same for partitioned tables.
>
> I'm up for changing the behavior of partitioned tables in pg12 (please
> send a patch), but I'm up for changing the behavior of partitioned
> tables in pg11.

Uh, what?

I strongly object to inserting behavior changes into released branches
on the grounds that the behavior wasn't considered carefully enough
before feature freeze.  If we accept that as a justification, then
anybody can claim that any behavior change should be back-patched at
any time as long as they were the author of the original patch.  But
that's not a recipe for a stable product.  There's got to be a point
at which we say, yeah, you know, this is maybe not the perfect set of
design decisions, but we're going to support the decisions we made for
the next 5 years.  And maybe we'll make better ones in the next
release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-02, Robert Haas wrote:

> I strongly object to inserting behavior changes into released branches
> on the grounds that the behavior wasn't considered carefully enough
> before feature freeze.

I'm not proposing to change any stable behavior.  The thing I'm
proposing to change clearly cannot be used by anyone, because it just
throws errors.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure
On Fri, Nov 2, 2018 at 10:53 AM Tom Lane  wrote:
> Merlin Moncure  writes:
> > On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
> >> It's entirely unacceptable afaict. Besides the whole "exposing
> >> internals" issue, it's also at least not endianess safe, depends on the
> >> local alignment requirements (which differ both between platforms and
> >> 32/64 bit), numeric's internal encoding and probably more.
>
> > Binary format consuming applications already have to deal with these
> > kinds of issues. We already expose internal structures in the other
> > functions -- not sure why jsonb is held to a different standard.
>
> I don't think it's being held to a different standard at all.  Even for
> data as simple as integers/floats, we convert to uniform endianness on the
> wire.  Moreover, we do not expose the exact bits for anything more complex
> than machine floats.  Numeric, for instance, gets disassembled into fields
> rather than showing the exact header format (let alone several different
> header formats, as actually found on disk).
>
> Andres' point about alignment is a pretty good one as well, if it applies
> here --- I don't recall just what internal alignment requirements jsonb
> has.  We have not historically expected clients to have to deal with that.

I see your (and Andres') point; the binary wire format ought to lay on
top of the basic contracts established by other types.  It can be
binary; just not a straight memcpy out of the server.  The array and
composite type serializers should give some inspiration there on
serialization.   I'll still stand other point I made though; I'd
really want to see some benchmarks demonstrating benefit over
competing approaches that work over the current formats.  That should
frame the argument as to whether this is a good idea.

merlin



Re: PG vs macOS Mojave

2018-11-02 Thread Tom Lane
Jakob Egger  writes:
> Then I upgraded the "Comand Line Tools for Xcode 10.1" using Software Update, 
> and suddenly the "-isysroot" flag disappeared (for both Xcode 10 and Xcode 
> 10.1).

Interesting.  I'd not tested the intermediate state without the CLT
update.

> In any case, it looks like we can't rely on "-isysroot" being present.

Yeah, that's becoming clear, even if it's not clear exactly when it's
pre-supplied or not.

> Regarding #1:
> Since -isysroot is only necessary for building pl/tcl and pl/perl, why do you 
> add it to CPPFLAGS in the global makefile?
> Wouldn't it make more sense to just add it in the TCL and Perl makefiles?

Well, that's what #2 is, but the point is that #2 is a bit of a kluge that
might not work for much longer anyway.  If Apple has moved Perl's include
files to only appear under the sysroot, what's the reason to think they
will stop there?  It's pretty clear from documentation already cited in
this thread that they think specifying a sysroot path is good practice;
so I fear it's likely that they will stop providing much of anything under
/usr/include, except maybe the bare minimum required by POSIX.  An example
that's particularly likely to bite us soon is the headers for Secure
Transport, which isn't POSIX.  While we can get away with having separate
include switches for Perl and Tcl, I don't think we're going to have an
appetite for trying to apply a separate include switch for ST to just
our SSL-related .c files.

The bigger issue though is that on a machine with multiple sysroots
installed, approach #2 creates a very severe risk that portions of
Postgres will be built against a different sysroot than other portions.
That will inevitably lead to ABI-compatibility issues.  Debugging just
one of those would waste far more time than anyone could ever save by
being able to skip specifying the sysroot.

> I think I would prefer #2.
> I think it would work better for my use case (end users download a binary of 
> the server but may build their own extensions).

Well, it's clear that anybody who's doing that and using a different
sysroot version for their extension is taking their chances with ABI
compatibility.  I was willing to avoid breaking that case unnecessarily,
but what we're now seeing is that doing so means exposing everybody else
to those same ABI hazards.  I think it's probably better to have the
explicit sysroot setting, provide a way to override it at build time
(as patch #1 does), and just make it clear that if you override it
you are taking chances.

regards, tom lane



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Stephen Frost
Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Fri, Nov 2, 2018 at 10:53 AM Tom Lane  wrote:
> > Andres' point about alignment is a pretty good one as well, if it applies
> > here --- I don't recall just what internal alignment requirements jsonb
> > has.  We have not historically expected clients to have to deal with that.
> 
> I see your (and Andres') point; the binary wire format ought to lay on
> top of the basic contracts established by other types.  It can be
> binary; just not a straight memcpy out of the server.  The array and
> composite type serializers should give some inspiration there on
> serialization.

Right- I agree w/ Tom and Andres on this part also.

> I'll still stand other point I made though; I'd
> really want to see some benchmarks demonstrating benefit over
> competing approaches that work over the current formats.  That should
> frame the argument as to whether this is a good idea.

What are the 'competing approaches' you're alluding to here?  Sending
text-format json across as we do today?  Is there something else you're
thinking would be appropriate in this kind of a performance bake-off...?
I'm having a hard time seeing what else would actually have the
flexibility that JSON does without being clearly worse (xml?).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-02, Robert Haas wrote:

> On Fri, Nov 2, 2018 at 11:02 AM Alvaro Herrera  
> wrote:
> > > By the way, if we decide to do something about this, I think we do the
> > > same for partitioned tables.
> >
> > I'm up for changing the behavior of partitioned tables in pg12 (please
> > send a patch), but I'm up for changing the behavior of partitioned
> > tables in pg11.
> 
> Uh, what?

Sorry, I just realized the typo in the above.  The behavior I propose to
change in pg11 is that of partitioned *indexes*, not tables.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread John Naylor
On 11/2/18, Robert Haas  wrote:
> On Fri, Nov 2, 2018 at 10:07 AM Tom Lane  wrote:
>> Robert Haas  writes:
>> > That's not what I'm saying.  If we don't have the FSM, we have to
>> > check every page of the table.  If there's a workload where that
>> > happens a lot on a table that is just under the size threshold for
>> > creating the FSM, then it's likely to be a worst case for this patch.
>>
>> Hmm, you're assuming something not in evidence: why would that be the
>> algorithm?
>
> I think it's in evidence, in the form of several messages mentioning a
> flag called try_every_block.

Correct.

> Just checking the last page of the table doesn't sound like a good
> idea to me.  I think that will just lead to a lot of stupid bloat.  It
> seems likely that checking every page of the table is fine for npages
> <= 3, and that would still be win in a very significant number of
> cases, since lots of instances have many empty or tiny tables.  I was
> merely reacting to the suggestion that the approach should be used for
> npages <= 32; that threshold sounds way too high.

To be clear, no one suggested that. The patch has always had 8 or 10
as a starting point, and I've mentioned 4 and 8 as good possibilities
based on the COPY tests upthread. It was apparent I didn't need to
recompile a bunch of binaries with different thresholds. All I had to
do was compile with a threshold much larger than required, and then
test inserting into X number of pages, to simulate a threshold of X. I
increased X until I saw a regression. That's where the 32 came from,
sorry if that was misleading, in my head it was obvious. I'd be happy
test other scenarios. I'm not sure how to test redo -- seems more
difficult to get meaningful results than the normal case.

-John Naylor



Re: COPY FROM WHEN condition

2018-11-02 Thread David Fetter
On Fri, Nov 02, 2018 at 12:58:12PM +0100, Daniel Verite wrote:
>   Pavel Stehule wrote:
> 
> > > SELECT x.a, sum(x.b)
> > > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> > > numeric, c text, d date, e json) )
> > > WHERE x.d >= '2018-11-01'
> > >
> > >
> > Without some special feature this example is not extra useful. It is based
> > on copy on server that can use only super user with full rights.
> 
> And if superuser, one might use the file data wrapper [1]  to get
> the same results without the need for the explicit COPY in the query.
> 
> BTW, this brings into question whether the [WHEN condition] clause
> should be included in the options of file_fdw, as it supports pretty
> much all COPY options.
> 
> Also, psql's \copy should gain the option too?

tl;dr: \copy support is a very large can of worms.

psql's \copy is something which should probably be handled separately
from COPY, as it's both a way to access the filesystem without
superuser permission and an interface to the COPY part of the
protocol.  It seems like poor design to add grammar to support a
single client, so we'd need to think about this in terms of what we
want to support on the client side independent of specific clients. It
also seems like a violation of separation of concerns to couple FEBE
to grammar, so there'd need to be some way to do those things
separately, too.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure
On Fri, Nov 2, 2018 at 11:15 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Merlin Moncure (mmonc...@gmail.com) wrote:
> > On Fri, Nov 2, 2018 at 10:53 AM Tom Lane  wrote:
> > > Andres' point about alignment is a pretty good one as well, if it applies
> > > here --- I don't recall just what internal alignment requirements jsonb
> > > has.  We have not historically expected clients to have to deal with that.
> >
> > I see your (and Andres') point; the binary wire format ought to lay on
> > top of the basic contracts established by other types.  It can be
> > binary; just not a straight memcpy out of the server.  The array and
> > composite type serializers should give some inspiration there on
> > serialization.
>
> Right- I agree w/ Tom and Andres on this part also.
>
> > I'll still stand other point I made though; I'd
> > really want to see some benchmarks demonstrating benefit over
> > competing approaches that work over the current formats.  That should
> > frame the argument as to whether this is a good idea.
>
> What are the 'competing approaches' you're alluding to here?  Sending
> text-format json across as we do today?

Yep -- exactly.  For example, write a C client program that recursed
the structure and dumped it to stdout or assigned to dummy variables
(being mindful of compiler optimizations).  I'd be contrasting this to
a C parsed json that did essentially the same thing, and rigging a
high scale test on the back of that.  The assumption here is that the
ultimate consumer is not, say, a browser, but some client app that can
actually exploit the performance advantages (else, why bother?).

In my experience with arrays and composites, you can see significant
performance reduction and throughput increase in certain classes of
queries.  However, some of the types that were the worst offenders
(like timestamps) have been subsequently optimized and/or are
irrelevant to json since they'd be passed as test anyways.

merlin



Re: Compressed TOAST Slicing

2018-11-02 Thread Paul Ramsey
On Thu, Nov 1, 2018 at 4:02 PM Tom Lane  wrote:

> Paul Ramsey  writes:
> > On Thu, Nov 1, 2018 at 2:29 PM Stephen Frost  wrote:
> >> and secondly, why we wouldn't consider
> >> handling a non-zero offset.  A non-zero offset would, of course, still
> >> require decompressing from the start and then just throwing away what we
> >> skip over, but we're going to be doing that anyway, aren't we?  Why not
> >> stop when we get to the end, at least, and save ourselves the trouble of
> >> decompressing the rest and then throwing it away.
>
> > I was worried about changing the pg_lz code too much because it scared
> me,
> > but debugging some stuff made me read it more closely so I fear it less
> > now, and doing interior slices seems not unreasonable, so I will give it
> a
> > try.
>
> I think Stephen was just envisioning decompressing from offset 0 up to
> the end of what's needed, and then discarding any data before the start
> of what's needed; at least, that's what'd occurred to me.


Understood, that makes lots of sense and is a very small change, it turns
out :)
Allocating just what is needed also makes things faster yet, which is nice,
and no big surprise.
Some light testing seems to show no obvious regression in speed of
decompression for the usual "decompress it all" case.

P


compressed-datum-slicing-20190102a.patch
Description: Binary data


Re: Compressed TOAST Slicing

2018-11-02 Thread Paul Ramsey
As threatened, I have also added a patch to left() to also use sliced
access.


compressed-datum-slicing-20190102a.patch
Description: Binary data


compressed-datum-slicing-left-20190102a.patch
Description: Binary data


Re: Constraint documentation

2018-11-02 Thread Patrick Francelle

Thanks for your remarks and advices, and of course for your help to
rewrite the text.
So, it is now included in the new version attached.
I hope it will be ok this time.

Patrick Francelle


On 10/30/18 17:14, David G. Johnston wrote:
> The product name, when used in the documentation, is "PostgreSQL" with
> appropriate html elements surrounding it.
>
> Some parts that look or read oddly to me:
> "you may expect troubles"
> Use - if possible - (commas, not hypens, are customary here)
> "does not currently" - drop "currently", it doesn't and we don't need
> to predict the future (same goes for "are currently meant")
> "therefore we recommend to avoid them" - they are unsupported, the
> implied recommended is to not use them period, not avoid them if
> possible.  Better to say that it isn't enforced even though it is
> unsupported.
>
> An alternative to consider as one the whole the reading of the v4
> patch just feels off and different than the rest of that section of
> the documentation.
>
> PostgreSQL does not support writing CHECK constraints that reference
> tables (though it does not reliably prevent one from doing so).  While
> normal operations are likely to succeed even if you violate this rule
> it is probable that a database restoration will fail.  Use FOREIGN KEY
> constraints or custom triggers for cross-table validations.  For rows
> on the same table you should use UNIQUE or EXCLUDE constraints when
> applicable, or a custom trigger otherwise.
>
> David J.
>

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index b5ed1b7939..142918e2b1 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -403,6 +403,20 @@ CREATE TABLE products (
 ensure that a column does not contain null values, the not-null
 constraint described in the next section can be used.

+
+   
+
+ PostgreSQL does not support writing
+ CHECK constraints that reference tables (though it does
+ not reliably prevent one from doing so).  While normal operations are
+ likely to succeed even if you violate this rule, it is probable that a
+ database restoration will fail.  Use FOREIGN KEY
+ constraints or custom triggers for
+ cross-table validations.  For rows on the same table you should use
+ UNIQUE or EXCLUDE constraints when
+ applicable, or a custom trigger otherwise.
+
+   
   
 
   


Re: partitioned indexes and tablespaces

2018-11-02 Thread Robert Haas
On Fri, Nov 2, 2018 at 12:05 PM Alvaro Herrera  wrote:
> On 2018-Nov-02, Robert Haas wrote:
> > I strongly object to inserting behavior changes into released branches
> > on the grounds that the behavior wasn't considered carefully enough
> > before feature freeze.
>
> I'm not proposing to change any stable behavior.  The thing I'm
> proposing to change clearly cannot be used by anyone, because it just
> throws errors.

I don't get it.  If the default tablespace for partition is set as for
a regular table, that is a usable behavior.  If it is taken from the
parent table, that is a different and also usable behavior.  Isn't
this part of what you are proposing to change?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-02, Robert Haas wrote:

> On Fri, Nov 2, 2018 at 12:05 PM Alvaro Herrera  
> wrote:
> > On 2018-Nov-02, Robert Haas wrote:
> > > I strongly object to inserting behavior changes into released branches
> > > on the grounds that the behavior wasn't considered carefully enough
> > > before feature freeze.
> >
> > I'm not proposing to change any stable behavior.  The thing I'm
> > proposing to change clearly cannot be used by anyone, because it just
> > throws errors.
> 
> I don't get it.  If the default tablespace for partition is set as for
> a regular table, that is a usable behavior.  If it is taken from the
> parent table, that is a different and also usable behavior.  Isn't
> this part of what you are proposing to change?

In this thread I'm not proposing to change the behavior for tables, only
for indexes.  If people want to change behavior for tables (and I agree
with doing so), they can start their own threads.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Stephen Frost
Greetings,

* Merlin Moncure (mmonc...@gmail.com) wrote:
> On Fri, Nov 2, 2018 at 11:15 AM Stephen Frost  wrote:
> > * Merlin Moncure (mmonc...@gmail.com) wrote:
> > > I'll still stand other point I made though; I'd
> > > really want to see some benchmarks demonstrating benefit over
> > > competing approaches that work over the current formats.  That should
> > > frame the argument as to whether this is a good idea.
> >
> > What are the 'competing approaches' you're alluding to here?  Sending
> > text-format json across as we do today?
> 
> Yep -- exactly.  For example, write a C client program that recursed
> the structure and dumped it to stdout or assigned to dummy variables
> (being mindful of compiler optimizations).  I'd be contrasting this to
> a C parsed json that did essentially the same thing, and rigging a
> high scale test on the back of that.  The assumption here is that the
> ultimate consumer is not, say, a browser, but some client app that can
> actually exploit the performance advantages (else, why bother?).

If transferring the data in binary doesn't show a performance
improvement then I could agree that it wouldn't be sensible to do- but I
also find that very unlikely to be the case.

As for what language it's written in- I don't think that matters much.
I'd very much expect it to be more performant to use binary if you're
working in C, of course, but there's no point comparing C-parsed json
into some C structure vs. psycopg2 injesting binary data and building a
Python json object- what matters is if it'd be faster for psycopg2 to
pull in binary-json data and put it into a Python json object, or if
it'd be faster to parse the text-json data and put the result into the
same Python object.  In my view, there's something clearly quite wrong
if the text-json data format is faster at that.

> In my experience with arrays and composites, you can see significant
> performance reduction and throughput increase in certain classes of
> queries.  However, some of the types that were the worst offenders
> (like timestamps) have been subsequently optimized and/or are
> irrelevant to json since they'd be passed as test anyways.

I've had very good success transferring timestamps as binary, so I'm not
quite sure what you're getting at here.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Merlin Moncure
On Fri, Nov 2, 2018 at 2:34 PM Stephen Frost  wrote:
> * Merlin Moncure (mmonc...@gmail.com) wrote:
> As for what language it's written in- I don't think that matters much.
> I'd very much expect it to be more performant to use binary if you're
> working in C, of course, but there's no point comparing C-parsed json
> into some C structure vs. psycopg2 injesting binary data and building a
> Python json object- what matters is if it'd be faster for psycopg2 to
> pull in binary-json data and put it into a Python json object, or if
> it'd be faster to parse the text-json data and put the result into the
> same Python object.  In my view, there's something clearly quite wrong
> if the text-json data format is faster at that.

Yeah, I figure the language would be C or another language with
drivers smart enough to speak the binary protocol.

> > In my experience with arrays and composites, you can see significant
> > performance reduction and throughput increase in certain classes of
> > queries.  However, some of the types that were the worst offenders
> > (like timestamps) have been subsequently optimized and/or are
> > irrelevant to json since they'd be passed as test anyways.
>
> I've had very good success transferring timestamps as binary, so I'm not
> quite sure what you're getting at here.

Er, yes, timestamps are much faster in binary -- that is what I'd
observed and was the point I was trying to make.  They are slightly
less faster with recent optimizations but still faster.  In the old
days I saw as much as +50% throughput binary vs text in certain
contrived situations; that may no longer be true today.

merlin



Re: CF app feature request

2018-11-02 Thread Dmitry Dolgov
On Fri, 2 Nov 2018 at 10:24, Fabien COELHO  wrote:
>
>
> Bonjour Michaël,
>
> >> Because the same patch submission is already counted? It is a rare
> >> occurence, so just a "Withdrawn" state could be enough, and slightly false
> >> CF stats are no big deal.
> >
> > Or as we are dealing with duplicated entries, perhaps we could just
> > delete the entry not wanted, which seems to be 1859 in this case.
> > Admins can do so.
>
> Good. Please proceed!
>
> So the solution is to contact an admin (no clear cue about who is an
> admin, though) to remove the entry.

Just to make sure, if a duplicated entry will be removed, the patch itself
will stay or not? I'm asking, because both entries have the same patch
referenced, and the admin form says that one of the related items, that
would be removed is the patch item.



Re: [PATCH][PROPOSAL] Add enum releation option type

2018-11-02 Thread Nikolay Shaplov
В письме от 1 ноября 2018 18:26:20 пользователь Nikolay Shaplov написал:

> In this case the only solution I can see is
>
> DETAIL:  Valid values are: "value1", "value2", "value3".
>
> Where list '"value1", "value2", "value3"' is built in runtime but have no
> any bindnings to any specific language. And the rest of the message is
> 'DETAIL:  Valid values are: %s' which can be properly translated.

I've fiex the patch. Now it does not add "and" at all.

--
Do code for fun.diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index db84da0..b266c86 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -422,7 +422,11 @@ static relopt_real realRelOpts[]  	{{NULL}}
 };

-static relopt_string stringRelOpts[] +static relopt_enum_element_definition gist_buffering_enum_def[] +		GIST_OPTION_BUFFERING_ENUM_DEF;
+static relopt_enum_element_definition view_check_option_enum_def[] +		VIEW_OPTION_CHECK_OPTION_ENUM_DEF;
+static relopt_enum enumRelOpts[]  {
 	{
 		{
@@ -431,10 +435,8 @@ static relopt_string stringRelOpts[]  			RELOPT_KIND_GIST,
 			AccessExclusiveLock
 		},
-		4,
-		false,
-		gistValidateBufferingOption,
-		"auto"
+		gist_buffering_enum_def,
+		GIST_OPTION_BUFFERING_AUTO
 	},
 	{
 		{
@@ -443,11 +445,14 @@ static relopt_string stringRelOpts[]  			RELOPT_KIND_VIEW,
 			AccessExclusiveLock
 		},
-		0,
-		true,
-		validateWithCheckOption,
-		NULL
+		view_check_option_enum_def,
+		VIEW_OPTION_CHECK_OPTION_NOT_SET
 	},
+	{{NULL}}
+};
+
+static relopt_string stringRelOpts[] +{
 	/* list terminator */
 	{{NULL}}
 };
@@ -494,6 +499,12 @@ initialize_reloptions(void)
    realRelOpts[i].gen.lockmode));
 		j++;
 	}
+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		Assert(DoLockModesConflict(enumRelOpts[i].gen.lockmode,
+   enumRelOpts[i].gen.lockmode));
+		j++;
+	}
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		Assert(DoLockModesConflict(stringRelOpts[i].gen.lockmode,
@@ -532,6 +543,14 @@ initialize_reloptions(void)
 		j++;
 	}

+	for (i = 0; enumRelOpts[i].gen.name; i++)
+	{
+		relOpts[j] = &enumRelOpts[i].gen;
+		relOpts[j]->type = RELOPT_TYPE_ENUM;
+		relOpts[j]->namelen = strlen(relOpts[j]->name);
+		j++;
+	}
+
 	for (i = 0; stringRelOpts[i].gen.name; i++)
 	{
 		relOpts[j] = &stringRelOpts[i].gen;
@@ -629,6 +648,9 @@ allocate_reloption(bits32 kinds, int type, const char *name, const char *desc)
 		case RELOPT_TYPE_REAL:
 			size = sizeof(relopt_real);
 			break;
+		case RELOPT_TYPE_ENUM:
+			size = sizeof(relopt_enum);
+			break;
 		case RELOPT_TYPE_STRING:
 			size = sizeof(relopt_string);
 			break;
@@ -708,6 +730,24 @@ add_real_reloption(bits32 kinds, const char *name, const char *desc, double defa
 }

 /*
+ * add_enuum_reloption
+ *		Add a new enum reloption
+ */
+void
+add_enum_reloption(bits32 kinds, const char *name, const char *desc,
+	 relopt_enum_element_definition *enum_def, int default_val)
+{
+	relopt_enum *newoption;
+
+	newoption = (relopt_enum *) allocate_reloption(kinds, RELOPT_TYPE_ENUM,
+   name, desc);
+	newoption->enum_def = enum_def;
+	newoption->default_val = default_val;
+
+	add_reloption((relopt_gen *) newoption);
+}
+
+/*
  * add_string_reloption
  *		Add a new string reloption
  *
@@ -1208,6 +1248,51 @@ parse_one_reloption(relopt_value *option, char *text_str, int text_len,
 	   optreal->min, optreal->max)));
 			}
 			break;
+		case RELOPT_TYPE_ENUM:
+			{
+relopt_enum *optenum = (relopt_enum *) option->gen;
+relopt_enum_element_definition *el_def;
+
+parsed = false;
+for(el_def = optenum->enum_def; el_def->text_value; el_def++)
+{
+	if (pg_strcasecmp(value, el_def->text_value) == 0)
+	{
+		option->values.enum_val = el_def->numeric_value;
+		parsed = true;
+		break;
+	}
+}
+if (!parsed)
+{
+	/*
+	 * If value is not among allowed enum text values, but we
+	 * are not up to validating, just use default numeric value,
+	 * otherwise we raise an error
+	 */
+	if (!validate)
+		option->values.enum_val = optenum->default_val;
+	else
+	{
+		StringInfoData buf;
+		initStringInfo(&buf);
+		for(el_def = optenum->enum_def; el_def->text_value;
+	 el_def++)
+		{
+			appendStringInfo(&buf,"\"%s\"",el_def->text_value);
+			if (el_def[1].text_value)
+appendStringInfo(&buf,", ");
+		}
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for \"%s\" option",
+		option->gen->name),
+ errdetail("Valid values are: %s.", buf.data)));
+		pfree(buf.data);
+	}
+}
+			}
+			break;
 		case RELOPT_TYPE_STRING:
 			{
 relopt_string *optstring = (relopt_string *) option->gen;
@@ -1301,6 +1386,11 @@ fillRelOptions(void *rdopts, Size basesize,
 			options[i].values.real_val :
 			((relopt_real *) options[i].gen)->default_val;
 		break;

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andrew Dunstan




On 11/02/2018 11:34 AM, Merlin Moncure wrote:


Binary format consuming applications already have to deal with these
kinds of issues. We already expose internal structures in the other
functions -- not sure why jsonb is held to a different standard.  For
other data types where format changes were made, the standard of
'caveat version' was in place to protect the user.  For jsonb we
decided to implement a version flag within the type itself, which I
thought mistake at the time -- better to have a version header in the
COPY BINARY if needed.




jsonb_send does output a version header, as I pointed out upthread.

cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should pg 11 use a lot more memory building an spgist index?

2018-11-02 Thread Bruno Wolff III

I see that a fix got committed. Thanks!
I'll double check it after the point release comes out (which looks like it 
will be next week) and let you know if there is still a problem.




Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andres Freund
On 2018-11-02 10:34:07 -0500, Merlin Moncure wrote:
> On Wed, Oct 31, 2018 at 10:23 AM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2018-10-31 11:13:13 -0400, Andrew Dunstan wrote:
> > > I agree that just sending a blob of the internal format isn't a great 
> > > idea.
> >
> > It's entirely unacceptable afaict. Besides the whole "exposing
> > internals" issue, it's also at least not endianess safe, depends on the
> > local alignment requirements (which differ both between platforms and
> > 32/64 bit), numeric's internal encoding and probably more.
> 
> Binary format consuming applications already have to deal with these
> kinds of issues.

Uh, which?  We have some that are affected by configure flags, but
otherwise we should be endianess & alignment independent.

And *even if* that were the case, that's not a good reason to continue a
bad practice.


> For jsonb we decided to implement a version flag within the type
> itself, which I thought mistake at the time -- better to have a
> version header in the COPY BINARY if needed.

What's the problem with the version? How does a COPY BINARY flag
address normal clients?

Greetings,

Andres Freund



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andres Freund
Hi,


On 2018-11-02 11:52:59 -0400, Tom Lane wrote:
> Andres' point about alignment is a pretty good one as well, if it applies
> here --- I don't recall just what internal alignment requirements jsonb
> has.  We have not historically expected clients to have to deal with that.

Certainly looks like it takes it into account:

static void
fillJsonbValue(JsonbContainer *container, int index,
   char *base_addr, uint32 offset,
   JsonbValue *result)
...
else if (JBE_ISNUMERIC(entry))
{
result->type = jbvNumeric;
result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
}
...
else
{
Assert(JBE_ISCONTAINER(entry));
result->type = jbvBinary;
/* Remove alignment padding from data pointer and length */
result->val.binary.data = (JsonbContainer *) (base_addr + 
INTALIGN(offset));
result->val.binary.len = getJsonbLength(container, index) -
(INTALIGN(offset) - offset);
}
...

Greetings,

Andres Freund



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andres Freund
Hi,

On 2018-11-02 17:02:24 -0400, Andrew Dunstan wrote:
> On 11/02/2018 11:34 AM, Merlin Moncure wrote:
> > 
> > Binary format consuming applications already have to deal with these
> > kinds of issues. We already expose internal structures in the other
> > functions -- not sure why jsonb is held to a different standard.  For
> > other data types where format changes were made, the standard of
> > 'caveat version' was in place to protect the user.  For jsonb we
> > decided to implement a version flag within the type itself, which I
> > thought mistake at the time -- better to have a version header in the
> > COPY BINARY if needed.
> > 
> 
> 
> jsonb_send does output a version header, as I pointed out upthread.

That's Merlin's point I think. For reasons I don't quite understand he
doesn't like that. Yes, a global solution would have been prettier than
per-datum version flag, but that obvioulsy wasn't realistic to introduce
around the feature freeze of the version that introduced jsonb.

Greetings,

Andres Freund



Special role for subscriptions

2018-11-02 Thread Evgeniy Efimkin
Hi hackers!
In postgresql 10 and 11 only superuser can create/alter subscriptions.
If there was a special role (like pg_monitor), it would be more easy to grant 
control on subscriptions.
I can make a patch if there are no objections against it.



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andrew Dunstan




On 11/02/2018 05:20 PM, Andres Freund wrote:

Hi,

On 2018-11-02 17:02:24 -0400, Andrew Dunstan wrote:

On 11/02/2018 11:34 AM, Merlin Moncure wrote:

Binary format consuming applications already have to deal with these
kinds of issues. We already expose internal structures in the other
functions -- not sure why jsonb is held to a different standard.  For
other data types where format changes were made, the standard of
'caveat version' was in place to protect the user.  For jsonb we
decided to implement a version flag within the type itself, which I
thought mistake at the time -- better to have a version header in the
COPY BINARY if needed.



jsonb_send does output a version header, as I pointed out upthread.

That's Merlin's point I think. For reasons I don't quite understand he
doesn't like that. Yes, a global solution would have been prettier than
per-datum version flag, but that obvioulsy wasn't realistic to introduce
around the feature freeze of the version that introduced jsonb.






Oh, hmm. It would have been a big change of little value, ISTM. One byte 
of overhead per jsonb datum for a version indicator doesn't seem a huge 
price to pay.


cheers

andtrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread David Fetter
On Wed, Oct 31, 2018 at 11:18:46AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > >> I dunno, I do not think it's a great idea to expose jsonb's internal
> > >> format to the world.  We intentionally did not do that when the type
> > >> was first defined --- that's why its binary I/O format isn't already
> > >> like this --- and I don't see that the tradeoffs have changed since then.
> > 
> > > I disagree- it's awfully expensive to go back and forth between string
> > > and a proper representation.
> > 
> > Has anyone put any effort into making jsonb_out() faster?  I think that
> > that would be way more productive.  Nobody is going to want to write
> > code to convert jsonb's internal form into whatever their application
> > uses; particularly not dealing with numeric fields.
> 
> I'm all for making jsonb_out() faster, but even a faster jsonb_out()
> isn't going to be faster than shoveling the jsonb across.

Would it be completely batty to try store JSONB on disk in wire format
and optimize accesses, indexing, etc. around that?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2018-11-02 Thread Andres Freund
On 2018-11-02 23:24:35 +0100, David Fetter wrote:
> On Wed, Oct 31, 2018 at 11:18:46AM -0400, Stephen Frost wrote:
> > Greetings,
> > 
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > Stephen Frost  writes:
> > > > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > >> I dunno, I do not think it's a great idea to expose jsonb's internal
> > > >> format to the world.  We intentionally did not do that when the type
> > > >> was first defined --- that's why its binary I/O format isn't already
> > > >> like this --- and I don't see that the tradeoffs have changed since 
> > > >> then.
> > > 
> > > > I disagree- it's awfully expensive to go back and forth between string
> > > > and a proper representation.
> > > 
> > > Has anyone put any effort into making jsonb_out() faster?  I think that
> > > that would be way more productive.  Nobody is going to want to write
> > > code to convert jsonb's internal form into whatever their application
> > > uses; particularly not dealing with numeric fields.
> > 
> > I'm all for making jsonb_out() faster, but even a faster jsonb_out()
> > isn't going to be faster than shoveling the jsonb across.
> 
> Would it be completely batty to try store JSONB on disk in wire format
> and optimize accesses, indexing, etc. around that?

It doesn't seem to make any sense whatsoever.  For one, the current wire
format is text. If that's your argument, uh, yes, we have jsonb over
json for a reason.  If you're talking about a hypothetical future wire
format, that seems unlikely as well - for one there's obviously on-disk
compat issues, for another, the on-disk format is already kind of
optimized to store what it stores. And on-wire and on-disk concerns are
different.

Greetings,

Andres Freund



Re: partitioned indexes and tablespaces

2018-11-02 Thread Michael Paquier
On Fri, Nov 02, 2018 at 03:53:51PM -0300, Alvaro Herrera wrote:
> In this thread I'm not proposing to change the behavior for tables, only
> for indexes.  If people want to change behavior for tables (and I agree
> with doing so), they can start their own threads.

Changing this behavior on back-branches is not a good idea I think
because that changes the representation of the pg_class entries in the
catalogs by adding the reltablespace reference for a partitioned index
which does not exist now.  We are long past the time of doing such
changes on v11, but that can perfectly be done for v12.

Making the tablespace inherited from the parent if the parent has no
information on disk is sensible in my opinion, so please let's consider
that as a feature but not a bug, and also let's do the change for both
partitioned tables *and* partitioned indexes for consistency.  The
current behavior does not prevent the features to work. 

So what I would suggest is to fix the commands which are failing by not
ignoring partitioned indexes for v11, then raise a new thread which
implements the new tablespace behavior for all partitioned objects.  Per
my recent lookups at partition-related features, making a maximum
consistency between how partitioned tables and partitioned indexes
behave is actually a good approach.
--
Michael


signature.asc
Description: PGP signature


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

2018-11-02 Thread Michael Paquier
On Fri, Nov 02, 2018 at 09:27:39AM +0900, Michael Paquier wrote:
> You actually do a lot, moving just one person with MP as initials to
> consider moving the function as being parallel-safe.  Thanks for the
> points you raised, what needs to be done looks clear now.

So anybody has an objection with marking the function as parallel-safe?
I'd like to do so if that's not the case and close the case.

What has been raised on this thread is more than I hoped first.  Thanks
Amit and Robert for the additional input!
--
Michael


signature.asc
Description: PGP signature


Re: partitioned indexes and tablespaces

2018-11-02 Thread Alvaro Herrera
On 2018-Nov-03, Michael Paquier wrote:

> On Fri, Nov 02, 2018 at 03:53:51PM -0300, Alvaro Herrera wrote:
> > In this thread I'm not proposing to change the behavior for tables, only
> > for indexes.  If people want to change behavior for tables (and I agree
> > with doing so), they can start their own threads.
> 
> Changing this behavior on back-branches is not a good idea I think
> because that changes the representation of the pg_class entries in the
> catalogs by adding the reltablespace reference for a partitioned index
> which does not exist now.  We are long past the time of doing such
> changes on v11, but that can perfectly be done for v12.

With all due respect, this argument makes no sense.  All partitioned
indexes that exist today have a null reltablespace (all pg_class rows
already have a reltablespace column; I'm not changing that).  If users
want to keep the current behavior (i.e. that indexes on future
partitions are created in the default tablespace), all they have to do
is not send any ALTER INDEX changing the index's tablespace.

You're saying that people currently running Postgres 11 that are
doing "CREATE INDEX ... TABLESPACE foo" will be surprised because the
index ends up in tablespace foo for partitions they create afterwards.

Additionally, you're saying that all people currently doing
ALTER INDEX ... SET TABLESPACE foo
expect that
1. the parent partitioned index silently does not change at all
2. the indexes on future partitions end up in the default tablespace.

I cannot see how that's for real.

Furthermore, I think delaying this change to pg12 serves nobody, because
it just means that there will be a behavior difference for no reason.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: INSTALL file

2018-11-02 Thread Andreas 'ads' Scherbaum

On 02.11.18 01:38, Stephen Frost wrote:

Greetings,

* Andreas 'ads' Scherbaum (a...@pgug.de) wrote:

How about the attached one? Picked up your draft, and cleaned it up a bit.

(unsurprisingly) this is looking pretty good to me.

A few additional notes:


Incorporated. See the attached.

If that works for you, I will submit it to the Commitfest.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

PostgreSQL Database Management System
=

This directory contains the source code distribution of the PostgreSQL
database management system.

PostgreSQL is an advanced object-relational database management system
that supports an extended subset of the SQL standard, including
transactions, foreign keys, subqueries, triggers, user-defined types
and functions.  This distribution also contains C language bindings.



Download


The latest version of this software, both in source code form and as
binary packages for many platforms, may be obtained at

https://www.postgresql.org/download/

PostgreSQL has many language interfaces, many of which are listed here:

https://www.postgresql.org/download/products/2-drivers-and-interfaces/

For more information please visit

https://www.postgresql.org/.


Building on UNIX


Detailed instructions for many Unix platforms are available here:
https://www.postgresql.org/docs/current/static/installation.html

To build PostgreSQL on most Unix variants, the following are required:

GNU make, version 3.8 or newer
ISO/ANSI C compilar (at least C99-compliant)
Flex 2.5.31 or later, and Bison 1.875 or later (for building from git)
Perl 5.8.3 (for building from git)

PostgreSQL has many additional capabilities which can be enabled using
configure --enable switches but many of those also depend on additional
libraries.  See the installation instructions for details.

To build PostgreSQL, run the following commands:

./configure
make

PostgreSQL can then be installed using 'make install', which will
require being a superuser to install into the default directory.
The installation location can be changed by passing '--prefix' to
'configure'.  Run './configure --help' for additional options.


Building on Windows
===

Detailed instructions for building on Windows is available here:

https://www.postgresql.org/docs/current/static/install-windows.html

To build PostgreSQL on Windows, either Visual Studio Express 2017
for Windows Desktop or Microsoft Visual C++ 2005 (or later) should be
installed.  PostgreSQL can also be built using MinGW or Cygwin using
the Unix instructions.

There are different requirements for building on a 32-bit or 64-bit
environment, check the documentation for details.


Initializing your Database
==

Once the PostgreSQL software is installed, the first step to having a
running database is to initialize a PostgreSQL database, using the
'initdb' command:

initdb -D /path/to/mydatabase

Where '/path/to/mydatabase' is the directory where the database is
going to be installed. This directory can exist, but must be empty.
If it does not exist, 'initdb' will create it.

After the database system has been initialized, PostgreSQL can be
started by using the pg_ctl command:

pg_ctl -D /path/to/mydatabase -l logfile start

Once PostgreSQL is running, you can connect to it using the psql
command-line client.  A default database called 'postgres' was created
by 'initdb'.


Building the PostgreSQL Documentation
=

Full documentation for PostgreSQL is available online here:

https://www.postgresql.org/docs/current/static/index.html

PostgreSQL uses DocBook to build the documentation. Therefore the
DocBook tools must be installed. In addition, a working Java
installation is required.

To build PostgreSQL's documentation on Unix, run:

./configure
make docs

The documentation, once built by 'make docs', will be available in
various formats in the 'doc/src/sgml' directory.
diff --git a/README b/README
index 12de3f1d73..09ac40fa1c 100644
--- a/README
+++ b/README
@@ -9,19 +9,107 @@ that supports an extended subset of the SQL standard, including
 transactions, foreign keys, subqueries, triggers, user-defined types
 and functions.  This distribution also contains C language bindings.
 
+
+
+Download
+
+
+The latest version of this software, both in source code form and as
+binary packages for many platforms, may be obtained at
+
+	https://www.postgresql.org/download/
+
 PostgreSQL has many language interfaces, many of which are listed here:
 
-	https://www.postgresql.org/download
+	https://www.postgresql.org/download/products/2-drivers-and-interfaces/
+
+For more information please visit
+
+	https://www.postgresql.org/.
+
+
+Building on UNIX
+==

Re: INSTALL file

2018-11-02 Thread Michael Paquier
On Fri, Nov 02, 2018 at 06:47:19AM -0400, Stephen Frost wrote:
> As for what's in the README on the master branch, I was saying that it
> *should* point to the development documentation, since that should be
> current with whatever is actually in the git repo (or only a day behind
> or such).

Definitely!
--
Michael


signature.asc
Description: PGP signature


partitioned tables referenced by FKs

2018-11-02 Thread Alvaro Herrera
Here's a patch to allow partitioned tables to be referenced by foreign
keys.  Current state is WIP, but everything should work; see below for
the expected exception.

The design is very simple: have one pg_constraint row for each partition
on each side, each row pointing to the topmost table on the other side;
triggers appear on each leaf partition (and naturally they don't appear
on any intermediate partitioned table).

There are tests that should cover all the basic features.  pg_upgrade
tests work (containing relevant tables, as regress/foreign_key.sql
leaves them behind for this reason: partitioned-references-partitioned).

There is one requisite feature still missing from this patch: when a
partition on the referenced side is detached or dropped, we must ensure
no referencing row remains on the other side.  For this, I have an
(unmerged and thus unsubmitted here) new ri_triggers.c routine
RI_Inverted_Initial_Check (need to come up with better name, heh) that
verifies this, invoked at DETACH/DROP time.

Also, some general code cleanup and documentation patching is needed.

I'm posting this now so that I can take my hands off it for a few days;
will return to post an updated version at some point before next
commitfest.  I wanted to have this ready for this commitfest, but RL
dictated otherwise.  This patch took a *very long time* to write ... I
wrote three different recursion models for this.


One thing I realized while writing this, is that my commit
3de241dba86f ("Foreign keys on partitioned tables") put function
CloneForeignKeyConstraints() in catalog/pg_constraint.c that should
really have been in tablecmds.c.  In this patch I produced some siblings
of that function still in pg_constraint.c, but I intend to move the
whole lot to tablecmds.c before commit.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index f4057a9f15..4e277d6bed 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -38,6 +38,10 @@
 #include "utils/tqual.h"
 
 
+static void deconstruct_pg_constraint_row(HeapTuple contuple, TupleDesc tupdesc,
+			  int *numfks, AttrNumber *conkey,
+			  AttrNumber *confkey, Oid *conpfeqop,
+			  Oid *conppeqop, Oid *conffeqop);
 static void clone_fk_constraints(Relation pg_constraint, Relation parentRel,
 	 Relation partRel, List *clone, List **cloned);
 
@@ -395,30 +399,37 @@ CreateConstraintEntry(const char *constraintName,
  * order, though.
  *
  * The *cloned list is appended ClonedConstraint elements describing what was
- * created.
+ * created, for the purposes of validating the constraint in ALTER TABLE's
+ * Phase 3.
  */
+
 void
 CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 {
 	Relation	pg_constraint;
 	Relation	parentRel;
 	Relation	rel;
-	ScanKeyData key;
+	ScanKeyData key[2];
 	SysScanDesc scan;
 	HeapTuple	tuple;
+	AttrNumber *attmap;
 	List	   *clone = NIL;
+	ListCell   *cell;
 
 	parentRel = heap_open(parentId, NoLock);	/* already got lock */
 	/* see ATAddForeignKeyConstraint about lock level */
 	rel = heap_open(relationId, AccessExclusiveLock);
 	pg_constraint = heap_open(ConstraintRelationId, RowShareLock);
 
-	/* Obtain the list of constraints to clone or attach */
-	ScanKeyInit(&key,
+	/*
+	 * Search for constraints where the parent is in the referencing side.
+	 * obtain the list of constraints to clone or attach.
+	 */
+	ScanKeyInit(&key[0],
 Anum_pg_constraint_conrelid, BTEqualStrategyNumber,
 F_OIDEQ, ObjectIdGetDatum(parentId));
 	scan = systable_beginscan(pg_constraint, ConstraintRelidTypidNameIndexId, true,
-			  NULL, 1, &key);
+			  NULL, 1, key);
 	while ((tuple = systable_getnext(scan)) != NULL)
 		clone = lappend_oid(clone, HeapTupleGetOid(tuple));
 	systable_endscan(scan);
@@ -426,9 +437,120 @@ CloneForeignKeyConstraints(Oid parentId, Oid relationId, List **cloned)
 	/* Do the actual work, recursing to partitions as needed */
 	clone_fk_constraints(pg_constraint, parentRel, rel, clone, cloned);
 
-	/* We're done.  Clean up */
+	list_free(clone);
+	clone = NIL;
+
+	/*
+	 * Now search for any constraints where this partition is in the
+	 * referenced side.  However, we must ignore any constraint whose parent
+	 * constraint is also going to be cloned, to avoid duplicates.  So do it
+	 * in two steps: first construct the list of constraints to clone, then go
+	 * over that list cloning those whose parents are not in the list.  (We
+	 * must not rely on the parent being seen first, since catalog order could
+	 * return children first.)
+	 */
+	attmap = convert_tuples_by_name_map(RelationGetDescr(rel),
+		RelationGetDescr(parentRel),
+		gettext_noop("could not convert row type"));
+	ScanKeyInit(&key[0],
+Anum_pg_constraint_confrelid, BTEqualStrategyNumber,
+F_OIDEQ, ObjectIdGe

Re: partitioned tables referenced by FKs

2018-11-02 Thread Alvaro Herrera
Oh, I forgot to mention one thing.  When creating a constraint, an index
OID is normally given.  I'm not sure what is this for.  In the patch
it's a little convoluted to get the correct index OID, so I'm just
passing InvalidOid.  Things work nonetheless.  I wonder if we shouldn't
just do away with the index OID.  There are several /*FIXME*/ notes in
the code due to this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

2018-11-02 Thread Peter Geoghegan
On Fri, Nov 2, 2018 at 3:06 AM Andrey Lepikhov
 wrote:
> Documentation is full and clear. All non-trivial logic is commented
> accurately.

Glad you think so.

I had the opportunity to discuss this patch at length with Heikki
during pgConf.EU. I don't want to speak on his behalf, but I will say
that he seemed to understand all aspects of the patch series, and
seemed generally well disposed towards the high level design. The
high-level design is the most important aspect -- B-Trees can be
optimized in many ways, all at once, and we must be sure to come up
with something that enables most or all of them. I really care about
the long term perspective.

That conversation with Heikki eventually turned into a conversation
about reimplementing GIN using the nbtree code, which is actually
related to my patch series (sorted on heap TID is the first step to
optional run length encoding for duplicates). Heikki seemed to think
that we can throw out a lot of the optimizations within GIN, and add a
few new ones to nbtree, while still coming out ahead. This made the
general nbtree-as-GIN idea (which we've been talking about casually
for years) seem a lot more realistic to me. Anyway, he requested that
I support this long term goal by getting rid of the DESC TID sort
order thing -- that breaks GIN-style TID compression. It also
increases the WAL volume unnecessarily when a page is split that
contains all duplicates.

The DESC heap TID sort order thing probably needs to go. I'll probably
have to go fix the regression test failures that occur when ASC heap
TID order is used. (Technically those failures are a pre-existing
problem, a problem that I mask by using DESC order...which is weird.
The problem is masked in the master branch by accidental behaviors
around nbtree duplicates, which is something that deserves to die.
DESC order is closer to the accidental current behavior.)

> Patch applies cleanly on top of current master. Regression tests passed
> and my "Retail Indextuple deletion" use cases works without mistakes.

Cool.

> New BTScanInsert structure reduces parameters list of many functions and
> look fine. But it contains some optimization part ('restorebinsrch'
> field et al.). It is used very locally in the code -
> _bt_findinsertloc()->_bt_binsrch() routines calling. May be you localize
> this logic into separate struct, which will passed to _bt_binsrch() as
> pointer. Another routines may pass NULL value to this routine. It is may
> simplify usability of the struct.

Hmm. I see your point. I did it that way because the knowledge of
having cached an upper and lower bound for a binary search of a leaf
page needs to last for a relatively long time. I'll look into it
again, though.

> Due to the optimization the _bt_binsrch() size has grown twice. May be
> you move this to some service routine?

Maybe. There are some tricky details that seem to work against it.
I'll see if it's possible to polish that some more, though.

-- 
Peter Geoghegan



Re: pgbench doc fix

2018-11-02 Thread Tatsuo Ishii
> So I do not think a more precise wording harms. Maybe: "prepared: use
> extended query protocol with REUSED named prepared statements" would
> be even less slightly ambiguous.

I like this. But maybe we can remove "named"?

"prepared: use extended query protocol with reused prepared statements"

Because "named" prepared statements can be (unlike unnamed prepared
statements) reused repeatably, it implies "reused". So using both
"named" and "reused" sounds a little bit redundant to me. If we choose
one of them, I prefer "reused" since it more explicitly stats the
difference between "-M extended" and "-M prepared".

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



First-draft release notes for back-branch releases

2018-11-02 Thread Tom Lane
I've made a pass over the commit log up to now, and prepared draft
release note entries for everything that seemed worth documenting;
see

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=65a82a7649860f8010db581a0d1f12aa92f5969b

As I usually do, I dropped all of these into a section for 11.1,
although some won't remain there because they don't apply to the
v11 branch or were committed in time for 11.0.  I'll sort that
out on Sunday.

Comments please ...

regards, tom lane



pgbench -M option can be specified more than once

2018-11-02 Thread Tatsuo Ishii
While playing with pgbench, I found multiple "-M query_mode" can be
set more than once. For example,

$ pgbench -p 11002 -M extended -S -M prepared test
starting vacuum...end.
transaction type: 
scaling factor: 1
query mode: prepared
number of clients: 1
number of threads: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
latency average = 0.490 ms
tps = 2040.372858 (including connections establishing)
tps = 3920.541599 (excluding connections establishing)

In this case the last "-M prepared" was chosen as the query mode. This
is sloppy because we cannot actually choose different -M at the same
time.

Attached is a patch to detect such an error.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 81bc6d8a6e..d69478b8dd 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -464,6 +464,7 @@ typedef enum QueryMode
 
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
+bool	query_mode_already_set = false;
 
 typedef struct
 {
@@ -5149,6 +5150,12 @@ main(int argc, char **argv)
 }
 break;
 			case 'M':
+if (query_mode_already_set)
+{
+	fprintf(stderr, "query mode (-M) already set\n");
+	exit(1);
+}
+query_mode_already_set = true;
 benchmarking_option_set = true;
 for (querymode = 0; querymode < NUM_QUERYMODE; querymode++)
 	if (strcmp(optarg, QUERYMODE[querymode]) == 0)


Re: pgbench -M option can be specified more than once

2018-11-02 Thread Tom Lane
Tatsuo Ishii  writes:
> While playing with pgbench, I found multiple "-M query_mode" can be
> set more than once. For example,

I think that's true of just about every option in all of our programs.
Why is this one instance so much worse than others that it deserves
to be handled differently?

regards, tom lane



Re: pgbench -M option can be specified more than once

2018-11-02 Thread Tatsuo Ishii
> Tatsuo Ishii  writes:
>> While playing with pgbench, I found multiple "-M query_mode" can be
>> set more than once. For example,
> 
> I think that's true of just about every option in all of our programs.
> Why is this one instance so much worse than others that it deserves
> to be handled differently?

One of my colleagues actually believed that if both "-M extended" and
"-M prepared" were specified, pgbench runs in mixture of those
modes. So I felt avoiding such misunderstanding is more important.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: pgbench -M option can be specified more than once

2018-11-02 Thread Isaac Morland
In many cases, it is handy to be able to specify an option in an alias, but
still be able to override on the actual command line.

I can't say whether that is useful with this specific option, but it seems
the same debate could be had about almost any option. I'm pretty sure the
existing behaviour is consistent with other options, with the exception of
a few (e.g. pg_dump --table), which are explicitly documented as being
cumulative.

On Fri, 2 Nov 2018 at 21:12, Tatsuo Ishii  wrote:

> > Tatsuo Ishii  writes:
> >> While playing with pgbench, I found multiple "-M query_mode" can be
> >> set more than once. For example,
> >
> > I think that's true of just about every option in all of our programs.
> > Why is this one instance so much worse than others that it deserves
> > to be handled differently?
>
> One of my colleagues actually believed that if both "-M extended" and
> "-M prepared" were specified, pgbench runs in mixture of those
> modes. So I felt avoiding such misunderstanding is more important.
>
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp
>
>


Re: pgbench -M option can be specified more than once

2018-11-02 Thread Andres Freund
Hi,

On 2018-11-03 10:12:14 +0900, Tatsuo Ishii wrote:
> > Tatsuo Ishii  writes:
> >> While playing with pgbench, I found multiple "-M query_mode" can be
> >> set more than once. For example,
> > 
> > I think that's true of just about every option in all of our programs.
> > Why is this one instance so much worse than others that it deserves
> > to be handled differently?
> 
> One of my colleagues actually believed that if both "-M extended" and
> "-M prepared" were specified, pgbench runs in mixture of those
> modes. So I felt avoiding such misunderstanding is more important.

I regularly specify options multiple times, and I'd hate for that to not
work anymore.  It's really useful to set defaults for scripts that then
can be overwritten by the caller by passing on "$@" (i.e. all script
arguments).

Greetings,

Andres Freund



Re: COPY FROM WHEN condition

2018-11-02 Thread Corey Huinker
>
>
> > SELECT x.a, sum(x.b)
> > FROM ( COPY INLINE '/path/to/foo.txt' FORMAT CSV ) as x( a integer, b
> numeric, c text, d date, e json) )
>
> Apologies for bike-shedding, but wouldn't the following be a better
> fit with the current COPY?
>
> COPY t(a integer, b numeric, c text, d date, e json) FROM
> '/path/to/foo.txt' WITH (FORMAT CSV, INLINE)
>

+1 Very much a better fit.

>
>


Re: pgbench -M option can be specified more than once

2018-11-02 Thread Tom Lane
Andres Freund  writes:
> On 2018-11-03 10:12:14 +0900, Tatsuo Ishii wrote:
>> One of my colleagues actually believed that if both "-M extended" and
>> "-M prepared" were specified, pgbench runs in mixture of those
>> modes. So I felt avoiding such misunderstanding is more important.

> I regularly specify options multiple times, and I'd hate for that to not
> work anymore.  It's really useful to set defaults for scripts that then
> can be overwritten by the caller by passing on "$@" (i.e. all script
> arguments).

Yeah, there's an ancient tradition that "last switch wins" when receiving
conflicting command-line options.  I think we should tread very carefully
in breaking that.

regards, tom lane



Re: zheap: a new storage format for PostgreSQL

2018-11-02 Thread Amit Kapila
On Fri, Nov 2, 2018 at 6:41 PM Tomas Vondra
 wrote:
>
> On 11/02/2018 12:12 PM, Amit Kapila wrote:
> > On Thu, Nov 1, 2018 at 7:26 PM Tomas Vondra
> >  wrote:
> >>
> >> On 11/01/2018 07:43 AM, Amit Kapila wrote:
> >>>
> >>> You can find the latest code at https://github.com/EnterpriseDB/zheap
> >>>
> >>
> >> Seems valgrind complains about a couple of places in the code - nothing
> >> major, might be noise, but probably worth a look.
> >>
> >
> > I have looked at the report and one of those seems to be problematic,
> > so I have pushed the fix for the same.  The other one for below stack
> > seems to be bogus:
> > ==7569==  Uninitialised value was created by a stack allocation
> > ==7569==at 0x59043D: znocachegetattr (zheapam.c:6206)
> > ==7569==
> > {
> >
> >Memcheck:Cond
> >fun:ZHeapDetermineModifiedColumns
> >fun:zheap_update
> >
> > I have checked in the function znocachegetattr that if we initialize
> > the value of ret_datum, it fixes the reported error, but actually
> > there is no need for doing it as the code always assign the valid
> > value to this variable.  I have left it as is for now as I am not sure
> > whether there is any value in doing such an initialization.
> >
>
> Well, the problem is the ret_datum is modified like this:
>
>
> thisatt = TupleDescAttr(tupleDesc, attnum);
> if (thisatt->attbyval)
> memcpy(&ret_datum, tp + off, thisatt->attlen);
> else
> ret_datum = PointerGetDatum((char *) (tp + off));
>
> which means that for cases with attlen < sizeof(Datum), this ends up
> leaving part of the value undefined. So it's a valid issue.
>

Agreed.

> I'm sure
> it's not the only place where we do something like this, and the other
> places don't trigger the valgrind warning, so how do those places do
> this? heapam seems to call fetch_att in the end, which essentially calls
> Int32GetDatum/Int16GetDatum/CharGetDatum, so why not to use the same
> trick here?
>

This is because, in zheap, we have omitted all alignment padding for
pass-by-value types.  See the description in my previous email [1].  I
think here we need to initialize ret_datum at the beginning of the
function unless you have some better idea.

One thing unrelated to the above problem is that I have forgotten to
mention in my previous email that Daniel Westermann whom I have cc'ed
in this email has reported few bugs in this branch which seems to have
fixed.  He seems to be interested in doing more tests.  Daniel, I
encourage you to share your findings here.

Thanks, Tomas and Daniel for looking into the branch and reporting
problems, it is really helpful.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1Lwb%2BrGeB_z%2BjUbnSndvgnsDUK%2B9tjfng4sy1AZyrHqRg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Amit Kapila
On Fri, Nov 2, 2018 at 7:29 PM Robert Haas  wrote:
>
> On Fri, Nov 2, 2018 at 7:23 AM Amit Kapila  wrote:
> > > That doesn't seem like an unreasonable argument.  I'm not sure whether
> > > the right threshold is 4 or something a little bigger, but I bet it's
> > > not very large.  It seems important to me that before anybody thinks
> > > about committing this, we construct some kind of destruction case
> > > where repeated scans of the whole table are triggered as frequently as
> > > possible, and then run that test with varying thresholds.
> >
> > Why do you think repeated scans will be a destruction case when there
> > is no FSM for a small table?
>
> That's not what I'm saying.  If we don't have the FSM, we have to
> check every page of the table.  If there's a workload where that
> happens a lot on a table that is just under the size threshold for
> creating the FSM, then it's likely to be a worst case for this patch.
>

That makes sense and this is the first thing I was also worried about
after looking at the initial patch and suggested a test [1] which can
hit the worst case.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2BhP-jGYWi25-1QMedxeM_0H01s%3D%3D4-t74oEgL2EDVicw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2018-11-02 Thread Amit Kapila
On Fri, Nov 2, 2018 at 7:37 PM Tom Lane  wrote:
>
> Robert Haas  writes:
> > That's not what I'm saying.  If we don't have the FSM, we have to
> > check every page of the table.  If there's a workload where that
> > happens a lot on a table that is just under the size threshold for
> > creating the FSM, then it's likely to be a worst case for this patch.
>
> Hmm, you're assuming something not in evidence: why would that be the
> algorithm?  On a FSM-less table, I'd be inclined to just check the
> last page and then grow the table if the tuple doesn't fit there.
> This would, in many cases, soon result in a FSM being created, but
> I think that's just fine.  The point of the change is to optimize
> for cases where a table *never* gets more than a few inserts.  Not, IMO,
> for cases where a table gets a lot of churn but never has a whole lot of
> live tuples.  In the latter scenario we are far better off having a FSM.
>

In the past, you seem to have suggested an approach to try each block
[1] for small tables which don't have FSM.  I think if we do what you
are suggesting now, then we don't need to worry much about any
regression and code will be somewhat simpler, but OTOH, I don't see
much harm in trying every block if we keep the threshold as no more
than 4.  That would address more cases.

[1] - https://www.postgresql.org/message-id/11360.1345502641%40sss.pgh.pa.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Doubts about pushing LIMIT to MergeAppendPath

2018-11-02 Thread Antonin Houska
Tomas Vondra  wrote:

> OK, so the reason is that when building child paths, we don't keep the
> pathkeys unless it matches the "interesting" pathkeys.
> 
> So for example we may have an IndexPath, but with pathkeys=NIL if the
> index does not match the ORDER BY we need.

I don't agree that IndexPath will necessarily have pathkeys set to NIL in such
a case. Even if the index ordering does not match ORDER BY clause of the
query, the path can still be useful, typically for merge join.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com



  1   2   >