date:20190109

Re: MERGE SQL statement for PG12

2019-01-09 Thread Mi Tar

Hi!

Looking at the commitfest as a novice contributor I was searching for patches 
to review without any reviewers set. And because I just spend some time and 
made a patch improving how REFRESH MATERIALIZED VIEW CONCURRENTLY works (does 
INSERTs/UPDATEs/DELETEs instead of just DELETEs/INSERTs) when I saw this patch 
I said to myself, great, MERGE is exactly what would be needed there. Because 
we already have a merge implementation there (requiring unique columns). I 
didn't know that I will discover such a long and beautiful thread.

So I will just add my 2c based on experience from REFRESH MATERIALIZED VIEW 
CONCURRENTLY work. I think that we would need an additional statement-level 
trigger for MERGE, instead of it being exposed as INSERT, UPDATE, and DELETE 
triggers. Because it is really tricky to make triggers work if you want to know 
how exactly the table as changed through MERGE if this is split into three 
separate triggers and transient relations. If we do not have a new 
statement-level trigger for MERGE, then this is really just a syntactic sugar 
on top of INSERTs, UPDATEs, and DELETEs.


Mitar

Re: additional foreign key test coverage

2019-01-09 Thread Mi Tar

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi!

I tested this patch and it applied cleanly and all tests passed. I haven't 
looked if the changes to tests are reasonable or extensive to cover all aspects 
of what they want to cover.


Mitar

some minor comment fix in md.c

2019-01-09 Thread Jamison, Kirk

Hi,

Here are few minor fix in md.c comments
src/backend/storage/smgr/md.c

1. @L174 - removed the unnecessary word "is".
- […] Note that this is breaks mdnblocks() and related functionality [...]
+ […] Note that this breaks mdnblocks() and related functionality [...]

2. @L885 - grammar fix
- We used to pass O_CREAT here, but that's has the disadvantage that it might 
[...]
+ We used to pass O_CREAT here, but that has the disadvantage that it might 
[...]

Regards,
Kirk J.


0001-minor-comment-fix-in-md.c.patch
Description: 0001-minor-comment-fix-in-md.c.patch

RE: Problem during Windows service start

2019-01-09 Thread Higuchi, Daisuke

Michael Paquier wrote:
> You should register this patch to the next commit fest in the section for bug 
> fixes to not lose sight of it; 

Thank you for picking up my post. I registered to the next CF. 

> I haven't put much thoughts into what you propose here, but this status 
> message is not really helpful for the user.

Thank you for review. I attached the updated patch. There are some reasons for 
server startup failing, so I changed the message like following: 

+   case PQPING_NO_RESPONSE:
+   case PQPING_NO_ATTEMPT:
+   write_eventlog(EVENTLOG_ERROR_TYPE, _("Server 
startup failed. Examine the log output.\n"));
+   pgwin32_SetServiceStatus(SERVICE_STOPPED);


Regards, 
Daisuke, Higuchi



windows_service_bug_fix_v2.patch
Description: windows_service_bug_fix_v2.patch

Re: Displaying and dumping of table access methods

2019-01-09 Thread Dmitry Dolgov

> On Tue, Jan 8, 2019 at 6:34 PM Andres Freund  wrote:
>
> On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > On 08/01/2019 00:56, Andres Freund wrote:
> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> >
> > For psql, a variable that hides the access method if it's the default.
>
> Yea, I think that seems the least contentious solution.  Don't like it
> too much, but it seems better than the alternative. I wonder if we want
> one for multiple regression related issues, or whether one specifically
> about table AMs is more appropriate. I lean towards the latter.

Are there any similar existing regression related issues? If no, then probably
the latter indeed makes more sense.

> > > Similarly, if pg_dump starts to dump table access methods either
> > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > unimportant differences.
> >
> > For pg_dump, track and set the default_table_access_method setting
> > throughout the dump (similar to how default_with_oids was handled, I
> > believe).
>
> Yea, that's similar to that, and I think that makes sense.

Yes, sounds like a reasonable approach, I can proceed with it.

Re: port of INSTALL file generation to XSLT

2019-01-09 Thread Mi Tar

The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:tested, failed

Hi!

I tested this. Patch applied cleanly and INSTALL file was produced. Formatting 
looks differently from before, but I think that it looks better. We lost 
central alignment of some headings, but many code/command snippets are now 
better/correctly indented. So I think this is overall better.

Documentation about documentation generation should be updated though. Please 
add pandoc to requirements here [1].

[1] https://www.postgresql.org/docs/devel/docguide-toolsets.html


Mitar

The new status of this patch is: Waiting on Author

problems with foreign keys on partitioned tables

2019-01-09 Thread Amit Langote

Hi,

I noticed a couple of problems with foreign keys on partitioned tables.

1. Foreign keys of partitions stop working correctly after being detached
from the parent table

create table pk (a int primary key);
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);
alter table p add foreign key (a) references pk (a);

-- these things work correctly
insert into p values (1);
ERROR:  insert or update on table "p11" violates foreign key constraint
"p_a_fkey"
DETAIL:  Key (a)=(1) is not present in table "pk".
insert into pk values (1);
insert into p values (1);
delete from pk where a = 1;
ERROR:  update or delete on table "pk" violates foreign key constraint
"p_a_fkey" on table "p"
DETAIL:  Key (a)=(1) is still referenced from table "p".

-- detach p1, which preserves the foreign key key
alter table p detach partition p1;
create table p12 partition of p1 for values in (2);

-- this part of the foreign key on p1 still works
insert into p1 values (2);
ERROR:  insert or update on table "p12" violates foreign key constraint
"p_a_fkey"
DETAIL:  Key (a)=(2) is not present in table "pk".

-- but this seems wrong
delete from pk where a = 1;
DELETE 1

-- because
select * from p1;
 a
───
 1
(1 row)

This happens because the action triggers defined on the PK relation (pk)
refers to p as the referencing relation.  On detaching p1 from p, p1's
data is no longer accessible to that trigger.  To fix this problem, we
need create action triggers on PK relation that refer to p1 when it's
detached (unless such triggers already exist which might be true in some
cases).  Attached patch 0001 shows this approach.

2. Foreign keys of a partition cannot be dropped in some cases after
detaching it from the parent.

create table p (a int references pk) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);
alter table p detach partition p1;

-- p1's foreign key is no longer inherited, so should be able to drop it
alter table p1 drop constraint p_a_fkey ;
ERROR:  constraint "p_a_fkey" of relation "p11" does not exist

This happens because by the time ATExecDropConstraint tries to recursively
drop the p11's inherited foreign key constraint (which is what normally
happens for inherited constraints), the latter has already been dropped by
dependency management.  I think the foreign key inheritance related code
doesn't need to add dependencies for something that inheritance recursion
can take of and I can't think of any other reason to have such
dependencies around.  I thought maybe they're needed for pg_dump to work
correctly, but apparently not so.

Interestingly, the above problem doesn't occur if the constraint is added
to partitions by inheritance recursion.

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);
alter table p add foreign key (a) references pk (a);
alter table p detach partition p1;
alter table p1 drop constraint p_a_fkey ;
ALTER TABLE

Looking into it, that happens to work *accidentally*.

ATExecDropInherit() doesn't try to recurse, which prevents the error in
this case, because it finds that the constraint on p1 is marked NO INHERIT
(non-inheritable), which is incorrect.  The value of p1's constraint's
connoinherit (in fact, other inheritance related properties too) is
incorrect, because ATAddForeignKeyConstraint doesn't bother to set them
correctly.  This is what the inheritance properties of various copies of
'p_a_fkey' looks like in the catalog in this case:

-- case 1: foreign key added to partitions recursively
create table p (a int) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);
alter table p add foreign key (a) references pk (a);
select conname, conrelid::regclass, conislocal, coninhcount, connoinherit
from pg_constraint where conname like 'p%fkey%';
 conname  │ conrelid │ conislocal │ coninhcount │ connoinherit
──┼──┼┼─┼──
 p_a_fkey │ p│ t  │   0 │ t
 p_a_fkey │ p1   │ t  │   0 │ t
 p_a_fkey │ p11  │ t  │   0 │ t
(3 rows)

In this case, after detaching p1 from p, p1's foreign key's coninhcount
turns to -1, which is not good.

alter table p detach partition p1;
select conname, conrelid::regclass, conislocal, coninhcount, connoinherit
from pg_constraint where conname like 'p%fkey%';
 conname  │ conrelid │ conislocal │ coninhcount │ connoinherit
──┼──┼┼─┼──
 p_a_fkey │ p│ t  │   0 │ t
 p_a_fkey │ p11  │ t  │   0 │ t
 p_a_fkey │ p1   │ t  │  -1 │ t
(3 rows)

-- case 2: for

RE: libpq compression

2019-01-09 Thread Iwata, Aya

Hi, 

> > I agree with the critiques from Robbie Harwood and Michael Paquier
> > that the way in that compression is being hooked into the existing
> > architecture looks like a kludge.  I'm not sure I know exactly how it
> > should be done, but the current approach doesn't look natural; it
> > looks like it was bolted on.
> 
> After some time spend reading this patch and investigating different points,
> mentioned in the discussion, I tend to agree with that. As far as I see it's
> probably the biggest disagreement here, that keeps things from progressing.
> I'm interested in this feature, so if Konstantin doesn't mind, I'll post in
> the near future (after I'll wrap up the current CF) an updated patch I'm 
> working
> on right now to propose another way of incorporating compression. For now
> I'm moving patch to the next CF.

This thread seems to be stopped. 
In last e-mail, Dmitry suggest to update the patch that implements the function 
in another way, and as far as I saw, he has not updated patch yet. (It may be 
because author has not responded.)
I understand big disagreement is here, however the status is "Needs review". 
There is no review after author update the patch to v9. So I will do.

About the patch, Please update your patch to attach current master. I could not 
test.

About Documentation, there are typos. Please check it. I am waiting for the 
reviewer of the sentence because I am not so good at English.

When you add new protocol message, it needs the information of "Length of 
message contents in bytes, including self.". 
It provides supported compression algorithm as a Byte1. I think it better to 
provide it as a list like the NegotiateProtocolVersion protocol.

I quickly saw code changes.

+   nread = conn->zstream
+   ? zpq_read(conn->zstream, conn->inBuffer + conn->inEnd,
+  conn->inBufSize - conn->inEnd, &processed)
+   : pqsecure_read(conn, conn->inBuffer + conn->inEnd,
+   conn->inBufSize - conn->inEnd);

How about combine as a #define macro? Because there are same logic in two place.

Do you consider anything about memory control?
Typically compression algorithm keeps dictionary in memory. I think it needs 
reset or some method.


Regards,
Aya Iwata

Re: libpq compression

2019-01-09 Thread Dmitry Dolgov

> On Wed, Jan 9, 2019 at 11:25 AM Iwata, Aya  wrote:
>

> This thread seems to be stopped.
> In last e-mail, Dmitry suggest to update the patch that implements the
> function in another way, and as far as I saw, he has not updated patch yet.

Yep, I'm still working on it, hopefully I can submit something rather soon. But
it doesn't cancel in any way all this work, done by Konstantin, so anyway thanks
for the review!

Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Michael Banck

Hi,

there was a report in #postgresql recently about a crash on Google Cloud
SQL with the somewhat misleading message "could not write to log file"
while in fact it was the xlog/wal:

|PANIC:  could not write to log file 000101960054 at offset
| 13279232, length 245760: Cannot allocate memory 
|ERROR:  could not write block 74666 in file "base/18031/48587": Cannot
| allocate memory 
|CONTEXT:  writing block 74666 of relation base/18031/48587 
|LOG:  server process (PID 5160) was terminated by signal 9: Killed 

The slightly longer logfile can be found here: http://dpaste.com/2T61PS9

I suggest to reword that message, e.g. "could not write to transaction
log file" or "could not write to wal file".

Also, the errno (ENOMEM) is curious (and the user wrote that Google
monitoring reported memory at 16/20GB at the time of the crash), but it
could be a due to running on a cloud-fork? As you have no access to
PGDATA, it sounds difficult to diagnose after the fact.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: ALTER TABLE with multiple SET NOT NULL

2019-01-09 Thread Sergei Kornilov

Hello

I investigate this bug and found reason:
>  alter table t1 add column b float8 not null default random(), add primary 
> key(a);

Here we call ATController (src/backend/commands/tablecmds.c) with two cmds: 
AT_AddColumn and AT_AddIndex
Then we go to phase 2 in ATRewriteCatalogs:
- succesful add new attribute, but without table rewrite - it will be later in 
phase 3
- call ATExecAddIndex, we want add primary key, so we call 
index_check_primary_key.
index_check_primary_key call AlterTableInternal and therefore another 
ATController with independent one AT_SetNotNull command.
ATController will call phase 2, and then its own phase 3 with validation all 
constraints. But at this nested level we have no AlteredTableInfo->newvals and 
we do not proper transform tuple.

not sure how we can proper rewrite this case.

regards, Sergei

Re: Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Magnus Hagander

On Wed, Jan 9, 2019 at 12:06 PM Michael Banck 
wrote:

> Hi,
>
> there was a report in #postgresql recently about a crash on Google Cloud
> SQL with the somewhat misleading message "could not write to log file"
> while in fact it was the xlog/wal:
>
> |PANIC:  could not write to log file 000101960054 at offset
> | 13279232, length 245760: Cannot allocate memory
> |ERROR:  could not write block 74666 in file "base/18031/48587": Cannot
> | allocate memory
> |CONTEXT:  writing block 74666 of relation base/18031/48587
> |LOG:  server process (PID 5160) was terminated by signal 9: Killed
>
> The slightly longer logfile can be found here: http://dpaste.com/2T61PS9
>
> I suggest to reword that message, e.g. "could not write to transaction
> log file" or "could not write to wal file".
>

Given the change xlog -> wal, I would suggest "could not write to wal file"
as the correct option there.

And +1 for rewording it. I think there are also some other messages like it
that needs to be changed, and also things like

(errmsg("restored log file \"%s\" from archive"

could do with an update.


Also, the errno (ENOMEM) is curious (and the user wrote that Google
> monitoring reported memory at 16/20GB at the time of the crash), but it
> could be a due to running on a cloud-fork? As you have no access to
> PGDATA, it sounds difficult to diagnose after the fact.
>

Yeah, nobody knows what Google has done in their fork *or* how they
actually measure those things, so without a repro I think that's hard..


//Magnus

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-09 Thread Etsuro Fujita


Amit-san,

(2019/01/09 9:30), Amit Langote wrote:

(sorry about the repeated email, but my previous attempt failed due to
trying to send to the -hackers and -performance lists at the same time, so
trying again after removing -performance)


Thanks!  (Actually, I also failed to send my post to those lists...)


On 2019/01/08 20:07, Etsuro Fujita wrote:

(2018/12/07 20:14), Ashutosh Bapat wrote:

On Fri, Dec 7, 2018 at 11:13 AM Ashutosh Bapat
mailto:ashutosh.bapat@gmail.com>>  wrote:



 Robert, Ashutosh, any comments on this?  I'm unfamiliar with the
 partitionwise join code.



 As the comment says it has to do with the equivalence classes being
 used during merge append. EC's are used to create pathkeys used for
 sorting. Creating a sort node which has column on the nullable side
 of an OUTER join will fail if it doesn't find corresponding
 equivalence class. You may not notice this if both the partitions
 being joined are pruned for some reason. Amit's idea to make
 partition-wise join code do this may work, but will add a similar
 overhead esp. in N-way partition-wise join once those equivalence
 classes are added.



I looked at the patch. The problem there is that for a given relation,
we will add child ec member multiple times, as many times as the number
of joins it participates in. We need to avoid that to keep ec_member
list length in check.


Amit-san, are you still working on this, perhaps as part of the
speeding-up-planning-with-partitions patch [1]?


I had tried to continue working on it after PGConf.ASIA last month, but
got distracted by something else.

So, while the patch at [1] can take care of this issue as I also mentioned
upthread, I was trying to come up with a solution that can be back-patched
to PG 11.  The patch I posted above is one such solution and as Ashutosh
points out it's perhaps not the best, because it can result in potentially
creating many copies of the same child EC member if we do it in joinrel.c,
as the patch proposes.  I will try to respond to the concerns he raised in
the next week if possible.


Thanks for working on this!

I like your patch in general.  I think one way to address Ashutosh's 
concerns would be to use the consider_partitionwise_join flag: 
originally, that was introduced for partitioned relations to show that 
they can be partitionwise-joined, but I think that flag could also be 
used for non-partitioned relations to show that they have been set up 
properly for partitionwise-joins, and I think by checking that flag we 
could avoid creating those copies for child dummy rels in 
try_partitionwise_join.  Please find attached an updated version of the 
patch.  I modified your version so that building tlists for child dummy 
rels are also postponed until after they actually participate in 
partitionwise-joins, to avoid that possibly-useless work as well.  I 
haven't done any performance tests yet though.


Best regards,
Etsuro Fujita
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 256fe16..feccc60 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1017,38 +1017,6 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 		Assert(childrel->reloptkind == RELOPT_OTHER_MEMBER_REL);
 
 		/*
-		 * Copy/Modify targetlist. Even if this child is deemed empty, we need
-		 * its targetlist in case it falls on nullable side in a child-join
-		 * because of partitionwise join.
-		 *
-		 * NB: the resulting childrel->reltarget->exprs may contain arbitrary
-		 * expressions, which otherwise would not occur in a rel's targetlist.
-		 * Code that might be looking at an appendrel child must cope with
-		 * such.  (Normally, a rel's targetlist would only include Vars and
-		 * PlaceHolderVars.)  XXX we do not bother to update the cost or width
-		 * fields of childrel->reltarget; not clear if that would be useful.
-		 */
-		childrel->reltarget->exprs = (List *)
-			adjust_appendrel_attrs(root,
-   (Node *) rel->reltarget->exprs,
-   1, &appinfo);
-
-		/*
-		 * We have to make child entries in the EquivalenceClass data
-		 * structures as well.  This is needed either if the parent
-		 * participates in some eclass joins (because we will want to consider
-		 * inner-indexscan joins on the individual children) or if the parent
-		 * has useful pathkeys (because we should try to build MergeAppend
-		 * paths that produce those sort orderings). Even if this child is
-		 * deemed dummy, it may fall on nullable side in a child-join, which
-		 * in turn may participate in a MergeAppend, where we will need the
-		 * EquivalenceClass data structures.
-		 */
-		if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
-			add_child_rel_equivalences(root, appinfo, rel, childrel);
-		childrel->has_eclass_joins = rel->has_eclass_joins;
-
-		/*
 		 * We have to copy the parent's quals to the child, with appropriate

Re: [PROPOSAL] Shared Ispell dictionaries

2019-01-09 Thread Arthur Zakirov


On 01.10.2018 12:22, Arthur Zakirov wrote:

On Thu, Jun 14, 2018 at 11:40:17AM +0300, Arthur Zakirov wrote:

I attached new version of the patch.


The patch still applies to HEAD. I moved it to the next commitfest.



Here is the rebased patch. I also updated copyright in ts_shared.h and 
ts_shared.c.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/src/backend/tsearch/spell.c b/src/backend/tsearch/spell.c
index eb39466b22..eb8416ce7f 100644
--- a/src/backend/tsearch/spell.c
+++ b/src/backend/tsearch/spell.c
@@ -78,6 +78,8 @@
 #define tmpalloc(sz)  MemoryContextAlloc(Conf->buildCxt, (sz))
 #define tmpalloc0(sz)  MemoryContextAllocZero(Conf->buildCxt, (sz))
 
+#define tmpstrdup(str)	MemoryContextStrdup(Conf->buildCxt, (str))
+
 /*
  * Prepare for constructing an ISpell dictionary.
  *
@@ -498,7 +500,7 @@ NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
 	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
 	strcpy(Conf->Spell[Conf->nspell]->word, word);
 	Conf->Spell[Conf->nspell]->p.flag = (*flag != '\0')
-		? cpstrdup(Conf, flag) : VoidString;
+		? tmpstrdup(flag) : VoidString;
 	Conf->nspell++;
 }
 
@@ -1040,7 +1042,7 @@ setCompoundAffixFlagValue(IspellDict *Conf, CompoundAffixFlag *entry,
 		entry->flag.i = i;
 	}
 	else
-		entry->flag.s = cpstrdup(Conf, s);
+		entry->flag.s = tmpstrdup(s);
 
 	entry->flagMode = Conf->flagMode;
 	entry->value = val;
@@ -1541,6 +1543,9 @@ nextline:
 	return;
 
 isnewformat:
+	pfree(recoded);
+	pfree(pstr);
+
 	if (oldformat)
 		ereport(ERROR,
 (errcode(ERRCODE_CONFIG_FILE_ERROR),
diff --git a/contrib/dict_int/dict_int.c b/contrib/dict_int/dict_int.c
index 628b9769c3..ddde55eee4 100644
--- a/contrib/dict_int/dict_int.c
+++ b/contrib/dict_int/dict_int.c
@@ -30,7 +30,7 @@ PG_FUNCTION_INFO_V1(dintdict_lexize);
 Datum
 dintdict_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	DictInt*d;
 	ListCell   *l;
 
@@ -38,7 +38,7 @@ dintdict_init(PG_FUNCTION_ARGS)
 	d->maxlen = 6;
 	d->rejectlong = false;
 
-	foreach(l, dictoptions)
+	foreach(l, init_data->dict_options)
 	{
 		DefElem*defel = (DefElem *) lfirst(l);
 
diff --git a/contrib/dict_xsyn/dict_xsyn.c b/contrib/dict_xsyn/dict_xsyn.c
index 509e14aee0..15b1a0033a 100644
--- a/contrib/dict_xsyn/dict_xsyn.c
+++ b/contrib/dict_xsyn/dict_xsyn.c
@@ -140,7 +140,7 @@ read_dictionary(DictSyn *d, const char *filename)
 Datum
 dxsyn_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	DictSyn*d;
 	ListCell   *l;
 	char	   *filename = NULL;
@@ -153,7 +153,7 @@ dxsyn_init(PG_FUNCTION_ARGS)
 	d->matchsynonyms = false;
 	d->keepsynonyms = true;
 
-	foreach(l, dictoptions)
+	foreach(l, init_data->dict_options)
 	{
 		DefElem*defel = (DefElem *) lfirst(l);
 
diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
index fc5176e338..f3663cefd0 100644
--- a/contrib/unaccent/unaccent.c
+++ b/contrib/unaccent/unaccent.c
@@ -270,12 +270,12 @@ PG_FUNCTION_INFO_V1(unaccent_init);
 Datum
 unaccent_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *init_data = (DictInitData *) PG_GETARG_POINTER(0);
 	TrieChar   *rootTrie = NULL;
 	bool		fileloaded = false;
 	ListCell   *l;
 
-	foreach(l, dictoptions)
+	foreach(l, init_data->dict_options)
 	{
 		DefElem*defel = (DefElem *) lfirst(l);
 
diff --git a/src/backend/commands/tsearchcmds.c b/src/backend/commands/tsearchcmds.c
index cda21675f0..93a71adc5d 100644
--- a/src/backend/commands/tsearchcmds.c
+++ b/src/backend/commands/tsearchcmds.c
@@ -390,17 +390,25 @@ verify_dictoptions(Oid tmplId, List *dictoptions)
 	}
 	else
 	{
+		DictInitData init_data;
+
 		/*
 		 * Copy the options just in case init method thinks it can scribble on
 		 * them ...
 		 */
 		dictoptions = copyObject(dictoptions);
 
+		init_data.dict_options = dictoptions;
+		init_data.dict.id = InvalidOid;
+		init_data.dict.xmin = InvalidTransactionId;
+		init_data.dict.xmax = InvalidTransactionId;
+		ItemPointerSetInvalid(&init_data.dict.tid);
+
 		/*
 		 * Call the init method and see if it complains.  We don't worry about
 		 * it leaking memory, since our command will soon be over anyway.
 		 */
-		(void) OidFunctionCall1(initmethod, PointerGetDatum(dictoptions));
+		(void) OidFunctionCall1(initmethod, PointerGetDatum(&init_data));
 	}
 
 	ReleaseSysCache(tup);
diff --git a/src/backend/snowball/dict_snowball.c b/src/backend/snowball/dict_snowball.c
index 5166738310..f30f29865c 100644
--- a/src/backend/snowball/dict_snowball.c
+++ b/src/backend/snowball/dict_snowball.c
@@ -201,14 +201,14 @@ locate_stem_module(DictSnowball *d, const char *lang)
 Datum
 dsnowball_init(PG_FUNCTION_ARGS)
 {
-	List	   *dictoptions = (List *) PG_GETARG_POINTER(0);
+	DictInitData *ini

Re: WIP: Avoid creation of the free space map for small tables

2019-01-09 Thread Mithun Cy

Hi John Naylor,
On Tue, Jan 8, 2019 at 2:27 AM John Naylor  wrote:
> I've attached two patches for testing. Each one applies on top of the
> current patch.

Thanks for the patch, I did a quick test for both of the patches same
tests as in [1], now for fillfactors 20, 70, 100 (Note for
HEAP_FSM_CREATION_THRESHOLD = 4 highest tid inserted was 20 fillfactor
is (3,43),  for 70 fillfactor is (3, 157) and for 100 fillfactor is
(3, 225), so exactly 4 pages are used)

Machine : cthulhu, same as before [2] and server settings is default.
Test: COPY command as in [1], for 500 tables.

Fill factor 20
  execution time in ms%increase in
execution time
Base 119.238
v11-all-pages121.974 2.2945705228
v11-Every-other-page   114.455 -4.0113051209
v11-last-page113.573 -4.7510021973

Fill factor 70
 execution time in ms   %increase in execution time
Base  209.991
v11-all-pages 211.0760.5166888105
v11-Every-other-page206.476  -1.6738812616
v11-last-page 203.591  -3.0477496655

Fill factor 100
   execution time in ms%increase in execution time
Base  269.691
v11-all-pages 270.078 0.1434975583
v11-Every-other-page262.691-2.5955630703
v11-last-page 260.293-3.4847288193

Observations
1. Execution time of both base and v11-all-pages patch has improved
than my earlier results [2]. But still v11-all-pages is slightly
behind base.
2. v11-Every-other-page and v11-last-page patches improve the
performance from base.
3. IMHO v11-Every-other-page would be ideal to consider it improves
the performance and also to an extent avoid expansion if space is
already available.

[1] 
https://www.postgresql.org/message-id/CAJVSVGX%3D2Q52fwijD9cjeq1UdiYGXns2_9WAPFf%3DE8cwbFCDvQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD__Ouj%3Dat4hy2wYidK90v92qSRLjU%2BQe4y-PwfjLLeGkhc6ZA%40mail.gmail.com

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Re: Displaying and dumping of table access methods

2019-01-09 Thread Amit Kapila

On Tue, Jan 8, 2019 at 11:04 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > On 08/01/2019 00:56, Andres Freund wrote:
> > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > means that running the tests with a different default table access
> > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > there'll be a significant number of test failures, even though the test
> > > results did not meaningfully differ.
> >
> > For psql, a variable that hides the access method if it's the default.
>
> Yea, I think that seems the least contentious solution.
>

+1.

>  Don't like it
> too much, but it seems better than the alternative. I wonder if we want
> one for multiple regression related issues, or whether one specifically
> about table AMs is more appropriate. I lean towards the latter.
>

I didn't understand what is the earlier part "I wonder if we want one
for multiple regression related issues".  What do you mean by multiple
regression related issues?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-09 Thread Alvaro Herrera

On 2019-Jan-09, Fabien COELHO wrote:

> > Attached a v26 with I hope everything ok, but for the documentation that
> > I'm unsure how to improve.
> 
> Attached v27 is the same but with an improved documentation where full
> examples, with and without prefix, are provided for both cset & gset.

I have already made changes on top of v26.  Can you please submit this
as a delta patch on top of v26?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: emacs configuration for new perltidy settings

2019-01-09 Thread Andrew Dunstan



On 1/8/19 8:44 PM, Noah Misch wrote:
> On Tue, Jan 08, 2019 at 08:17:43AM -0500, Andrew Dunstan wrote:
>> On 1/3/19 12:53 AM, Noah Misch wrote:
>>> If I run perltidy on 60d9979, then run perl-mode indent, the diff between 
>>> the
>>> perltidy run and perl-mode indent run is:
>>>  129 files changed, 8468 insertions(+), 8468 deletions(-)
>>> If I add (perl-continued-brace-offset . -2):
>>>  119 files changed, 3515 insertions(+), 3515 deletions(-)
>>> If I add (perl-indent-continued-arguments . 4) as well:
>>>  86 files changed, 2626 insertions(+), 2626 deletions(-)
>>> If I add (perl-indent-parens-as-block . t) as well:
>>>  65 files changed, 2373 insertions(+), 2373 deletions(-)
>> Sounds good. What do the remaining diffs look like?
> I've attached them.  Most involve statement continuation in some form.  For
> example, src/backend/utils/mb/Unicode has numerous instances where perl-mode
> indents hashref-constructor curly braces as though they were code blocks.
> Other diff lines involve labels.  Others are in string literals.



On a very quick glance I notice some things that looked just wrong, and
some that were at best dubious. It's a pity that we can't get closer to
what perltidy does, but +1 for applying your changes.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Donald Dong

Hi Michael,

Thank you for the information!

> On Dec 11, 2018, at 9:55 PM, Michael Paquier  wrote:
> 
> Well, the conninfo and slot name accessible to the user are the values
> available only once the information of the WAL receiver in shared memory
> is ready to be displayed.  Relying more on the GUC context has the
> advantage to never have in shared memory the original string and only
> store the clobbered one, which actually simplifies what's stored in
> shared memory because you can entirely remove ready_to_display (I forgot
> to remove that in the patch actually).  If those parameters become
> reloadable, you actually rely only on what the param context has, and
> not on what the shared memory context may have or not.

I wonder why do we need to have this addition?

src/backend/replication/walreceiver.c
+   memset(walrcv->slotname, 0, NAMEDATALEN);
+   if (PrimarySlotName)
+   strlcpy((char *) walrcv->slotname, PrimarySlotName, 
NAMEDATALEN);


src/backend/replication/walreceiver.c
288: PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0'
291: errmsg("cannot connect to the primary server as \"primary_conninfo\" is 
not defined")));
392: PrimarySlotName && PrimarySlotName[0] != '\0'

src/backend/access/transam/xlog.c
11824: * If primary_conninfo is set, launch walreceiver to try
11833: PrimaryConnInfo && strcmp(PrimaryConnInfo, "") != 0

I notice these patterns appear in different places. Maybe it will be better to 
have a common method such as pg_str_empty()?

Regards,
Donald Dong

Re: libpq compression

2019-01-09 Thread Konstantin Knizhnik




On 09.01.2019 13:25, Iwata, Aya wrote:

Hi,


I agree with the critiques from Robbie Harwood and Michael Paquier
that the way in that compression is being hooked into the existing
architecture looks like a kludge.  I'm not sure I know exactly how it
should be done, but the current approach doesn't look natural; it
looks like it was bolted on.

After some time spend reading this patch and investigating different points,
mentioned in the discussion, I tend to agree with that. As far as I see it's
probably the biggest disagreement here, that keeps things from progressing.
I'm interested in this feature, so if Konstantin doesn't mind, I'll post in
the near future (after I'll wrap up the current CF) an updated patch I'm working
on right now to propose another way of incorporating compression. For now
I'm moving patch to the next CF.

This thread seems to be stopped.
In last e-mail, Dmitry suggest to update the patch that implements the function 
in another way, and as far as I saw, he has not updated patch yet. (It may be 
because author has not responded.)
I understand big disagreement is here, however the status is "Needs review".
There is no review after author update the patch to v9. So I will do.

About the patch, Please update your patch to attach current master. I could not 
test.

About Documentation, there are typos. Please check it. I am waiting for the 
reviewer of the sentence because I am not so good at English.

When you add new protocol message, it needs the information of "Length of message 
contents in bytes, including self.".
It provides supported compression algorithm as a Byte1. I think it better to 
provide it as a list like the NegotiateProtocolVersion protocol.

I quickly saw code changes.

+   nread = conn->zstream
+   ? zpq_read(conn->zstream, conn->inBuffer + conn->inEnd,
+  conn->inBufSize - conn->inEnd, &processed)
+   : pqsecure_read(conn, conn->inBuffer + conn->inEnd,
+   conn->inBufSize - conn->inEnd);

How about combine as a #define macro? Because there are same logic in two place.

Do you consider anything about memory control?
Typically compression algorithm keeps dictionary in memory. I think it needs 
reset or some method.


Thank you for review.
Attached please find rebased version of the patch.
I fixed all issues you have reported except using list of supported 
compression algorithms.
It will require extra round of communication between client and server 
to make a decision about used compression algorithm.
I still not sure whether it is good idea to make it possible to user to 
explicitly specify compression algorithm.
Right now used streaming compression algorithm is hardcoded and depends 
on --use-zstd ort --use-zlib configuration options.
If client and server were built with the same options, then they are 
able to use compression.


Concerning memory control: there is a call of zpq_free(PqStream) in 
socket_close() function which should deallocate all memory used by 
compressor:


void
zpq_free(ZpqStream *zs)
{
    if (zs != NULL)
    {
        ZSTD_freeCStream(zs->tx_stream);
        ZSTD_freeDStream(zs->rx_stream);
        free(zs);
    }
}

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index d5ace62..2de7394 100755
--- a/configure
+++ b/configure
@@ -701,6 +701,7 @@ ELF_SYS
 EGREP
 GREP
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 with_libxml
@@ -863,6 +864,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 enable_float4_byval
@@ -8290,6 +8292,86 @@ fi
 
 
 #
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$a

Re: FETCH FIRST clause PERCENT option

2019-01-09 Thread Tomas Vondra



On 1/9/19 7:09 AM, Surafel Temesgen wrote:
> 
> 
> On Sun, Jan 6, 2019 at 5:51 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> On 1/6/19 12:40 PM, Surafel Temesgen wrote:
> >
> >
> > On Fri, Jan 4, 2019 at 5:27 PM Tomas Vondra
> >  
>  >> wrote:
> >
> >
> >     What formula? All the math remains exactly the same, you just
> need to
> >     update the number of rows to return and track how many rows
> are already
> >     returned.
> >
> >     I haven't tried doing that, but AFAICS you'd need to tweak
> how/when
> >     node->count is computed - instead of computing it only once it
> needs to
> >     be updated after fetching each row from the subplan.
> >
> >     Furthermore, you'll need to stash the subplan rows somewhere
> (into a
> >     tuplestore probably), and whenever the node->count value
> increments,
> >     you'll need to grab a row from the tuplestore and return that
> (i.e.
> >     tweak node->position and set node->subSlot).
> >
> >     I hope that makes sense. The one thing I'm not quite sure about is
> >     whether tuplestore allows adding and getting rows at the same
> time.
> >
> >     Does that make sense
> >
> >
> >
> > In current implementation in LIMIT_INITIAL state we execute outer
> > plan to the end , store the resulting tuples in tuplestore and
> > calculate limitCount in number .We are in this state only once and
> > did not return any tuple. Once we are at LIMIT_INWINDOW state and
> > inner node execution asking for tuple it return from tuple store
> > immediately.
> >
> 
> Right.
> 
> > Inorder to do fast startup what I was thinking was dividing the work
> > done at LIMIT_INITIAL state in to limitCount. For example we want
> > 0.5 percent of the result which means in LIMIT_INWINDOW state we
> > execute outer node 200 times ,store the result in tuplestore and
> > return the first tuple. if we don’t get that much tuple that means we
> > reach end of the limit.
> Yes, pretty much.
> 
> 
> 
> While working on this method i found an issue that did not work will
> on percent with fractional part like 2.6 percent which means we have
> to emit per every 38.4615 tuple so it have to be round up to 39 or
> round down to 38 either way it leads to incorrect result  .Even with
> integer percentage sometimes the result came out with fractional
> part and needs rounding
> 

It's hard to say what exactly are you doing, because you haven't shared
any code nor examples. But it sounds as if you're computing how often to
produce an additional row at the beginning, and then simply multiplying
it. I'm not sure that's quite possible, because there will be issues due
to rounding.

Firstly, I see your last patch (from 30/11) does this:

node->count = DatumGetInt64((DatumGetFloat8(val) *
  tuplestore_tuple_count(node->totalTuple)) / 100);

but the SQL standard I have says this in the  section:

If  is specified, then let FFP be the
 simply contained in , and let LOCT be a  whose value is OCT.
Let FFRC be the value of

CEILING ( FFP * LOCT / 100.0E0 )

That's not what you patch does, though, because it relies on automatic
float->int conversion, which does floor() and not ceil(). FWIW the final
DatumGetInt64 seems unnecessary, because the expression is float8 and
not Datum.

The patch should do something like this instead:

node->count = ceil((DatumGetFloat8(val) *
  tuplestore_tuple_count(node->totalTuple)) / 100);

Now, back to the row count issue - let's assume the user requests

FETCH FIRST 3 PERCENT ROWS ONLY

This means we should produce 1 rows every 33 rows, but only very
roughly. In reality it's a bit less, because 3% is not 1/33 exactly.
Consider this query, which computes number of rows for

select sample_cnt, l - lag(l) over (order by sample_cnt) AS d from (
select sample_cnt, min(nrows) AS l from (
  select
  nrows,
  (nrows * 3.0 / 100.0) AS raw_sample_cnt,
  ceil(nrows * 3.0 / 100.0) AS sample_cnt
  from generate_series(1,1) s(nrows)
) foo group by sample_cnt order by sample_cnt
) bar;

The inner-most query simply computes how many rows we're supposed to
return based on certain row count. Then we compute points at which the
sample_cnt actually changes. And finally we compute the distance between
those points. You should get something like

 sample_cnt | d
+
  1 |
  2 | 33
  3 | 33
  4 | 34
  5 | 33
  6 | 33
  7 | 34
  8 | 33
  9 | 33
 10 | 34
 11 | 33
 ...

This shows that the distance oscillates be

Re: speeding up planning with partitions

2019-01-09 Thread Alvaro Herrera

> From 3b86331dd5a2368adc39c9fef92f3dd09d817a08 Mon Sep 17 00:00:00 2001
> From: amit 
> Date: Wed, 7 Nov 2018 16:51:31 +0900
> Subject: [PATCH v11 4/6] Move inheritance expansion code into its own file

I wonder if it would make sense to commit 0004 ahead of the rest?  To
avoid carrying this huge one over and over ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: speeding up planning with partitions

2019-01-09 Thread Amit Langote

On Wed, Jan 9, 2019 at 11:41 PM Alvaro Herrera  wrote:
> > From 3b86331dd5a2368adc39c9fef92f3dd09d817a08 Mon Sep 17 00:00:00 2001
> > From: amit 
> > Date: Wed, 7 Nov 2018 16:51:31 +0900
> > Subject: [PATCH v11 4/6] Move inheritance expansion code into its own file
>
> I wonder if it would make sense to commit 0004 ahead of the rest?  To
> avoid carrying this huge one over and over ...

Maybe a good idea.  I will rearrange the patches that  way tomorrow.

Thanks,
Amit

Re: WIP: Avoid creation of the free space map for small tables

2019-01-09 Thread John Naylor

On Wed, Jan 9, 2019 at 7:33 AM Mithun Cy  wrote:
> 2. v11-Every-other-page and v11-last-page patches improve the
> performance from base.
> 3. IMHO v11-Every-other-page would be ideal to consider it improves
> the performance and also to an extent avoid expansion if space is
> already available.

Good to hear. I'll clean up the every-other-page patch and include it
in my next version.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-09 Thread Fabien COELHO




Attached v27 is the same but with an improved documentation where full
examples, with and without prefix, are provided for both cset & gset.


I have already made changes on top of v26.  Can you please submit this
as a delta patch on top of v26?


Attached.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 246944ea79..cc369c423e 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -972,14 +972,18 @@ pgbench  options  d
  
 
  
-  The following example sends three queries as one compound SQL command,
+  The following example sends four queries as one compound SQL command,
   inducing one message sent at the protocol level.
-  The result of the second query is stored into variable two,
+  The result of the first query is stored into variable one,
+  the results of the third query are stored into variables z_three
+  and z_four,
   whereas the results of the other queries are discarded.
 
--- compound of 3 queries
-SELECT 1 AS one \; SELECT 2 AS two \cset
-SELECT 2;
+-- compound of four queries
+SELECT 1 AS one \cset
+SELECT 2 AS two \;
+SELECT 3 AS three, 4 AS four \cset z_
+SELECT 5;

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-09 Thread Tomas Vondra




On 1/8/19 3:18 PM, Dean Rasheed wrote:
> On Mon, 7 Jan 2019 at 00:45, Tomas Vondra  
> wrote:
>>
>> FWIW the main unsolved issue (at least on the MCV part) is how it
>> decides which items to keep in the list.
>>
>> As explained in [1], in the multivariate case we can't simply look at
>> the group frequency and compare it to the average frequency (of the
>> non-MCV items), which is what analyze_mcv_list() does in the
>> single-column case. In the multivariate case we also case about observed
>> vs. base frequency, i.e. we want the MCV list to include groups that are
>> present singificantly more/less than product of per-column stats.
>>
>> I've repeatedly tried to come up with a criteria that would address
>> that, but it seems rather difficult because we can't abandon the other
>> criteria either. So the MCV list should include groups that match both
>>
>> (a) items that are statistically more common than the non-MCV part (i.e.
>> the rule from per-column analyze_mcv_list)
>>
>> (b) items that are statistically more/less common than estimated from
>> per-column stats (i.e. the new rule)
> 
> Thinking about this some more, I think that it probably isn't
> appropriate to use analyze_mcv_list() directly because that's making
> specific assumptions about how items not in the list will be estimated
> that aren't actually true for groups of values in multivariate stats.
> If a group of values isn't in the MCV list, it gets estimated based on
> the product of the selectivities from the per-column stats (modulo the
> additional logic preventing the selectivity not exceeding the total
> non-MCV selectivity).
> 
> So actually, the estimate for a group of values will be either the MCV
> item's frequency (if the MCV item is kept), or (roughly) the MCV
> item's base_frequency (if the MCV item is not kept). That suggests
> that we should simply keep items that are significantly more or less
> common than the item's base frequency -- i.e., keep rule (b) and ditch
> rule (a).
> 

Hmmm, but won't that interfere with how we with how we extrapolate the
MCV estimate to the non-MCV part? Currently the patch does what you
proposed, i.e.

other_sel = simple_sel - mcv_basesel;

I'm worried that if we only include the items that are significantly
more or less common than the base frequency, it may skew the other_sel
estimate.

>> Enforcing rule (a) seems reasonable because it ensures the MCV list
>> includes all items more frequent than the last one. Without it, it's
>> difficult to decide know whether the absent item is very common (but
>> close to base frequency) or very uncommon (so less frequent than the
>> last MCV item).
> 
> I'm not sure there's much we can do about that. Keeping the item will
> result in keeping a frequency that we know is close to the base
> frequency, and not keeping the item will result in per-column stats
> being used that we expect to also give an estimate close to the base
> frequency. So either way, the result is about the same, and it's
> probably better to discard it, leaving more room for other items about
> which we may have more information.
> 
> That said, there is a separate benefit to keeping items in the list
> even if their frequency is close to the base frequency -- the more
> items kept, the larger their total selectivity will be, giving a
> better cap on the non-MCV selectivities. So if, after keeping all
> items satisfying rule (b), there are free slots available, perhaps
> they should be used for the most common remaining values satisfying
> rule (a).
> 

Hmm, so essentially we'd use (b) first to bootstrap the MCV list, and
then we could do what analyze_mcv_list() does. That could work, I guess.

The question is how to define "significantly different from base freq"
though. Any ideas?

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FETCH FIRST clause PERCENT option

2019-01-09 Thread Surafel Temesgen

On Wed, Jan 9, 2019 at 5:38 PM Tomas Vondra 
wrote:

>
> It's hard to say what exactly are you doing, because you haven't shared
> any code nor examples.


okay i attach in progress patch

regards
Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..8491b7831a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..ee04992c9f 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -44,6 +44,9 @@ ExecLimit(PlanState *pstate)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
+	TupleDesc   tupleDescriptor;
+	slot = node->subSlot;
+	tupleDescriptor = node->ps.ps_ResultTupleDesc;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -131,7 +134,7 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count && node->limitOption != PERCENTAGE)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -144,6 +147,42 @@ ExecLimit(PlanState *pstate)
 
 	return NULL;
 }
+if (node->limitOption == PERCENTAGE)
+{
+
+	for (int i=1; node->count >= i; i++)
+	{
+
+		slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+
+		slot = ExecProcNode(outerPlan);
+		if (TupIsNull(slot))
+		{
+			node->lstate = LIMIT_WINDOWEND;
+
+			/*
+			 * If we know we won't need to back up, we can release
+			 * resources at this point.
+			 */
+			if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+(void) ExecShutdownNode(outerPlan);
+
+			return NULL;
+		}
+		tuplestore_puttupleslot(node->totalTuple, slot);
+	}
+	slot = MakeSingleTupleTableSlot(tupleDescriptor, &TTSOpsMinimalTuple);
+	if (tuplestore_gettupleslot(node->totalTuple, true, true, slot))
+	{
+		node->subSlot = slot;
+		node->position++;
+	} else {
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+}
+else if (node->limitOption == EXACT_NUMBER)
+{
 
 /*
  * Get next tuple from subplan, if any.
@@ -156,6 +195,7 @@ ExecLimit(PlanState *pstate)
 }
 node->subSlot = slot;
 node->position++;
+}
 			}
 			else
 			{
@@ -283,12 +323,16 @@ recompute_limits(LimitState *node)
 		}
 		else
 		{
-			node->count = DatumGetInt64(val);
-			if (node->count < 0)
-ereport(ERROR,
-		(errcode(ERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE),
-		 errmsg("LIMIT must not be negative")));
-			node->noCount = false;
+			if (node->limitOption == PERCENTAGE)
+node->count = DatumGetInt64(100 / DatumGetFloat8(val));
+			else
+			{
+node->count = DatumGetInt64(val);
+if (node->count < 0)
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_ROW_COUNT_IN_LIMIT_CLAUSE),
+			 errmsg("LIMIT must not be negative")));
+			}
 		}
 	}
 	else
@@ -374,6 +418,9 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 		  (PlanState *) limitstate);
+	limitstate->limitOption = node->limitOption;
+	if (node->limitOption == PERCENTAGE)
+		limitstate->totalTuple= tuplestore_begin_heap(true, false, work_mem);
 
 	/*
 	 * Initialize result type.
@@ -405,6 +452,8 @@ ExecEndLimit(LimitState *node)
 {
 	ExecFreeExprContext(&node->ps);
 	ExecEndNode(outerPlanState(node));
+	if (node->totalTuple!= NULL)
+		tuplestore_end(node->totalTuple);
 }
 
 
@@ -424,4 +473,6 @@ ExecReScanLimit(LimitState *node)
 	 */
 	if (node->ps.lefttree->chgParam == NULL)
 		ExecReScan(node->ps.lefttree);
+	if (node->totalTuple!= NULL)
+		tupl

Re: Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Andres Freund

Hi,

On 2019-01-09 12:06:39 +0100, Michael Banck wrote:
> there was a report in #postgresql recently about a crash on Google Cloud
> SQL with the somewhat misleading message "could not write to log file"
> while in fact it was the xlog/wal:
> 
> |PANIC:  could not write to log file 000101960054 at offset
> | 13279232, length 245760: Cannot allocate memory 
> |ERROR:  could not write block 74666 in file "base/18031/48587": Cannot
> | allocate memory 
> |CONTEXT:  writing block 74666 of relation base/18031/48587 
> |LOG:  server process (PID 5160) was terminated by signal 9: Killed 
> 
> The slightly longer logfile can be found here: http://dpaste.com/2T61PS9
> 
> I suggest to reword that message, e.g. "could not write to transaction
> log file" or "could not write to wal file".

I'm quite unenthused about that. If anything, I'd remove detail and use
the standard error message about not being able to write to a file, and
include the full path.


> Also, the errno (ENOMEM) is curious (and the user wrote that Google
> monitoring reported memory at 16/20GB at the time of the crash), but it
> could be a due to running on a cloud-fork? As you have no access to
> PGDATA, it sounds difficult to diagnose after the fact.

Yes, that sounds quite likely. This pretty much is a write() which isn't
documented to return ENOMEM commonly, so I assume Google's doing
something odd.

Greetings,

Andres Freund

Re: FETCH FIRST clause PERCENT option

2019-01-09 Thread Tomas Vondra


On 1/9/19 4:43 PM, Surafel Temesgen wrote:
> 
> 
> On Wed, Jan 9, 2019 at 5:38 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> It's hard to say what exactly are you doing, because you haven't shared
> any code nor examples. 
> 
> 
> okay i attach in progress patch
> 

Yeah, that's what I thought - the patch computes

node->count = DatumGetInt64(100 / DatumGetFloat8(val));

and then always fetches this number of records before emitting the next
row from the tuplestore. That's wrong, as I explained before, because
the distance does change, due to rounding.

See the attached patch, which recomputes the count regularly. I don't
claim the patch is committable or that it has no other bugs, but
hopefully it shows what I meant.

FWIW you don't need to create any slots - the two already created are
enough. You certainly don't need to create the slots within a loop.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..8491b7831a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..87d751bdbe 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -21,6 +21,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
 #include "miscadmin.h"
@@ -44,6 +46,7 @@ ExecLimit(PlanState *pstate)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
+	slot = node->subSlot;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -81,7 +84,15 @@ ExecLimit(PlanState *pstate)
 			/*
 			 * Check for empty window; if so, treat like empty subplan.
 			 */
-			if (node->count <= 0 && !node->noCount)
+			if (node->limitOption == PERCENTAGE)
+			{
+if (node->percent == 0.0)
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
+			else if (node->count <= 0 && !node->noCount)
 			{
 node->lstate = LIMIT_EMPTY;
 return NULL;
@@ -122,6 +133,7 @@ ExecLimit(PlanState *pstate)
 			return NULL;
 
 		case LIMIT_INWINDOW:
+
 			if (ScanDirectionIsForward(direction))
 			{
 /*
@@ -130,6 +142,32 @@ ExecLimit(PlanState *pstate)
  * advancing the subplan or the position variable; but change
  * the state machine state to record having done so.
  */
+
+/*
+ * When in percentage mode, we need to see if we can get any
+ * additional rows from the subplan (enough to increase the
+ * node->count value).
+ */
+if (node->limitOption == PERCENTAGE)
+{
+	/* loop until the node->count increments */
+	while (node->position - node->offset >= node->count)
+	{
+		int64	cnt;
+
+		slot = ExecProcNode(outerPlan);
+		if (TupIsNull(slot))
+			break;
+
+		tuplestore_puttupleslot(node->totalTuple, slot);
+
+		/* plus 1, because the first tuple is returned from LIMIT_RESCAN */
+		cnt = 1 + tuplestore_tuple_count(node->totalTuple);
+
+		node->count = ceil(node->percent * cnt / 100.0);
+	}
+}
+
 if (!node->noCount &&
 	node->position - node->offset >= node->count)
 {
@@ -145,6 +183,20 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
+if (node->limitOption == PERCENTAGE)
+{
+	while (node->position - node->offset < node->count)
+	{
+		if (tuplestore_gettupleslot(node->totalTuple, true, true, slot))
+		{
+			node->subSlot = slot;
+			node->position++;
+		}
+	}
+}
+else if (node->limitOption == EXACT_NUMBER)
+{
+
 /*
  * Get next tuple from subplan, if any.
  */
@@ -156,6 +208,7 @@ ExecLimit(PlanState *pstate)
 }
 no

Re: Displaying and dumping of table access methods

2019-01-09 Thread Andres Freund

Hi,

On 2019-01-09 18:26:16 +0530, Amit Kapila wrote:
> On Tue, Jan 8, 2019 at 11:04 PM Andres Freund  wrote:
> +1.
> 
> >  Don't like it
> > too much, but it seems better than the alternative. I wonder if we want
> > one for multiple regression related issues, or whether one specifically
> > about table AMs is more appropriate. I lean towards the latter.
> >
> 
> I didn't understand what is the earlier part "I wonder if we want one
> for multiple regression related issues".  What do you mean by multiple
> regression related issues?

One flag that covers all things that make psql output less useful for
regression test output, or one flag that just controls the table access
method display.

Greetings,

Andres Freund

Re: A few new options for vacuumdb

2019-01-09 Thread Bossart, Nathan

On 1/8/19, 10:34 PM, "Michael Paquier"  wrote:
> On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
>> Since pg_(total)_relation_size() returns 0 for parent table the
>> specifying the parent table to vacuumdb with --min-relation-size
>> always does nothing. Maybe we will need to deal with this case when a
>> function returning whole partitoned table size is introduced.
>
> Good point.  I am not sure if we want to go down to having a size
> function dedicated to partitions especially as this would just now be
> a wrapper around pg_partition_tree(), but the size argument with
> partitioned tables is something to think about.  If we cannot sort out
> this part cleanly, perhaps we could just focus on the age-ing
> parameters and the other ones first?  It seems to me that what is
> proposed on this thread has value, so we could shave things and keep
> the essential, and focus on what we are sure about for simplicity.

Sounds good.  I'll leave out --min-relation-size for now.

Nathan

Re: insensitive collations

2019-01-09 Thread Andreas Karlsson


On 12/28/18 9:55 AM, Peter Eisentraut wrote:

Here is an updated patch.

I have updated the naming to "deterministic", as discussed.


Maybe this is orthogonal and best handled elsewhere but have you when 
working with string equality given unicode normalization forms[1] any 
thought? I feel there are three sane ways to do unicode string equality:


1) Binary equality
2) Binary equality after normalizing the unicode
3) Collation equality

Would there be any point in adding unicode normalization support into 
the collation system or is this best handle for example with a function 
run on INSERT or with something else entirely?


Right now PosgreSQL does not have any support for normalization forms as 
far as I know.


1. http://unicode.org/reports/tr15/

Andreas

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

I wrote:
> John Naylor  writes:
>> -There is a bit of a cognitive clash between $case_sensitive in
>> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each
>> make sense in their own file, but might it be worth using one or the
>> other?

> Yeah, dunno.  It seems to make sense for the command-line-level default of
> gen_keywordlist.pl to be "case insensitive", since most users want that.
> But that surely shouldn't be the default in PerfectHash.pm, and I'm not
> very sure how to reconcile the discrepancy.

Working on the fmgr-oid-lookup idea gave me the thought that
PerfectHash.pm ought to support fixed-length keys.  Rather than start
adding random parameters to the function, I borrowed an idea from
PostgresNode.pm and made the options be keyword-style parameters.  Now
the impedance mismatch about case sensitivity is handled with 

my $f = PerfectHash::generate_hash_function(\@keywords, $funcname,
case_insensitive => !$case_sensitive);

which is at least a little clearer than before, though I'm not sure
if it entirely solves the problem.

Also, in view of finding that the original multiplier choices failed
on the fmgr oid problem, I spent a little effort making the code
able to try more combinations of hash multipliers and seeds.  It'd
be nice to have some theory rather than just heuristics about what
will work, though ...

Barring objections or further review, I plan to push this soon.

regards, tom lane

diff --git a/src/common/Makefile b/src/common/Makefile
index 317b071..d0c2b97 100644
*** a/src/common/Makefile
--- b/src/common/Makefile
*** OBJS_FRONTEND = $(OBJS_COMMON) fe_memuti
*** 63,68 
--- 63,73 
  OBJS_SHLIB = $(OBJS_FRONTEND:%.o=%_shlib.o)
  OBJS_SRV = $(OBJS_COMMON:%.o=%_srv.o)
  
+ # where to find gen_keywordlist.pl and subsidiary files
+ TOOLSDIR = $(top_srcdir)/src/tools
+ GEN_KEYWORDLIST = $(PERL) -I $(TOOLSDIR) $(TOOLSDIR)/gen_keywordlist.pl
+ GEN_KEYWORDLIST_DEPS = $(TOOLSDIR)/gen_keywordlist.pl $(TOOLSDIR)/PerfectHash.pm
+ 
  all: libpgcommon.a libpgcommon_shlib.a libpgcommon_srv.a
  
  distprep: kwlist_d.h
*** libpgcommon_srv.a: $(OBJS_SRV)
*** 118,125 
  	$(CC) $(CFLAGS) $(subst -DFRONTEND,, $(CPPFLAGS)) -c $< -o $@
  
  # generate SQL keyword lookup table to be included into keywords*.o.
! kwlist_d.h: $(top_srcdir)/src/include/parser/kwlist.h $(top_srcdir)/src/tools/gen_keywordlist.pl
! 	$(PERL) $(top_srcdir)/src/tools/gen_keywordlist.pl --extern $<
  
  # Dependencies of keywords*.o need to be managed explicitly to make sure
  # that you don't get broken parsing code, even in a non-enable-depend build.
--- 123,130 
  	$(CC) $(CFLAGS) $(subst -DFRONTEND,, $(CPPFLAGS)) -c $< -o $@
  
  # generate SQL keyword lookup table to be included into keywords*.o.
! kwlist_d.h: $(top_srcdir)/src/include/parser/kwlist.h $(GEN_KEYWORDLIST_DEPS)
! 	$(GEN_KEYWORDLIST) --extern $<
  
  # Dependencies of keywords*.o need to be managed explicitly to make sure
  # that you don't get broken parsing code, even in a non-enable-depend build.
diff --git a/src/common/kwlookup.c b/src/common/kwlookup.c
index d72842e..6545480 100644
*** a/src/common/kwlookup.c
--- b/src/common/kwlookup.c
***
*** 35,94 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *text,
    const ScanKeywordList *keywords)
  {
! 	int			len,
! i;
! 	char		word[NAMEDATALEN];
! 	const char *kw_string;
! 	const uint16 *kw_offsets;
! 	const uint16 *low;
! 	const uint16 *high;
! 
! 	len = strlen(text);
  
  	if (len > keywords->max_kw_len)
! 		return -1;/* too long to be any keyword */
! 
! 	/* We assume all keywords are shorter than NAMEDATALEN. */
! 	Assert(len < NAMEDATALEN);
  
  	/*
! 	 * Apply an ASCII-only downcasing.  We must not use tolower() since it may
! 	 * produce the wrong translation in some locales (eg, Turkish).
  	 */
! 	for (i = 0; i < len; i++)
! 	{
! 		char		ch = text[i];
  
! 		if (ch >= 'A' && ch <= 'Z')
! 			ch += 'a' - 'A';
! 		word[i] = ch;
! 	}
! 	word[len] = '\0';
  
  	/*
! 	 * Now do a binary search using plain strcmp() comparison.
  	 */
! 	kw_string = keywords->kw_string;
! 	kw_offsets = keywords->kw_offsets;
! 	low = kw_offsets;
! 	high = kw_offsets + (keywords->num_keywords - 1);
! 	while (low <= high)
  	{
! 		const uint16 *middle;
! 		int			difference;
  
! 		middle = low + (high - low) / 2;
! 		difference = strcmp(kw_string + *middle, word);
! 		if (difference == 0)
! 			return middle - kw_offsets;
! 		else if (difference < 0)
! 			low = middle + 1;
! 		else
! 			high = middle - 1;
  	}
  
! 	return -1;
  }
--- 35,85 
   * receive a different case-normalization mapping.
   */
  int
! ScanKeywordLookup(const char *str,
    const ScanKeywordList *keywords)
  {
! 	size_t		len;
! 	int			h;
! 	const char *kw;
  
+ 	/*
+ 	 * Reject immediately if too long to be any keyword.  This saves useless
+ 	 * hashing and downcasing work on long strings.
+

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

I wrote:
> Also, I fail to understand why fmgr_builtin_oid_index has 1 entries
> anyway.  We could easily have fmgrtab.c expose the last actually assigned
> builtin function OID (presently 6121) and make the index array only
> that big, which just about eliminates the space advantage completely.

Concretely, like the attached.

We could make the index table still smaller if we wanted to reassign
a couple dozen high-numbered functions down to lower OIDs, but I dunno
if it's worth the trouble.  It certainly isn't from a performance
standpoint, because those unused entry ranges will never be touched
in normal usage; but it'd make the server executable a couple KB smaller.

regards, tom lane

diff --git a/src/backend/utils/Gen_fmgrtab.pl b/src/backend/utils/Gen_fmgrtab.pl
index cafe408..f970940 100644
*** a/src/backend/utils/Gen_fmgrtab.pl
--- b/src/backend/utils/Gen_fmgrtab.pl
*** foreach my $datfile (@input_files)
*** 80,90 
  	$catalog_data{$catname} = Catalog::ParseData($datfile, $schema, 0);
  }
  
- # Fetch some values for later.
- my $FirstGenbkiObjectId =
-   Catalog::FindDefinedSymbol('access/transam.h', $include_path,
- 	'FirstGenbkiObjectId');
- 
  # Collect certain fields from pg_proc.dat.
  my @fmgr = ();
  
--- 80,85 
*** my %bmap;
*** 225,230 
--- 220,226 
  $bmap{'t'} = 'true';
  $bmap{'f'} = 'false';
  my @fmgr_builtin_oid_index;
+ my $last_builtin_oid = 0;
  my $fmgr_count = 0;
  foreach my $s (sort { $a->{oid} <=> $b->{oid} } @fmgr)
  {
*** foreach my $s (sort { $a->{oid} <=> $b->
*** 232,237 
--- 228,234 
  	  "  { $s->{oid}, $s->{nargs}, $bmap{$s->{strict}}, $bmap{$s->{retset}}, \"$s->{prosrc}\", $s->{prosrc} }";
  
  	$fmgr_builtin_oid_index[ $s->{oid} ] = $fmgr_count++;
+ 	$last_builtin_oid = $s->{oid};
  
  	if ($fmgr_count <= $#fmgr)
  	{
*** foreach my $s (sort { $a->{oid} <=> $b->
*** 244,274 
  }
  print $tfh "};\n";
  
! print $tfh qq|
  const int fmgr_nbuiltins = (sizeof(fmgr_builtins) / sizeof(FmgrBuiltin));
! |;
  
  
  # Create fmgr_builtins_oid_index table.
! #
! # Note that the array has to be filled up to FirstGenbkiObjectId,
! # as we can't rely on zero initialization as 0 is a valid mapping.
! print $tfh qq|
! const uint16 fmgr_builtin_oid_index[FirstGenbkiObjectId] = {
! |;
  
! for (my $i = 0; $i < $FirstGenbkiObjectId; $i++)
  {
  	my $oid = $fmgr_builtin_oid_index[$i];
  
! 	# fmgr_builtin_oid_index is sparse, map nonexistant functions to
  	# InvalidOidBuiltinMapping
  	if (not defined $oid)
  	{
  		$oid = 'InvalidOidBuiltinMapping';
  	}
  
! 	if ($i + 1 == $FirstGenbkiObjectId)
  	{
  		print $tfh "  $oid\n";
  	}
--- 241,270 
  }
  print $tfh "};\n";
  
! printf $tfh qq|
  const int fmgr_nbuiltins = (sizeof(fmgr_builtins) / sizeof(FmgrBuiltin));
! 
! const Oid fmgr_last_builtin_oid = %u;
! |, $last_builtin_oid;
  
  
  # Create fmgr_builtins_oid_index table.
! printf $tfh qq|
! const uint16 fmgr_builtin_oid_index[%u] = {
! |, $last_builtin_oid + 1;
  
! for (my $i = 0; $i <= $last_builtin_oid; $i++)
  {
  	my $oid = $fmgr_builtin_oid_index[$i];
  
! 	# fmgr_builtin_oid_index is sparse, map nonexistent functions to
  	# InvalidOidBuiltinMapping
  	if (not defined $oid)
  	{
  		$oid = 'InvalidOidBuiltinMapping';
  	}
  
! 	if ($i == $last_builtin_oid)
  	{
  		print $tfh "  $oid\n";
  	}
diff --git a/src/backend/utils/fmgr/fmgr.c b/src/backend/utils/fmgr/fmgr.c
index b41649f..506eeef 100644
*** a/src/backend/utils/fmgr/fmgr.c
--- b/src/backend/utils/fmgr/fmgr.c
*** fmgr_isbuiltin(Oid id)
*** 75,86 
  	uint16		index;
  
  	/* fast lookup only possible if original oid still assigned */
! 	if (id >= FirstGenbkiObjectId)
  		return NULL;
  
  	/*
  	 * Lookup function data. If there's a miss in that range it's likely a
! 	 * nonexistant function, returning NULL here will trigger an ERROR later.
  	 */
  	index = fmgr_builtin_oid_index[id];
  	if (index == InvalidOidBuiltinMapping)
--- 75,86 
  	uint16		index;
  
  	/* fast lookup only possible if original oid still assigned */
! 	if (id > fmgr_last_builtin_oid)
  		return NULL;
  
  	/*
  	 * Lookup function data. If there's a miss in that range it's likely a
! 	 * nonexistent function, returning NULL here will trigger an ERROR later.
  	 */
  	index = fmgr_builtin_oid_index[id];
  	if (index == InvalidOidBuiltinMapping)
diff --git a/src/include/utils/fmgrtab.h b/src/include/utils/fmgrtab.h
index a778f88..e981f34 100644
*** a/src/include/utils/fmgrtab.h
--- b/src/include/utils/fmgrtab.h
*** extern const FmgrBuiltin fmgr_builtins[]
*** 36,46 
  
  extern const int fmgr_nbuiltins;	/* number of entries in table */
  
  /*
!  * Mapping from a builtin function's oid to the index in the fmgr_builtins
!  * array.
   */
  #define InvalidOidBuiltinMapping PG_UINT16_MAX
! extern const uint16 fmgr_builtin_oid_index[FirstGenbkiObjectId];
  
  #endif			/* FMGRTAB_H */
--- 36,48

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Alvaro Herrera

On 2019-Jan-09, Tom Lane wrote:

> We could make the index table still smaller if we wanted to reassign
> a couple dozen high-numbered functions down to lower OIDs, but I dunno
> if it's worth the trouble.  It certainly isn't from a performance
> standpoint, because those unused entry ranges will never be touched
> in normal usage; but it'd make the server executable a couple KB smaller.

Or two couples KB smaller, if we abandoned the idea that pg_proc OIDs
must not collide with those in any other catalog, and we renumbered all
functions to start at OID 1 or so.  duplicate_oids would complain about
that, though, I suppose ... and nobody who has ever hardcoded a function
OID would love this idea much.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Andres Freund

Hi,

On 2019-01-09 14:44:24 -0500, Tom Lane wrote:
> I wrote:
> > Also, I fail to understand why fmgr_builtin_oid_index has 1 entries
> > anyway.  We could easily have fmgrtab.c expose the last actually assigned
> > builtin function OID (presently 6121) and make the index array only
> > that big, which just about eliminates the space advantage completely.
> 
> Concretely, like the attached.

Seems like a good improvement.


> We could make the index table still smaller if we wanted to reassign
> a couple dozen high-numbered functions down to lower OIDs, but I dunno
> if it's worth the trouble.  It certainly isn't from a performance
> standpoint, because those unused entry ranges will never be touched
> in normal usage; but it'd make the server executable a couple KB smaller.

Probably indeed not worth it. I'm not 100% convinced on the performance
POV, but in contrast to the earlier binary search either approach is
fast enough that it probably hard to measure any difference.


> diff --git a/src/backend/utils/fmgr/fmgindex b41649f..506eeef 100644
> --- a/src/backend/utils/fmgr/fmgr.c
> +++ b/src/backend/utils/fmgr/fmgr.c
> @@ -75,12 +75,12 @@ fmgr_isbuiltin(Oid id)
>   uint16  index;
>  
>   /* fast lookup only possible if original oid still assigned */
> - if (id >= FirstGenbkiObjectId)
> + if (id > fmgr_last_builtin_oid)
>   return NULL;

An extern reference here will make the code a bit less efficient, but
it's probably not worth generating a header with a define for it
instead...

Greetings,

Andres Freund

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-09 14:44:24 -0500, Tom Lane wrote:
>> /* fast lookup only possible if original oid still assigned */
>> -if (id >= FirstGenbkiObjectId)
>> +if (id > fmgr_last_builtin_oid)
>>  return NULL;

> An extern reference here will make the code a bit less efficient, but
> it's probably not worth generating a header with a define for it
> instead...

Yeah, also that would be significantly more fragile, in that it'd
be hard to be sure where that OID had propagated to when rebuilding.
We haven't chosen to make fmgr_nbuiltins a #define either, and I think
it's best to treat this the same way.

regards, tom lane

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

Alvaro Herrera  writes:
> On 2019-Jan-09, Tom Lane wrote:
>> We could make the index table still smaller if we wanted to reassign
>> a couple dozen high-numbered functions down to lower OIDs, but I dunno
>> if it's worth the trouble.  It certainly isn't from a performance
>> standpoint, because those unused entry ranges will never be touched
>> in normal usage; but it'd make the server executable a couple KB smaller.

> Or two couples KB smaller, if we abandoned the idea that pg_proc OIDs
> must not collide with those in any other catalog, and we renumbered all
> functions to start at OID 1 or so.  duplicate_oids would complain about
> that, though, I suppose ... and nobody who has ever hardcoded a function
> OID would love this idea much.

I think that'd be a nonstarter for commonly-used functions.  I'm guessing
that pg_replication_origin_create() and so on, which are the immediate
problem, haven't been around long enough or get used often enough for
someone to have hard-coded their OIDs.  But I could be wrong.

(Speaking of which, I've been wondering for awhile if libpq ought not
obtain the OIDs of lo_create and friends by #including fmgroids.h
instead of doing a runtime query on every connection.  If we did that,
we'd be forever giving up the option to renumber them ... but do you
really want to bet that nobody else has done this already in some
other client code?)

regards, tom lane

Re: New vacuum option to do only freezing

2019-01-09 Thread Bossart, Nathan

Hi,

On 1/8/19, 7:03 PM, "Masahiko Sawada"  wrote:
> Attached the updated version patch incorporated all comments I got.

Thanks for the new patch!

> * Instead of freezing xmax I've changed the behaviour of the new
> option (DISABLE_INDEX_CLEANUP) so that it sets dead tuples as dead
> instead of as unused and skips both index vacuum and index cleanup.
> That is, we remove the storage of dead tuple but leave dead itemids
> for the index lookups. These are removed by the next vacuum execution
> enabling index cleanup. Currently HOT-pruning doesn't set the root of
> the chain as unused even if the whole chain is dead. Since setting
> tuples as dead removes storage the freezing xmax is no longer
> required.

I tested this option with a variety of cases (HOT, indexes, etc.), and
it seems to work well.  I haven't looked too deeply into the
implications of using LP_DEAD this way, but it seems like a reasonable
approach at first glance.

+
+DISABLE_INDEX_CLEANUP
+
+ 
+  VACUUM removes dead tuples and prunes HOT-updated
+  tuples chain for live tuples on table. If the table has any dead tuple
+  it removes them from both table and indexes for re-use. With this
+  option VACUUM marks tuples as dead (i.e., it doesn't
+  remove tuple storage) and disables removing dead tuples from indexes.
+  This is suitable for avoiding transaction ID wraparound but not
+  sufficient for avoiding index bloat. This option cannot be used in
+  conjunction with FULL option.
+ 
+
+   

I think we should clarify the expected behavior when this option is
used on a table with no indexes.  We probably do not want to fail, as
this could disrupt VACUUM commands that touch several tables.  Also,
we don't need to set tuples as dead instead of unused, which appears
to be what this patch is actually doing:

+   if (nindexes > 0 && disable_index_cleanup)
+   lazy_set_tuples_dead(onerel, blkno, buf, 
vacrelstats);
+   else
+   lazy_vacuum_page(onerel, blkno, buf, 0, 
vacrelstats, &vmbuffer);

In this case, perhaps we should generate a log statement that notes
that DISABLE_INDEX_CLEANUP is being ignored and set
disable_index_cleanup to false during processing.

+   if (disable_index_cleanup)
+   ereport(elevel,
+   (errmsg("\"%s\": marked %.0f row 
versions in %u pages as dead",
+   
RelationGetRelationName(onerel),
+   tups_vacuumed, 
vacuumed_pages)));
+   else
+   ereport(elevel,
+   (errmsg("\"%s\": removed %.0f row 
versions in %u pages",
+   
RelationGetRelationName(onerel),
+   tups_vacuumed, 
vacuumed_pages)));

Should the first log statement only be generated when there are also
indexes?

+static void
+lazy_set_tuples_dead(Relation onerel, BlockNumber blkno, Buffer buffer,
+   LVRelStats *vacrelstats)

This function looks very similar to lazy_vacuum_page().  Perhaps the
two could be combined?

Nathan

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Andres Freund

Hi,

On 2019-01-09 15:03:35 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Jan-09, Tom Lane wrote:
> >> We could make the index table still smaller if we wanted to reassign
> >> a couple dozen high-numbered functions down to lower OIDs, but I dunno
> >> if it's worth the trouble.  It certainly isn't from a performance
> >> standpoint, because those unused entry ranges will never be touched
> >> in normal usage; but it'd make the server executable a couple KB smaller.
> 
> > Or two couples KB smaller, if we abandoned the idea that pg_proc OIDs
> > must not collide with those in any other catalog, and we renumbered all
> > functions to start at OID 1 or so.  duplicate_oids would complain about
> > that, though, I suppose ... and nobody who has ever hardcoded a function
> > OID would love this idea much.
> 
> I think that'd be a nonstarter for commonly-used functions.  I'm guessing
> that pg_replication_origin_create() and so on, which are the immediate
> problem, haven't been around long enough or get used often enough for
> someone to have hard-coded their OIDs.  But I could be wrong.

I don't think it's likely that it'd be useful to hardcode them, and
therefore hope that nobody would do so.

I personally feel limited sympathy to people hardcoding oids across
major versions. The benefits of making pg easier to maintain and more
efficient seem higher than allowing for that.


> (Speaking of which, I've been wondering for awhile if libpq ought not
> obtain the OIDs of lo_create and friends by #including fmgroids.h
> instead of doing a runtime query on every connection.  If we did that,
> we'd be forever giving up the option to renumber them ... but do you
> really want to bet that nobody else has done this already in some
> other client code?)

I'm not enthusiastic about that. I kinda hope we're going to evolve that
interface further, which'd make it version dependent anyway (we don't
require all of them right now...). And it's not that expensive to query
their oids once.

Greetings,

Andres Freund

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-09 15:03:35 -0500, Tom Lane wrote:
>> (Speaking of which, I've been wondering for awhile if libpq ought not
>> obtain the OIDs of lo_create and friends by #including fmgroids.h
>> instead of doing a runtime query on every connection.  If we did that,
>> we'd be forever giving up the option to renumber them ... but do you
>> really want to bet that nobody else has done this already in some
>> other client code?)

> I'm not enthusiastic about that. I kinda hope we're going to evolve that
> interface further, which'd make it version dependent anyway (we don't
> require all of them right now...). And it's not that expensive to query
> their oids once.

Version dependency doesn't seem like much of an argument: we'd just teach
libpq to pay attention to the server version, which it knows anyway (and
uses for other purposes already).

But this is a bit off-topic for this thread, perhaps.

regards, tom lane

RE: Timeout parameters

2019-01-09 Thread AYahorau

Hello,

>  To wrap up, the relevant parameters work like this:

> * TCP keepalive and TCP user (retransmission) timeout: for network 
problems
> * statement_timeout: for long-running queries
> * socket_timeout (or send/recv_timeout): for saturated servers

Takayuki Tsunakawa, could you provide wider explanation of socket_timeout? 
I'm little bit misunderstanding in which cases this parameter is/can be 
used.


I would like to add some more information about TCP keepalive and 
TCP_USER_TIMEOUT mechanisms:
1)  Both these mechanisms are used for termination socket connection in 
case of network problems. 
2) This termination of tcp connection is done on operation system level 
(kernel) and not by application.
3) TCP keepalive and TCP_USER_TIMEOUT work differently and complement each 
other :
 *  TCP keepalive mechanism works when a socket is in idle state 
(there is no any transaction in this case)
 * TCP_USER_TIMEOUT mechanism works when a socket is in active state 
(sending/receiving data).

So, TCP keepalive and TCP_USER_TIMEOUT provide full control under network 
state directly after creating a TCP socket and applying these parameters 
to it. Moreover, this control is delegate to the operation system (kernel) 
in case it supports such mechanisms.
If TCP_USER_TIMEOUT is not supported by PostgreSQL, it means that TCP 
connection are partly controlled by the operation system (kernel). In this 
case pqWaitTimed() should be used on the application layer for connection 
control in data transmission phase.

In my opinion, there is no any difference between server and client 
connection sides. To avoid mess in the configuration it seems reasonable 
to give the same name of this option for the client and server sides.

To my mind, this description  of tcp_user_timeout option is not correct 
(See my comment about TCP_USER_TIMEOUT mechanism above).

> + 
> + 
> + Specify in milliseconds the time to disconnect to the client 
> + when there is no ack packet from the client to the server's data 
transmission.
> + This parameter is supported on linux version 2.6.37 or later.
> + 
> + 
> + 
> + This parameter is not supported on Windows.
> + 
> + 
> + 

As for me It better to specify the description as follows:



Define a wrapper for TCP_USER_TIMEOUT socket option of libpq connection.


Specifies the number of milliseconds after which a TCP connection can be 
aborted by the operation system due to network problems when the data is 
transmitting through this connection (sending/receiving). A value of 0 
uses the system default. This parameter is supported only on systems that 
support TCP_USER_TIMEOUT or an equivalent socket option, and on Windows; 
on other systems, it must be zero. In sessions connected via a Unix-domain 
socket, this parameter is ignored and always reads as zero.



This parameter is not supported on Windows, and must be zero. 


To enable full control under TCP connection use this option together with 
keepalive.





Best regards, 
Andrei Yahorau



From:   "Tsunakawa, Takayuki" 
To: 'Fabien COELHO' , "Nagaura, Ryohei" 
, 
Cc: "'pgsql-hack...@postgresql.org'" , 
"ayaho...@ibagroup.eu" 
Date:   27/12/2018 11:26
Subject:RE: Timeout parameters



From: Fabien COELHO [mailto:coe...@cri.ensmp.fr]
> I still do not understand the use-case specifics: for me, aborting the
> connection, or a softer cancelling the statement, will result in the
> server stopping the statement, so the server does NOT "continue the 
job",
> so I still do not see how it really differs from the server-side
> statement_timeout setting.

How about when the server is so saturated that statement_timeout cannot 
work?  See SQLNET.SEND_TIMEOUT and SQLNET.RECV_TIMEOUT here:

https://docs.oracle.com/cd/E11882_01/network.112/e10835/sqlnet.htm#NETRF228


As these parameter names suggest, maybe we could use SEND_TIMEO and 
RECV_TIMEO socket options for setsockopt() instead of using pqWaitTimed().

To wrap up, the relevant parameters work like this:

* TCP keepalive and TCP user (retransmission) timeout: for network 
problems
* statement_timeout: for long-running queries
* socket_timeout (or send/recv_timeout): for saturated servers


FYI, PgJDBC has a parameter named socketTimeout:

https://jdbc.postgresql.org/documentation/head/connect.html#connection-parameters



Regards
Takayuki Tsunakawa

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Joerg Sonnenberger

On Wed, Jan 09, 2019 at 02:04:15PM -0500, Tom Lane wrote:
> Also, in view of finding that the original multiplier choices failed
> on the fmgr oid problem, I spent a little effort making the code
> able to try more combinations of hash multipliers and seeds.  It'd
> be nice to have some theory rather than just heuristics about what
> will work, though ...

The theory is that the code needs two families of pair-wise independent
hash functions to give the O(1) number of tries. For practical purposes,
the Jenkins hash easily qualifies or any cryptographic hash function.
The downside is that those hash functions are very expensive though.
A multiplicative hash, especially using the high bits of the results
(e.g. the top 32bit of a 32x32->64 multiplication) qualifies for the
requirements, but for strings of input it would need a pair of constant
per word. So the choice of a hash function family for a performance
sensitive part is a heuristic in the sense of trying to get away with as
simple a function as possible.

Joerg

Re: insensitive collations

2019-01-09 Thread Daniel Verite

Peter Eisentraut wrote:

> > =# select n from (values ('été' collate "myfr"), ('ete')) x(n)
> >   group by 1 order by 1 ;
> >   n  
> > -
> >  ete
> > (1 row)
> > 
> > =#  select n from (values ('été' collate "myfr"), ('ete')) x(n)
> >   group by 1 order by 1 desc;
> >   n  
> > -
> >  été
> > (1 row)
> 
> I don't see anything wrong here.  The collation says that both values
> are equal, so which one is returned is implementation-dependent.

Is it, but it's impractical if the product of seemingly the same GROUP BY
flip-flops between its different valid results. If it can't be avoided, then
okay. If it can be avoided at little cost, then it would be better to do it.

As a different example, the regression tests are somewhat counting on
this already. Consider this part:

+CREATE TABLE test3ci (x text COLLATE case_insensitive);
+INSERT INTO test1ci VALUES ('abc'), ('def'), ('ghi');
+INSERT INTO test2ci VALUES ('ABC'), ('ghi');
+INSERT INTO test3ci VALUES ('abc'), ('ABC'), ('def'), ('ghi');
...
+SELECT x, count(*) FROM test3ci GROUP BY x ORDER BY x;
+  x  | count 
+-+---
+ abc | 2
+ def | 1
+ ghi | 1
+(3 rows)

If ABC was returned here instead of abc for whatever reason,
that would be correct strictly speaking, yet "make check" would fail.
That's impractical.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor

On Wed, Jan 9, 2019 at 2:04 PM Tom Lane  wrote:
>
> I wrote:
> > John Naylor  writes:
> >> -There is a bit of a cognitive clash between $case_sensitive in
> >> gen_keywordlist.pl and $case_insensitive in PerfectHash.pm. They each
> >> make sense in their own file, but might it be worth using one or the
> >> other?

> Working on the fmgr-oid-lookup idea gave me the thought that
> PerfectHash.pm ought to support fixed-length keys.  Rather than start
> adding random parameters to the function, I borrowed an idea from
> PostgresNode.pm and made the options be keyword-style parameters.  Now
> the impedance mismatch about case sensitivity is handled with
>
> my $f = PerfectHash::generate_hash_function(\@keywords, $funcname,
> case_insensitive => !$case_sensitive);
>
> which is at least a little clearer than before, though I'm not sure
> if it entirely solves the problem.

It's a bit clearer, but thinking about this some more, it makes sense
for gen_keywordlist.pl to use $case_insensitive, because right now
every instance of the var is "!$case_sensitive". In the attached (on
top of v4), I change the command line option to --citext, and add the
ability to negate it within the option, as '--no-citext'. It's kind of
a double negative for the C-keywords invocation, but we can have the
option for both cases, so we don't need to worry about what the
default is (which is case_insensitive=1).

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/common/Makefile b/src/common/Makefile
index d0c2b970eb..5f9327de59 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -124,7 +124,7 @@ libpgcommon_srv.a: $(OBJS_SRV)
 
 # generate SQL keyword lookup table to be included into keywords*.o.
 kwlist_d.h: $(top_srcdir)/src/include/parser/kwlist.h $(GEN_KEYWORDLIST_DEPS)
-	$(GEN_KEYWORDLIST) --extern $<
+	$(GEN_KEYWORDLIST) --extern --citext $<
 
 # Dependencies of keywords*.o need to be managed explicitly to make sure
 # that you don't get broken parsing code, even in a non-enable-depend build.
diff --git a/src/interfaces/ecpg/preproc/Makefile b/src/interfaces/ecpg/preproc/Makefile
index 6c02f97622..96dd76317f 100644
--- a/src/interfaces/ecpg/preproc/Makefile
+++ b/src/interfaces/ecpg/preproc/Makefile
@@ -60,10 +60,10 @@ preproc.y: ../../../backend/parser/gram.y parse.pl ecpg.addons ecpg.header ecpg.
 
 # generate keyword headers
 c_kwlist_d.h: c_kwlist.h $(GEN_KEYWORDLIST_DEPS)
-	$(GEN_KEYWORDLIST) --varname ScanCKeywords --case $<
+	$(GEN_KEYWORDLIST) --varname ScanCKeywords --no-citext  $<
 
 ecpg_kwlist_d.h: ecpg_kwlist.h $(GEN_KEYWORDLIST_DEPS)
-	$(GEN_KEYWORDLIST) --varname ScanECPGKeywords $<
+	$(GEN_KEYWORDLIST) --varname ScanECPGKeywords --citext $<
 
 # Force these dependencies to be known even without dependency info built:
 ecpg_keywords.o c_keywords.o keywords.o preproc.o pgc.o parser.o: preproc.h
diff --git a/src/pl/plpgsql/src/Makefile b/src/pl/plpgsql/src/Makefile
index 8a0f294323..bc4ee50682 100644
--- a/src/pl/plpgsql/src/Makefile
+++ b/src/pl/plpgsql/src/Makefile
@@ -80,10 +80,10 @@ plerrcodes.h: $(top_srcdir)/src/backend/utils/errcodes.txt generate-plerrcodes.p
 
 # generate keyword headers for the scanner
 pl_reserved_kwlist_d.h: pl_reserved_kwlist.h $(GEN_KEYWORDLIST_DEPS)
-	$(GEN_KEYWORDLIST) --varname ReservedPLKeywords $<
+	$(GEN_KEYWORDLIST) --varname ReservedPLKeywords --citext $<
 
 pl_unreserved_kwlist_d.h: pl_unreserved_kwlist.h $(GEN_KEYWORDLIST_DEPS)
-	$(GEN_KEYWORDLIST) --varname UnreservedPLKeywords $<
+	$(GEN_KEYWORDLIST) --varname UnreservedPLKeywords --citext $<
 
 
 check: submake
diff --git a/src/tools/gen_keywordlist.pl b/src/tools/gen_keywordlist.pl
index 2744e1dc26..927511d7c4 100644
--- a/src/tools/gen_keywordlist.pl
+++ b/src/tools/gen_keywordlist.pl
@@ -35,13 +35,13 @@ use PerfectHash;
 
 my $output_path = '';
 my $extern = 0;
-my $case_sensitive = 0;
+my $case_insensitive = 1;
 my $varname = 'ScanKeywords';
 
 GetOptions(
 	'output:s'   => \$output_path,
 	'extern' => \$extern,
-	'case-sensitive' => \$case_sensitive,
+	'citext!'=> \$case_insensitive,
 	'varname:s'  => \$varname) || usage();
 
 my $kw_input_file = shift @ARGV || die "No input file.\n";
@@ -97,7 +97,7 @@ while (<$kif>)
 }
 
 # When being case-insensitive, insist that the input be all-lower-case.
-if (!$case_sensitive)
+if ($case_insensitive)
 {
 	foreach my $kw (@keywords)
 	{
@@ -157,7 +157,7 @@ printf $kwdef "#define %s_NUM_KEYWORDS %d\n\n", uc $varname, scalar @keywords;
 my $funcname = $varname . "_hash_func";
 
 my $f = PerfectHash::generate_hash_function(\@keywords, $funcname,
-	case_insensitive => !$case_sensitive);
+	case_insensitive => $case_insensitive);
 
 printf $kwdef qq|static %s\n|, $f;
 
@@ -178,11 +178,11 @@ printf $kwdef "#endif\t\t\t\t\t\t\t/* %s_H */\n", uc $base_filename;
 sub usage
 {
 	die <] [--varname/-v ] [--extern/-e] input_file
+Usage: gen_keywordlist.pl [--output

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor

On Tue, Jan 8, 2019 at 5:31 PM Tom Lane  wrote:
>
> John Naylor  writes:
> > In the committed keyword patch, I noticed that in common/keywords.c,
> > the array length is defined with
> > ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS]
> > but other keyword arrays just have ...[]. Is there a reason for the 
> > difference?
>
> The length macro was readily available there so I used it.  AFAIR
> that wasn't true elsewhere, though I might've missed something.
> It's pretty much just belt-and-suspenders coding anyway, since all
> those arrays are machine generated ...

I tried using the available num_keywords macro in plpgsql and it
worked fine, but it makes the lines really long. Alternatively, as in
the attached, we could remove the single use of the core macro and
maybe add comments to the generated magic numbers.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/common/keywords.c b/src/common/keywords.c
index 84f779feb9..edec4a5094 100644
--- a/src/common/keywords.c
+++ b/src/common/keywords.c
@@ -26,7 +26,7 @@
 
 #define PG_KEYWORD(kwname, value, category) category,
 
-const uint8 ScanKeywordCategories[SCANKEYWORDS_NUM_KEYWORDS] = {
+const uint8 ScanKeywordCategories[] = {
 #include "parser/kwlist.h"
 };
 
diff --git a/src/tools/gen_keywordlist.pl b/src/tools/gen_keywordlist.pl
index 927511d7c4..8c4c5da828 100644
--- a/src/tools/gen_keywordlist.pl
+++ b/src/tools/gen_keywordlist.pl
@@ -147,11 +147,6 @@ foreach my $name (@keywords)
 
 print $kwdef "};\n\n";
 
-# Emit a macro defining the number of keywords.
-# (In some places it's useful to have access to that as a constant.)
-
-printf $kwdef "#define %s_NUM_KEYWORDS %d\n\n", uc $varname, scalar @keywords;
-
 # Emit the definition of the hash function.
 
 my $funcname = $varname . "_hash_func";
@@ -168,8 +163,8 @@ printf $kwdef "const ScanKeywordList %s = {\n", $varname;
 printf $kwdef qq|\t%s_kw_string,\n|, $varname;
 printf $kwdef qq|\t%s_kw_offsets,\n|, $varname;
 printf $kwdef qq|\t%s,\n|, $funcname;
-printf $kwdef qq|\t%s_NUM_KEYWORDS,\n|, uc $varname;
-printf $kwdef qq|\t%d\n|, $max_len;
+printf $kwdef qq|\t%d,\t/* number of keywords */\n|, scalar @keywords;
+printf $kwdef qq|\t%d\t/* max keyword length */\n|, $max_len;
 printf $kwdef "};\n\n";
 
 printf $kwdef "#endif\t\t\t\t\t\t\t/* %s_H */\n", uc $base_filename;

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor

On Wed, Jan 9, 2019 at 2:44 PM Tom Lane  wrote:
> [patch to shrink oid index]

It would help maintaining its newfound sveltness if we warned if a
higher oid was assigned, as in the attached. I used 6200 as a soft
limit, but that could be anything similiar.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/genbki.pl b/src/backend/catalog/genbki.pl
index e7ead7dcc0..201343c2f6 100644
--- a/src/backend/catalog/genbki.pl
+++ b/src/backend/catalog/genbki.pl
@@ -74,6 +74,7 @@ my %catalog_data;
 my @toast_decls;
 my @index_decls;
 my %oidcounts;
+my $max_proc_oid = 6200;
 
 foreach my $header (@input_files)
 {
@@ -230,6 +231,13 @@ foreach my $row (@{ $catalog_data{pg_opfamily} })
 my %procoids;
 foreach my $row (@{ $catalog_data{pg_proc} })
 {
+	# Warn if a large OID has been newly assigned.
+	if ($row->{oid} > $max_proc_oid)
+	{
+		warn sprintf "Consider using an OID smaller than %d for %s (see commit 8ff5f824dca75)\n",
+			$max_proc_oid, $row->{proname};
+	}
+
 	# Generate an entry under just the proname (corresponds to regproc lookup)
 	my $prokey = $row->{proname};
 	if (defined $procoids{$prokey})

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

John Naylor  writes:
> On Wed, Jan 9, 2019 at 2:44 PM Tom Lane  wrote:
>> [patch to shrink oid index]

> It would help maintaining its newfound sveltness if we warned if a
> higher oid was assigned, as in the attached. I used 6200 as a soft
> limit, but that could be anything similiar.

I think the reason we have this issue is that people tend to use
high OIDs during development of a patch, so that their elbows won't
be joggled by unrelated changes.  Then sometimes they forget to
renumber them down before committing.  A warning like this would
lead to lots of noise during the development stage, which nobody
would thank us for.  If we could find a way to notice this only
when we were about to commit, it'd be good .. but I don't have an
idea about a nice way to do that.  (No, I don't want a commit hook
on gitmaster; that's warning too late, which is not better.)

regards, tom lane

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

John Naylor  writes:
> On Wed, Jan 9, 2019 at 2:04 PM Tom Lane  wrote:
>> Now the impedance mismatch about case sensitivity is handled with
>> my $f = PerfectHash::generate_hash_function(\@keywords, $funcname,
>>  case_insensitive => !$case_sensitive);
>> which is at least a little clearer than before, though I'm not sure
>> if it entirely solves the problem.

> It's a bit clearer, but thinking about this some more, it makes sense
> for gen_keywordlist.pl to use $case_insensitive, because right now
> every instance of the var is "!$case_sensitive". In the attached (on
> top of v4), I change the command line option to --citext, and add the
> ability to negate it within the option, as '--no-citext'. It's kind of
> a double negative for the C-keywords invocation, but we can have the
> option for both cases, so we don't need to worry about what the
> default is (which is case_insensitive=1).

Ah, I didn't realize that Getopt allows having a boolean option
defaulting to "on".  That makes it more practical to do something here.

I'm not in love with "citext" as the option name, though ... that has
little to recommend it except brevity, which is not a property we
really need here.  We could go with "[no-]case-insensitive", perhaps.
Or "[no-]case-fold", which is at least a little shorter and less
double-negative-y.

regards, tom lane

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

John Naylor  writes:
> On Tue, Jan 8, 2019 at 5:31 PM Tom Lane  wrote:
>> The length macro was readily available there so I used it.  AFAIR
>> that wasn't true elsewhere, though I might've missed something.
>> It's pretty much just belt-and-suspenders coding anyway, since all
>> those arrays are machine generated ...

> I tried using the available num_keywords macro in plpgsql and it
> worked fine, but it makes the lines really long. Alternatively, as in
> the attached, we could remove the single use of the core macro and
> maybe add comments to the generated magic numbers.

Meh, I'm not excited about removing the option just because there's
only one use of it now.  There might be more-compelling uses later.

regards, tom lane

Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-09 Thread Alvaro Herrera

Here are my proposed final changes.  I noticed that you were allocating
the prefixes for all cases even when there were no cset/gset in the
command; I changed it to delay the allocation until needed.  I also
noticed you were skipping the Meta enum dance for no good reason; added
that makes conditionals simpler.  The read_response routine seemed
misplaced and I gave it a name in a style closer to the rest of the
pgbench code.  Also, you were missing to free the ->lines pqexpbuffer
when the command is discarded.  I grant that the free_command() stuff
might be bogus since it's only tested with a command that's barely
initialized, but it seems better to make it bogus in this way (fixable
if we ever extend its use) than to forever leak memory silently.

I didn't test this beyond running "make check".

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4b8b6ba10f..3e111876fb 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -482,6 +482,8 @@ typedef enum MetaCommand
 	META_SETSHELL,/* \setshell */
 	META_SHELL,	/* \shell */
 	META_SLEEP,	/* \sleep */
+	META_CSET,	/* \cset */
+	META_GSET,	/* \gset */
 	META_IF,	/* \if */
 	META_ELIF,	/* \elif */
 	META_ELSE,	/* \else */
@@ -499,19 +501,39 @@ typedef enum QueryMode
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
+/*
+ * struct Command represents one command in a script.
+ *
+ * lines		The raw, possibly multi-line command text.  Variable substitution
+ *not applied.
+ * first_line	A short, single-line extract of 'lines', for error reporting.
+ * type			SQL_COMMAND or META_COMMAND
+ * meta			The type of meta-command, or META_NONE if command is SQL
+ * argc			Number of arguments of the command, 0 if not yet processed.
+ * argv			Command arguments, the first of which is the command or SQL
+ *string itself.  For SQL commands, after post-processing
+ *argv[0] is the same as 'lines' with variables substituted.
+ * nqueries		In a multi-command SQL line, the number of queries.
+ * varprefix 	SQL commands terminated with \gset or \cset have this set
+ *to a non NULL value.  If nonempty, it's used to prefix the
+ *variable name that receives the value.
+ * varprefix_max Allocated size of the varprefix array.
+ * expr			Parsed expression, if needed.
+ * stats		Time spent in this command.
+ */
 typedef struct Command
 {
-	PQExpBufferData lines;		/* raw command text (possibly multi-line) */
-	char	   *first_line;		/* first line for short display */
-	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
-	MetaCommand meta;			/* meta command identifier, or META_NONE */
-	int			argc;			/* number of command words */
-	char	   *argv[MAX_ARGS]; /* command word list */
-	int			nqueries;		/* number of compounds within an SQL command */
-	int			prefix_size;	/* allocated size of prefix, >= nqueries */
-	char	  **prefix;			/* per-compound command prefix for [cg]set */
-	PgBenchExpr *expr;			/* parsed expression, if needed */
-	SimpleStats stats;			/* time spent in this command */
+	PQExpBufferData lines;
+	char	   *first_line;
+	int			type;
+	MetaCommand meta;
+	int			argc;
+	char	   *argv[MAX_ARGS];
+	int			nqueries;
+	char	  **varprefix;
+	int			varprefix_max;
+	PgBenchExpr *expr;
+	SimpleStats stats;
 } Command;
 
 typedef struct ParsedScript
@@ -589,6 +611,7 @@ static void doLog(TState *thread, CState *st,
 static void processXactStats(TState *thread, CState *st, instr_time *now,
  bool skipped, StatsData *agg);
 static void pgbench_error(const char *fmt,...) pg_attribute_printf(1, 2);
+static void allocate_command_varprefix(Command *cmd, int totalqueries);
 static void addScript(ParsedScript script);
 static void *threadRun(void *arg);
 static void finishCon(CState *st);
@@ -1734,107 +1757,6 @@ valueTruth(PgBenchValue *pval)
 	}
 }
 
-/* read all responses from backend, storing into variable or discarding */
-static bool
-read_response(CState *st, char **prefix)
-{
-	PGresult   *res;
-	int			compound = 0;
-
-	while ((res = PQgetResult(st->con)) != NULL)
-	{
-		switch (PQresultStatus(res))
-		{
-			case PGRES_COMMAND_OK:	/* non-SELECT commands */
-			case PGRES_EMPTY_QUERY: /* may be used for testing no-op overhead */
-if (prefix[compound] != NULL)
-{
-	fprintf(stderr,
-			"client %d file %d command %d compound %d: "
-			"\\gset/cset expects a row\n",
-			st->id, st->use_file, st->command, compound);
-	st->ecnt++;
-	return false;
-}
-break;			/* OK */
-
-			case PGRES_TUPLES_OK:
-if (prefix[compound] != NULL)
-{
-	/* store result into variables if required */
-	int			ntuples = PQntuples(res),
-nfields = PQnfields(res);
-
-	if (ntuples != 1)
-	{
-		fprintf(stderr,
-"client %d file %d command %d co

Re: BUG #15446: Crash on ALTER TABLE

2019-01-09 Thread Andrew Dunstan


On 1/8/19 7:41 PM, Andrew Dunstan wrote:
> On 1/8/19 4:48 PM, Dean Rasheed wrote:
>> On Tue, 8 Jan 2019 at 19:34, Andrew Dunstan
>>  wrote:
>>> Here's a patch that I think cures the problem.
>>>
>> Hmm, that doesn't quite work because the table might not actually be
>> rewritten as a result of the type change. For example:
>>
>> DROP TABLE IF EXISTS foo;
>> CREATE TABLE foo (a int);
>> INSERT INTO foo VALUES (1);
>> ALTER TABLE foo ADD COLUMN b varchar(10) DEFAULT 'xxx';
>> ALTER TABLE foo ALTER COLUMN b SET DEFAULT 'yyy';
>> INSERT INTO foo VALUES (2);
>> SELECT * FROM foo;
>>  a |  b
>> ---+-
>>  1 | xxx
>>  2 | yyy
>> (2 rows)
>>
>> ALTER TABLE foo ALTER COLUMN b TYPE varchar(20) USING b::varchar(20);
>> SELECT * FROM foo;
>>  a |  b
>> ---+-
>>  1 |
>>  2 | yyy
>> (2 rows)
>>
>
> Ouch, OK, looks like we need something more complex.
>
>

Here's another attempt. For the rewrite case it kept the logic of the
previous patch to clear all the missing attributes, but if we're not
rewriting we reconstruct the missing value according to the new type
settings.


cheers


andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c8c50e8c98..e72d0b43ed 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9285,6 +9285,21 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 	HeapTuple	depTup;
 	ObjectAddress address;
 
+	/*
+	 * Clear all the missing values if we're rewriting the table, since this
+	 * renders them pointless.
+	 */
+	if (tab->rewrite)
+	{
+		Relationnewrel;
+
+		newrel = heap_open(RelationGetRelid(rel), NoLock);
+		RelationClearMissing(newrel);
+		relation_close(newrel, NoLock);
+		/* make sure we don't conflict with later attribute modifications */
+		CommandCounterIncrement();
+	}
+
 	attrelation = heap_open(AttributeRelationId, RowExclusiveLock);
 
 	/* Look up the target column */
@@ -9601,7 +9616,69 @@ ATExecAlterColumnType(AlteredTableInfo *tab, Relation rel,
 	/*
 	 * Here we go --- change the recorded column type and collation.  (Note
 	 * heapTup is a copy of the syscache entry, so okay to scribble on.)
+	 * First fix up the missing value if any.
 	 */
+	if (attTup->atthasmissing)
+	{
+		Datum   missingval;
+		boolmissingNull;
+
+		/* if rewrite is true the missing value should already be cleared */
+		Assert(tab->rewrite == 0);
+
+		/* Get the missing value datum */
+		missingval = heap_getattr(heapTup,
+  Anum_pg_attribute_attmissingval,
+  attrelation->rd_att,
+  &missingNull);
+
+		/* if it's a null array there is nothing to do */
+
+		if (! missingNull)
+		{
+			/*
+			 * Get the datum out of the array and repack it in a new array
+			 * built with the new type data. We assume that since the table
+			 * doesn't need rewriting, the actual Datum doesn't need to be
+			 * changed, only the array metadata.
+			 */
+
+			int one = 1;
+			bool isNull;
+			Datum   valuesAtt[Natts_pg_attribute];
+			boolnullsAtt[Natts_pg_attribute];
+			boolreplacesAtt[Natts_pg_attribute];
+
+			MemSet(valuesAtt, 0, sizeof(valuesAtt));
+			MemSet(nullsAtt, false, sizeof(nullsAtt));
+			MemSet(replacesAtt, false, sizeof(replacesAtt));
+
+			missingval = array_get_element(missingval,
+		   1,
+		   &one,
+		   0,
+		   attTup->attlen,
+		   attTup->attbyval,
+		   attTup->attalign,
+		   &isNull);
+			missingval = PointerGetDatum(
+construct_array(&missingval,
+1,
+targettype,
+tform->typlen,
+tform->typbyval,
+tform->typalign));
+
+			valuesAtt[Anum_pg_attribute_attmissingval - 1] = missingval;
+			replacesAtt[Anum_pg_attribute_attmissingval - 1] = true;
+			nullsAtt[Anum_pg_attribute_attmissingval - 1] = false;
+
+			heapTup = heap_modify_tuple(heapTup, RelationGetDescr(attrelation),
+		valuesAtt, nullsAtt, replacesAtt);
+			attTup = (Form_pg_attribute) GETSTRUCT(heapTup);
+		}
+	}
+
 	attTup->atttypid = targettype;
 	attTup->atttypmod = targettypmod;
 	attTup->attcollation = targetcollid;
diff --git a/src/test/regress/expected/fast_default.out b/src/test/regress/expected/fast_default.out
index 1c1924cd5c..40a15bd2d6 100644
--- a/src/test/regress/expected/fast_default.out
+++ b/src/test/regress/expected/fast_default.out
@@ -735,7 +735,44 @@ INSERT INTO leader VALUES (1, 1), (2, 2);
 ALTER TABLE leader ADD c int;
 ALTER TABLE leader DROP c;
 DELETE FROM leader;
+-- check that ALTER TABLE ... ALTER TYPE does the right thing
+CREATE TABLE vtype( a integer);
+INSERT INTO vtype VALUES (1);
+ALTER TABLE vtype ADD COLUMN b DOUBLE PRECISION DEFAULT 0.2;
+ALTER TABLE vtype ADD COLUMN c BOOLEAN DEFAULT true;
+SELECT * FROM vtype;
+ a |  b  | c 
+---+-+---
+ 1 | 0.2 | t
+(1 row)
+
+ALTER TABLE vtype
+  ALTER b TYPE text USIN

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread John Naylor

On Wed, Jan 9, 2019 at 5:33 PM Tom Lane  wrote:
> really need here.  We could go with "[no-]case-insensitive", perhaps.
> Or "[no-]case-fold", which is at least a little shorter and less
> double-negative-y.

I'd be in favor of --[no-]case-fold.

On Tue, Jan 8, 2019 at 5:53 PM Tom Lane  wrote:
> I improved the comment about how come the hash table entry assignment
> works.

I've gone over the algorithm in more detail and I don't see any nicer
way to write it. This comment in PerfectHash.pm:

(It does change '_', else we could just skip adjusting
# $cn here at all, for typical keyword strings.)

...seems a bit out of place in the module, because of its reference to
keywords, of interest right now to its only caller. Maybe a bit of
context here. (I also don't quite understand why we could
hypothetically skip the adjustment.)

Lastly, the keyword headers still have a dire warning about ASCII
order and binary search. Those could be softened to match the comment
in gen_keywordlist.pl.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-01-09 Thread Tom Lane

John Naylor  writes:
> On Wed, Jan 9, 2019 at 5:33 PM Tom Lane  wrote:
>> really need here.  We could go with "[no-]case-insensitive", perhaps.
>> Or "[no-]case-fold", which is at least a little shorter and less
>> double-negative-y.

> I'd be in favor of --[no-]case-fold.

Yeah, I like that better too; I've been having to stop and think
every time as to which direction is which with the [in]sensitive
terminology.  I'll make it "case-fold" throughout.

> This comment in PerfectHash.pm:

> (It does change '_', else we could just skip adjusting
> # $cn here at all, for typical keyword strings.)

> ...seems a bit out of place in the module, because of its reference to
> keywords, of interest right now to its only caller. Maybe a bit of
> context here. (I also don't quite understand why we could
> hypothetically skip the adjustment.)

Were it not for the underscore case, we could plausibly assume that
the supplied keywords are already all-lower-case and don't need any
further folding.  But I agree that this comment is probably more
confusing than helpful; it's easier just to see that the code is
applying the same transform as the runtime lookup will do.

> Lastly, the keyword headers still have a dire warning about ASCII
> order and binary search. Those could be softened to match the comment
> in gen_keywordlist.pl.

Agreed, will do.

Thanks for reviewing!

regards, tom lane

Re: A few new options for vacuumdb

2019-01-09 Thread Masahiko Sawada

On Wed, Jan 9, 2019 at 1:33 PM Michael Paquier  wrote:
>
> On Wed, Jan 09, 2019 at 10:33:00AM +0900, Masahiko Sawada wrote:
> > Since pg_(total)_relation_size() returns 0 for parent table the
> > specifying the parent table to vacuumdb with --min-relation-size
> > always does nothing. Maybe we will need to deal with this case when a
> > function returning whole partitoned table size is introduced.
>
> Good point.  I am not sure if we want to go down to having a size
> function dedicated to partitions especially as this would just now be
> a wrapper around pg_partition_tree(), but the size argument with
> partitioned tables is something to think about.  If we cannot sort out
> this part cleanly, perhaps we could just focus on the age-ing
> parameters and the other ones first?  It seems to me that what is
> proposed on this thread has value, so we could shave things and keep
> the essential, and focus on what we are sure about for simplicity.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: some minor comment fix in md.c

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 08:30:53AM +, Jamison, Kirk wrote:
> Here are few minor fix in md.c comments
> src/backend/storage/smgr/md.c
> 
> 1. @L174 - removed the unnecessary word "is".
> - […] Note that this is breaks mdnblocks() and related functionality [...]
> + […] Note that this breaks mdnblocks() and related functionality [...]
> 
> 2. @L885 - grammar fix
> - We used to pass O_CREAT here, but that's has the disadvantage that it might 
> [...]
> + We used to pass O_CREAT here, but that has the disadvantage that it might 
> [...]

Thanks, that looks good to me so pushed.
--
Michael


signature.asc
Description: PGP signature

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 06:00:23AM -0800, Donald Dong wrote:
> I wonder why do we need to have this addition?
> 
> src/backend/replication/walreceiver.c
> +   memset(walrcv->slotname, 0, NAMEDATALEN);
> +   if (PrimarySlotName)
> +   strlcpy((char *) walrcv->slotname, PrimarySlotName, 
> NAMEDATALEN);
>

That's much cleaner to me to clean up the field for safety before
starting the process.  When requesting a WAL receiver to start,
slotname and conninfo gets zeroed before doing anything, you are right
that we could do without it actually.

> I notice these patterns appear in different places. Maybe it will be
> better to have a common method such as pg_str_empty()? 

That's a pattern used quite a lot for many GUCs, so that's quite
separate from this patch.  Perhaps it would make sense to have a macro
for that purpose for GUCs, I am not sure if that's worth it though.
--
Michael


signature.asc
Description: PGP signature

Re: Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 08:10:43AM -0800, Andres Freund wrote:
> I'm quite unenthused about that. If anything, I'd remove detail and use
> the standard error message about not being able to write to a file, and
> include the full path.

Partially agreed.  Those messages have been left out of 56df07b
because they include some context about the offset and the length, and
I don't think that we simply want to remove that information.  What
about making the offset and the length part of an extra errdetail, and
switch the main error string to a more generic one?
--
Michael


signature.asc
Description: PGP signature

Re: Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Andres Freund




On January 9, 2019 5:01:40 PM PST, Michael Paquier  wrote:
>On Wed, Jan 09, 2019 at 08:10:43AM -0800, Andres Freund wrote:
>> I'm quite unenthused about that. If anything, I'd remove detail and
>use
>> the standard error message about not being able to write to a file,
>and
>> include the full path.
>
>Partially agreed.  Those messages have been left out of 56df07b
>because they include some context about the offset and the length, and
>I don't think that we simply want to remove that information.  What
>about making the offset and the length part of an extra errdetail, and
>switch the main error string to a more generic one?

IIRC we have other such errors including offset and length (and if not we'll 
grow some). It should be formatted as a genetic write error with the file name, 
no reference to log file, etc, even if there's no precedent.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Early WIP/PoC for inlining CTEs

2019-01-09 Thread Andreas Karlsson

Here is a new version of the patch with added tests, improved comments, 
some minor code cleanup and most importantly slightly changed logic for 
when we should inline.


The current strategy is to always inline unless the CTE is recursive or 
it has side effects, i.e. it is a DML query, it contains calls it a 
volatile function, or it contains FOR SHARE/FOR UPDATE. I feel this is a 
conservative approach which prevents behavioral changes (other than 
performance).


Current open issues:

1. Currently the patch will inline CTEs even when there are multiple 
references to them. If this is a win or not depends on the specific 
query so I am not sure what we should do here. One thing worth noting is 
that our current documentation talks a lot about how CTEs are only 
evaluated once.


"A useful property of WITH queries is that they are 
evaluated

only once per execution of the parent query, even if they are referred to
more than once by the parent query or sibling WITH 
queries.

Thus, expensive calculations that are needed in multiple places can be
placed within a WITH query to avoid redundant work. 
Another

possible application is to prevent unwanted multiple evaluations of
functions with side-effects."

What do you think?

2. Feedback on the new syntax. I am personally fine with the current 
syntax, but it was just something I just quickly hacked together to move 
the patch forward and which also solved my personal uses cases.


3. Are we happy with how I modified query_tree_walker()? I feel the code 
would be clearer if we could change the tree walker to treat the RTE as 
the parent node of the subquery instead of a sibling, but this seems 
like potentially a quite invasive change.


4. I need to update the user documentation.

Andreas
commit 60aab4af243cc93d504b99c99010593c02a5705c
Author: Andreas Karlsson 
Date:   Mon Sep 17 03:42:40 2018 +0200

CTE inlining

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d37a..8c26dd1f26 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165650..56602a164c 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index 88bc189646..92f180c5cf 100644
--- a/doc/src/sgml/queries.sgml
+++

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Donald Dong



On Jan 9, 2019, at 4:47 PM, Michael Paquier  wrote:
> That's much cleaner to me to clean up the field for safety before
> starting the process.  When requesting a WAL receiver to start,
> slotname and conninfo gets zeroed before doing anything, you are right
> that we could do without it actually.

Cool! I agree that cleaning up the field would make the logic cleaner.

> That's a pattern used quite a lot for many GUCs, so that's quite
> separate from this patch.  Perhaps it would make sense to have a macro
> for that purpose for GUCs, I am not sure if that's worth it though.

Yeah. I think having such macro would make the code a bit cleaner. Tho the
duplication of logic is not too significant.

Re: Misleading panic message in backend/access/transam/xlog.c

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 05:09:19PM -0800, Andres Freund wrote:
> IIRC we have other such errors including offset and length (and if
> not we'll grow some). It should be formatted as a genetic write
> error with the file name, no reference to log file, etc, even if
> there's no precedent. 

Yeah, there are a couple of them:
access/transam/xlog.c:
errmsg("could not read from log segment %s, offset %u: %m",
access/transam/xlog.c:
errmsg("could not read from log segment %s, offset %u: read %d of %zu",
access/transam/xlogutils.c:
errmsg("could not seek in log segment %s to offset %u: %m"
access/transam/xlogutils.c:
errmsg("could not read from log segment %s, offset %u, length %lu: %m",
replication/walreceiver.c:
errmsg("could not seek in log segment %s to offset %u: %m",
replication/walsender.c:
errmsg("could not seek in log segment %s to offset %u: %m",
replication/walsender.c:
errmsg("could not read from log segment %s, offset %u, length %zu: %m",
replication/walsender.c:
errmsg("could not read from log segment %s, offset %u: read %d of %zu",
--
Michael


signature.asc
Description: PGP signature

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-09 Thread Amit Langote

Fujita-san,

On 2019/01/09 20:20, Etsuro Fujita wrote:
> (2019/01/09 9:30), Amit Langote wrote:
>> So, while the patch at [1] can take care of this issue as I also mentioned
>> upthread, I was trying to come up with a solution that can be back-patched
>> to PG 11.  The patch I posted above is one such solution and as Ashutosh
>> points out it's perhaps not the best, because it can result in potentially
>> creating many copies of the same child EC member if we do it in joinrel.c,
>> as the patch proposes.  I will try to respond to the concerns he raised in
>> the next week if possible.
> 
> Thanks for working on this!
> 
> I like your patch in general.  I think one way to address Ashutosh's
> concerns would be to use the consider_partitionwise_join flag: originally,
> that was introduced for partitioned relations to show that they can be
> partitionwise-joined, but I think that flag could also be used for
> non-partitioned relations to show that they have been set up properly for
> partitionwise-joins, and I think by checking that flag we could avoid
> creating those copies for child dummy rels in try_partitionwise_join.

Ah, that's an interesting idea.

If I understand the original design of it correctly,
consider_partitionwise_join being true for a given relation (simple or
join) means that its RelOptInfo contains properties to consider it to be
joined with another relation (simple or join) using partitionwise join
mechanism.  Partitionwise join will occur between the pair if the other
relation also has relevant properties (hence its
consider_partitionwise_join set to true) and properties on the two sides
match.

That's a loaded meaning and abusing it to mean something else can be
challenged, but we can live with that if properly documented.  Speaking of
which:

/* used by partitionwise joins: */
boolconsider_partitionwise_join;/* consider partitionwise join
 * paths? (if partitioned
rel) */

Maybe, mention here how it will be abused in back-branches for
non-partitioned relations?
 
> Please find attached an updated version of the patch.  I modified your
> version so that building tlists for child dummy rels are also postponed
> until after they actually participate in partitionwise-joins, to avoid
> that possibly-useless work as well.  I haven't done any performance tests
> yet though.

Thanks for updating the patch.  I tested your patch (test setup described
below) and it has almost the same performance as my previous version:
552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also
mentioned below.

Thanks,
Amit

[1] Test setup

-- create tables
CREATE TABLE precio(fecha timestamp, pluid int, loccd int, plusalesprice
int) PARTITION BY RANGE (fecha);

SELECT format('CREATE TABLE public.precio_%s PARTITION OF public.precio
(PRIMARY KEY (fecha, pluid, loccd) ) FOR VALUES FROM (''%s'')TO(''%s'')',
i, a, b) FROM (SELECT '1990-01-01'::timestamp +(i||'days')::interval a,
'1990-01-02'::timestamp+(i||'days')::interval b, i FROM
generate_series(1,999) i)x;

\gexec

-- query
SELECT l_variacao.fecha, l_variacao.loccd , l_variacao.pant ,
l_variacao.patual , max_variacao.var_max FROM (SELECT p.fecha, p.loccd,
p.plusalesprice patual, da.plusalesprice pant, abs(p.plusalesprice -
da.plusalesprice) as var from precio p, (SELECT p.fecha, p.plusalesprice,
p.loccd from precio p WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2) da WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2 and p.loccd = da.loccd and p.fecha = da.fecha) l_variacao,
(SELECT max(abs(p.plusalesprice - da.plusalesprice)) as var_max from
precio p, (SELECT p.fecha, p.plusalesprice, p.loccd from precio p WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2) da WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2 and p.loccd
= da.loccd and p.fecha = da.fecha) max_variacao WHERE max_variacao.var_max
= l_variacao.var;

BTW, have we got a commitfest manager for the January CF?

2019-01-09 Thread Tom Lane

I see somebody marked the CF as in-progress, but if anyone volunteered
to be nagger-in-chief for this month, I didn't see that.

regards, tom lane

Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 05:38:53PM -0800, Donald Dong wrote:
> On Jan 9, 2019, at 4:47 PM, Michael Paquier  wrote:
>> That's much cleaner to me to clean up the field for safety before
>> starting the process.  When requesting a WAL receiver to start,
>> slotname and conninfo gets zeroed before doing anything, you are right
>> that we could do without it actually.
> 
> Cool! I agree that cleaning up the field would make the logic cleaner.

I was just double-checking the code, and it is possible to remove the
part from RequestXLogStreaming() which sets slotname and conninfo to
'\0', so I removed that part from my local branch to simplify the
code.
--
Michael


signature.asc
Description: PGP signature

Re: BTW, have we got a commitfest manager for the January CF?

2019-01-09 Thread Michael Paquier

On Wed, Jan 09, 2019 at 08:43:08PM -0500, Tom Lane wrote:
> I see somebody marked the CF as in-progress, but if anyone volunteered
> to be nagger-in-chief for this month, I didn't see that.

No volunteers as far as I know of...
--
Michael


signature.asc
Description: PGP signature

Re: pgsql: Use perfect hashing, instead of binary search, for keyword looku

2019-01-09 Thread Robert Haas

On Wed, Jan 9, 2019 at 7:48 PM Tom Lane  wrote:
> Use perfect hashing, instead of binary search, for keyword lookup.
>
> We've been speculating for a long time that hash-based keyword lookup
> ought to be faster than binary search, but up to now we hadn't found
> a suitable tool for generating the hash function.  Joerg Sonnenberger
> provided the inspiration, and sample code, to show us that rolling our
> own generator wasn't a ridiculous idea.  Hence, do that.
>
> The method used here requires a lookup table of approximately 4 bytes
> per keyword, but that's less than what we saved in the predecessor commit
> afb0d0712, so it's not a big problem.  The time savings is indeed
> significant: preliminary testing suggests that the total time for raw
> parsing (flex + bison phases) drops by ~20%.
>
> Patch by me, but it owes its existence to Joerg Sonnenberger;
> thanks also to John Naylor for review.

Wow.  That is a VERY significant improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Early WIP/PoC for inlining CTEs

2019-01-09 Thread Thomas Munro

On Thu, Jan 10, 2019 at 2:28 PM Andreas Karlsson  wrote:
> 2. Feedback on the new syntax. I am personally fine with the current
> syntax, but it was just something I just quickly hacked together to move
> the patch forward and which also solved my personal uses cases.

Thanks for working on this.  I very much want to see this feature go
in.  As mentioned by Andres up-thread, TPC-DS makes a lot of use of
CTEs... let me see, 34 queries out of 99 have a WITH clause.  These
will hopefully become candidates for parallel query.

I know this is a thorny topic, but I have to say that I am uneasy
about the MATERIALIZED syntax.  Here's how you write that in some
other RDBMS that loves hints:

WITH foo AS (SELECT /*+ MATERIALIZE */ ...)

I understood that it was a long standing project policy that we don't
want planner hints, but now we have a proposal to support one with a
top-level non-standard syntax.  If we take this syntax, should we not
also accept MATERIALIZED in front of subselects?

-1

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BTW, have we got a commitfest manager for the January CF?

2019-01-09 Thread David Fetter

On Wed, Jan 09, 2019 at 08:43:08PM -0500, Tom Lane wrote:
> I see somebody marked the CF as in-progress, but if anyone volunteered
> to be nagger-in-chief for this month, I didn't see that.

I'm happy to do it.

Would love to chat with recent prior CFMs, if they're willing.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2019-01-09 Thread Amit Kapila

On Wed, Jan 9, 2019 at 10:26 AM Haribabu Kommi  wrote:
>
>
> On Wed, Jan 9, 2019 at 2:25 PM Amit Kapila  wrote:
>>
>> Another minor point is that in the second statement
>> (pg_stat_statements_reset), you seem to have made it a two-line
>> statement whereas first one looks to be a single-line statement, it
>> would be good from the readability point of view if both looks same.
>> I would prefer the second to look similar to the first one.
>
>
> OK. Corrected.
>
> Updated patch attached.
>

I am happy with this patch now, attached is the new version of the
patch where I have slightly modified the commit message, see if that
looks fine to you.  I will push this patch tomorrow morning (after
again going through it) unless someone has any objections or see any
other issues.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

0001-Extend-pg_stat_statements_reset-to-reset-statistics_v17.patch
Description: Binary data

Re: WIP: Avoid creation of the free space map for small tables

2019-01-09 Thread Amit Kapila

On Wed, Jan 9, 2019 at 9:00 PM John Naylor  wrote:
>
> On Wed, Jan 9, 2019 at 7:33 AM Mithun Cy  wrote:
> > 2. v11-Every-other-page and v11-last-page patches improve the
> > performance from base.
> > 3. IMHO v11-Every-other-page would be ideal to consider it improves
> > the performance and also to an extent avoid expansion if space is
> > already available.
>

Thanks, Mithun for performance testing, it really helps us to choose
the right strategy here.  Once John provides next version, it would be
good to see the results of regular pgbench (read-write) runs (say at
50 and 300 scale factor) and the results of large copy.  I don't think
there will be any problem, but we should just double check that.

> Good to hear. I'll clean up the every-other-page patch and include it
> in my next version.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Displaying and dumping of table access methods

2019-01-09 Thread Amit Kapila

On Wed, Jan 9, 2019 at 10:53 PM Andres Freund  wrote:
>
> Hi,
>
> On 2019-01-09 18:26:16 +0530, Amit Kapila wrote:
> > On Tue, Jan 8, 2019 at 11:04 PM Andres Freund  wrote:
> > +1.
> >
> > >  Don't like it
> > > too much, but it seems better than the alternative. I wonder if we want
> > > one for multiple regression related issues, or whether one specifically
> > > about table AMs is more appropriate. I lean towards the latter.
> > >
> >
> > I didn't understand what is the earlier part "I wonder if we want one
> > for multiple regression related issues".  What do you mean by multiple
> > regression related issues?
>
> One flag that covers all things that make psql output less useful for
> regression test output, or one flag that just controls the table access
> method display.
>

+1 for the later (one flag that just controls the table access method
display) as that looks clean.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Unified logging system for command-line programs

2019-01-09 Thread Donald Dong

I think this patch is a nice improvement!

On Jan 3, 2019, at 2:08 PM, Andres Freund  wrote:
> Or we could just add an ERROR variant that doesn't exit. Years back
> I'd proposed that we make the log level a bitmask, but it could also
> just be something like CALLSITE_ERROR or something roughly along those
> lines.  There's a few cases in backend code where that'd be beneficial
> too.

I think the logging system can also be applied on pg_regress. Perhaps even
for the external frontend applications?

The patch cannot be applied directly on HEAD. So I patched it on top of 
60d99797bf. When I call pg_log_error() in initdb, I see

Program received signal SIGSEGV, Segmentation fault.
__strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
62  ../sysdeps/x86_64/multiarch/strlen-avx2.S: No such file or directory.
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:62
#1  0x55568f96 in dopr.constprop ()
#2  0x55569ddb in pg_vsnprintf ()
#3  0x55564236 in pg_log_generic ()
#4  0xc240 in main ()

I'm not sure what would be causing this behavior. I would appreciate
references or docs for testing and debugging patches more efficiently.
Now I'm having difficulties loading symbols of initdb in gdb.

Thank you,
Donald Dong

Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-09 Thread Michael Paquier

On Tue, Jan 08, 2019 at 04:42:35PM +0900, Michael Paquier wrote:
> I can see your point, though I would stick with
> ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and
> because the code is intended to not work on anything else than plain
> tables, at least now.

Attached are my suggestions shaped as a patch.  Thoughts?
--
Michael
diff --git a/src/backend/executor/execReplication.c b/src/backend/executor/execReplication.c
index e9c1beb1b7..bb1ab57595 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -609,11 +609,29 @@ CheckSubscriptionRelkind(char relkind, const char *nspname,
 		 const char *relname)
 {
 	/*
-	 * We currently only support writing to regular tables.
+	 * We currently only support writing to regular tables.  However, give
+	 * a more specific error for partitioned and foreign tables.
 	 */
+	if (relkind == RELKIND_PARTITIONED_TABLE)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use relation \"%s.%s\" as logical replication target",
+		nspname, relname),
+ errdetail("\"%s.%s\" is a partitioned table.",
+		nspname, relname)));
+	else if (relkind == RELKIND_FOREIGN_TABLE)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use relation \"%s.%s\" as logical replication",
+		nspname, relname),
+ errdetail("\"%s.%s\" is a foreign table.",
+		nspname, relname)));
+
 	if (relkind != RELKIND_RELATION)
 		ereport(ERROR,
 (errcode(ERRCODE_WRONG_OBJECT_TYPE),
- errmsg("logical replication target relation \"%s.%s\" is not a table",
+ errmsg("cannot use relation \"%s.%s\" as logical replication target",
+		nspname, relname),
+ errdetail("\"%s.%s\" is not a table.",
 		nspname, relname)));
 }


signature.asc
Description: PGP signature

Re: New vacuum option to do only freezing

2019-01-09 Thread Masahiko Sawada

On Thu, Jan 10, 2019 at 5:23 AM Bossart, Nathan  wrote:
>
> Hi,
>
> On 1/8/19, 7:03 PM, "Masahiko Sawada"  wrote:
> > Attached the updated version patch incorporated all comments I got.
>
> Thanks for the new patch!
>
> > * Instead of freezing xmax I've changed the behaviour of the new
> > option (DISABLE_INDEX_CLEANUP) so that it sets dead tuples as dead
> > instead of as unused and skips both index vacuum and index cleanup.
> > That is, we remove the storage of dead tuple but leave dead itemids
> > for the index lookups. These are removed by the next vacuum execution
> > enabling index cleanup. Currently HOT-pruning doesn't set the root of
> > the chain as unused even if the whole chain is dead. Since setting
> > tuples as dead removes storage the freezing xmax is no longer
> > required.
>
> I tested this option with a variety of cases (HOT, indexes, etc.), and
> it seems to work well.  I haven't looked too deeply into the
> implications of using LP_DEAD this way, but it seems like a reasonable
> approach at first glance.

Thank you for reviewing the patch!

>
> +
> +DISABLE_INDEX_CLEANUP
> +
> + 
> +  VACUUM removes dead tuples and prunes HOT-updated
> +  tuples chain for live tuples on table. If the table has any dead tuple
> +  it removes them from both table and indexes for re-use. With this
> +  option VACUUM marks tuples as dead (i.e., it doesn't
> +  remove tuple storage) and disables removing dead tuples from indexes.
> +  This is suitable for avoiding transaction ID wraparound but not
> +  sufficient for avoiding index bloat. This option cannot be used in
> +  conjunction with FULL option.
> + 
> +
> +   
>
> I think we should clarify the expected behavior when this option is
> used on a table with no indexes.  We probably do not want to fail, as
> this could disrupt VACUUM commands that touch several tables.  Also,
> we don't need to set tuples as dead instead of unused, which appears
> to be what this patch is actually doing:
>
> +   if (nindexes > 0 && disable_index_cleanup)
> +   lazy_set_tuples_dead(onerel, blkno, buf, 
> vacrelstats);
> +   else
> +   lazy_vacuum_page(onerel, blkno, buf, 0, 
> vacrelstats, &vmbuffer);
>
> In this case, perhaps we should generate a log statement that notes
> that DISABLE_INDEX_CLEANUP is being ignored and set
> disable_index_cleanup to false during processing.

Agreed. How about output a NOTICE message before calling
lazy_scan_heap() in lazy_vacuum_rel()?

>
> +   if (disable_index_cleanup)
> +   ereport(elevel,
> +   (errmsg("\"%s\": marked %.0f row 
> versions in %u pages as dead",
> +   
> RelationGetRelationName(onerel),
> +   tups_vacuumed, 
> vacuumed_pages)));
> +   else
> +   ereport(elevel,
> +   (errmsg("\"%s\": removed %.0f row 
> versions in %u pages",
> +   
> RelationGetRelationName(onerel),
> +   tups_vacuumed, 
> vacuumed_pages)));
>
> Should the first log statement only be generated when there are also
> indexes?

You're right. Will fix.

>
> +static void
> +lazy_set_tuples_dead(Relation onerel, BlockNumber blkno, Buffer buffer,
> +   LVRelStats *vacrelstats)
>
> This function looks very similar to lazy_vacuum_page().  Perhaps the
> two could be combined?
>

Yes, I intentionally separed them as I was concerned the these
functions have different assumptions and usage. But the combining them
also could work. Will change it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Undo logs

2019-01-09 Thread Dilip Kumar

On Wed, Jan 9, 2019 at 11:40 AM Dilip Kumar  wrote:

> On Tue, Jan 8, 2019 at 2:11 PM Amit Kapila 
> wrote:
>
>>
>>
>> 3.
>> + work_txn.urec_next = uur->uur_next;
>> + work_txn.urec_xidepoch = uur->uur_xidepoch;
>> + work_txn.urec_progress = uur->uur_progress;
>> + work_txn.urec_prevurp = uur->uur_prevurp;
>> + work_txn.urec_dbid = uur->uur_dbid;
>>
>> It would be better if we initialize these members in the order in
>> which they appear in the actual structure.  All other undo header
>> structures are initialized that way, so this looks out-of-place.
>
>
One more change in ReadUndoByte on same line.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


0003-Provide-interfaces-to-store-and-fetch-undo-records_v16.patch
Description: Binary data

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-09 Thread Etsuro Fujita


Amit-san,

(2019/01/10 10:41), Amit Langote wrote:

On 2019/01/09 20:20, Etsuro Fujita wrote:

I like your patch in general.  I think one way to address Ashutosh's
concerns would be to use the consider_partitionwise_join flag: originally,
that was introduced for partitioned relations to show that they can be
partitionwise-joined, but I think that flag could also be used for
non-partitioned relations to show that they have been set up properly for
partitionwise-joins, and I think by checking that flag we could avoid
creating those copies for child dummy rels in try_partitionwise_join.


Ah, that's an interesting idea.

If I understand the original design of it correctly,
consider_partitionwise_join being true for a given relation (simple or
join) means that its RelOptInfo contains properties to consider it to be
joined with another relation (simple or join) using partitionwise join
mechanism.  Partitionwise join will occur between the pair if the other
relation also has relevant properties (hence its
consider_partitionwise_join set to true) and properties on the two sides
match.


Actually, the flag being true just means that the tlist for a given 
partitioned relation (simple or join) doesn't contain any whole-row 
Vars.  And if two given partitioned relations having the flag being true 
have additional properties to be joined using the PWJ technique, then we 
try to do PWJ for those partitioned relations.  (The name of the flag 
isn't good?  If so, that would be my fault because I named that flag.)



That's a loaded meaning and abusing it to mean something else can be
challenged, but we can live with that if properly documented.  Speaking of
which:

 /* used by partitionwise joins: */
 boolconsider_partitionwise_join;/* consider partitionwise join
  * paths? (if partitioned
rel) */

Maybe, mention here how it will be abused in back-branches for
non-partitioned relations?


Will do.


Please find attached an updated version of the patch.  I modified your
version so that building tlists for child dummy rels are also postponed
until after they actually participate in partitionwise-joins, to avoid
that possibly-useless work as well.  I haven't done any performance tests
yet though.


Thanks for updating the patch.  I tested your patch (test setup described
below) and it has almost the same performance as my previous version:
552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also
mentioned below.


Thanks for that testing!

I also tested the patch with your script:

253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10)

Best regards,
Etsuro Fujita

Re: Fast path for empty relids in check_outerjoin_delay()

2019-01-09 Thread Richard Guo

On Tue, Jan 8, 2019 at 11:29 PM Tom Lane  wrote:

> Richard Guo  writes:
> > On Fri, Jan 4, 2019 at 10:32 PM Peter Eisentraut <
> > peter.eisentr...@2ndquadrant.com> wrote:
> >> I think code readability and maintainability would be improved by having
> >> fewer special cases and fast paths.  In this particular case, I'd even
> >> remove the existing fast path and just let the subsequent loop run zero
> >> times.  I doubt there is a performance benefit to the existing coding.
> >> And if there is no performance benefit, then it's not really a fast
> >> path, just another path.
>
> > This code was added in commit d7a6a0, by Tom.
> > Hi Tom, what is your opinion about the fast path in
> > check_outerjoin_delay()?
>
> I quite reject Peter's argument that the existing fast path is not
> worthwhile.  It's a cheap test and it saves cycles whenever the query has
> no outer joins, which is common for simple queries.  (The general rule
> I've followed for planner fast-path cases is to make simple queries as
> cheap as possible; you don't want the planner expending a whole lot of
> overhead on trivial cases.)  Moreover, while it's true that the loop
> will fall through quickly if root->join_info_list is nil, there's still
> a bms_copy, bms_int_members, and bms_free that will be expended uselessly
> if we remove the fast-path check.
>
> I'm not, however, convinced that the case of *relids_p being empty is
> common enough to justify adding a test for that.  In fact, it looks
> to me like it's guaranteed to be nonempty for the main call sites in
> distribute_qual_to_rels.
>
> Possibly what would make more sense is to add the consideration to
> check_equivalence_delay, which is the only call site that can pass
> an empty set; that is
>
> +   /* A variable-free expression cannot be outerjoin-delayed */
> +   if (restrictinfo->left_relids)
> +   {
> /* must copy restrictinfo's relids to avoid changing it */
> relids = bms_copy(restrictinfo->left_relids);
> /* check left side does not need delay */
> if (check_outerjoin_delay(root, &relids, &nullable_relids, true))
> return false;
> +   }
>
> and likewise for the right side.
>
> The bigger picture here, of course, is that check_outerjoin_delay's
> API is not at all well matched to what check_equivalence_delay needs:
> it has to make the extra bms_copy shown above, and it has no use
> for either of the relid sets that check_outerjoin_delay wants to
> return.  I believe it's like that because check_outerjoin_delay is
> much older than the EquivalenceClass logic, and I didn't want to
> redesign it when I put in ECs, and I also didn't want to just
> duplicate all that logic.  But maybe it's time to put some more thought
> into that, and try to find a refactoring of check_outerjoin_delay that
> suits both callers better.
>

I am thinking about whether we can quickly tell a qual is
outerjoin_delayed or not by some bms_overlap operations.

A qual is regarded as being outerjoin_delayed if:

1) It references any nullable rels of some OJ below this qual, and
2) We haven't included all the rels of that OJ in relids.

For 1), we can collect all the nullable rels of OJs below this qual in
deconstruct_recurse() (like nullable_baserels, initsplan.c:976) and
bms_overlap this qual's relids with the collected nullable_relids.

For 2), I haven't figured out a good way.

Is this a viable direction? Any thoughts?


>
> Anyway, I concur with Peter that the patch you've presented should
> probably be rejected: I doubt it's a win performance-wise and I don't
> see that it adds anything to readability either.  But if you want to
> think about a more effective refactoring of this logic, maybe there
> is something good to be had out of that.
>

Thanks Tom. I understand your concern.

Thanks
Richard

RE: speeding up planning with partitions

2019-01-09 Thread Imai, Yoshikazu

Hi Amit,

On Mon, Jan 7, 2019 at 6:30 PM, Amit Langote wrote:
> On 2018/12/27 20:25, Amit Langote wrote:
> > Here's the v10 of the patch.  I didn't get chance to do more changes
> > than those described above and address Imai-san's review comments
> yesterday.
> >
> > Have a wonderful new year!  I'll be back on January 7 to reply on this
> thread.
> 
> Rebased due to changed copyright year in prepunion.c.  Also realized that
> the new files added by patch 0004 still had 2018 in them.

Thank you for new patches.


I also have some comments on 0001, set_inherit_target_rel_sizes().

In set_inherit_target_rel_sizes():

Some codes are placed not the same order as set_append_rel_size().

0001: at line 325-326,
+   ListCell   *l;
+   boolhas_live_children;

In set_append_rel_size(), "has_live_children" is above of the "ListCell *l";

0001: at line 582-603
+   if (IS_DUMMY_REL(childrel))
+   continue;
+
...
+   Assert(childrel->rows > 0);
+
+   /* We have at least one live child. */
+   has_live_children = true;

In set_append_rel_size(), 
+   /* We have at least one live child. */
+   has_live_children = true;
is directly under of
+   if (IS_DUMMY_REL(childrel))
+   continue;

0001: at line 606-622,
+   if (!has_live_children)
+   {
+   /*
+* All children were excluded by constraints, so mark the 
relation
+* ass dummy.  We must do this in this phase so that the rel's
+* dummy-ness is visible when we generate paths for other rels.
+*/
+   set_dummy_rel_pathlist(rel);
+   }
+   else
+   /*
+* Set a non-zero value here to cope with the caller's 
requirement
+* that non-dummy relations are actually not empty.  We don't 
try to
+* be accurate here, because we're not going to create a path 
that
+* combines the children outputs.
+*/
+   rel->rows = 1;

In set_append_rel_size(), a condition of if clause is not !has_live_children 
but 
has_live_children.

I also noticed there isn't else block while there is if block.

--
Yoshikazu Imai

Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2019-01-09 Thread Haribabu Kommi

On Thu, Jan 10, 2019 at 2:32 PM Amit Kapila  wrote:

> On Wed, Jan 9, 2019 at 10:26 AM Haribabu Kommi 
> wrote:
> >
> >
> > On Wed, Jan 9, 2019 at 2:25 PM Amit Kapila 
> wrote:
> >>
> >> Another minor point is that in the second statement
> >> (pg_stat_statements_reset), you seem to have made it a two-line
> >> statement whereas first one looks to be a single-line statement, it
> >> would be good from the readability point of view if both looks same.
> >> I would prefer the second to look similar to the first one.
> >
> >
> > OK. Corrected.
> >
> > Updated patch attached.
> >
>
> I am happy with this patch now, attached is the new version of the
> patch where I have slightly modified the commit message, see if that
> looks fine to you.  I will push this patch tomorrow morning (after
> again going through it) unless someone has any objections or see any
> other issues.
>

Thanks for changes. LGTM.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: insensitive collations

2019-01-09 Thread Peter Eisentraut

On 09/01/2019 19:49, Andreas Karlsson wrote:
> On 12/28/18 9:55 AM, Peter Eisentraut wrote:
>> Here is an updated patch.
>>
>> I have updated the naming to "deterministic", as discussed.
> 
> Maybe this is orthogonal and best handled elsewhere but have you when 
> working with string equality given unicode normalization forms[1] any 
> thought?

Nondeterministic collations do address this by allowing canonically
equivalent code point sequences to compare as equal.  You still need a
collation implementation that actually does compare them as equal; ICU
does this, glibc does not AFAICT.

> Would there be any point in adding unicode normalization support into 
> the collation system or is this best handle for example with a function 
> run on INSERT or with something else entirely?

I think there might be value in a feature that normalizes strings as
they enter the database, as a component of the encoding conversion
infrastructure.  But that would be a separate feature.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: insensitive collations

2019-01-09 Thread Peter Eisentraut

On 09/01/2019 22:01, Daniel Verite wrote:
>> I don't see anything wrong here.  The collation says that both values
>> are equal, so which one is returned is implementation-dependent.
> Is it, but it's impractical if the product of seemingly the same GROUP BY
> flip-flops between its different valid results. If it can't be avoided, then
> okay. If it can be avoided at little cost, then it would be better to do it.

But there is no concept of which one of these is the preferred variant,
so I don't see how the system is supposed to pick one and then stick to
it across separate query invocations.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

82 matches

Mail list logo