Re: [HACKERS] dubious error message from partition.c

2017-08-08 Thread Amit Langote
On 2017/08/09 13:03, David G. Johnston wrote:
> On Tue, Aug 8, 2017 at 8:34 PM, Tom Lane  wrote:
> 
>> A small suggestion is that it'd be better to write it like "Specified
>> upper bound \"%s\" precedes lower bound \"%s\"."  I think "succeeds" has
>> more alternate meanings than "precedes", so the wording you have seems
>> more confusing than it needs to be.  (Of course, the situation could be
>> the opposite in other languages, but translators have the ability to
>> reverse the ordering if they need to.)
>>
>> Or you could just go with "follows" instead of "succeeds".
>>
> 
> ​"exceeds" seems to be the word the original sentence was looking for.  If
> using a single word I kinda like reversing the direction and using
> "precedes"​ though.  "follows" makes it sound like a puppy :)
> 
> "is greater than" is a bit more verbose but an option as well - one that
> seems more natural in this space - and yes I've reverted back to
> lower->upper with this wording.  I think the hard "x" in exceeds is what
> turned me off to it.

I went with the Tom's suggested "Specified upper bound %s precedes lower
bound %s." in the attached patch.

Although, we can also consider a message along the lines suggested by
David: "Specified upper bound is less than (less than or equal to) lower
bound." or "Specified lower bound is greater than (greater than or equal
to) upper bound", because the "precedes" version looks a bit odd in the
following, for example:

 CREATE TABLE fail_part PARTITION OF range_parted2 FOR VALUES FROM (1) TO (1);
-ERROR:  cannot create range partition with empty range
+ERROR:  invalid range bound specification for partition "fail_part"
+DETAIL:  Specified upper bound (1) precedes lower bound (1).

In any case, for someone to make sense of why that leads to an empty
range, they have to remember the rule that the upper bound is an exclusive
bound.

Thanks,
Amit
From 6f2266376c765207e2cc458bb890c671e0bc364a Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 9 Aug 2017 14:36:57 +0900
Subject: [PATCH] Fix error message when apprently empty range bound is
 specified

---
 src/backend/catalog/partition.c| 10 +++-
 src/backend/utils/adt/ruleutils.c  | 84 +++---
 src/include/utils/ruleutils.h  |  1 +
 src/test/regress/expected/create_table.out |  6 ++-
 4 files changed, 56 insertions(+), 45 deletions(-)

diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index dcc7f8af27..01ba4a6172 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -722,10 +722,16 @@ check_new_partition_bound(char *relname, Relation parent,
 */
if (partition_rbound_cmp(key, lower->datums, 
lower->kind, true,

 upper) >= 0)
+   {
ereport(ERROR,

(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
-errmsg("cannot create 
range partition with empty range"),
-
parser_errposition(pstate, spec->location)));
+errmsg("invalid range 
bound specification for partition \"%s\"",
+   
relname),
+errdetail("Specified 
upper bound %s precedes lower bound %s.",
+   
get_range_partbound_string(spec->upperdatums),
+   
get_range_partbound_string(spec->lowerdatums)),
+   
parser_errposition(pstate, spec->location)));
+   }
 
if (partdesc->nparts > 0)
{
diff --git a/src/backend/utils/adt/ruleutils.c 
b/src/backend/utils/adt/ruleutils.c
index d83377d1d8..0faa0204ce 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -8722,47 +8722,9 @@ get_rule_expr(Node *node, deparse_context *context,
   
list_length(spec->lowerdatums) ==
   
list_length(spec->upperdatums));
 
-   appendStringInfoString(buf, 
"FOR VALUES FROM (");
-   sep = "";
-   foreach(cell, spec->lowerdatums)
-   {
-   PartitionRangeDatum 
*datum =
-  

Re: [HACKERS] Error : undefined symbol : LWLockAssign in 9.6.3

2017-08-08 Thread Konstantin Knizhnik

On 08/09/2017 07:07 AM, 송기훈 wrote:

본문 이미지 1
Hi.
I'm trying to use imcs module with 9.6 and got this error message. LWLockAssign 
function has been deleted from 9.6. I can't use this module anymore from 9.6.

What I want to ask you something is that your team decides not to support imcs 
module anymore or doesn't concern about imcs module or are there any ways to 
run postgresql in memory only?


Hi,
I am author of IMCS module and performing support of it.
Please contact to me directly.
I have committed patch in https://github.com/knizhnik/imcs.git repository
which allows to use IMCS with 9.6.3 and later Postgres versions.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] parallelize queries containing initplans

2017-08-08 Thread Haribabu Kommi
On Mon, Jul 17, 2017 at 10:53 PM, Amit Kapila 
wrote:

> On Tue, Mar 28, 2017 at 7:25 AM, Amit Kapila 
> wrote:
> > On Thu, Mar 16, 2017 at 2:34 AM, Kuntal Ghosh
> >  wrote:
> >> On Tue, Mar 14, 2017 at 3:20 PM, Amit Kapila 
> wrote:
> >>> Based on that idea, I have modified the patch such that it will
> >>> compute the set of initplans Params that are required below gather
> >>> node and store them as bitmap of initplan params at gather node.
> >>> During set_plan_references, we can find the intersection of external
> >>> parameters that are required at Gather or nodes below it with the
> >>> initplans that are passed from same or above query level. Once the set
> >>> of initplan params are established, we evaluate those (if they are not
> >>> already evaluated) before execution of gather node and then pass the
> >>> computed value to each of the workers.   To identify whether a
> >>> particular param is parallel safe or not, we check if the paramid of
> >>> the param exists in initplans at same or above query level.  We don't
> >>> allow to generate gather path if there are initplans at some query
> >>> level below the current query level as those plans could be
> >>> parallel-unsafe or undirect correlated plans.
> >>
> >> I would like to mention different test scenarios with InitPlans that
> >> we've considered while developing and testing of the patch.
> >>
> >
> > Thanks a lot Kuntal for sharing different test scenarios.
> > Unfortunately, this patch doesn't received any review till now, so
> > there is no chance of making it in to PostgreSQL-10.  I have moved
> > this to next CF.
> >
>
> Attached is a rebased version of the patch with below changes:
> a. SubplanState now directly stores Subplan rather than ExprState, so
> patch needs some adjustment in that regard.
> b. While rejecting the paths (based on if there are initplans at level
> below the current query level) for parallelism, the rejected paths
> were not marked as parallel unsafe.  Due to this in
> force_parallel_mode=regress, we were able to add gather node above
> parallel unsafe paths.  The modified patch ensures to mark such paths
> as parallel unsafe.
> c. Added regression test.
> d. Improve comments in the code.
>
>
I tested the latest patch and the parallel plan is getting choose for most
of
the init plans.

For the following query the parallel plan is not chosen. The query contains
an init plan that refer the outer node.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t1.k = (select max(k) from t3 where t3.i=t1.i));
 QUERY PLAN

-
 Seq Scan on t1  (cost=0.00..22426.28 rows=448 width=12) (actual
time=8.335..132.557 rows=2 loops=1)
   Filter: (SubPlan 2)
   Rows Removed by Filter: 894
   SubPlan 2
 ->  Result  (cost=16.27..31.26 rows=999 width=4) (actual
time=0.146..0.146 rows=0 loops=896)
   One-Time Filter: (t1.k = $1)
   InitPlan 1 (returns $1)
 ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
time=0.145..0.145 rows=1 loops=896)
   ->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
(actual time=0.131..0.144 rows=0 loops=896)
 Filter: (i = t1.i)
 Rows Removed by Filter: 900
   ->  Seq Scan on t2  (cost=16.27..31.26 rows=999 width=4) (actual
time=0.012..0.013 rows=10 loops=2)
 Planning time: 0.272 ms
 Execution time: 132.623 ms
(14 rows)

If I change the query a little bit, the Result node doesn't appear and the
parallel plan
gets chosen.

postgres=# explain analyze select * from t1 where t1.i in (select t2.i from
t2 where t2.k = (select max(k) from t3 where t3.i=t1.i));
   QUERY PLAN

-
 Seq Scan on t1  (cost=0.00..19162.88 rows=448 width=12) (actual
time=3501.483..3501.483 rows=0 loops=1)
   Filter: (SubPlan 2)
   Rows Removed by Filter: 896
   SubPlan 2
 ->  Gather  (cost=16.27..26.47 rows=2 width=4) (actual
time=3.471..3.795 rows=0 loops=896)
   Workers Planned: 2
   Params Evaluated: $1
   Workers Launched: 2
   InitPlan 1 (returns $1)
 ->  Aggregate  (cost=16.25..16.27 rows=1 width=4) (actual
time=0.161..0.161 rows=1 loops=896)
   ->  Seq Scan on t3  (cost=0.00..16.25 rows=2 width=4)
(actual time=0.144..0.156 rows=0 loops=896)
 Filter: (i = t1.i)
 Rows Removed by Filter: 900
   ->  Parallel Seq Scan on t2  (cost=0.00..10.20 rows=1 width=4)
(actual time=0.001..0.001 rows=0 loops=804608)
 Filter: (k = $1)

Re: [HACKERS] Error : undefined symbol : LWLockAssign in 9.6.3

2017-08-08 Thread Andres Freund
Hi,

On 2017-08-09 13:07:53 +0900, 송기훈 wrote:
> [image: 본문 이미지 1]
> Hi.
> I'm trying to use imcs module with 9.6 and got this error message.
> LWLockAssign function has been deleted from 9.6. I can't use this module
> anymore from 9.6.
> 
> What I want to ask you something is that your team decides not to support
> imcs module anymore or doesn't concern about imcs module or are there any
> ways to run postgresql in memory only?

imcs was never supported by "the postgres team" - it was made by an
individual contributor "outside" of postgres. I've CCed him, maybe he
can comment on his plans around imcs.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "make check" with non-GNU make

2017-08-08 Thread Thomas Munro
On Wed, Aug 9, 2017 at 3:44 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Does anyone know why "make check" doesn't work on BSD systems if
>> tmp_install doesn't exist yet?  It's no big deal, you just have to run
>> "gmake check", but Makefile is supposed to do that for you and it
>> works fine for every other target.  No big deal, but it'd be nice to
>> unravel this mystery...
>
>> Specifically, if you run "make check" then it invokes
>> "/usr/local/bin/gmake check" for you, but it seem to skip the step
>> that builds tmp_install and so then pg_regress fails.
>
> Hmm, looking into Makefile.global.in, that step seems to be conditional on
> MAKELEVEL:
>
> temp-install:
> ifndef NO_TEMP_INSTALL
> ifneq ($(abs_top_builddir),)
> ifeq ($(MAKELEVEL),0)

Ah, right.  That coding is recommended in the GNU make manual to
distinguish from explicit invocation and recursive invocation.
FreeBSD make also seems to set MAKELEVEL.  Doing this fixes the
problem for me, though it feels a bit sneaky:

diff --git a/Makefile b/Makefile
index 72e9c83733..399e69540f 100644
--- a/Makefile
+++ b/Makefile
@@ -29,7 +29,7 @@ all check install installdirs installcheck
installcheck-parallel uninstall clean
\
 if [ x"$${GMAKE+set}" = xset ]; then \
   echo "Using GNU make found at $${GMAKE}"; \
-  $${GMAKE} $@ ; \
+  MAKELEVEL= $${GMAKE} $@ ; \
 else \
   echo "You must use GNU make to build PostgreSQL." ; \
   false; \

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Error : undefined symbol : LWLockAssign in 9.6.3

2017-08-08 Thread 송기훈
[image: 본문 이미지 1]
Hi.
I'm trying to use imcs module with 9.6 and got this error message.
LWLockAssign function has been deleted from 9.6. I can't use this module
anymore from 9.6.

What I want to ask you something is that your team decides not to support
imcs module anymore or doesn't concern about imcs module or are there any
ways to run postgresql in memory only?


Re: [HACKERS] dubious error message from partition.c

2017-08-08 Thread David G. Johnston
On Tue, Aug 8, 2017 at 8:34 PM, Tom Lane  wrote:

> A small suggestion is that it'd be better to write it like "Specified
> upper bound \"%s\" precedes lower bound \"%s\"."  I think "succeeds" has
> more alternate meanings than "precedes", so the wording you have seems
> more confusing than it needs to be.  (Of course, the situation could be
> the opposite in other languages, but translators have the ability to
> reverse the ordering if they need to.)
>
> Or you could just go with "follows" instead of "succeeds".
>

​"exceeds" seems to be the word the original sentence was looking for.  If
using a single word I kinda like reversing the direction and using
"precedes"​ though.  "follows" makes it sound like a puppy :)

"is greater than" is a bit more verbose but an option as well - one that
seems more natural in this space - and yes I've reverted back to
lower->upper with this wording.  I think the hard "x" in exceeds is what
turned me off to it.

David J.


Re: [HACKERS] "make check" with non-GNU make

2017-08-08 Thread Tom Lane
Thomas Munro  writes:
> Does anyone know why "make check" doesn't work on BSD systems if
> tmp_install doesn't exist yet?  It's no big deal, you just have to run
> "gmake check", but Makefile is supposed to do that for you and it
> works fine for every other target.  No big deal, but it'd be nice to
> unravel this mystery...

> Specifically, if you run "make check" then it invokes
> "/usr/local/bin/gmake check" for you, but it seem to skip the step
> that builds tmp_install and so then pg_regress fails.

Hmm, looking into Makefile.global.in, that step seems to be conditional on
MAKELEVEL:

temp-install:
ifndef NO_TEMP_INSTALL
ifneq ($(abs_top_builddir),)
ifeq ($(MAKELEVEL),0)
rm -rf '$(abs_top_builddir)'/tmp_install
$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install 
install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
endif
$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C 
'$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install 
>>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
endif
endif

I'm not real clear on how make invoking gmake would end up affecting
gmake's initial value of MAKELEVEL, but I bet the problem is somewhere
around there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dubious error message from partition.c

2017-08-08 Thread Tom Lane
Amit Langote  writes:
> On 2017/08/09 3:50, Robert Haas wrote:
>> In retrospect, I'm not thrilled by this error message, for two reasons:
>> 1. It gives no details, as other nearby messages do.  For example,
>> further down in the function, we have a message "partition \"%s\"
>> would overlap partition \"%s\", which tells you the names of the old
>> and new partitions.  But this message has no %-escapes at all.
>> ...
>> So, I suggest something like:
>> "lower bound %s for partition \"%s\" must precede upper bound %s"

> Or, we could specify extra information in the detail part in a way that is
> perhaps less confusing:

> ERROR: invalid range bound specification for partition \"%s\"
> DETAIL: specified lower bound %s succeeds upper bound %s

+1 for doing it more or less like that.  One of our basic message style
guidelines is that primary error texts shouldn't be very long, and it'd be
easy to break that rule if we embed data values in it.

A small suggestion is that it'd be better to write it like "Specified
upper bound \"%s\" precedes lower bound \"%s\"."  I think "succeeds" has
more alternate meanings than "precedes", so the wording you have seems
more confusing than it needs to be.  (Of course, the situation could be
the opposite in other languages, but translators have the ability to
reverse the ordering if they need to.)

Or you could just go with "follows" instead of "succeeds".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-08-08 Thread Tom Lane
Peter Geoghegan  writes:
> On Thu, Jan 19, 2017 at 5:45 PM, Peter Geoghegan  wrote:
>> A customer is on 9.6.1, and complains of a segfault observed at least
>> 3 times.
> ...
> For the sake of the archives: this now looks very much like the issue
> that Tom just fixed with commit
> 9bf4068cc321a4d44ac54089ab651a49d89bb567.

Yeah, particularly seeing that $customer noted that some of the
columns involved were UUIDs:

https://www.postgresql.org/message-id/caoxz3fqk9y0gntl8mdoezjy2ot_6lx1yvhay6bd1vykup-i...@mail.gmail.com

Good to have gotten to the bottom of that one.  Too bad it just
missed the train for 9.6.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Default Partition for Range

2017-08-08 Thread Beena Emerson
On Fri, Aug 4, 2017 at 7:48 PM, Robert Haas  wrote:
> On Mon, Jul 31, 2017 at 8:28 AM, Beena Emerson  
> wrote:
>
> Why do we need to introduce PARTITION_RANGE_DATUM_DEFAULT at all?  It
> seems to me that the handling of default range partitions ought to be
> similar to the way a null-accepting list partition is handled -
> namely, it wouldn't show up in the "datums" or "kind" array at all,
> instead just showing up in PartitionBoundInfoData's default_index
> field.
>

I have updated the patch to make it similar to the way default/null is
handled in list partition, removing the PARTITION_RANGE_DATUM_DEFAULT.
This is to be applied over v24 patches shared by Jeevan [1] which
applies on commit id 5ff3d73813ebcc3ff80be77c30b458d728951036.

The RelationBuildPartitionDesc has been modified a lot, especially the
way all_bounds, ndatums and rbounds are set.

[1] 
https://www.postgresql.org/message-id/CAOgcT0OVwDu%2BbeChWb5R5s6rfKLCiWcZT5617hqu7T3GdA1hAw%40mail.gmail.com

-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


default_range_partition_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane  wrote:
>> In the meantime, I think my vote would be to remove AtEOXact_CatCache.

> In all supported branches?

Whatever we do about this issue, I don't feel a need to do it further
back than HEAD.  It's a non-problem except in an assert-enabled build,
and we don't recommend running those for production, only development.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Small code improvement for btree

2017-08-08 Thread Masahiko Sawada
On Mon, Aug 7, 2017 at 1:44 PM, Masahiko Sawada  wrote:
> On Sat, Aug 5, 2017 at 3:29 AM, Peter Geoghegan  wrote:
>> On Fri, Aug 4, 2017 at 11:12 AM, Alvaro Herrera
>>  wrote:
>>> Interesting.  We learned elsewhere that it's better to integrate the
>>> "!= 0" test as part of the macro definition; so a
>>> better formulation of this patch would be to change the
>>> P_INCOMPLETE_SPLIT macro and omit the comparison in the Assert.  (See
>>> commit 594e61a1de03 for an example).
>
> Thank you for the information. The macros other than
> P_INCOMPLETE_SPLIT in btree.h such as P_ISLEAF, P_ISROOT also doesn't
> return booleans. Should we deal with them as well?
>
>>>
>>>
 -   LockBuffer(hbuffer, BUFFER_LOCK_SHARE);
 +   LockBuffer(hbuffer, BT_READ);
>>
>> +1.
>>
>> One Linus Torvalds rant that I actually agreed with was a rant against
>> the use of bool as a type in C code. It's fine, as long as you never
>> forget that it's actually just another integer.
>>
>>> I think BT_READ and BT_WRITE are useless, and I'd rather get rid of
>>> them ...
>>
>> Fair enough, but we should either use them consistently or not at all.
>> I'm not especially concerned about which, as long as it's one of those
>> two.
>>
>
> I definitely agreed.
>

Attached updated patch. I'll add it to next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


code_improvement_for_btree_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Robert Haas
On Tue, Aug 8, 2017 at 4:34 PM, Tom Lane  wrote:
> In the meantime, I think my vote would be to remove AtEOXact_CatCache.

In all supported branches?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible issue with expanded object infrastructure on Postgres 9.6.1

2017-08-08 Thread Peter Geoghegan
On Thu, Jan 19, 2017 at 5:45 PM, Peter Geoghegan  wrote:
> A customer is on 9.6.1, and complains of a segfault observed at least
> 3 times.

> I can use GDB to get details of the instruction pointer that appeared
> in the kernel trap error, which shows a function from the expanded
> value representation infrastructure:
>
> (gdb) info symbol 0x55fcf08b0491
> EOH_get_flat_size + 1 in section .text of /usr/lib/postgresql/9.6/bin/postgres
> (gdb) info symbol 0x55fcf08b0490
> EOH_get_flat_size in section .text of /usr/lib/postgresql/9.6/bin/postgres
> (gdb) disassemble 0x55fcf08b0490
> Dump of assembler code for function EOH_get_flat_size:
>0x55fcf08b0490 <+0>: push   %rbp
>0x55fcf08b0491 <+1>: mov0x8(%rdi),%rax
>0x55fcf08b0495 <+5>: mov%rsp,%rbp
>0x55fcf08b0498 <+8>: pop%rbp
>0x55fcf08b0499 <+9>: mov(%rax),%rax
>0x55fcf08b049c <+12>: jmpq   *%rax
> End of assembler dump.

For the sake of the archives: this now looks very much like the issue
that Tom just fixed with commit
9bf4068cc321a4d44ac54089ab651a49d89bb567.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-08-08 Thread Amit Langote
On 2017/08/09 9:03, David G. Johnston wrote:
> On Tue, Aug 8, 2017 at 4:33 PM, Dean Rasheed wrote:
>> Well perhaps verbosity-reduction isn't sufficient justification but I
>> still think this is correct because logically any values in the bound
>> after MINVALUE/MAXVALUE are irrelevant, so it seems overly restrictive
>> to force all later values to be MINVALUE/MAXVALUE when the code will
>> just ignore those values.
>>
>>
> But semantically using ​unbounded
> is correct - and referencing the principle of "write once, read many"
> people are probably going to care a lot more about the \d display than the
> original SQL - and \d showing some randomly chosen value and mentally doing
> the gymnastics to think "oh, wait, it doesn't actually mater what this
> value is)" compared to it showing "unbounded" and it being obvious that
> "any value" will work, seems like a win.
> 
>> I don't think we should allow values after MINVALUE/MAXVALUE to be
>> omitted altogether because we document multi-column ranges as being
>> most easily understood according to the rules of row-wise comparisons,
>> and they require equal numbers of values in each row.
>>
> 
> I wouldn't change the underlying representation but from a UX perspective
> being able to ?omit the explicit specification and let the system default
> appropriately would be worth considering - though probably not worth the
> effort.
> 
> ​The complaint regarding \d could be solved by figuring out on-the-fly
> whether the particular column is presently bounded or not and diplaying
> "unbounded" for the later ones regardless of what value is stored in the
> catalog.  "Unbounded (0)" would communicate both aspects.​  In the "be
> liberal in what you accept" department that would seem to be a win.

One of the patches posted on the thread where this development occurred
[1] implemented more or less this behavior.  That is, we would let the
users omit inconsequential values in the FROM and TO lists, but internally
store minvalue in the columns for which no value was specified. \d could
show those values as the catalog had it (minvalue).

Consider following example with the current syntax:

create table rp (a int, b int) partition by range (a, b);
create table rp0 partition of rp for values from (minvalue, minvalue) to
(1, minvalue);

\d+ rp0 shows:

Partition of: rp FOR VALUES FROM (MINVALUE, MINVALUE) TO (1, MINVALUE)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a < 1))

With the aforementioned patch, the same partition could be created as:

create table rp0 partition of rp for values from (minvalue) to (1, minvalue);

or:

create table rp0 partition of rp for values from (minvalue) to (1);

Consider one more partition:

create table rp1 partition of rp for values from (1, minvalue) to (1, 1);

\d+ rp1 shows:

Partition of: rp FOR VALUES FROM (1, MINVALUE) TO (1, 1)
Partition constraint: ((a IS NOT NULL) AND (b IS NOT NULL) AND (a = 1) AND
(b < 1))

Same could be created with following alternative command:

create table rp1 partition of rp for values from (1) to (1, 1);

Regarding Dean's argument that it's hard to make sense of that syntax when
one considers that row-comparison logic is used which requires equal
number of columns to be present on both sides, since users are always able
to reveal what ROW expression looks like internally by describing a
partition, they can see what values are being used in the row comparisons,
including those of the columns for which they didn't specify any value
when creating the table.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/a6d6b752-3d08-ded8-3c8f-5cc9f090ec20%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] "make check" with non-GNU make

2017-08-08 Thread Thomas Munro
Hi hackers,

Does anyone know why "make check" doesn't work on BSD systems if
tmp_install doesn't exist yet?  It's no big deal, you just have to run
"gmake check", but Makefile is supposed to do that for you and it
works fine for every other target.  No big deal, but it'd be nice to
unravel this mystery...

Specifically, if you run "make check" then it invokes
"/usr/local/bin/gmake check" for you, but it seem to skip the step
that builds tmp_install and so then pg_regress fails.  If you run
"/usr/local/bin/gmake check" directly then it works, and then future
invocations of "make check" work after that until you next "make
clean" because tmp_install is already there.  One thing I note is that
gmake knows that it's been invoked recursively by a make (in this case
an alien make), judging by the numbers that it prints in square
brackets, so I assume that some communication via environment
variables is causing this, but I can't explain it.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dubious error message from partition.c

2017-08-08 Thread Amit Langote
On 2017/08/09 3:50, Robert Haas wrote:
> In the original table partitioning commit
> (f0e44751d7175fa3394da2c8f85e3ceb3cdbfe630), we introduced the
> following code, which hasn't changed materially since then:
> 
> +if (partition_rbound_cmp(key, lower->datums,
> lower->content, true,
> + upper) >= 0)
> +ereport(ERROR,
> +(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> +errmsg("cannot create range partition with empty range"),
> + parser_errposition(pstate, spec->location)));
> 
> In retrospect, I'm not thrilled by this error message, for two reasons:
> 
> 1. It gives no details, as other nearby messages do.  For example,
> further down in the function, we have a message "partition \"%s\"
> would overlap partition \"%s\", which tells you the names of the old
> and new partitions.  But this message has no %-escapes at all.  It
> tells you neither the name of the partition that you're trying to
> create nor the problematic values.  You might object that you can only
> be creating one partition at a time and should therefore know its name
> but (a) you might be running a script and be uncertain which command
> failed and (b) we've talked repeatedly about introducing convenience
> syntax for creating a bunch of partitions along with the parent table,
> and if we do that at some point, then this will become more of an
> issue.

Yeah, the message can sound confusing.

> 2. The message text is, in my opinion, not as clear as it could be.
> The most logical interpretation of that message, ISTM, would be that
> you've specified a lower bound that is equal to the upper bound - and
> you would indeed get this message in that case.  However, you'd also
> get it if you specified a lower bound that is GREATER than the upper
> bound, and I think that it's possible somebody might get confused.
> For example:
> 
> rhaas=# create table example (a citext) partition by range (a);
> CREATE TABLE
> rhaas=# create table example1 partition of example for values from
> ('Bob') to ('alice');
> ERROR:  cannot create range partition with empty range
> 
> A user who fails to realize what the comparison semantics of citext
> are might scratch their head at this message.  Surely 'Bob' precedes
> 'alice' since 'B' = 66 and 'a' = 97, so what's the matter?  This could
> also come up with plain old text if the partition bounds have a
> different collation than the user is expecting.
> 
> So, I suggest something like:
> 
> "lower bound %s for partition \"%s\" must precede upper bound %s"
> 
> e.g. lower bound (Bob) for partition "example1" must precede upper bound 
> (alice)
> 
> You could still be confused about why Bob comes before alice (sexism?)
> but you'll at least know which partition is the problem and hopefully
> at least have a clearer notion that the problem is that the system's
> idea about how those bounds are ordered differs from your own.

Hmm, maybe no need to put (Bob) and (alice) in the error message text?

"lower bound for partition \"%s\" must precede upper bound"

Or, we could specify extra information in the detail part in a way that is
perhaps less confusing:

ERROR: invalid range bound specification for partition \"%s\"
DETAIL: specified lower bound %s succeeds upper bound %s

Thoughts?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur

2017-08-08 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> Well, I started out believing that the current behavior was for the best,
> and then completely reversed my position and favored the OP's proposal.
> Nothing has really happened since then to change my mind, so I guess I'm
> still in that camp.  But do we have any new data points?  Have any
> beta-testers tested this and what do they think?
> The only non-developer (i.e. person not living in an ivory tower) who has
> weighed in here is Tels, who favored reversing the original decision and
> adopting Tsunakawa-san's position, and that was 2 months ago.
> 
> I am pretty reluctant to tinker with this at this late date and in the face
> of several opposing votes, but I do think that we bet on the wrong horse.

Thank you Robert and Tels.  Yes, Tels's comment sounds plausible as a 
representative real user who expects high availability.  I'm sorry to repeat 
myself, but this feature is for HA, so libpq should attempt to connect to the 
next host when it fails to establish a connection.

Who can conclude this?  I don't think that no feedback from beta users means 
satisfaction with the current behavior.

Regards
Takayuki Tsunakawa





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-08-08 Thread David G. Johnston
On Tue, Aug 8, 2017 at 4:33 PM, Dean Rasheed 
wrote:

> On 8 August 2017 at 19:22, Robert Haas  wrote:
> > On Fri, Jul 21, 2017 at 4:24 AM, Dean Rasheed 
> wrote:
> >> Also drop the constraint prohibiting finite values after an unbounded
> >> column, and just document the fact that any values after MINVALUE or
> >> MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
> >> multiple times, which was needlessly verbose.
> >
> > I would like to (belatedly) complain about this part of this commit.
> > Now you can do stuff like this:
> >
> > rhaas=# create table foo (a int, b text) partition by range (a, b);
> > CREATE TABLE
> > rhaas=# create table foo1 partition of foo for values from (minvalue,
> > 0) to (maxvalue, 0);
> > CREATE TABLE
> >
> > The inclusion of 0 has no effect, but it is still stored in the
> > catalog, shows up in \d foo1, and somebody might think it does
> > something.  I think we should go back to requiring bounds after
> > minvalue or maxvalue to be minvalue or maxvalue.
>

​The commit message indicates one would just specify "unbounded", not
min/maxvalue​, which indeed seems to be a better word for it.

Can we, presently, stick "null" in place of the "0"?

Well perhaps verbosity-reduction isn't sufficient justification but I
> still think this is correct because logically any values in the bound
> after MINVALUE/MAXVALUE are irrelevant, so it seems overly restrictive
> to force all later values to be MINVALUE/MAXVALUE when the code will
> just ignore those values.
>
>
But semantically using
​unbounded
 is correct - and referencing the principle of "write once, read many"
people are probably going to care a lot more about the \d display than the
original SQL - and \d showing some randomly chosen value and mentally doing
the gymnastics to think "oh, wait, it doesn't actually mater what this
value is)" compared to it showing "unbounded" and it being obvious that
"any value" will work, seems like a win.


> I don't think we should allow values after MINVALUE/MAXVALUE to be
> omitted altogether because we document multi-column ranges as being
> most easily understood according to the rules of row-wise comparisons,
> and they require equal numbers of values in each row.
>

I wouldn't change the underlying representation but from a UX perspective
being able to ?omit the explicit specification and let the system default
appropriately would be worth considering - though probably not worth the
effort.

​The complaint regarding \d could be solved by figuring out on-the-fly
whether the particular column is presently bounded or not and diplaying
"unbounded" for the later ones regardless of what value is stored in the
catalog.  "Unbounded (0)" would communicate both aspects.​  In the "be
liberal in what you accept" department that would seem to be a win.

​David J.​


[HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-08-08 Thread Dean Rasheed
On 8 August 2017 at 19:22, Robert Haas  wrote:
> On Fri, Jul 21, 2017 at 4:24 AM, Dean Rasheed  
> wrote:
>> Also drop the constraint prohibiting finite values after an unbounded
>> column, and just document the fact that any values after MINVALUE or
>> MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
>> multiple times, which was needlessly verbose.
>
> I would like to (belatedly) complain about this part of this commit.
> Now you can do stuff like this:
>
> rhaas=# create table foo (a int, b text) partition by range (a, b);
> CREATE TABLE
> rhaas=# create table foo1 partition of foo for values from (minvalue,
> 0) to (maxvalue, 0);
> CREATE TABLE
>
> The inclusion of 0 has no effect, but it is still stored in the
> catalog, shows up in \d foo1, and somebody might think it does
> something.  I think we should go back to requiring bounds after
> minvalue or maxvalue to be minvalue or maxvalue.  I thought maybe the
> idea here was that you were going to allow something like this to
> work, which actually would have saved some typing:
>
> create table foo2 partition of foo for values from (minvalue) to (maxvalue);
>
> But no:
>
> ERROR:  FROM must specify exactly one value per partitioning column
>
> So the only way that this has made things less verbose is by letting
> you put in a meaningless value of the data type which takes fewer than
> 8 characters to type.  That doesn't seem to me to be a defensible way
> of reducing verbosity.
>

Well perhaps verbosity-reduction isn't sufficient justification but I
still think this is correct because logically any values in the bound
after MINVALUE/MAXVALUE are irrelevant, so it seems overly restrictive
to force all later values to be MINVALUE/MAXVALUE when the code will
just ignore those values.

I don't think we should allow values after MINVALUE/MAXVALUE to be
omitted altogether because we document multi-column ranges as being
most easily understood according to the rules of row-wise comparisons,
and they require equal numbers of values in each row.

It's also worth noting that this choice is consistent with other
databases, so at least some people will be used to being able to do
this.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Replication to Postgres 10 on Windows is broken

2017-08-08 Thread Peter Eisentraut
On 8/7/17 21:06, Noah Misch wrote:
>> That would fit.  Until v10 (commit 1e8a850), PQconnectStart() had no in-tree
>> callers outside of libpq itself.
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Peter,
> since you committed the patch believed to have created it, you own this open
> item.

I don't think I can usefully contribute to this.  Could someone else
take it?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Draft for 2017-08-10 Release

2017-08-08 Thread Jonathan Katz
Hi,

> On Aug 6, 2017, at 11:14 AM, Jonathan Katz  
> wrote:
> 
> Hi,
> 
> I have put together a draft of the press release notes for the upcoming 
> 20170810 release.  It is available here:
> 
>   
> https://git.postgresql.org/gitweb/?p=press.git;a=blob_plain;f=update_releases/current/20170810securityrelease.md
>  
> 
> 
> Please let me know by Tuesday Aug 8 if you have any corrections, believe 
> there is something missing from the highlight list, or something that should 
> be removed from the highlight list.
> 
> (And of course typos, etc.)

I have added the additional details for the upcoming release:


https://git.postgresql.org/gitweb/?p=press.git;a=blob_plain;f=update_releases/current/20170810securityrelease.md
 


Please let me know if there are any corrections.

Thanks!

Jonathan



[HACKERS] Re: [COMMITTERS] pgsql: Fix inadequacies in recently added wait events

2017-08-08 Thread Alvaro Herrera
Thread moved to -hackers.

Thomas Munro wrote:
> On Wed, Aug 9, 2017 at 7:39 AM, Alvaro Herrera  
> wrote:

> > While at it, fix numerous other problems in the vicinity:

> All of the above seem like good candidates for a checker script in
> src/tools/check_XXX.pl, a bit like the others I've talked about
> recently [1][2].

Yeah, that's one idea, but I think it'd be better to have all these
things be generated content from a single "wait_events.txt" file, like
src/backend/utils/errcodes.txt which gives birth to a whole host of
different files.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-08-08 Thread Tom Lane
Michael Paquier  writes:
> On Tue, Jun 27, 2017 at 3:44 AM, Tom Lane  wrote:
>> Looks good as far as it goes, but I wonder whether any of the other
>> get_slot_xmins calls need polling too.  Don't feel a need to add such
>> calls until someone exhibits a failure there, but I won't be very
>> surprised if someone does.

And behold, we have here
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar=2017-08-08%2020%3A54%3A09

#   Failed test 'xmin of cascaded slot non-null with hs feedback'
#   at t/001_stream_rep.pl line 224.
#  got: ''
# expected: anything else

That's one of only four calls lacking a preceding wait_slot_xmins call.
The ones at lines 173 and 177 seem safe because nothing has happened yet.
I think the one at line 300 should be safe because the standby_2 server is
shut down at that point, but is there any way that the standby_1's view
hasn't updated by the time that happens?

> I got the same thought, wondering as well if get_slot_xmins should be
> renamed check_slot_xmins with the is() tests moved inside it as well.
> Not sure if that's worth the API ugliness though.

Mmm, doesn't seem like that's worth doing, but I'm half tempted to merge
wait_slot_xmins into get_slot_xmins so you can't skip it ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 8, 2017 at 12:26 PM, Tom Lane  wrote:
>> Yeah, I thought about weakening the assertions too, but I couldn't
>> see a fix of that kind that didn't seem mighty ad-hoc.

> More concretely, the present example seems no different than the
> ResourceOwner stuff emitting warnings on commit and remaining silent
> on abort.  We could make it complain on both commit and abort, but
> then it would fail spuriously because there's no other mechanism to
> release resources in the abort path, so we don't.  Similarly here, we
> have every reason to expect the check to pass during ERROR recovery
> but there is no reason to expect it to pass during FATAL recovery, yet
> as coded we will do the test anyway.  That's wrong.

I think that's arguing from expedience not principle.  We had every
reason to think it would pass during FATAL errors too, until we noticed
this specific misbehavior; and there is at least as much of an argument
that this is a bug in SearchCatCacheList as there is that the check
is too strong.

>> Now, there is some room to argue that AtEOXact_CatCache() is just
>> obsolete and we should remove it altogether; I don't think it's
>> caught a real bug since we invented resowners in 8.1.

> Yeah, the same thought crossed my mind.  IIUC, we'd crash if a
> catcache reference were acquired without CurrentResourceOwner being
> valid, so this is really just a belt-and-suspenders check.

Right.  It was worth keeping it around till we were sure all the bugs
were shaken out of ResourceOwners, but surely we crossed the point of
diminishing returns for that long ago.

> ...  I'm also not deadly opposed to
> redesigning the whole mechanism either, but I think that should be
> approached with a measure of caution: it might end up reducing
> reliability rather than increasing it.  I suggest in any case that we
> start with a surgical fix.

In the absence of clear evidence that there are similar bugs elsewhere,
I agree that redesigning FATAL exits would likely cause more problems
than it solves.  But I feel like it would be a good thing to test for.
I wonder if Andreas would be interested in trying the randomly-timed-
SIGTERM thing with sqlsmith.

In the meantime, I think my vote would be to remove AtEOXact_CatCache.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Infrastructure for JIT compiling tuple deforming

2017-08-08 Thread Andres Freund
Hi,

As previously mentioned, tuple deforming is a major bottleneck, and
JITing it can be highly beneficial.  I previously had posted a prototype
that does JITing at the slot_deform_tuple() level, caching the deformed
function in the tupledesc.

Storing things in the tupledesc isn't a great concept however - the
lifetime of the generated function is hard to manage. But more
importantly, and even if we moved this into the slot, it precludes
important optimization.

JITing the deforming is a *lot* more efficient if we can combine it with
the JITing of the expressions using the deformed expression. There's a
couple of reasons for that:

1) By knowing the exact attnum the caller is going to request, the code
   can be optimized. No need to generate code for columns not
   deformed. If there's NOT NULL columns at/after the last
   to-be-deformed column, there's no need to generate checks about the
   length of the null-bitmap - getting rid of about half the branches!

2) By generating the deforming code in the generated expression code,
   the code will be generated together.. That's a good chunk of the
   overhead, of the memory mapping overhead, and it noticeably reduces
   function call overhead (because relative near calls can be used).

3) LLVM's optimizer can inline parts / all of the tuple deforming code
   into the expression evaluation function, further reducing
   overhead. In simpler cases and with some additional prodding, llvm
   even can interleave deforming of individual columns and their use
   (note that I'm not proposing to do so initially).

4) If we know that the underlying tuple is an actual nonvirtual tuple,
   e.g. on the scan level, the slot deforming of NOT NULL can be
   replaced with direct byte accesses to the relevant column - a good
   chunk faster again.
   (note that I'm not proposing to do so initially)

The problem however is that when generating the expression code we don't
have the necessary information. In my current prototype I'm emitting the
LLVM IR (the input to LLVM) at ExecInitExpr() time for all expressions
in a tree. That allows to emit the code for all functions in executor
tree in one go.  But unfortunately the current executor initiation
"framework" doesn't provide information about the underlying slot
tupledescs at that time.  Nor does it actually guarantee that the
tupledesc / slots stay the same over the course of the execution.

Therefore I'd like to somehow change things so that the executor keeps
track of whether the tupledesc of inner/outer/scan are going to change,
and if not provide them.

The right approach here seems to be to add a bit of extra data to
ExecAssignScanType etc., and move ExecInitExpr / ExecInitQual /
ExecAssignScanProjectionInfo /... to after that.  We then could keep
track of of the relevant tupledescs somewhere in PlanState - that's a
bit ugly, but I don't quite see how to avoid that unless we want to add
major executor-node awareness into expression evaluation.

Thoughts? Better ideas?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-08 Thread Robert Haas
On Tue, Aug 8, 2017 at 8:48 AM, Rushabh Lathia  wrote:
> It seems like with we set the numParents and parents only for the
> dumpable objects (flagInhTables()). Current patch relies on the numParents
> and parents to get the root partition TableInfo, but when --schema is been
> specified - it doesn't load those information for the object which is not
> dumpable.
>
> Now one options is:
>
> 1) restrict the --load-via-partition-root to be used with
> the --schema or may be some other options - where we restrict the
> objects.
>
> Consider this, partition root is in schema 'a' and the partition table is in
> schema 'b', if someone specify the --schema b with
> --load-via-partition-root,
> I think we should not do "INSERT INTO a.tab" to load the data (because
> user specified --schema b).
>
> 2) fix flagInhTables() to load the numParents and the parents information
> for the partition table (can be checked using ispartition), irrespective of
> whether object is dumpable is true or not.

(1) seems like a pretty arbitrary restriction, so I don't like that
option.  (2) would hurt performance in some use cases.  Do we have an
option (3)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dubious error message from partition.c

2017-08-08 Thread Robert Haas
In the original table partitioning commit
(f0e44751d7175fa3394da2c8f85e3ceb3cdbfe630), we introduced the
following code, which hasn't changed materially since then:

+if (partition_rbound_cmp(key, lower->datums,
lower->content, true,
+ upper) >= 0)
+ereport(ERROR,
+(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+errmsg("cannot create range partition with empty range"),
+ parser_errposition(pstate, spec->location)));

In retrospect, I'm not thrilled by this error message, for two reasons:

1. It gives no details, as other nearby messages do.  For example,
further down in the function, we have a message "partition \"%s\"
would overlap partition \"%s\", which tells you the names of the old
and new partitions.  But this message has no %-escapes at all.  It
tells you neither the name of the partition that you're trying to
create nor the problematic values.  You might object that you can only
be creating one partition at a time and should therefore know its name
but (a) you might be running a script and be uncertain which command
failed and (b) we've talked repeatedly about introducing convenience
syntax for creating a bunch of partitions along with the parent table,
and if we do that at some point, then this will become more of an
issue.

2. The message text is, in my opinion, not as clear as it could be.
The most logical interpretation of that message, ISTM, would be that
you've specified a lower bound that is equal to the upper bound - and
you would indeed get this message in that case.  However, you'd also
get it if you specified a lower bound that is GREATER than the upper
bound, and I think that it's possible somebody might get confused.
For example:

rhaas=# create table example (a citext) partition by range (a);
CREATE TABLE
rhaas=# create table example1 partition of example for values from
('Bob') to ('alice');
ERROR:  cannot create range partition with empty range

A user who fails to realize what the comparison semantics of citext
are might scratch their head at this message.  Surely 'Bob' precedes
'alice' since 'B' = 66 and 'a' = 97, so what's the matter?  This could
also come up with plain old text if the partition bounds have a
different collation than the user is expecting.

So, I suggest something like:

"lower bound %s for partition \"%s\" must precede upper bound %s"

e.g. lower bound (Bob) for partition "example1" must precede upper bound (alice)

You could still be confused about why Bob comes before alice (sexism?)
but you'll at least know which partition is the problem and hopefully
at least have a clearer notion that the problem is that the system's
idea about how those bounds are ordered differs from your own.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [COMMITTERS] pgsql: Use MINVALUE/MAXVALUE instead of UNBOUNDED for range partition b

2017-08-08 Thread Robert Haas
On Fri, Jul 21, 2017 at 4:24 AM, Dean Rasheed  wrote:
> Also drop the constraint prohibiting finite values after an unbounded
> column, and just document the fact that any values after MINVALUE or
> MAXVALUE are ignored. Previously it was necessary to repeat UNBOUNDED
> multiple times, which was needlessly verbose.

I would like to (belatedly) complain about this part of this commit.
Now you can do stuff like this:

rhaas=# create table foo (a int, b text) partition by range (a, b);
CREATE TABLE
rhaas=# create table foo1 partition of foo for values from (minvalue,
0) to (maxvalue, 0);
CREATE TABLE

The inclusion of 0 has no effect, but it is still stored in the
catalog, shows up in \d foo1, and somebody might think it does
something.  I think we should go back to requiring bounds after
minvalue or maxvalue to be minvalue or maxvalue.  I thought maybe the
idea here was that you were going to allow something like this to
work, which actually would have saved some typing:

create table foo2 partition of foo for values from (minvalue) to (maxvalue);

But no:

ERROR:  FROM must specify exactly one value per partitioning column

So the only way that this has made things less verbose is by letting
you put in a meaningless value of the data type which takes fewer than
8 characters to type.  That doesn't seem to me to be a defensible way
of reducing verbosity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More race conditions in logical replication

2017-08-08 Thread Alvaro Herrera
Robert Haas wrote:
> On Mon, Aug 7, 2017 at 8:14 PM, Alvaro Herrera  
> wrote:
> > BTW, I noticed that the PG_WAIT_LOCK value that we're using for wait
> > event here (and in the replication slot case) is bogus.  We probably
> > need something new here.
> 
> Yeah, if you're adding a new wait point, you should add document a new
> constant in the appropriate section, probably something under
> PG_WAIT_IPC in this case.

Here's a patch.  It turned to be a bit larger than I initially expected.

Wait events are a maintainability fail IMO.  I think we need to rethink
this stuff; using generated files from a single source containing the C
symbol, text name and doc blurb sounds better.  That can wait for pg11
though.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From e050ef66956935e6dba41fc18e11632ff9f0068f Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 8 Aug 2017 13:33:47 -0400
Subject: [PATCH] Fix various inadequacies in wait events

In commit 9915de6c1cb2, we introduced a new wait point for replication
slots and incorrectly labelled it as wait event PG_WAIT_LOCK.  That's
wrong, so invent an appropriate new wait event instead, and document it
properly.

While at it, fix numerous other problems in the vicinity:
- two different walreceiver wait events were being mixed up in a single
  wait event; split it out so that they can be distinguished, and
  document the new symbols properly.

- ParallelBitmapPopulate was documented but didn't exist.

- ParallelBitmapScan was not documented

- Logical replication wait events weren't documented

- various symbols had been added in dartboard order in various places.
  Put them back in alphabetical order, as was originally intended.
---
 doc/src/sgml/monitoring.sgml   | 32 +--
 src/backend/postmaster/pgstat.c| 36 +-
 .../libpqwalreceiver/libpqwalreceiver.c|  4 +--
 src/backend/replication/slot.c |  3 +-
 src/include/pgstat.h   | 16 +-
 5 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index be3dc672bc..eb20c9c543 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1176,6 +1176,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting in main loop of checkpointer process.
 
 
+ LogicalLauncherMain
+ Waiting in main loop of logical launcher process.
+
+
+ LogicalApplyMain
+ Waiting in main loop of logical apply process.
+
+
  PgStatMain
  Waiting in main loop of the statistics collector 
process.
 
@@ -1213,6 +1221,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting to write data from the client.
 
 
+ LibPQWalReceiverConnect
+ Waiting in WAL receiver to establish connection to remote 
server.
+
+
+ LibPQWalReceiverReceive
+ Waiting in WAL receiver to receive data from remote 
server.
+
+
  SSLOpenServer
  Waiting for SSL while attempting connection.
 
@@ -1251,6 +1267,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting for activity from child process when executing 
Gather node.
 
 
+ LogicalSyncData
+ Waiting for logical replication remote server to send data for 
initial table synchronization.
+
+
+ LogicalSyncStateChange
+ Waiting for logical replication remote server to change 
state.
+
+
  MessageQueueInternal
  Waiting for other process to be attached in shared message 
queue.
 
@@ -1271,14 +1295,18 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  Waiting for parallel workers to finish computing.
 
 
- ParallelBitmapPopulate
- Waiting for the leader to populate the TidBitmap.
+ ParallelBitmapScan
+ Waiting for parallel bitmap scan to become initialized.
 
 
  ProcArrayGroupUpdate
  Waiting for group leader to clear transaction id at 
transaction end.
 
 
+ ReplicationSlotDrop
+ Waiting for a replication slot to become inactive to be 
dropped.
+
+
  SafeSnapshot
  Waiting for a snapshot for a READ ONLY DEFERRABLE 
transaction.
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a0b0eecbd5..3f5fb796a5 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ 

Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Robert Haas
On Tue, Aug 8, 2017 at 12:26 PM, Tom Lane  wrote:
> Yeah, I thought about weakening the assertions too, but I couldn't
> see a fix of that kind that didn't seem mighty ad-hoc.

I don't really see what's ad-hoc about skipping it in the case where a
FATAL is in progress.  I mean, skipping a sanity check only in the
cases where we know it might pass - and are OK with the fact that it
might not pass - seems to me to be an extremely difficult policy to
argue against on rational grounds.  That's just the definition of
writing correct sanity checks.

More concretely, the present example seems no different than the
ResourceOwner stuff emitting warnings on commit and remaining silent
on abort.  We could make it complain on both commit and abort, but
then it would fail spuriously because there's no other mechanism to
release resources in the abort path, so we don't.  Similarly here, we
have every reason to expect the check to pass during ERROR recovery
but there is no reason to expect it to pass during FATAL recovery, yet
as coded we will do the test anyway.  That's wrong.

> Now, there is some room to argue that AtEOXact_CatCache() is just
> obsolete and we should remove it altogether; I don't think it's
> caught a real bug since we invented resowners in 8.1.

Yeah, the same thought crossed my mind.  IIUC, we'd crash if a
catcache reference were acquired without CurrentResourceOwner being
valid, so this is really just a belt-and-suspenders check.

> But, again, the larger question to my mind is where else we may have
> similar issues.  There's certainly never been any methodical attempt
> to see whether elog(FATAL) will work everywhere.

IIRC, you raised a similar objection back when Bruce added
pg_terminate_backend(), which caused that change to be briefly
reverted before ultimately being put back.  Despite the hardening you
did back then, I think it's highly likely that bugs remain in that
path, and I am of course not opposed to trying to improve the
situation.  That having been said, the bugs that remain are probably
mostly quite low-probability, because otherwise we'd have found them
by now.  I think parallel query is likely to flush out a few of the
ones that remain by creating a new class of backends that terminate
after a single query and may get killed by the leader at any time;
that's how we discovered this issue.  Fuzz testing could be done, too,
e.g. run something like sqlsmith simultaneously in many sessions while
killing off backends at random.  I'm also not deadly opposed to
redesigning the whole mechanism either, but I think that should be
approached with a measure of caution: it might end up reducing
reliability rather than increasing it.  I suggest in any case that we
start with a surgical fix.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 8, 2017 at 11:36 AM, Tom Lane  wrote:
>> We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead
>> of plain PG_TRY.  But I have an itchy feeling that there may be a lot
>> of places with similar issues.  Should we be revisiting the basic way
>> that elog(FATAL) works, to make it less unlike elog(ERROR)?

> ... Arguably, this assertion is merely overzealous; we could skip
> this processing when proc_exit_inprogress, for example.

Yeah, I thought about weakening the assertions too, but I couldn't
see a fix of that kind that didn't seem mighty ad-hoc.

Now, there is some room to argue that AtEOXact_CatCache() is just
obsolete and we should remove it altogether; I don't think it's
caught a real bug since we invented resowners in 8.1.  Short of that,
though, I'm not really happy with just arbitrarily weakening the
checks.

But, again, the larger question to my mind is where else we may have
similar issues.  There's certainly never been any methodical attempt
to see whether elog(FATAL) will work everywhere.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Robert Haas
On Tue, Aug 8, 2017 at 11:36 AM, Tom Lane  wrote:
>> By looking at the stack-trace, and as discussed it with my team members;
>> what we have observed that in SearchCatCacheList(), we are incrementing
>> refcount and then decrementing it at the end. However for some reason, if
>> we are in TRY() block (where we increment the refcount), and hit with any
>> interrupt, we failed to decrement the refcount due to which later we get
>> assertion failure.
>
> Hm.  So SearchCatCacheList has a PG_TRY block that is meant to release
> those refcounts, but if you hit the backend with a SIGTERM while it's
> in that function, control goes out through elog(FATAL) which doesn't
> execute the PG_CATCH cleanup.  But it does do AbortTransaction which
> calls AtEOXact_CatCache, and that is expecting that all the cache
> refcounts have reached zero.
>
> We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead
> of plain PG_TRY.  But I have an itchy feeling that there may be a lot
> of places with similar issues.  Should we be revisiting the basic way
> that elog(FATAL) works, to make it less unlike elog(ERROR)?

I'm not sure what the technically best fix here is.  It strikes me
that the charter of FATAL ought to be to exit the backend cleanly,
safely, and expeditiously.  PG_ENSURE_ERROR_CLEANUP, or some similar
mechanism, is necessary when we need to correct the contents of shared
memory so that other backends can continue to function, but there's no
such problem here.  You're proposing to do more work in the FATAL
pathway so that the debugging cross-checks in the FATAL pathway will
pass, which is not an entirely palatable idea.  It's definitely
frustrating that the ERROR and FATAL pathways are so different - that
generated a surprising amount of the work around DSM - but I'm still
skeptical about a solution that involves doing more cleanup of what
are essentially irrelevant backend-local data structures in the FATAL
path.  Arguably, this assertion is merely overzealous; we could skip
this processing when proc_exit_inprogress, for example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-08 Thread amul sul
On Tue, Aug 8, 2017 at 6:18 PM, Rushabh Lathia  wrote:
> Thanks Rajkumar for testing and reporting this.
>
>
> It seems like with we set the numParents and parents only for the
> dumpable objects (flagInhTables()). Current patch relies on the numParents
> and parents to get the root partition TableInfo, but when --schema is been
> specified - it doesn't load those information for the object which is not
> dumpable.
>
> Now one options is:
>
> 1) restrict the --load-via-partition-root to be used with
> the --schema or may be some other options - where we restrict the
> objects.
>
> Consider this, partition root is in schema 'a' and the partition table is in
> schema 'b', if someone specify the --schema b with
> --load-via-partition-root,
> I think we should not do "INSERT INTO a.tab" to load the data (because
> user specified --schema b).
>

+1, this looks cleaner to me.

> 2) fix flagInhTables() to load the numParents and the parents information
> for the partition table (can be checked using ispartition), irrespective of
> whether object is dumpable is true or not.
>
> May be something like:
>
> @@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables,
>
> /* Don't bother computing anything for non-target tables, either */
> if (!tblinfo[i].dobj.dump)
> +   {
> +   if (tblinfo[i].ispartition)
> +   findParentsByOid([i], inhinfo, numInherits);
> continue;
> +   }
>
> I am still looking into this, meanwhile any inputs are welcome.
>

See the note given in the pg_dump document[1] is :

"When -n is specified, pg_dump makes no attempt to dump any other
database objects that the selected schema(s) might depend upon.
Therefore, there is no guarantee that the results of a specific-schema
dump can be successfully restored by themselves into a clean
database."

If we want to follow the same trend then we could simply dump all
partition(ed) belong to a specified schema. In the Rajkumar’s example,
we need to dump partitions belong to schema 'a' (i.e t1_p1 and
t1_p2_p1), and target root for the insert query will be the same table
or the parent belong to the same schema.

Regards,
Amul

[1] https://www.postgresql.org/docs/devel/static/app-pgdump.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Tom Lane
Jeevan Chalke  writes:
> We have observed a random server crash (FailedAssertion), while running few
> tests at our end. Stack-trace is attached.

> By looking at the stack-trace, and as discussed it with my team members;
> what we have observed that in SearchCatCacheList(), we are incrementing
> refcount and then decrementing it at the end. However for some reason, if
> we are in TRY() block (where we increment the refcount), and hit with any
> interrupt, we failed to decrement the refcount due to which later we get
> assertion failure.

Hm.  So SearchCatCacheList has a PG_TRY block that is meant to release
those refcounts, but if you hit the backend with a SIGTERM while it's
in that function, control goes out through elog(FATAL) which doesn't
execute the PG_CATCH cleanup.  But it does do AbortTransaction which
calls AtEOXact_CatCache, and that is expecting that all the cache
refcounts have reached zero.

We could respond to this by using PG_ENSURE_ERROR_CLEANUP there instead
of plain PG_TRY.  But I have an itchy feeling that there may be a lot
of places with similar issues.  Should we be revisiting the basic way
that elog(FATAL) works, to make it less unlike elog(ERROR)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl extension fails on Windows

2017-08-08 Thread Robert Haas
On Tue, Aug 8, 2017 at 8:37 AM, Sandeep Thakkar
 wrote:
> An update from beta3 build: We are no longer seeing this issue (handshake
> failure) on Windows 64bit, but on Windows 32bit it still persists.

Hmm, maybe you should've reported it sooner, so we could've tried to
fix this before beta3 went out.

What was the exact message you saw, including the hex values?

Is the Perl you were building against for plperl the same Perl that
was being used for the build itself?

Do you have the portion of the build log where src/pl/plperl was being built?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-08 Thread Fabien COELHO


Hello Mahahiko-san,

My 0.02€ about the patch & feature, which only reflect my point of view:

Please add a number to patches to avoid ambiguities. This was patch was 
the second sent on the thread.


There is no need to have both custom_init & init functions. I'll suggest 
to remove function "init", rename "custom_init" as "init", and make the 
option default to what is appropriate so that it initialize the schema as

expected, and there is only one initialization mechanism.

I would suggest to make initialization sub options (no-vacuum, 
foreign-key...) rely on updating the initialization operations instead of 
being maintained separately. Currently "--no-vacuum --custom-init=vacuum" 
would skip vacuum, which is quite debatable...


I'm not sure of the "custom-initialize" option name, but why not. I 
suggest to remove "is_initialize_suite", and make custom-initialize 
require -i as seems logical and upward compatible.


ISTM that there should be short names for the phases. Maybe using only one 
letter could simplify the code significantly, dropping commas and list, 
eg:  "t" for "create_table", "d" for "load_data", "p" for "create_pkey", 
"f" for "create_fkey", "v" for "vacuum".


I do not think that allowing a space in the list is a shell-wise idea.

I'm also wondering whether using a list is a good option, because it 
implies a large parse function, list management and so on. With the one 
letter version, it could be just a string to be scanned char by char for 
operations.


Otherwise, at least allow short two letter codes (eg: ct ld pk fk va), in 
order to avoid writing a very long thing on the command line, which is 
quite a pain:


  sh> pgbench 
--custom-initialize=create_table,load_data,primary_key,foreign_key,vacuum ...

vs

  sh> pgbench -i -I tdpfv ...

Maybe there could be short-hands for usual setups, eg "default" for "tdpv" 
or maybe "ct,ld,pk,va", "full" for "tdpfv" or maybe "ct,ld,pk,fk,va"...


Typo in doc "initailize" -> "initialize". Option values should be 
presented in their logical execution order, i.e. put vacuum at the end.


Typo in help "initilize" -> "initialize". I would suggest to drop the space in 
the
option value in the presentation so that quotes are not needed.

Remove the "no-primary-keys" from the long option array as it has 
disappeared. You might consider make "custom-initialize" take the 'I' 
short option.


Ranting unrelated to this patch: the automatic aid type switching based on 
scale is a bad idea (tm), because when trying to benchmark it means that 
changing the scale also changes the schema, and you really do not need 
that. ISTM that it should always use INT8.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-08 Thread Alexander Korotkov
On Tue, Aug 8, 2017 at 4:12 PM, Mark Rofail  wrote:

> On Tue, Aug 8, 2017 at 2:25 PM, Alexander Korotkov 
> wrote:
>>
>> Do we already assume that default btree opclass for array element type
>> matches PK opclass when using @>> operator on UPDATE/DELETE of referenced
>> table?
>>
> I believe so, since it's a polymorphic function.
>
>
>> If so, we don't introduce additional restriction here...
>>
> You mean to remove the wrapper query ?
>

I think we should choose the query which would be better planned (and
presumably faster executed).  You can make some experiments and then choose
the query.


> GROUP BY would also use default btree/hash opclass for element type.  It
>> doesn't differ from DISTINCT from that point.
>>
> Then there's no going around this limitation,
>

That seems like this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-08 Thread Mark Rofail
On Tue, Aug 8, 2017 at 2:25 PM, Alexander Korotkov 
wrote:
>
> Do we already assume that default btree opclass for array element type
> matches PK opclass when using @>> operator on UPDATE/DELETE of referenced
> table?
>
I believe so, since it's a polymorphic function.


> If so, we don't introduce additional restriction here...
>
You mean to remove the wrapper query ?


> GROUP BY would also use default btree/hash opclass for element type.  It
> doesn't differ from DISTINCT from that point.
>
Then there's no going around this limitation,

Best Regard,
Mark Rofail


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-08 Thread Rushabh Lathia
Thanks Rajkumar for testing and reporting this.


It seems like with we set the numParents and parents only for the
dumpable objects (flagInhTables()). Current patch relies on the numParents
and parents to get the root partition TableInfo, but when --schema is been
specified - it doesn't load those information for the object which is not
dumpable.

Now one options is:

1) restrict the --load-via-partition-root to be used with
the --schema or may be some other options - where we restrict the
objects.

Consider this, partition root is in schema 'a' and the partition table is in
schema 'b', if someone specify the --schema b with
--load-via-partition-root,
I think we should not do "INSERT INTO a.tab" to load the data (because
user specified --schema b).

2) fix flagInhTables() to load the numParents and the parents information
for the partition table (can be checked using ispartition), irrespective of
whether object is dumpable is true or not.

May be something like:

@@ -322,7 +322,11 @@ flagInhTables(TableInfo *tblinfo, int numTables,

/* Don't bother computing anything for non-target tables, either */
if (!tblinfo[i].dobj.dump)
+   {
+   if (tblinfo[i].ispartition)
+   findParentsByOid([i], inhinfo, numInherits);
continue;
+   }

I am still looking into this, meanwhile any inputs are welcome.



On Tue, Aug 8, 2017 at 4:10 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Rushabh,
>
> While testing latest v2 patch, I got a crash when using
> --load-via-partition-root with --schema options. Below are steps to
> reproduce.
>
> --create below test data
> create schema a;
> create schema b;
> create schema c;
>
> create table t1 (a int,b text) partition by list(a);
> create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by
> list(a);
> create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2);
> create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4);
> create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by
> list(a);
> create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6);
> create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8);
>
> insert into t1 values (8,'t1');
> insert into a.t1_p1 values (2,'a.t1_p1');
> insert into b.t1_p1_p1 values (1,'b.t1_p1_p1');
> insert into c.t1_p1_p2 values (3,'c.t1_p1_p2');
> insert into b.t1_p2 values (6,'b.t1_p2');
> insert into a.t1_p2_p1 values (5,'a.t1_p2_p1');
> insert into t1_p2_p2 values (7,'t1_p2_p1');
> insert into t1 values (4 ,'t1');
>
> --trying to take pg_dump
> [edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp
> [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f
> d2.dump -Fp
> [edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root
> --schema=a -f d3.dump -Fp
> pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents
> == 1' failed.
> Aborted (core dumped)
>
>
>
> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>
> On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia 
> wrote:
>
>>
>> Here is an update patch,  now renamed the switch to
>> --load-via-partition-root
>> and also added the documentation for the new switch into pg_dump as well
>> as pg_dumpall.
>>
>>
>> On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <
>> langote_amit...@lab.ntt.co.jp> wrote:
>>
>>> On 2017/08/04 1:08, David G. Johnston wrote:
>>> > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane  wrote:
>>> >
>>> >> Robert Haas  writes:
>>> >>> So maybe --load-via-partition-root if nobody likes my previous
>>> >>> suggestion of --partition-data-via-root ?
>>> >>
>>> >> WFM.
>>> >>
>>> >
>>> > ​+1
>>>
>>> +1.
>>>
>>> Thanks,
>>> Amit
>>>
>>>
>>
>>
>> Thanks,
>> Rushabh Lathia
>> www.EnterpriseDB.com
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>


-- 
Rushabh Lathia


Re: [HACKERS] pl/perl extension fails on Windows

2017-08-08 Thread Sandeep Thakkar
Hi

An update from beta3 build: We are no longer seeing this issue (handshake
failure) on Windows 64bit, but on Windows 32bit it still persists.

On Mon, Jul 31, 2017 at 10:15 PM, Christoph Berg  wrote:

> Re: Tom Lane 2017-07-31 <30582.1501508...@sss.pgh.pa.us>
> > Christoph Berg  writes:
> > >>> The only interesting line in log/postmaster.log is a
> log_line_prefix-less
> > >>> Util.c: loadable library and perl binaries are mismatched (got
> handshake key 0xd500080, needed 0xd600080)
> >
> > Can we see the Perl-related output from configure, particularly the new
> > lines about CFLAGS?
>
> $ ./configure --with-perl
>
> checking whether to build Perl modules... yes
>
> checking for perl... /usr/bin/perl
> configure: using perl 5.26.0
> checking for Perl archlibexp... /usr/lib/x86_64-kfreebsd-gnu/perl/5.26
> checking for Perl privlibexp... /usr/share/perl/5.26
> checking for Perl useshrplib... true
> checking for CFLAGS recommended by Perl... -D_REENTRANT -D_GNU_SOURCE
> -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include
> -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
> checking for CFLAGS to compile embedded Perl... -DDEBIAN
> checking for flags to link embedded Perl...   -fstack-protector-strong
> -L/usr/local/lib  -L/usr/lib/x86_64-kfreebsd-gnu/perl/5.26/CORE -lperl
> -ldl -lm -lpthread -lc -lcrypt
>
> checking for perl.h... yes
> checking for libperl... yes
>
> Christoph
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Sandeep Thakkar
EDB


[HACKERS] Server crash (FailedAssertion) due to catcache refcount mis-handling

2017-08-08 Thread Jeevan Chalke
Hi,

We have observed a random server crash (FailedAssertion), while running few
tests at our end. Stack-trace is attached.

By looking at the stack-trace, and as discussed it with my team members;
what we have observed that in SearchCatCacheList(), we are incrementing
refcount and then decrementing it at the end. However for some reason, if
we are in TRY() block (where we increment the refcount), and hit with any
interrupt, we failed to decrement the refcount due to which later we get
assertion failure.

To mimic the scenario, I have added a sleep in SearchCatCacheList() as
given below:

diff --git a/src/backend/utils/cache/catcache.c
b/src/backend/utils/cache/catcache.c
index e7e8e3b..eb6d4b5 100644
--- a/src/backend/utils/cache/catcache.c
+++ b/src/backend/utils/cache/catcache.c
@@ -1520,6 +1520,9 @@ SearchCatCacheList(CatCache *cache,
hashValue = CatalogCacheComputeTupleHashValue(cache, ntp);
hashIndex = HASH_INDEX(hashValue, cache->cc_nbuckets);

+   elog(INFO, "Sleeping for 0.1 seconds.");
+   pg_usleep(10L); /* 0.1 seconds */
+
bucket = >cc_bucket[hashIndex];
dlist_foreach(iter, bucket)
{

And then followed these steps to get a server crash:

-- Terminal 1
DROP TYPE typ;
DROP FUNCTION func(x int);

CREATE TYPE typ AS (X VARCHAR(50), Y INT);

CREATE OR REPLACE FUNCTION func(x int) RETURNS int AS $$
DECLARE
  rec typ;
  var2 numeric;
BEGIN
  RAISE NOTICE 'Function Called.';
  REC.X := 'Hello';
  REC.Y := 0;

  IF (rec.Y + var2) = 0 THEN
RAISE NOTICE 'Check Pass';
  END IF;

  RETURN 1;
END;
$$ LANGUAGE plpgsql;

SELECT pg_backend_pid();

SELECT func(1);

-- Terminal 2, should be run in parallel when SELECT func(1) is in progress
in terminal 1.
SELECT pg_terminate_backend();


I thought it worth posting here to get others attention.

I have observed this on the master branch, but can also be reproducible on
back-branches.

Thanks
-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Program terminated with signal 6, Aborted.
#0  0x7fdff78951d7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install 
audit-libs-2.6.5-3.el7_3.1.x86_64 cyrus-sasl-lib-2.1.26-20.el7_2.x86_64 
glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 
krb5-libs-1.14.1-27.el7_3.x86_64 libcap-ng-0.7.5-4.el7.x86_64 
libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 
libicu-50.1.2-15.el7.x86_64 libselinux-2.5-6.el7.x86_64 
libstdc++-4.8.5-11.el7.x86_64 nspr-4.13.1-1.0.el7_3.x86_64 
nss-3.28.2-1.6.el7_3.x86_64 nss-softokn-freebl-3.16.2.3-14.4.el7.x86_64 
nss-util-3.28.2-1.1.el7_3.x86_64 openldap-2.4.40-13.el7.x86_64 
openssl-libs-1.0.1e-60.el7_3.1.x86_64 pam-1.1.8-18.el7.x86_64 
pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x7fdff78951d7 in raise () from /lib64/libc.so.6
#1  0x7fdff78968c8 in abort () from /lib64/libc.so.6
#2  0x009d43ff in ExceptionalCondition (conditionName=0xc02255 
"!(ct->refcount == 0)", errorType=0xc020b4 "FailedAssertion", 
fileName=0xc02060 "catcache.c", lineNumber=567) at assert.c:54
#3  0x009b59e0 in AtEOXact_CatCache (isCommit=0 '\000') at 
catcache.c:567
#4  0x00538a8d in AbortTransaction () at xact.c:2613
#5  0x0053a95e in AbortOutOfAnyTransaction () at xact.c:4271
#6  0x009e8e99 in ShutdownPostgres (code=1, arg=0) at postinit.c:1146
#7  0x008456e1 in shmem_exit (code=1) at ipc.c:228
#8  0x008455d5 in proc_exit_prepare (code=1) at ipc.c:185
#9  0x00845543 in proc_exit (code=1) at ipc.c:102
#10 0x009d4bae in errfinish (dummy=0) at elog.c:543
#11 0x008761b4 in ProcessInterrupts () at postgres.c:2882
#12 0x009d4bea in errfinish (dummy=0) at elog.c:565
#13 0x009d7097 in elog_finish (elevel=17, fmt=0xc02678 "Sleeping for 
0.1 seconds.") at elog.c:1378
#14 0x009b765b in SearchCatCacheList (cache=0x27636d0, nkeys=1, 
v1=11604184, v2=0, v3=0, v4=0) at catcache.c:1523
#15 0x009cbd20 in SearchSysCacheList (cacheId=37, nkeys=1, 
key1=11604184, key2=0, key3=0, key4=0) at syscache.c:1338
#16 0x0056e22e in OpernameGetCandidates (names=0x27f3b08, oprkind=98 
'b', missing_schema_ok=0 '\000') at namespace.c:1576
#17 0x005f09a6 in oper (pstate=0x27f3ec0, opname=0x27f3b08, ltypeId=23, 
rtypeId=1700, noError=0 '\000', location=14) at parse_oper.c:414
#18 0x005f132d in make_op (pstate=0x27f3ec0, opname=0x27f3b08, 
ltree=0x27f40f0, rtree=0x27f4128, last_srf=0x0, location=14) at parse_oper.c:778
#19 0x005e5e8d in transformAExprOp (pstate=0x27f3ec0, a=0x27f3a60) at 
parse_expr.c:933
#20 0x005e454e in transformExprRecurse (pstate=0x27f3ec0, 
expr=0x27f3a60) at parse_expr.c:217
#21 0x005e5e44 in transformAExprOp (pstate=0x27f3ec0, a=0x27f3b78) at 
parse_expr.c:930
#22 0x005e454e in transformExprRecurse (pstate=0x27f3ec0, 

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-08-08 Thread Alexander Korotkov
On Sat, Aug 5, 2017 at 11:36 PM, Mark Rofail  wrote:

> This is the query fired upon any UPDATE/DELETE for RI checks:
>
> SELECT 1 FROM ONLY  x WHERE pkatt1 = $1 [AND ...] FOR KEY SHARE
> OF x
>
> in  the case of foreign key arrays, it's wrapped in this query:
>
> SELECT 1 WHERE
> (SELECT count(DISTINCT y) FROM unnest($1) y)
> = (SELECT count(*) FROM () z)
>
> This is where the limitation appears, the DISTINCT keyword. Since in
> reality, count(DISTINCT) will fall back to the default btree opclass for
> the array element type regardless of the opclass indicated in the access
> method. Thus I believe going around DISTINCT is the way to go.
>

Do we already assume that default btree opclass for array element type
matches PK opclass when using @>> operator on UPDATE/DELETE of referenced
table?
If so, we don't introduce additional restriction here...


This is what I came up with:
>
> SELECT 1 WHERE
> (SELECT COUNT(*)
> FROM
> (
> SELECT y
> FROM unnest($1) y
> GROUP BY y
> )
> )
> = (SELECT count(*) () z)
>
> I understand there might be some syntax errors but this is just a proof of
> concept.
>

GROUP BY would also use default btree/hash opclass for element type.  It
doesn't differ from DISTINCT from that point.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] reload-through-the-top-parent switch the partition table

2017-08-08 Thread Rajkumar Raghuwanshi
Hi Rushabh,

While testing latest v2 patch, I got a crash when using
--load-via-partition-root with --schema options. Below are steps to
reproduce.

--create below test data
create schema a;
create schema b;
create schema c;

create table t1 (a int,b text) partition by list(a);
create table a.t1_p1 partition of t1 FOR VALUES in (1,2,3,4) partition by
list(a);
create table b.t1_p1_p1 partition of a.t1_p1 FOR VALUES in (1,2);
create table c.t1_p1_p2 partition of a.t1_p1 FOR VALUES in (3,4);
create table b.t1_p2 partition of t1 FOR VALUES in (5,6,7,8) partition by
list(a);
create table a.t1_p2_p1 partition of b.t1_p2 FOR VALUES in (5,6);
create table t1_p2_p2 partition of b.t1_p2 FOR VALUES in (7,8);

insert into t1 values (8,'t1');
insert into a.t1_p1 values (2,'a.t1_p1');
insert into b.t1_p1_p1 values (1,'b.t1_p1_p1');
insert into c.t1_p1_p2 values (3,'c.t1_p1_p2');
insert into b.t1_p2 values (6,'b.t1_p2');
insert into a.t1_p2_p1 values (5,'a.t1_p2_p1');
insert into t1_p2_p2 values (7,'t1_p2_p1');
insert into t1 values (4 ,'t1');

--trying to take pg_dump
[edb@localhost bin]$ ./pg_dump -d postgres --schema=a -f d1.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root -f
d2.dump -Fp
[edb@localhost bin]$ ./pg_dump -d postgres --load-via-partition-root
--schema=a -f d3.dump -Fp
pg_dump: pg_dump.c:2063: getRootTableInfo: Assertion `tbinfo->numParents ==
1' failed.
Aborted (core dumped)



Thanks & Regards,
Rajkumar Raghuwanshi
QMG, EnterpriseDB Corporation

On Fri, Aug 4, 2017 at 3:01 PM, Rushabh Lathia 
wrote:

>
> Here is an update patch,  now renamed the switch to
> --load-via-partition-root
> and also added the documentation for the new switch into pg_dump as well
> as pg_dumpall.
>
>
> On Fri, Aug 4, 2017 at 7:13 AM, Amit Langote <
> langote_amit...@lab.ntt.co.jp> wrote:
>
>> On 2017/08/04 1:08, David G. Johnston wrote:
>> > On Thu, Aug 3, 2017 at 8:53 AM, Tom Lane  wrote:
>> >
>> >> Robert Haas  writes:
>> >>> So maybe --load-via-partition-root if nobody likes my previous
>> >>> suggestion of --partition-data-via-root ?
>> >>
>> >> WFM.
>> >>
>> >
>> > ​+1
>>
>> +1.
>>
>> Thanks,
>> Amit
>>
>>
>
>
> Thanks,
> Rushabh Lathia
> www.EnterpriseDB.com
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


[HACKERS] The error message "sorry, too many clients already" is imprecise

2017-08-08 Thread Piotr Stefaniak
I recently started getting the "sorry, too many clients already" error.
There are currently four places that can generate it, but fortunately
log_error_verbosity was set to verbose so I was able to see that in this
case the warning was generated by proc.c:InitProcess().

But it's still not much, because there are three different lists you can
run out of and get the same error message from InitProcess(). I was
lucky to be able to rule out two of them. max_connections is set to much
more than the sum of possible connections from all relevant pgBouncer
instances we have, so it's not hitting max_connections. Also, this is on
9.4 and we don't use any funny extensions, so it's probably not running
out of bgworkerFreeProcs either. autovacFreeProcs is what's left but
this is still just a guess and I'd prefer to actually know.

I've made a hack for myself (attached diff against 9.4) which adds a
DETAIL-level message telling me which proc list was saturated. It's not
committable in its current form because of a C99 feature and perhaps for
other reasons.

By the way, I've also noticed that the InitProcess() can segfault upon
hitting set_spins_per_delay(procglobal->spins_per_delay). This only
happens when I run REL9_4_STABLE under gdb, set a breakpoint on
InitProcess, see an autovacuum launcher hit the breakpoint and tell gdb
to continue. "p procglobal->spins_per_delay" says "Cannot access memory
at address 0xf01b2f90". Maybe this means nothing.
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index e608198..613b36a 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -279,6 +279,14 @@ InitProcess(void)
 {
 	/* use volatile pointer to prevent code rearrangement */
 	volatile PROC_HDR *procglobal = ProcGlobal;
+#define LIST_TYPE_ENTRY(e) [(e)] = (#e)
+	enum listType { autovac, bgworker, backend } which;
+	char const * const listTypeNames[] = {
+		LIST_TYPE_ENTRY(autovac),
+		LIST_TYPE_ENTRY(bgworker),
+		LIST_TYPE_ENTRY(backend)
+	};
+#undef LIST_TYPE_ENTRY
 
 	/*
 	 * ProcGlobal should be set up already (if we are a backend, we inherit
@@ -309,11 +317,11 @@ InitProcess(void)
 	set_spins_per_delay(procglobal->spins_per_delay);
 
 	if (IsAnyAutoVacuumProcess())
-		MyProc = procglobal->autovacFreeProcs;
+		which = autovac, MyProc = procglobal->autovacFreeProcs;
 	else if (IsBackgroundWorker)
-		MyProc = procglobal->bgworkerFreeProcs;
+		which = bgworker, MyProc = procglobal->bgworkerFreeProcs;
 	else
-		MyProc = procglobal->freeProcs;
+		which = backend, MyProc = procglobal->freeProcs;
 
 	if (MyProc != NULL)
 	{
@@ -330,13 +338,13 @@ InitProcess(void)
 		/*
 		 * If we reach here, all the PGPROCs are in use.  This is one of the
 		 * possible places to detect "too many backends", so give the standard
-		 * error message.  XXX do we need to give a different failure message
-		 * in the autovacuum case?
+		 * error message.
 		 */
 		SpinLockRelease(ProcStructLock);
 		ereport(FATAL,
 (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
- errmsg("sorry, too many clients already")));
+ errmsg("sorry, too many clients already"),
+ errdetail("%s proc list saturated", listTypeNames[which])));
 	}
 	MyPgXact = >allPgXact[MyProc->pgprocno];
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] FYI: branch for v11 devel is planned for next week

2017-08-08 Thread Michael Paquier
On Mon, Aug 7, 2017 at 5:33 AM, Tom Lane  wrote:
> The release team discussed this a couple weeks ago, but I don't
> think anybody mentioned it on -hackers: v10 seems to be in good
> enough shape that it's okay to make the REL_10_STABLE branch soon,
> and open HEAD for v11 development.
>
> Last year we branched on Aug 15, and we should be able to keep
> to the same schedule.  Barring objections or bad bugs showing up
> soon, I'll make the branch early next week.

Thanks for letting know in advance.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Subscription code improvements

2017-08-08 Thread Masahiko Sawada
Petr,

On Thu, Aug 3, 2017 at 5:24 AM, Stephen Frost  wrote:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> Alvaro Herrera  writes:
>> > I wish you'd stop splitting error message strings across multiple lines.
>> > I've been trapped by a faulty grep not matching a split error message a
>> > number of times :-(  I know by now to remove words until I get a match,
>> > but it seems an unnecessary trap for the unwary.
>>
>> Yeah, that's my number one reason for not splitting error messages, too.
>> It's particularly nasty if similar strings appear in multiple places and
>> they're not all split alike, as you can get misled into thinking that a
>> reported error must have occurred in a place you found, rather than
>> someplace you didn't.
>
> +1.
>

Are you planning to work on remaining patches 0005 and 0006 that
improve the subscription codes in PG11 cycle? If not, I will take over
them and work on the next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] free space % calculation in pgstathashindex

2017-08-08 Thread Amit Kapila
On Mon, Aug 7, 2017 at 9:38 PM, Ashutosh Sharma  wrote:
> On Mon, Aug 7, 2017 at 7:19 PM, Amit Kapila  wrote:
>> On Mon, Aug 7, 2017 at 6:07 PM, Ashutosh Sharma  
>> wrote:
>>> Hi,
>>>
>> ..
..
>> Why an extra parenthesis in above case whereas not in below case?  I
>> think the code will look consistent if you follow the same coding
>> practice.  I suggest don't use it unless you need it.
>
> That is because in the 1st case, there are multiple operators (*, +)
> whereas in the 2nd case we have just one(*). So, just to ensure that
> '*' is performed before '+',  i had used parenthesis, though it is not
> required as '*' has higher precedence than '+'. I have removed the
> extra parenthesis and attached is the new version of patch. Thanks.
>

Your latest patch looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Failover Slots

2017-08-08 Thread Craig Ringer
On 3 August 2017 at 04:35, Robert Haas  wrote:

> On Tue, Jul 25, 2017 at 8:44 PM, Craig Ringer 
> wrote:
> > No. The whole approach seems to have been bounced from core. I don't
> agree
> > and continue to think this functionality is desirable but I don't get to
> > make that call.
>
> I actually think failover slots are quite desirable, especially now
> that we've got logical replication in core.  In a review of this
> thread I don't see anyone saying otherwise.  The debate has really
> been about the right way of implementing that.  Suppose we did
> something like this:
>
> - When a standby connects to a master, it can optionally supply a list
> of slot names that it cares about.
>

Wouldn't that immediately exclude use for PITR and snapshot recovery? I
have people right now who want the ability to promote a PITR-recovered
snapshot into place of a logical replication master and have downstream
peers replay from it. It's more complex than that, as there's a resync
process required to recover changes the failed node had sent to other peers
but isn't available in the WAL archive, but that's the gist.

If you have a 5TB database do you want to run an extra replica or two
because PostgreSQL can't preserve slots without a running, live replica?
Your SAN snapshots + WAL archiving have been fine for everything else so
far.

Requiring live replication connections could also be an issue for service
interruptions, surely?  Unless you persist needed knowledge in the physical
replication slot used by the standby to master connection, so the master
can tell the difference between "downstream went away for while but will
come back" and "downstream is gone forever, toss out its resources."

That's exactly what the catalog_xmin hot_standby_feedback patches in Pg10
do, but they can only tell the master about the oldest resources needed by
any existing slot on the replica. Not which slots. And they have the same
issues with needing a live, running replica.

Also, what about cascading? Lots of "pull" model designs I've looked at
tend to fall down in cascaded environments. For that matter so do failover
slots, but only for the narrower restriction of not being able to actually
decode from a failover-enabled slot on a standby, they still work fine in
terms of cascading down to leaf nodes.

- The master responds by periodically notifying the standby of changes
> to the slot contents using some new replication sub-protocol message.
> - The standby applies those updates to its local copies of the slots.
>

That's pretty much what I expect to have to do for clients to work on
unpatched Pg10, probably using a separate bgworker and normal libpq
connections to the upstream since we don't have hooks to extend the
walsender/walreceiver.

It can work now that the catalog_xmin hot_standby_feedback patches are in,
but it'd require some low-level slot state setting that I know Andres is
not a fan of. So I expect to carry on relying on an out-of-tree failover
slots patch for Pg 10.



> So, you could create a slot on a standby with an "uplink this" flag of
> some kind, and it would then try to keep it up to date using the
> method described above.  It's not quite clear to me how to handle the
> case where the corresponding slot doesn't exist on the master, or
> initially does but then it's later dropped, or it initially doesn't
> but it's later created.
>
> Thoughts?


Right. So the standby must be running and in active communication. It needs
some way to know the master has confirmed slot creation and it can rely on
the slot's resources really being reserved by the master. That turns out to
be quite hard, per the decoding on standby patches. There needs to be some
way to tell the master a standby has gone away forever and to drop its
dependent slots, so you're not stuck wondering "is slot xxyz from standby
abc that we lost in that crash?". Standbys need to cope with having created
a slot, only to find out there's a name collision with master.

For all those reasons, I just extended hot_standby_feedback to report
catalog_xmin separately to upstreams instead, so the existing physical slot
serves all these needs. And it's part of the picture, but there's no way to
get slot position change info from the master back down onto the replicas
so the replicas can advance any of their own slots and, via feedback, free
up master resources. That's where the bgworker hack to query
pg_replication_slots comes in. Seems complex, full of restrictions, and
fragile to me compared to just expecting the master to do it.

The only objection I personally understood and accepted re failover slots
was that it'd be impossible to create a failover slot on a standby and have
that standby "sub-tree" support failover to leaf nodes. Which is true, but
instead we have noting and no viable looking roadmap toward anything users
can benefit from. So I don't think that's the worst restriction in the
world.

I do not understand 

Re: [HACKERS] why not parallel seq scan for slow functions

2017-08-08 Thread Amit Kapila
On Wed, Aug 2, 2017 at 11:12 PM, Jeff Janes  wrote:
> On Wed, Jul 12, 2017 at 7:08 PM, Amit Kapila 
> wrote:
>>
>> On Wed, Jul 12, 2017 at 11:20 PM, Jeff Janes  wrote:
>> > On Tue, Jul 11, 2017 at 10:25 PM, Amit Kapila 
>> > wrote:
>> >>
>> >> On Wed, Jul 12, 2017 at 1:50 AM, Jeff Janes 
>> >> wrote:
>> >> > On Mon, Jul 10, 2017 at 9:51 PM, Dilip Kumar 
>> >> > wrote:
>> >> >>
>> >> >> So because of this high projection cost the seqpath and parallel
>> >> >> path
>> >> >> both have fuzzily same cost but seqpath is winning because it's
>> >> >> parallel safe.
>> >> >
>> >> >
>> >> > I think you are correct.  However, unless parallel_tuple_cost is set
>> >> > very
>> >> > low, apply_projection_to_path never gets called with the Gather path
>> >> > as
>> >> > an
>> >> > argument.  It gets ruled out at some earlier stage, presumably
>> >> > because
>> >> > it
>> >> > assumes the projection step cannot make it win if it is already
>> >> > behind
>> >> > by
>> >> > enough.
>> >> >
>> >>
>> >> I think that is genuine because tuple communication cost is very high.
>> >
>> >
>> > Sorry, I don't know which you think is genuine, the early pruning or my
>> > complaint about the early pruning.
>> >
>>
>> Early pruning.  See, currently, we don't have a way to maintain both
>> parallel and non-parallel paths till later stage and then decide which
>> one is better. If we want to maintain both parallel and non-parallel
>> paths, it can increase planning cost substantially in the case of
>> joins.  Now, surely it can have benefit in many cases, so it is a
>> worthwhile direction to pursue.
>
>
> If I understand it correctly, we have a way, it just can lead to exponential
> explosion problem, so we are afraid to use it, correct?  If I just
> lobotomize the path domination code (make pathnode.c line 466 always test
> false)
>
> if (JJ_all_paths==0 && costcmp != COSTS_DIFFERENT)
>
> Then it keeps the parallel plan and later chooses to use it (after applying
> your other patch in this thread) as the overall best plan.  It even doesn't
> slow down "make installcheck-parallel" by very much, which I guess just
> means the regression tests don't have a lot of complex joins.
>
> But what is an acceptable solution?  Is there a heuristic for when retaining
> a parallel path could be helpful, the same way there is for fast-start
> paths?  It seems like the best thing would be to include the evaluation
> costs in the first place at this step.
>
> Why is the path-cost domination code run before the cost of the function
> evaluation is included?

Because the function evaluation is part of target list and we create
path target after the creation of base paths (See call to
create_pathtarget @ planner.c:1696).

>  Is that because the information needed to compute
> it is not available at that point,

Right.

I see two ways to include the cost of the target list for parallel
paths before rejecting them (a) Don't reject parallel paths
(Gather/GatherMerge) during add_path.  This has the danger of path
explosion. (b)  In the case of parallel paths, somehow try to identify
that path has a costly target list (maybe just check if the target
list has anything other than vars) and use it as a heuristic to decide
that whether a parallel path can be retained.

I think the preference will be to do something on the lines of
approach (b), but I am not sure whether we can easily do that.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GSOC][weekly report 9] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions

2017-08-08 Thread Mengxing Liu
> From: "Alvaro Herrera" 

> * I wonder why did you settle on a skip list as the best optimization
>   path for this.  Did you consider other data structures?  (We don't
>   seem to have much that would be usable in shared memory, I fear.)
>
 
There are three typical alternative data structures: 1) unordered linked list, 
with O(N) searching and O(1) inserting/removing. 
2) tree-like data structures,  with O(log(N)) inserting/searching/removing. 
Skip list just likes a tree.
3) hash table , with O(1) inserting/searching.  But constant is much bigger 
than linked list. 
Any data structures can be classified into one of data structures above.

In PG serializable module,  number of conflicts is much smaller than we 
excepted, which means N is small.
So we should be very careful about constants before Big O notation. 
Sometimes O(N) complexity of linked list is faster than O(1) complexity of hash 
table.  
For example, when N is smaller than 5, HTAB is slower than SHM_QUEUE when 
searching an element.
And in all cases, remove operation of hash table is slower than that of linked 
list, 
which would result in more serious problem: more contention on 
SeriliazableFinishedListLock.

Skip list is better than HTAB because its remove operation has O(1) complexity. 
I think any other tree-like data structures won't do better. 

Shared memory is also the reason why I chose hash table and skip list at the 
beginning.
It's quite difficult to develop a totally new data structure in shared memory,
while there were well tested hash table and linked list code already.

> * I wonder if a completely different approach to the finished xact
>   maintenance problem would be helpful.  For instance, in
>   ReleasePredicateLocks we call ClearOldPredicateLocks()
>   inconditionally, and there we grab the SerializableFinishedListLock
>   lock inconditionally for the whole duration of that routine, but
>   perhaps it would be better to skip the whole ClearOld stuff if the
>   SerializableFinishedListLock is not available?  Find out some way to
>   ensure that the cleanup is always called later on, but maybe skipping
>   it once in a while improves overall performance.
> 

Yes, that sounds nice. Actually, for a special backend worker, it's OK to skip 
the ClearOldPredicateLocks.
Because other workers will do the clean job later. I'll try it later.

> * the whole predicate.c stuff is written using SHM_QUEUE.  I suppose
>   SHM_QUEUE works just fine, but predicate.c was being written at about
>   the same time (or a bit earlier) than the newer ilist.c interface was
>   being created, which I think had more optimization work thrown in.
>   Maybe it would be good for predicate.c to ditch use of SHM_QUEUE and
>   use ilist.c interfaces instead?  We could even think about being less
>   strict about holding exclusive lock on SerializableFinished for the
>   duration of ClearOldPredicateLocks, i.e. use only a share lock and
>   only exchange for exclusive if a list modification is needed.
> 

I read the code in ilist.c but I didn't see too much difference with SHM_QUEUE. 
Anyway, I'll do a small test to compare the performance of these two data 
structures.

> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sincerely


Mengxing Liu










-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-08 Thread Fabien COELHO



Attached patch


I'll look into it.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding hook in BufferSync for backup purposes

2017-08-08 Thread Tom Lane
Andrey Borodin  writes:
> 7 авг. 2017 г., в 18:37, Tom Lane  написал(а):
>> Yeah.  Keep in mind that if the extension does anything at all that could
>> possibly throw an error, and if that error condition persists across
>> multiple tries, you will have broken the database completely: it will
>> be impossible to complete a checkpoint, and your WAL segment pool will
>> grow until it exhausts disk.  So the idea of doing something that involves
>> unspecified extension behavior, especially possible interaction with
>> an external backup agent, right there is pretty terrifying.

> I think that API for extensions should tend to protect developer from
> breaking everything, but may allow it with precaution warnings in docs
> and comments. Please let me know if this assumption is incorrect.

My point is not to claim that we mustn't put a hook there.  It's that what
such a hook could safely do is tightly constrained, and you've not offered
clear evidence that there's something useful to be done within those
constraints.  Alvaro seems to think that the result might be better
reached by hooking in at other places, and my gut reaction is similar.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers