Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-05 Thread Alvaro Herrera
On 2018-Dec-06, Amit Langote wrote:

> The partitionwise join related
> changes in PG 11 moved the add_child_rel_equivalences call in
> set_append_rel_size such that child EC members would be added even before
> checking if the child rel is dummy, but for a reason named in the comment
> above the call:
> 
>... Even if this child is
>  * deemed dummy, it may fall on nullable side in a child-join, which
>  * in turn may participate in a MergeAppend, where we will need the
>  * EquivalenceClass data structures.
> 
> However, I think we can skip adding the dummy child EC members here  and
> instead make it a responsibility of partitionwise join code in joinrels.c
> to add the needed EC members.  Attached a patch to show what I mean, which
> passes the tests and gives this planning time:

Robert, Ashutosh, any comments on this?  I'm unfamiliar with the
partitionwise join code.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-05 Thread Alvaro Herrera
On 2018-Dec-06, Amit Langote wrote:

Hi

> [ Parallel SeqScan on precio_126 to precio_998  ]
> 
> >  ->  Parallel Seq Scan on precio_999 p_874  
> > (cost=0.00..27.50 rows=1 width=16)
> >Filter: ((fecha >= '1990-05-06 
> > 00:00:00'::timestamp without time zone) AND (fecha <= '1999-05-07 
> > 00:00:00'::timestamp without time zone) AND (pluid = 2))
> 
> As you can see from the "Filter: " property above, the baserestrictinfo of
> this Append's parent relation is:
> 
> BETWEEN '1990-05-06' AND '1999-05-07'
> 
> which selects partitions for all days from '1990-05-06' (precio_125) up to
> '1992-09-26' (precio_999).

Looking at my .psql_history, you're right -- I typoed 1990 as 1999 in
one of the clauses.  Thanks, mystery solved :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Facility for detecting insecure object naming

2018-12-05 Thread Noah Misch
On Thu, Aug 30, 2018 at 12:06:09AM -0700, Noah Misch wrote:
> On Sat, Aug 11, 2018 at 12:47:05PM -0700, Noah Misch wrote:
> > On Wed, Aug 08, 2018 at 09:58:38AM -0400, Tom Lane wrote:
> > > When the security team was discussing this issue before, we speculated
> > > about ideas like inventing a function trust mechanism, so that attacks
> > > based on search path manipulations would fail even if they managed to
> > > capture an operator reference.  I'd rather go down that path than
> > > encourage people to do more schema qualification.
> > 
> > Interesting.  If we got a function trust mechanism, how much qualification
> > would you then like?  Here are the levels I know about, along with their
> > implications:
> 
> Any preferences among these options and the fifth option I gave in
> https://postgr.es/m/20180815024429.ga3535...@rfd.leadboat.com?  I don't want
> to leave earthdistance broken.  So far, though, it seems no two people accept
> any one fix.

Ping.  I prefer (1) for most functions.  The loss of readability doesn't weigh
on me much, because contrib and pg_proc.dat don't have much SQL code.  I would
use (3) for ts_debug(), because that function is long, not
performance-critical, and uses only pg_catalog objects; there may be a few
other functions like that.  Implementing function trust would raise the
absolute attractiveness of (2), but I think I would continue to prefer (1).

If we can't agree on something here, (4) stands, and earthdistance functions
will continue to fail unless both the cube extension's schema and the
earthdistance extension's schema are in search_path.  That's bad, and I don't
want to prolong it.  I don't think implementing function trust or a lexical
search_path makes (4) cease to be bad.  Implementing both, however, would make
(4) non-bad.

> > -- (1) Use qualified references and exact match for all objects.
> > --
> > -- Always secure, even if schema usage does not conform to 
> > ddl-schemas-patterns
> > -- and function trust is disabled.
> > --
> > -- Subject to denial of service from anyone able to CREATE in cube schema or
> > -- earthdistance schema.
> > CREATE FUNCTION latitude(earth)
> > RETURNS float8
> > LANGUAGE SQL
> > IMMUTABLE STRICT
> > PARALLEL SAFE
> > AS $$SELECT CASE
> >   WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3)
> > OPERATOR(pg_catalog./)
> > @extschema@.earth() OPERATOR(pg_catalog.<) -1 THEN 
> > -90::pg_catalog.float8
> >   WHEN @cube_schema@.cube_ll_coord($1::@cube_schema@.cube, 3)
> > OPERATOR(pg_catalog./)
> > @extschema@.earth() OPERATOR(pg_catalog.>)  1 THEN  
> > 90::pg_catalog.float8
> >   ELSE pg_catalog.degrees(pg_catalog.asin(@cube_schema@.cube_ll_coord(
> > $1::@cube_schema@.cube, 3) OPERATOR(pg_catalog./) @extschema@.earth()))
> > END$$;
> > 
> > 
> > -- (2) Use qualified references for objects outside pg_catalog.
> > --
> > -- With function trust disabled, this would be subject to privilege 
> > escalation
> > -- from anyone able to CREATE in cube schema.
> > --
> > -- Subject to denial of service from anyone able to CREATE in cube schema or
> > -- earthdistance schema.
> > CREATE FUNCTION latitude(earth)
> > RETURNS float8
> > LANGUAGE SQL
> > IMMUTABLE STRICT
> > PARALLEL SAFE
> > AS $$SELECT CASE
> >   WHEN @cube_schema@.cube_ll_coord($1, 3)
> > /
> > @extschema@.earth() < -1 THEN -90::float8
> >   WHEN @cube_schema@.cube_ll_coord($1, 3)
> > /
> > @extschema@.earth() >  1 THEN  90::float8
> >   ELSE degrees(asin(@cube_schema@.cube_ll_coord($1, 3) / 
> > @extschema@.earth()))
> > END$$;
> > 
> > 
> > -- (3) "SET search_path" with today's code.
> > --
> > -- Security and reliability considerations are the same as (2).  Today, this
> > -- reduces performance by suppressing optimizations like inlining.
> > CREATE FUNCTION latitude(earth)
> > RETURNS float8
> > LANGUAGE SQL
> > IMMUTABLE STRICT
> > PARALLEL SAFE
> > SET search_path FROM CURRENT
> > AS $$SELECT CASE
> >   WHEN cube_ll_coord($1, 3)
> > /
> > earth() < -1 THEN -90::float8
> >   WHEN cube_ll_coord($1, 3)
> > /
> > earth() >  1 THEN  90::float8
> >   ELSE degrees(asin(cube_ll_coord($1, 3) / earth()))
> > END$$;
> > 
> > 
> > -- (4) Today's code (reformatted).
> > --
> > -- Always secure if schema usage conforms to ddl-schemas-patterns, even if
> > -- function trust is disabled.  If cube schema or earthdistance schema is 
> > not in
> > -- search_path, function doesn't work.
> > CREATE FUNCTION latitude(earth)
> > RETURNS float8
> > LANGUAGE SQL
> > IMMUTABLE STRICT
> > PARALLEL SAFE
> > AS $$SELECT CASE
> >   WHEN cube_ll_coord($1, 3)
> > /
> > earth() < -1 THEN -90::float8
> >   WHEN cube_ll_coord($1, 3)
> > /
> > earth() >  1 THEN  90::float8
> >   ELSE degrees(asin(cube_ll_coord($1, 3) / earth()))
> > END$$;
> 



Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Pavel Stehule
čt 6. 12. 2018 v 8:08 odesílatel Amit Kapila 
napsal:

> On Thu, Dec 6, 2018 at 12:30 PM Pavel Stehule 
> wrote:
> >
> > čt 6. 12. 2018 v 7:55 odesílatel Mithun Cy 
> napsal:
> >>
> >> On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila 
> wrote:
> >> >
> >> > On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule <
> pavel.steh...@gmail.com> wrote:
> >> > >
> >> > > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy <
> mithun...@enterprisedb.com> napsal:
> >> > >>
> >> > >> COPY command seems to have improved very slightly with zheap in
> both with size of wal and execution time. I also did some tests with insert
> statement where I could see some regression in zheap when compared to heap
> with respect to execution time. With further more investigation I will
> reply here.
> >> > >>
> >> > >
> >> > > 20% of size reduction looks like effect of fill factor.
> >> > >
> >> >
> >> > I think it is because of smaller zheap tuple sizes.  Mithun can tell
> >> > more about setup whether he has used different fillfactor or anything
> >> > else which could lead to such a big difference.
> >>
> >> Yes default fillfactor is unaltered, zheap tuples sizes are less and
> >> alinged each at 2 Bytes
> >>
> >
> > I am sorry, I know zero about zheap - does zheap use fill factor? if
> yes, why?
> >
>
> Good question.  It is required because tuples can expand (Update tuple
> to bigger length).  In such cases, we try to perform in-place update
> if there is a space in the page.  So, having fillfactor can help.
>
>
Thank you for reply :)

Pavel


-- 
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>


Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Amit Kapila
On Thu, Dec 6, 2018 at 12:30 PM Pavel Stehule  wrote:
>
> čt 6. 12. 2018 v 7:55 odesílatel Mithun Cy  
> napsal:
>>
>> On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila  wrote:
>> >
>> > On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule  
>> > wrote:
>> > >
>> > > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy  
>> > > napsal:
>> > >>
>> > >> COPY command seems to have improved very slightly with zheap in both 
>> > >> with size of wal and execution time. I also did some tests with insert 
>> > >> statement where I could see some regression in zheap when compared to 
>> > >> heap with respect to execution time. With further more investigation I 
>> > >> will reply here.
>> > >>
>> > >
>> > > 20% of size reduction looks like effect of fill factor.
>> > >
>> >
>> > I think it is because of smaller zheap tuple sizes.  Mithun can tell
>> > more about setup whether he has used different fillfactor or anything
>> > else which could lead to such a big difference.
>>
>> Yes default fillfactor is unaltered, zheap tuples sizes are less and
>> alinged each at 2 Bytes
>>
>
> I am sorry, I know zero about zheap - does zheap use fill factor? if yes, why?
>

Good question.  It is required because tuples can expand (Update tuple
to bigger length).  In such cases, we try to perform in-place update
if there is a space in the page.  So, having fillfactor can help.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Pavel Stehule
čt 6. 12. 2018 v 7:55 odesílatel Mithun Cy 
napsal:

> On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila 
> wrote:
> >
> > On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule 
> wrote:
> > >
> > > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy 
> napsal:
> > >>
> > >> COPY command seems to have improved very slightly with zheap in both
> with size of wal and execution time. I also did some tests with insert
> statement where I could see some regression in zheap when compared to heap
> with respect to execution time. With further more investigation I will
> reply here.
> > >>
> > >
> > > 20% of size reduction looks like effect of fill factor.
> > >
> >
> > I think it is because of smaller zheap tuple sizes.  Mithun can tell
> > more about setup whether he has used different fillfactor or anything
> > else which could lead to such a big difference.
>
> Yes default fillfactor is unaltered, zheap tuples sizes are less and
> alinged each at 2 Bytes
>
>
I am sorry, I know zero about zheap - does zheap use fill factor? if yes,
why? I though it was sense just for current format.

Regards

Pavel


> Length of each item. (all Items are identical)
> =
> postgres=# SELECT lp_len FROM
> zheap_page_items(get_raw_page('pgbench_zheap', 9)) limit 1;
>  lp_len
> 
> 102
> (1 row)
>
> postgres=# SELECT lp_len FROM
> heap_page_items(get_raw_page('pgbench_heap', 9)) limit 1;
>  lp_len
> 
> 121
> (1 row)
>
> Total tuples per page
> =
> postgres=# SELECT count(*) FROM
> zheap_page_items(get_raw_page('pgbench_zheap', 9));
>  count
> ---
> 76
> (1 row)
>
> postgres=# SELECT count(*) FROM
> heap_page_items(get_raw_page('pgbench_heap', 9));
>  count
> ---
> 61
> (1 row)
>
> because of this zheap takes less space as reported above.
>

>
> --
> Thanks and Regards
> Mithun Chicklore Yogendra
> EnterpriseDB: http://www.enterprisedb.com
>


Re: minor leaks in pg_dump (PG tarball 10.6)

2018-12-05 Thread Pavel Raiskup
On Wednesday, December 5, 2018 4:59:18 PM CET Stephen Frost wrote:
> This change doesn't seem to make any sense to me..?  If anything, seems
> like we'd end up overallocating memory *after* this change, where we
> don't today (though an analyzer tool might complain because we don't
> free the memory from it and instead copy the pointer from each of these
> items into the tbinfo structure).

Correct, I haven't think that one through.  I was confused that some items
related to the dropped columns could be unreferenced.  But those are
anyways allocated as a solid block with others (not intended to be ever
free()'d).  Feel free to ignore that.

Thanks for looking at this!
Pavel






Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Mithun Cy
On Thu, Dec 6, 2018 at 11:13 AM Amit Kapila  wrote:
>
> On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule  wrote:
> >
> > čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy  
> > napsal:
> >>
> >> COPY command seems to have improved very slightly with zheap in both with 
> >> size of wal and execution time. I also did some tests with insert 
> >> statement where I could see some regression in zheap when compared to heap 
> >> with respect to execution time. With further more investigation I will 
> >> reply here.
> >>
> >
> > 20% of size reduction looks like effect of fill factor.
> >
>
> I think it is because of smaller zheap tuple sizes.  Mithun can tell
> more about setup whether he has used different fillfactor or anything
> else which could lead to such a big difference.

Yes default fillfactor is unaltered, zheap tuples sizes are less and
alinged each at 2 Bytes

Length of each item. (all Items are identical)
=
postgres=# SELECT lp_len FROM
zheap_page_items(get_raw_page('pgbench_zheap', 9)) limit 1;
 lp_len

102
(1 row)

postgres=# SELECT lp_len FROM
heap_page_items(get_raw_page('pgbench_heap', 9)) limit 1;
 lp_len

121
(1 row)

Total tuples per page
=
postgres=# SELECT count(*) FROM
zheap_page_items(get_raw_page('pgbench_zheap', 9));
 count
---
76
(1 row)

postgres=# SELECT count(*) FROM
heap_page_items(get_raw_page('pgbench_heap', 9));
 count
---
61
(1 row)

because of this zheap takes less space as reported above.


-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2018-12-05 Thread Peter Geoghegan
On Wed, Dec 5, 2018 at 10:35 PM Andrey Lepikhov
 wrote:
> This solution changes pg_depend relation for solve a problem, which
> exists only in regression tests.  Very rarely it can be in the
> partitioning cases. Or is it not?

I don't think it's a matter of how rarely this will happen. We're
trying to avoid these diagnostic message changes because they are
wrong. I don't think that there is much ambiguity about that.  Still,
it will happen however often the user drops objects belonging to
partition children, which could be quite often.

> I think this decision is some excessive.
> May be you consider another approach:
> 1. Order of dependencies in 'DROP ... CASCADE' case is a problem of test
> tools, not DBMS. And here we can use 'verbose terse'.
> 2. Print all dependencies in findDependentObjects() on a drop error (see
> attachment as a prototype).

You didn't include changes to the regression test output, which seems
like a big oversight, given that this has a lot to do with diagnostic
messages that are represented in the regression tests. Anyway, I don't
think it's acceptable to change the messages like this. It makes them
much less useful.

These stability issue keeps coming up, which makes a comprehensive
approach seem attractive to me. At least 95% of the test instability
comes from pg_depend.
-- 
Peter Geoghegan



Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)

2018-12-05 Thread Andrey Lepikhov

On 14.11.2018 11:28, Peter Geoghegan wrote:

We're already relying on the scan order being in reverse chronological
order, so we might as well formalize the dependency. I don't think
that it's possible to sort the pg_depend entries as a way of fixing
the breakage while avoiding storing this extra information -- what is
there to sort on that's there already? You'd have to infer a whole
bunch of things about the object types associated with pg_depend
entries to do that, and teach dependency.c about its callers. That
seems pretty brittle to me.


This solution changes pg_depend relation for solve a problem, which 
exists only in regression tests.  Very rarely it can be in the 
partitioning cases. Or is it not?

I think this decision is some excessive.
May be you consider another approach:
1. Order of dependencies in 'DROP ... CASCADE' case is a problem of test 
tools, not DBMS. And here we can use 'verbose terse'.
2. Print all dependencies in findDependentObjects() on a drop error (see 
attachment as a prototype).


--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From e6056363889a00699fcb6ef6ef8ce9bc3e007d7e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Thu, 6 Dec 2018 11:09:17 +0500
Subject: [PATCH] Print All dependencies on error

---
 src/backend/catalog/dependency.c | 59 
 1 file changed, 52 insertions(+), 7 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 7dfa3278a5..e84143af2a 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -595,7 +595,9 @@ findDependentObjects(const ObjectAddress *object,
  */
 if (stack == NULL)
 {
-	char	   *otherObjDesc;
+	char		*emsg = NULL;
+	int			emsgbufsize = 1024;
+	char		*depObjDesc;
 
 	if (pendingObjects &&
 		object_address_present(, pendingObjects))
@@ -605,14 +607,57 @@ findDependentObjects(const ObjectAddress *object,
 		ReleaseDeletionLock(object);
 		return;
 	}
-	otherObjDesc = getObjectDescription();
+
+	do
+	{
+		ObjectAddress depObject;
+		Form_pg_depend foundDep = (Form_pg_depend) GETSTRUCT(tup);
+
+		depObject.classId = foundDep->refclassid;
+		depObject.objectId = foundDep->refobjid;
+		depObject.objectSubId = foundDep->refobjsubid;
+
+		if (object_address_present(, pendingObjects))
+			continue;
+
+		if ((foundDep->deptype == DEPENDENCY_EXTENSION) &&
+			((flags & PERFORM_DELETION_SKIP_EXTENSIONS) ||
+			 (creating_extension &&
+			  depObject.classId == ExtensionRelationId &&
+			  depObject.objectId == CurrentExtensionObject))
+		   )
+			continue;
+
+		if ((foundDep->deptype != DEPENDENCY_INTERNAL) &&
+			(foundDep->deptype != DEPENDENCY_INTERNAL_AUTO)
+			)
+			continue;
+
+		depObjDesc = getObjectDescription();
+		if (emsg == NULL)
+			emsg = palloc0(emsgbufsize);
+		else
+			strcat(emsg, " or ");
+
+		if (strlen(depObjDesc) > emsgbufsize - strlen(emsg))
+		{
+			char	*tmpstr;
+
+			emsgbufsize += strlen(depObjDesc) + 4;
+			tmpstr = palloc0(emsgbufsize);
+			memcpy(tmpstr, emsg, strlen(emsg));
+			pfree(emsg);
+			emsg = tmpstr;
+		}
+		strcat(emsg, depObjDesc);
+	} while (HeapTupleIsValid(tup = systable_getnext(scan)));
+
 	ereport(ERROR,
 			(errcode(ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST),
-			 errmsg("cannot drop %s because %s requires it",
-	getObjectDescription(object),
-	otherObjDesc),
-			 errhint("You can drop %s instead.",
-	 otherObjDesc)));
+			 errmsg("cannot drop %s because another object requires it",
+	getObjectDescription(object)),
+			 errhint("You can try to drop one of the following: %s.",
+	 emsg)));
 }
 
 /*
-- 
2.17.1



Re: Minor typo

2018-12-05 Thread Kyotaro HORIGUCHI
At Tue, 4 Dec 2018 11:37:16 -0500, Stephen Frost  wrote in 
<20181204163716.gr3...@tamriel.snowman.net>
> Thanks to everyone for sharing their thoughts here, the goal is simply
> to try and have the comments as clear as we can for everyone.
> 
> Please find attached a larger rewording of the comment in xlogrecord.h,
> and a minor change to xloginsert.c to clarify that we're removing a hole
> and not doing compression at that point.
> 
> Other thoughts on this..?

Thank you for the complete rewriting. It makes the description
clearer at least for me, execpt the following:

>  * present is BLCKSZ - the length of "hole" bytes.

Maybe it's because I'm not so accustomed to punctuation marks but
I was confused for a second because the '-' didn't look to me a
minus sign, but a dash. If it is not specific to me, a word
'minus' seems better or, (BLCKSZ - ) bytes
is clearer  for me.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Too many logs are written on Windows (LOG: could not reserve shared memory region (addr=%p) for child %p:)

2018-12-05 Thread Takahashi, Ryohei
Hi,


I found the reason of the message.

My customer uses "F-secure" antivirus software.
There are several pages that indicate F-secure causes this message such as [1].
I told my customer to stop F-secure and report to the vendor.


Anyway, I think this message is not helpful to administrators and should be 
lower level such as "DEBUG1".


[1] - 
https://www.postgresql.org/message-id/flat/feb75b80-aa4e-0b45-fbe7-0c2335904fcc%40evolu-s.it


Regards,
Ryohei Takahashi



Re: Limitting full join to one match

2018-12-05 Thread Sergei Agalakov

On 12/5/2018 8:30 PM, John W Higgins wrote:



On Wed, Dec 5, 2018 at 4:34 PM Phil Endecott 
> wrote:


Dear Experts,

I have a couple of tables that I want to reconcile, finding rows
that match and places where rows are missing from one table or the
other:

...

So my question is: how can I modify my query to output only two rows,
like this:?

+++++
|    date    | amount |    date    | amount |
+++++
| 2018-01-01 |  10.00 | 2018-01-01 |  10.00 |
| 2018-02-01 |   5.00 |            |        |
|            |        | 2018-03-01 |   8.00 |
| 2018-04-01 |   5.00 | 2018-04-01 |   5.00 |
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  1
| 2018-05-01 |  20.00 | 2018-05-01 |  20.00 |  2
+++++


Evening Phil,

Window functions are your friend here. I prefer views for this stuff - 
but subqueries would work just fine.


create view a_rows as (select *,
                       row_number() OVER (PARTITION BY date, amount) 
AS pos from a);

create view b_rows as (select *,
                       row_number() OVER (PARTITION BY date, amount) 
AS pos from b);


select
  a_rows.date,
  a_rows.amount,
  a_rows.pos,
  b_rows.date,
  b_rows.amount,
  b_rows.pos
from
  a_rows full join b_rows using (date,amount,pos);

Example here - http://sqlfiddle.com/#!17/305d6/3

John


Any suggestions anyone?


The best I have found so far is something involving EXCEPT ALL:

db=> select * from a except all select * from b;
db=> select * from b except all select * from a;

That's not ideal, though, as what I ultimately want is something
that lists everything with its status:

++++
|    date    | amount | status |
++++
| 2018-01-01 |  10.00 |   OK   |
| 2018-02-01 |   5.00 | a_only |
| 2018-03-01 |   8.00 | b_only |
| 2018-04-01 |   5.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
| 2018-05-01 |  20.00 |   OK   |
++++

That would be easy enough to achieve from the JOIN.


Thanks, Phil.



This question is always asked time to time.
I have found an old article with so far the best solution for big tables.
https://asktom.oracle.com/pls/asktom/f?p=100:11:P11_QUESTION_ID:2151582681236#15393095283923

On the same test data
create table a (date date, amount money);
create table b (date date, amount money);

insert into a values ('2018-01-01', 10);
insert into a values ('2018-02-01', 5);
insert into a values ('2018-04-01', 5);
insert into a values ('2018-05-01', 20);
insert into a values ('2018-05-01', 20);
insert into b values ('2018-01-01', 10);
insert into b values ('2018-03-01', 8);
insert into b values ('2018-04-01', 5);
insert into b values ('2018-05-01', 20);
insert into b values ('2018-05-01', 20);

select tt.date,
   tt.amount,
   count(tt.src1) CNT1,
   count(tt.src2) CNT2
 from
(
 select a.date,
    a.amount,
    1 src1,
    null::integer src2
   from a
 union all
 select b.date,
    b.amount,
    null::integer src1,
    2 src2
  from b
  ) tt
group by tt.date, tt.amount;

date          amount    cnt1    cnt2
2018-01-01    $10.00 1       1
2018-02-01    $5.00      1   0
2018-03-01    $8.00  0       1
2018-04-01    $5.00  1       1
2018-05-01    $20.00     2       2

It requires a sort, so you may want to increase work_mem before 
execution, and then return it back like

SET work_mem = '512MB';
...  run your query
RESET work_mem;

Regards,

Sergei Agalakov


Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Amit Kapila
On Thu, Dec 6, 2018 at 10:03 AM Pavel Stehule  wrote:
>
> čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy  
> napsal:
>>
>> COPY command seems to have improved very slightly with zheap in both with 
>> size of wal and execution time. I also did some tests with insert statement 
>> where I could see some regression in zheap when compared to heap with 
>> respect to execution time. With further more investigation I will reply here.
>>
>
> 20% of size reduction looks like effect of fill factor.
>

I think it is because of smaller zheap tuple sizes.  Mithun can tell
more about setup whether he has used different fillfactor or anything
else which could lead to such a big difference.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-12-05 Thread Michael Paquier
On Thu, Dec 06, 2018 at 01:55:46PM +0900, Kyotaro HORIGUCHI wrote:
> durable_unlink has two modes of faiure. Failure to unlink and
> fsync. If once it fails at the fsync stage, subsequent
> durable_unlink calls for the same file always fail to unlink with
> ENOENT.  If durable_unlink is intended to be called repeatedly on
> falure, perhaps it should return a different code for each
> failure so that the caller can indentify what to do next.

Why?  A WARNING would be logged if the first unlink() fails, and
another, different WARNING would be logged if the subsequent fsync
fails.  It looks enough to me to make a distinction between both.  Now,
you may have a point in the fact that we could also live with only using
unlink() for this code path, as even on repetitive crashes this would
take care of removing orphan archive status files consistently.
--
Michael


signature.asc
Description: PGP signature


RE: Timeout parameters

2018-12-05 Thread Nagaura, Ryohei
Hi, Fabien.

Thank you for your review.
And I'm very sorry to have kept you waiting so long.


About "socket_timeout"

> I'm generally fine with giving more access to low-level parameters to users.
> However, I'm not sure I understand the use case you have that needs these
> new extensions.
If you face the following situation, this parameter will be needed.
1. The connection between the server and the client has been established 
normally.
2. A server process has been received SQL statement.
3. The server OS can return an ack packet, but it takes time to execute the SQL 
statement 
   Or return the result because the server process is very busy.
4. The client wants to close the connection while leaving the job to the server.
In this case, "statement_timeout" can't satisfy at line 4.

> I think that there is some kind of a misnomer: this is not a socket-level
> timeout, but a client-side query timeout, so it should be named differently?
Yes, I think so.

> I'm not sure how to name it, though.
Me too.

> I think that the way it works is a little extreme, basically the connection
> is aborted from within pqWait, and then restarted from scratch.
>
> There is no clear way to know about the value of the setting (SHOW, \set,
> \pset...). Ok, this is already the case of other connection parameters.
If this parameter can be needed, I would like to discuss design and optional 
functions.
How do you think?
I'll correct patch of "socket_timeout" after that.


About "TCP_USER_TIMEOUT"
I fixed on the previous feedback.
Would you review, please?

> There are no tests.
I introduce the test methods of TCP_USER_TIMEOUT.

Test of client-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
postgres=# psql 
postgresql://USERNAME:PASSWORD@hostname:port/dbname?tcp_user_timeout=15000
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Close the client port from the other console of the client machine.
   Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP
4. Query the following SQL:
postgres=# select pg_sleep(10);
5. TCP USER TIMEOUT works correctly if an error message is output to the 
console.

Test of server-side TCP_USER_TIMEOUT:
[client operation]
1. Connect DB server.
2. Get the port number by the following command:
postgres=# select inet_client_port();
3. Set the TCP_USER_TIMEOUT by the following command:
postgres=# set tcp_user_timeout=15000;
4. Query the following SQL:
postgres=# select pg_sleep(10);
5. Close the client port from the other console.
   Please rewrite "56750" to the number confirmed on line 2.
$ iptables -I INPUT -p tcp --dport 56750 -j DROP
[server operation]
6. Verify the logfile.

> There is no documentation.
I made a patch of documentation of TCP USER TIMEOUT.

Best regards,
-
Ryohei Nagaura



document.patch
Description: document.patch


TCP_backend.patch
Description: TCP_backend.patch


TCP_interface.patch
Description: TCP_interface.patch


Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-05 Thread Amit Langote
Hi,

(Re-sending after adding -hackers, sorry for the noise to those who would
receive this twice)

On 2018/12/05 6:55, Alvaro Herrera wrote:
> I noticed another interesting thing, which is that if I modify the query
> to actually reference some partition that I do have (as opposed to the
> above, which just takes 30s to prune everything) the plan is mighty
> curious ... if only because in one of the Append nodes, partitions have
> not been pruned as they should.
> 
> So, at least two bugs here,
> 1. the equivalence-class related slowness,
> 2. the lack of pruning
> 
>   
>  QUERY PLAN   
>  
> ─
>  Hash Join  (cost=1159.13..25423.65 rows=1 width=24)
>Hash Cond: (abs((p.plusalesprice - p_875.plusalesprice)) = 
> (max(abs((p_877.plusalesprice - p_879.plusalesprice)
>->  Nested Loop  (cost=1000.00..25264.52 rows=1 width=20)
>  Join Filter: ((p.loccd = p_875.loccd) AND (p.fecha = p_875.fecha))
>  ->  Gather  (cost=1000.00..25154.38 rows=875 width=16)
>Workers Planned: 2
>->  Parallel Append  (cost=0.00..24066.88 rows=875 width=16)
>  ->  Parallel Seq Scan on precio_125 p  (cost=0.00..27.50 
> rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1999-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))

[ Parallel SeqScan on precio_126 to precio_998  ]

>  ->  Parallel Seq Scan on precio_999 p_874  
> (cost=0.00..27.50 rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1999-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))

As you can see from the "Filter: " property above, the baserestrictinfo of
this Append's parent relation is:

BETWEEN '1990-05-06' AND '1999-05-07'

which selects partitions for all days from '1990-05-06' (precio_125) up to
'1992-09-26' (precio_999).

>  ->  Materialize  (cost=0.00..79.52 rows=2 width=16)
>->  Append  (cost=0.00..79.51 rows=2 width=16)
>  ->  Seq Scan on precio_125 p_875  (cost=0.00..39.75 
> rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))
>  ->  Seq Scan on precio_126 p_876  (cost=0.00..39.75 
> rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))

Whereas for this Append, it is BETWEEN '1990-05-06' AND '1990-05-07'.

>->  Hash  (cost=159.12..159.12 rows=1 width=4)
>  ->  Aggregate  (cost=159.10..159.11 rows=1 width=4)
>->  Nested Loop  (cost=0.00..159.10 rows=1 width=8)
>  Join Filter: ((p_877.loccd = p_879.loccd) AND 
> (p_877.fecha = p_879.fecha))
>  ->  Append  (cost=0.00..79.51 rows=2 width=16)
>->  Seq Scan on precio_125 p_877  
> (cost=0.00..39.75 rows=1 width=16)
>  Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))
>->  Seq Scan on precio_126 p_878  
> (cost=0.00..39.75 rows=1 width=16)
>  Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))
>  ->  Materialize  (cost=0.00..79.52 rows=2 width=16)
>->  Append  (cost=0.00..79.51 rows=2 width=16)
>  ->  Seq Scan on precio_125 p_879  
> (cost=0.00..39.75 rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))
>  ->  Seq Scan on precio_126 p_880  
> (cost=0.00..39.75 rows=1 width=16)
>Filter: ((fecha >= '1990-05-06 
> 00:00:00'::timestamp without time zone) AND (fecha <= '1990-05-07 
> 00:00:00'::timestamp without time zone) AND (pluid = 2))

And also for these two Appends.

So, I don't think there's anything funny going on with pruning here, maybe
just a typo in the query (1999 looks very much 

Re: Use durable_unlink for .ready and .done files for WAL segment removal

2018-12-05 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 5 Dec 2018 16:11:23 +, "Bossart, Nathan"  
wrote in 
> The v4 patch looks good to me!

durable_unlnk has two modes of faiure. Failure to unlink and
fsync. If once it fails at the fsync stage, subsequent
durable_unlink calls for the same file always fail to unlink with
ENOENT.  If durable_unlink is intended to be called repeatedly on
falure, perhaps it should return a different code for each
failure so that the caller can indentify what to do next.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: error message when subscription target is a partitioned table

2018-12-05 Thread Amit Langote
On 2018/12/06 13:19, Michael Paquier wrote:
> On Thu, Dec 06, 2018 at 11:34:19AM +0900, Amit Langote wrote:
>> Adding to January CF.
> 
> Okay, that looks good to me based on your arguments upthread.

Thanks for looking.

> A
> small-ish comment I have is that you could use a set of if/else if
> conditions instead of separate ifs.

Okay, no problem.  Updated patch attached.

Thanks,
Amit
diff --git a/src/backend/executor/execReplication.c 
b/src/backend/executor/execReplication.c
index 5bd3bbc35e..3ff741e684 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -608,6 +608,20 @@ void
 CheckSubscriptionRelkind(char relkind, const char *nspname,
 const char *relname)
 {
+   /* Give more specific error for partitioned and foreign tables. */
+   if (relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s.%s\" is a partitioned table",
+   nspname, relname),
+errdetail("Partitioned tables are not 
supported as logical replication targets.")));
+   else if (relkind == RELKIND_FOREIGN_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s.%s\" is a foreign table",
+   nspname, relname),
+errdetail("Foreign tables are not supported as 
logical replication targets.")));
+
/*
 * We currently only support writing to regular tables.
 */


Re: Hint and detail punctuation

2018-12-05 Thread Michael Paquier
On Wed, Dec 05, 2018 at 05:22:25PM +0100, Daniel Gustafsson wrote:
> While looking at error messages downstream, I noticed a few hints and details
> in postgres which aren’t punctuated as per the style guide.  The attached 
> patch
> fixes the ones where it seemed reasonable to end with a period.

Good point.  I am spotting a couple more places:
src/backend/utils/misc/guc.c:
GUC_check_errdetail("effective_io_concurrency must be set to 0 on
platforms that lack posix_fadvise()");
src/backend/utils/misc/guc.c:
GUC_check_errdetail("recovery_target_timeline is not a valid number");
src/backend/utils/misc/guc.c:
GUC_check_errdetail("recovery_target_name is too long (maximum %d
characters)",

Thanks,
--
Michael


signature.asc
Description: PGP signature


Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Pavel Stehule
čt 6. 12. 2018 v 5:02 odesílatel Mithun Cy 
napsal:

> > On Thu, Mar 1, 2018 at 7:39 PM Amit Kapila 
> wrote:
>
> I did some testing for performance of COPY command for zheap against heap,
> here are my results,
> Machine : cthulhu, (is a 8 node numa machine with 500GB of RAM)
> server non default settings: shared buffers 32GB, max_wal_size = 20GB,
> min_wal_size = 15GB
>
> Test tables and data:
> 
> I have used pgbench_accounts table of pgbench tool as data source with 3
> different scale factors 100, 1000, 2000. Both heap and zheap table is
> lookalike of pgbench_accounts
>
> CREATE TABLE pgbench_zheap (LIKE pgbench_accounts) WITH
> (storage_engine='zheap');
> CREATE TABLE pgbench_heap (LIKE pgbench_accounts) WITH
> (storage_engine='heap');
>
> Test Commands:
> Command to generate datafile: COPY pgbench_accounts TO '/mnt/data-mag/
> mithun.cy/zheapperfbin/bin/pgbench.data';
>
> Command to load from datafile:
> COPY pgbench_heap FROM '/mnt/data-mag/
> mithun.cy/zheapperfbin/bin/pgbench.data'; -- heap table
> COPY pgbench_zheap FROM '/mnt/data-mag/
> mithun.cy/zheapperfbin/bin/pgbench.data'; -- zheap table
>
> Results
> ==
>
> Scale factor : 100
> 
> zheap table size : 1028 MB
> heap table size: 1281 MB
> -- table size reduction: 19% size reduction.
> zheap wal size: 1007 MB
> heap wal size: 1024 MB
> -- wal size difference: 1.6% size reduction.
> zheap COPY  execution time: 24869.451 ms
> heap COPY  execution time: 25858.773 ms
> -- % of improvement -- 3.8% reduction in execution time for zheap
>
> Scale factor : 1000
> -
> zheap table size : 10 GB
> heap table size: 13 GB
> -- table size reduction: 23% size reduction.
> zheap wal size: 10071 MB
> heap wal size: 10243 MB
> -- wal size difference: 1.67% size reduction.
> zheap COPY  execution time: 270790.235 ms
> heap COPY  execution time:  280325.632 ms
> -- % of improvement -- 3.4% reduction in execution time for zheap
>
> Scale factor : 2000
> -
> zheap table size : 20GB
> heap table size: 25GB
> -- table size reduction: 20% size reduction.
> zheap wal size: 20142 MB
> heap wal size: 20499 MB
> -- wal size difference: 1.7% size reduction.
> zheap COPY  execution time: 523702.904 ms
> heap COPY  execution time: 537537.720 ms
> -- % of improvement -- 2.5 % reduction in execution time for zheap
>
>
> COPY command seems to have improved very slightly with zheap in both with
> size of wal and execution time. I also did some tests with insert statement
> where I could see some regression in zheap when compared to heap with
> respect to execution time. With further more investigation I will reply
> here.
>
>
20% of size reduction looks like effect of fill factor.

Regards

Pavel

-- 
> Thanks and Regards
> Mithun Chicklore Yogendra
> EnterpriseDB: http://www.enterprisedb.com
>


Re: error message when subscription target is a partitioned table

2018-12-05 Thread Michael Paquier
On Thu, Dec 06, 2018 at 11:34:19AM +0900, Amit Langote wrote:
> Adding to January CF.

Okay, that looks good to me based on your arguments upthread.  A
small-ish comment I have is that you could use a set of if/else if
conditions instead of separate ifs.
--
Michael


signature.asc
Description: PGP signature


Re: zheap: a new storage format for PostgreSQL

2018-12-05 Thread Mithun Cy
> On Thu, Mar 1, 2018 at 7:39 PM Amit Kapila 
wrote:

I did some testing for performance of COPY command for zheap against heap,
here are my results,
Machine : cthulhu, (is a 8 node numa machine with 500GB of RAM)
server non default settings: shared buffers 32GB, max_wal_size = 20GB,
min_wal_size = 15GB

Test tables and data:

I have used pgbench_accounts table of pgbench tool as data source with 3
different scale factors 100, 1000, 2000. Both heap and zheap table is
lookalike of pgbench_accounts

CREATE TABLE pgbench_zheap (LIKE pgbench_accounts) WITH
(storage_engine='zheap');
CREATE TABLE pgbench_heap (LIKE pgbench_accounts) WITH
(storage_engine='heap');

Test Commands:
Command to generate datafile: COPY pgbench_accounts TO '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data';

Command to load from datafile:
COPY pgbench_heap FROM '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data'; -- heap table
COPY pgbench_zheap FROM '/mnt/data-mag/
mithun.cy/zheapperfbin/bin/pgbench.data'; -- zheap table

Results
==

Scale factor : 100

zheap table size : 1028 MB
heap table size: 1281 MB
-- table size reduction: 19% size reduction.
zheap wal size: 1007 MB
heap wal size: 1024 MB
-- wal size difference: 1.6% size reduction.
zheap COPY  execution time: 24869.451 ms
heap COPY  execution time: 25858.773 ms
-- % of improvement -- 3.8% reduction in execution time for zheap

Scale factor : 1000
-
zheap table size : 10 GB
heap table size: 13 GB
-- table size reduction: 23% size reduction.
zheap wal size: 10071 MB
heap wal size: 10243 MB
-- wal size difference: 1.67% size reduction.
zheap COPY  execution time: 270790.235 ms
heap COPY  execution time:  280325.632 ms
-- % of improvement -- 3.4% reduction in execution time for zheap

Scale factor : 2000
-
zheap table size : 20GB
heap table size: 25GB
-- table size reduction: 20% size reduction.
zheap wal size: 20142 MB
heap wal size: 20499 MB
-- wal size difference: 1.7% size reduction.
zheap COPY  execution time: 523702.904 ms
heap COPY  execution time: 537537.720 ms
-- % of improvement -- 2.5 % reduction in execution time for zheap


COPY command seems to have improved very slightly with zheap in both with
size of wal and execution time. I also did some tests with insert statement
where I could see some regression in zheap when compared to heap with
respect to execution time. With further more investigation I will reply
here.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Re: on or true

2018-12-05 Thread Tatsuo Ishii
>> I found a few more
>> places where true/false is used other than
>> ssl_passphrase_command_supports_reload in config.sgml.
>> Attached is a patch to fix them in config.sgml.
> 
> I will commit this to the master branch if there's no objection.
> Since this is an enhancement, not a bug fix, I think no back patching
> is necessary.

Done.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: \gexec \watch

2018-12-05 Thread David Fetter
On Wed, Dec 05, 2018 at 07:50:23PM -0300, Alvaro Herrera wrote:
> I just noticed that using \watch after \gexec does not do what I would
> like it to do, namely re-execute the returned queries.  Instead, it
> executes the returned queries once, then it just returns the queries.
> That is:
> 
> =# select 'select now()' \gexec \watch
>  2018-12-05 19:46:04.928995-03
> 
>  select now()
> 
>  select now()
> 
>  select now()
> 
> (This is under \pset tuples_only)
> 
> I think to be really useful, this combination ought to work like this
> instead:
> 
> =# select 'select now()' \gexec \watch
>  2018-12-05 19:47:51.045574-03
> 
>  2018-12-05 19:47:52.152132-03
> 
>  2018-12-05 19:47:53.099486-03
> 
> Is any psql hacker interested in fixing this?

As far as I can tell, what is happening currently is correct.

\g is a way to say "semicolon," in the sense that it looks backward to
the beginning of an SQL statement, sends it off to the backend, and
returns the results. Once those results are returned, its job is done,
and it releases control to the next psql event along with the memory
of the query it executed, so a following \g (or other "semicolon") can
do something new with it.

\gexec is a slightly different flavor of "semicolon." It starts off
doing what \g does, then before yielding control, it stores the
results and executes those results as a own query. At this point, psql
has forgotten about the results, even though it remembers the query.

\watch is yet another flavor of "semicolon." As with \g, it notices
the previous (or remembered) SQL had been written and executes it.
Unlike \gexec, it doesn't notice the result set. Instead, it repeats
that query in an infinite loop.

There's a bit of a philosophical issue here, or a mathematical one,
whichever way you want to put it.  Does it actually make sense to have
the behavior of one "semicolon" spill onto another?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: error message when subscription target is a partitioned table

2018-12-05 Thread Amit Langote
On 2018/12/06 11:28, Amit Langote wrote:
> On 2018/12/05 10:28, Amit Langote wrote:
>> On 2018/12/05 10:20, Michael Paquier wrote:
>>> On Tue, Dec 04, 2018 at 09:25:09AM +0100, Magnus Hagander wrote:
 I think more people would directly understand the "is not a table" for a
 foreign table than a partitioned one (for example, it does now show up in
 \dt or under tables in pgadmin, but partitioned ones do). That said, if
 it's not too complicated, I think including foreign tables as well would
 definitely be useful, because it has table in the name. For the other
 types, I agree they don't need to be special-cased, they are fine the way
 they are.
>>>
>>> relkind is directly available in this code path, so it is not that hard
>>> to add.  As you suggest, foreign tables make sense to add as those are
>>> actually *tables*.  And it seems to me that we should also add toast
>>> tables for clarity for the same reason.
>>
>> Considering toast tables here seems like a stretch to me, because they're
>> not user defined.  Chances of users adding a table to a publication whose
>> name matches that of a toast table's on the subscription side seems thin
>> too.  Partitioned tables and foreign tables are user-defined and something
>> they'd expect to be handled appropriately.
> 
> Attached updated patch that adds the check for foreign tables.

Adding to January CF.

Thanks,
Amit





Re: slight tweaks to documentation about runtime pruning

2018-12-05 Thread Amit Langote
On 2018/12/05 16:23, Amit Langote wrote:
> Hi,
> 
> Documentation of run-time pruning tells readers to inspect "nloops"
> property of the EXPLAIN ANALYZE output, but I think that's a typo of
> "loops" which is actually output ("internal variable to track that
> property is indeed nloops).
> 
> However, for pruned partitions' subplans, what's actually shown is the
> string "(never executed)", not loops.  So, wouldn't it be better to tell
> the readers to look for that instead of "loops"?
> 
> Attached is what I have in mind.

Adding this to January CF.

Thanks,
Amit




Re: error message when subscription target is a partitioned table

2018-12-05 Thread Amit Langote
On 2018/12/05 10:28, Amit Langote wrote:
> On 2018/12/05 10:20, Michael Paquier wrote:
>> On Tue, Dec 04, 2018 at 09:25:09AM +0100, Magnus Hagander wrote:
>>> I think more people would directly understand the "is not a table" for a
>>> foreign table than a partitioned one (for example, it does now show up in
>>> \dt or under tables in pgadmin, but partitioned ones do). That said, if
>>> it's not too complicated, I think including foreign tables as well would
>>> definitely be useful, because it has table in the name. For the other
>>> types, I agree they don't need to be special-cased, they are fine the way
>>> they are.
>>
>> relkind is directly available in this code path, so it is not that hard
>> to add.  As you suggest, foreign tables make sense to add as those are
>> actually *tables*.  And it seems to me that we should also add toast
>> tables for clarity for the same reason.
> 
> Considering toast tables here seems like a stretch to me, because they're
> not user defined.  Chances of users adding a table to a publication whose
> name matches that of a toast table's on the subscription side seems thin
> too.  Partitioned tables and foreign tables are user-defined and something
> they'd expect to be handled appropriately.

Attached updated patch that adds the check for foreign tables.

Thanks,
Amit
diff --git a/src/backend/executor/execReplication.c 
b/src/backend/executor/execReplication.c
index 5bd3bbc35e..93c74986a9 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -608,6 +608,20 @@ void
 CheckSubscriptionRelkind(char relkind, const char *nspname,
 const char *relname)
 {
+   /* Give more specific error for partitioned and foreign tables. */
+   if (relkind == RELKIND_PARTITIONED_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s.%s\" is a partitioned table",
+   nspname, relname),
+errdetail("Partitioned tables are not 
supported as logical replication targets.")));
+   if (relkind == RELKIND_FOREIGN_TABLE)
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("\"%s.%s\" is a foreign table",
+   nspname, relname),
+errdetail("Foreign tables are not supported as 
logical replication targets.")));
+
/*
 * We currently only support writing to regular tables.
 */


Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-05 Thread Amit Langote
On 2018/12/06 11:14, Amit Langote wrote:
> I ran the original unmodified query at [1] (the one that produces an empty
> plan due to all children being pruned) against the server built with
> patches I posted on the "speeding up planning with partitions" [2] thread
> and it finished in a jiffy.

Forgot to add the link for [2]: https://commitfest.postgresql.org/21/1778/

Thanks,
Amit




Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2018-12-05 Thread Amit Langote
Hi,

On 2018/12/05 6:55, Alvaro Herrera wrote:
> On 2018-Dec-04, Alvaro Herrera wrote:
> 
>> CREATE TABLE precio(fecha timestamp, pluid int, loccd int, plusalesprice 
>> int) PARTITION BY RANGE (fecha); 
>> SELECT format('CREATE TABLE public.precio_%s PARTITION OF public.precio 
>> (PRIMARY KEY (fecha, pluid, loccd) ) FOR VALUES FROM (''%s'')TO(''%s'')', i, 
>> a, b) FROM (SELECT '1990-01-01'::timestam p+(i||'days')::interval a, 
>> '1990-01-02'::timestamp+(i||'days')::interval b, i FROM 
>> generate_series(1,999) i)x \gexec
> 
> Actually, the primary keys are not needed; it's just as slow without
> them.

I ran the original unmodified query at [1] (the one that produces an empty
plan due to all children being pruned) against the server built with
patches I posted on the "speeding up planning with partitions" [2] thread
and it finished in a jiffy.

explain SELECT l_variacao.fecha, l_variacao.loccd , l_variacao.pant ,
l_variacao.patual , max_variacao.var_max FROM (SELECT p.fecha, p.loccd,
p.plusalesprice patual, da.plusalesprice pant, abs(p.plusalesprice -
da.plusalesprice) as var from precio p, (SELECT p.fecha, p.plusalesprice,
p.loccd from precio p WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2) da WHERE p.fecha between '2017-03-01' and '2017-03-02'
and p.pluid = 2 and p.loccd = da.loccd and p.fecha = da.fecha) l_variacao,
(SELECT max(abs(p.plusalesprice - da.plusalesprice)) as var_max from
precio p, (SELECT p.fecha, p.plusalesprice, p.loccd from precio p WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2) da WHERE
p.fecha between '2017-03-01' and '2017-03-02' and p.pluid = 2 and p.loccd
= da.loccd and p.fecha = da.fecha) max_variacao WHERE max_variacao.var_max
= l_variacao.var;
QUERY PLAN
───
 Result  (cost=0.00..0.00 rows=0 width=24)
   One-Time Filter: false
(2 rows)

Time: 50.792 ms

That's because one of the things changed by one of the patches is that
child EC members are added only for the non-dummy children.  In this case,
since all the children are pruned, there should be zero child EC members,
which is what would happen in PG 10 too.  The partitionwise join related
changes in PG 11 moved the add_child_rel_equivalences call in
set_append_rel_size such that child EC members would be added even before
checking if the child rel is dummy, but for a reason named in the comment
above the call:

   ... Even if this child is
 * deemed dummy, it may fall on nullable side in a child-join, which
 * in turn may participate in a MergeAppend, where we will need the
 * EquivalenceClass data structures.

However, I think we can skip adding the dummy child EC members here  and
instead make it a responsibility of partitionwise join code in joinrels.c
to add the needed EC members.  Attached a patch to show what I mean, which
passes the tests and gives this planning time:

QUERY PLAN
───
 Result  (cost=0.00..0.00 rows=0 width=24) (actual rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 512.788 ms
 Execution Time: 0.162 ms

which is not as low as with the patches at [2] for obvious reasons, but as
low as we can hope to get with PG 11.  Sadly, planning time is less with
PG 10.6:

QUERY PLAN
───
 Result  (cost=0.00..0.00 rows=0 width=24) (actual rows=0 loops=1)
   One-Time Filter: false
 Planning time: 254.533 ms
 Execution time: 0.080 ms
(4 rows)

But I haven't looked closely at what else in PG 11 makes the planning time
twice that of 10.

> I noticed another interesting thing, which is that if I modify the query
> to actually reference some partition that I do have (as opposed to the
> above, which just takes 30s to prune everything) the plan is mighty
> curious ... if only because in one of the Append nodes, partitions have
> not been pruned as they should.
>
> So, at least two bugs here,
> 1. the equivalence-class related slowness,
> 2. the lack of pruning

I haven't reproduced 2 yet.  Can you share the modified query?

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/20181128004402.GC30707%40telsasoft.com
From 67d051526138b00b7429996655a403535ea6d75c Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 6 Dec 2018 10:38:25 +0900
Subject: [PATCH] Add child EC members for only the non-dummy children

In set_append_rel_size, child EC members are added before checking
if the child relation itself is dummy.  That's very inefficient if
there are lots of children as the EC lists grow huge.  Such EC
members would remain unused unless the child participates in
partitionwise join, so add them only if the child "actually"
participates in partitionwise join.
---
 src/backend/optimizer/path/allpaths.c | 27 ---
 src/backend/optimizer/path/joinrels.c | 19 +++
 2 files changed, 31 insertions(+), 15 deletions(-)

diff --git 

Re: [PATCH] Opclass parameters

2018-12-05 Thread Nikita Glukhov

Attached 3rd version of the patches.

On 20.11.2018 14:15, Nikolay Shaplov wrote:

В письме от 15 ноября 2018 18:26:43 пользователь Nikita Glukhov написал:

Attached 2nd version of the patches. Nothing has changed since March,
this is just a rebased version.

CREATE INDEX syntax and parameters storage method still need discussion.

I've played around a bit with you patch and come to some conclusions, I'd like
to share. They are almost same as those before, but now there are more
details.



Again some issues about storing opclass options in pg_inedx:

1. Having both indoption and indoptions column in pg_index will make someone's
brain explode for sure. If not, it will bring troubles when people start
confusing them.

2. Now I found out how do you store option values for each opclass: text[] of
indoptions in pg_index is not the same as text[] in
reloptions in pg_catalog (and it brings more confusion). In reloption each
member of the array is a single option.

reloptions  | {fillfactor=90,autovacuum_enabled=false}

In indoptions, is a whole string of options for one of the indexed attributes,
each array item has all options for one indexed attribute. And this string
needs furthermore parsing, that differs from reloption parsing.

indoptions | {"{numranges=150}","{numranges=160}"}


This brings us to the following issues:

2a. pg_index stores properties of index in general. Properties of each indexed
attributes is stored in pg_attribute table. If we follow this philosophy
it is wrong to add any kind of per-attribute array values into pg_index. These
values should be added to pg_attribute one per each pg_attribute entry.

2b. Since you've chosen method of storage that differs from one that is used
in reloptions, that will lead to two verstions of code that processes the
attributes. And from now on, if we accept this, we should support both of them
and keep them in sync. (I see that you tried to reuse as much code as
possible, but still you added some more that duplicate current reloptions
functionality.)


On 21.11.2018 18:04, Robert Haas wrote:


It seems sensible to have both per-column options and per-index
options.  For example, we've got the fastupdate option for GIN, which
is a property of the index as a whole, not any individual column.  But
you could also want to specify some column-specific options, which
seems to be what this patch is about, since an opclass is associated
with an individual column.  And since an index can have more than one
column, I agree that it seems more appropriate to store this
information in pg_attribute than pg_index.


Ok, I have switched from pg_index.indoptions to pg_attribute.attoptions.


I agree that we should distinguish per-index and per-column options, but they
can also be AM-specific and opclass-specific.

'fastupdate' option for GIN is an example of AM-specific per-index option.

ASC/DESC and NULLS LAST/FIRST are examples of AM-class-specific per-column
options having special SQL syntax.  "AM-class-specific" here means "specific
only for the class of AMs that support ordering".  Now they are stored as flags
in pg_index.indoption[] but later can be moved to pg_attribute.attoptions.

Don't know should per-column AM-specific and opclass-specific options be stored
in the single column attoptions or have separate columns (like attamoptions and
attopclassoptions).  If we will store them together, we can run into name
collisions, but this collisions can be easily resolved using autogenerated
prefixes in option names ("am.foo=bar", "opclass.foo=baz").



And another problem is the options with default values.  They may be not
explicitly  specified by the user, and in this case in current implementation
nothing will be stored in the catalog because default values can be obtained
from the code.  But this will not allow changing of default values without
compatibility issues. So I think it would be better to store both default and
explicitly specified values of all opclass options, but this will require a
major refactoring of current API.


Also I have idea to define list of opclass parameters declaratively when opclass
is created using syntax like the following:

CREATE OPERATOR CLASS name [ (param_name param_type [= default_value] [,...]) ]
FOR TYPE ... AS (
 {  OPTIONS function_name ( arg_type [,...] )   /* reloptions => binary */
  | OPERATOR ...
 } [,...]
)

But if we remember about the min/max values etc. the syntax will definitely
become more complicated, and it will require much more changes in the catalog
(up to creation of new table pg_opclassparams).

In any case, I think it would be nice to give somehow the user the ability to
obtain a list of opclass parameters not only from the documentation.



On 20.11.2018 14:15, Nikolay Shaplov wrote:


I know that relotions code is not really suitable for reusing. This was the
reason why I started solving oplcass option task with rewriting reloptions
code,to make it 100% reusable for any kind of options. 

Re: Connection slots reserved for replication

2018-12-05 Thread Petr Jelinek
On 05/12/2018 15:33, Oleksii Kliukin wrote:
> 
>> On 30. Nov 2018, at 13:58, Alexander Kukushkin  wrote:
>>
>> attaching the new version of the patch.
>> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
>> guarantees that only walsender process could use them.
> 
> With this patch It looks like InitProcess will trigger the generic error 
> about 'too many clients' before the more specific error message in 
> InitWalSenderSlot when exceeding the number of max_wal_senders.
> 
> Does excluding WAL senders from the max_connections limit and including 
> max_wal_senders in MaxBackends also imply that we need to add max_wal_senders 
> to the list at xlog.c: CheckRequiredParameterValues, requiring its value on 
> the replica to be not lower than the one on the primary? 
> 

I think it does, we need the proc slots for walsenders on the standby
same way we do for normal backends.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Wrong parent node for WCO expressions in nodeModifyTable.c?

2018-12-05 Thread Andres Freund
Hi Stephen, All,

While working on the pluggable storage patchset I noticed that it
initializes the WCO expression like:

/*
 * Initialize any WITH CHECK OPTION constraints if needed.
 */
resultRelInfo = mtstate->resultRelInfo;
i = 0;
foreach(l, node->withCheckOptionLists)
{
List   *wcoList = (List *) lfirst(l);
List   *wcoExprs = NIL;
ListCell   *ll;

foreach(ll, wcoList)
{
WithCheckOption *wco = (WithCheckOption *) lfirst(ll);
ExprState  *wcoExpr = ExecInitQual((List *) wco->qual,
   mtstate->mt_plans[i]);

wcoExprs = lappend(wcoExprs, wcoExpr);
}

resultRelInfo->ri_WithCheckOptions = wcoList;
resultRelInfo->ri_WithCheckOptionExprs = wcoExprs;
resultRelInfo++;
i++;
}

note that the parent node for the qual is the plan the tuples originate
from, *not* the target relation. That seems wrong.

It does cause a problem for pluggable storage, but I wonder if it's a
problem beyond that. The fact that the parent is wrong means we'll
anchor subplans within the qual to the wrong parent.  Replacing the
parent with >ps itself, there are regression test differences:

 ERROR:  new row violates check option for view "rw_view1"
 DETAIL:  Failing row contains (15).
 EXPLAIN (costs off) INSERT INTO rw_view1 VALUES (5);
-  QUERY PLAN

+   QUERY PLAN
+-
  Insert on base_tbl b
->  Result
- SubPlan 1
-   ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
- Index Cond: (a = b.a)
- SubPlan 2
-   ->  Seq Scan on ref_tbl r_1
+   SubPlan 1
+ ->  Index Only Scan using ref_tbl_pkey on ref_tbl r
+   Index Cond: (a = b.a)
+   SubPlan 2
+ ->  Seq Scan on ref_tbl r_1
 (7 rows)

 EXPLAIN (costs off) UPDATE rw_view1 SET a = a + 5;
-   QUERY PLAN
--
+QUERY PLAN
+---
  Update on base_tbl b
->  Hash Join
  Hash Cond: (b.a = r.a)
  ->  Seq Scan on base_tbl b
  ->  Hash
->  Seq Scan on ref_tbl r
- SubPlan 1
-   ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
- Index Cond: (a = b.a)
- SubPlan 2
-   ->  Seq Scan on ref_tbl r_2
+   SubPlan 1
+ ->  Index Only Scan using ref_tbl_pkey on ref_tbl r_1
+   Index Cond: (a = b.a)
+   SubPlan 2
+ ->  Seq Scan on ref_tbl r_2
 (11 rows)

 DROP TABLE base_tbl, ref_tbl CASCADE;

And the new output certainly looks more correct to me.

Stephen, I have not researched this much, but is there a chance this
could cause trouble in the backbranches?

Greetings,

Andres Freund



\gexec \watch

2018-12-05 Thread Alvaro Herrera
I just noticed that using \watch after \gexec does not do what I would
like it to do, namely re-execute the returned queries.  Instead, it
executes the returned queries once, then it just returns the queries.
That is:

=# select 'select now()' \gexec \watch
 2018-12-05 19:46:04.928995-03

 select now()

 select now()

 select now()

(This is under \pset tuples_only)

I think to be really useful, this combination ought to work like this
instead:

=# select 'select now()' \gexec \watch
 2018-12-05 19:47:51.045574-03

 2018-12-05 19:47:52.152132-03

 2018-12-05 19:47:53.099486-03

Is any psql hacker interested in fixing this?

-- 
Álvaro Herrera



Re: slow queries over information schema.tables

2018-12-05 Thread Andres Freund
On 2018-12-05 13:41:32 -0500, Tom Lane wrote:
> The bigger picture here is that people seem to like to use domains
> as simple type aliases, which will never have any constraints, but
> we're very dumb about that today.  So the patch as presented seems
> like it has lots of potential applicability, as long as we fix the
> invalidation aspect.

Entirely agreed.



Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
st 5. 12. 2018 v 19:12 odesílatel Dmitry Igrishin 
napsal:

> вт, 4 дек. 2018 г. в 20:13, Pavel Stehule :
> >
> > Hi
> >
> > I wrote plpgsql_check https://github.com/okbob/plpgsql_check.
> >
> > It is working well, but because it does static analyse only, sometimes
> it can produces false alarms or it should to stop a analyse, because there
> are not necessary data.
> >
> > https://github.com/okbob/plpgsql_check/issues/36
> >
> > I see one possible solution in introduction of pragma statement with
> syntax:
> >
> >   PRAGMA keyword [content to semicolon];
> >
> > The pragma has a relation to following statement. So the issue 36 can be
> solved by pragma
> >
> > PRAGMA cmdtype CREATE;
> > EXECUTE format('CREATE TABLE xxx ...
> >
> > The PRAGMA statement does nothing in runtime. It works only in compile
> time, and add a pair of key, value to next non pragma statement. This
> information can be used by some plpgsql extensions.
> >
> > What do you think about this proposal?
> I think it's a good idea in common. But how about multiple PRAGMAs?
> Consider
>
>   PRAGMA cmdtype CREATE;
>   PRAGMA objtype TABLE;
>   EXECUTE format('CREATE TABLE');
>

yes, it is possible. They are assigned to next non pragma statement.

some like

PRAGMA cmdtype SELECT
PRAGMA resulttype (a int, b int, c int)
EXECUTE format('SELECT ... FROM %I ...

PRAGMA tmp_table_query
PRAGMA resulttype (a int, b int, c int)
FOR r IN SELECT * FROM some_temporary_table ...


Re: slow queries over information schema.tables

2018-12-05 Thread Andres Freund
Hi,

On 2018-12-05 13:22:23 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
> >> There are two different issues in that.  One is that the domain might
> >> have constraints (though in reality it does not), so the planner can't
> >> throw away the CoerceToDomain node, and thus can't match the expression
> >> to the index.  Even if we did throw away the CoerceToDomain, it still
> >> would not work because the domain is declared to be over varchar, and
> >> so there's a cast-to-varchar underneath the CoerceToDomain.
> 
> > Couldn't we make expression simplification replace CoerceToDomain with a
> > RelabelType if the constraint is simple enough?  That's not entirely
> > trivial because we'd have to look into the typecache to get the
> > constraints, but that doesn't sound too bad.
> 
> Not following what you have in mind here?  My 0002 throws away the
> CoerceToDomain if there are *no* constraints, but I can't see any
> situation in which we'd likely be able to ignore a constraint,
> simple or not.

Yea, simple probably means nonexistant for now. We could e.g. optimize
some NOT NULL checks away, but it's probably not worth it.


> >> 0003 essentially converts "namecol::text texteq textvalue" into
> >> "namecol nameeqtext textvalue", relying on the new equality
> >> operator introduced by 0001.
> 
> > Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
> > usable outside of one odd builtin type. I was wondering for a bit
> > whether we could have logic to move the cast to the other side of an
> > operator, but I don't see how we could make that generally safe.
> 
> Yeah.  It seems like it could be a special case of a more general
> expression transform facility, but we have no such facility now.
> 
> On the other hand, all of match_special_index_operator is an ugly
> single-purpose kluge already, so I'm not feeling that awful about
> throwing another special case into it.  Someday it would be nice
> to replace that code with something more general and extensible,
> but today is not that day as far as I'm concerned.

Fair enough.

Greetings,

Andres Freund



Re: slow queries over information schema.tables

2018-12-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
>> There are two different issues in that.  One is that the domain might
>> have constraints (though in reality it does not), so the planner can't
>> throw away the CoerceToDomain node, and thus can't match the expression
>> to the index.  Even if we did throw away the CoerceToDomain, it still
>> would not work because the domain is declared to be over varchar, and
>> so there's a cast-to-varchar underneath the CoerceToDomain.

> Couldn't we make expression simplification replace CoerceToDomain with a
> RelabelType if the constraint is simple enough?  That's not entirely
> trivial because we'd have to look into the typecache to get the
> constraints, but that doesn't sound too bad.

Not following what you have in mind here?  My 0002 throws away the
CoerceToDomain if there are *no* constraints, but I can't see any
situation in which we'd likely be able to ignore a constraint,
simple or not.

>> 0003 essentially converts "namecol::text texteq textvalue" into
>> "namecol nameeqtext textvalue", relying on the new equality
>> operator introduced by 0001.

> Ugh, that's indeed a bit kludgy. It'd be nice to have an approach that's
> usable outside of one odd builtin type. I was wondering for a bit
> whether we could have logic to move the cast to the other side of an
> operator, but I don't see how we could make that generally safe.

Yeah.  It seems like it could be a special case of a more general
expression transform facility, but we have no such facility now.

On the other hand, all of match_special_index_operator is an ugly
single-purpose kluge already, so I'm not feeling that awful about
throwing another special case into it.  Someday it would be nice
to replace that code with something more general and extensible,
but today is not that day as far as I'm concerned.

regards, tom lane



Re: proposal: plpgsql pragma statement

2018-12-05 Thread Dmitry Igrishin
вт, 4 дек. 2018 г. в 20:13, Pavel Stehule :
>
> Hi
>
> I wrote plpgsql_check https://github.com/okbob/plpgsql_check.
>
> It is working well, but because it does static analyse only, sometimes it can 
> produces false alarms or it should to stop a analyse, because there are not 
> necessary data.
>
> https://github.com/okbob/plpgsql_check/issues/36
>
> I see one possible solution in introduction of pragma statement with syntax:
>
>   PRAGMA keyword [content to semicolon];
>
> The pragma has a relation to following statement. So the issue 36 can be 
> solved by pragma
>
> PRAGMA cmdtype CREATE;
> EXECUTE format('CREATE TABLE xxx ...
>
> The PRAGMA statement does nothing in runtime. It works only in compile time, 
> and add a pair of key, value to next non pragma statement. This information 
> can be used by some plpgsql extensions.
>
> What do you think about this proposal?
I think it's a good idea in common. But how about multiple PRAGMAs? Consider

  PRAGMA cmdtype CREATE;
  PRAGMA objtype TABLE;
  EXECUTE format('CREATE TABLE');



Re: slow queries over information schema.tables

2018-12-05 Thread Andres Freund
Hi,

On 2018-12-05 12:24:54 -0500, Tom Lane wrote:
> The core of the problem, I think, is that we're unable to convert the
> condition on table_name into an indexscan on pg_class.relname, because
> the view has cast pg_class.relname to the sql_identifier domain.
>
> There are two different issues in that.  One is that the domain might
> have constraints (though in reality it does not), so the planner can't
> throw away the CoerceToDomain node, and thus can't match the expression
> to the index.  Even if we did throw away the CoerceToDomain, it still
> would not work because the domain is declared to be over varchar, and
> so there's a cast-to-varchar underneath the CoerceToDomain.

Couldn't we make expression simplification replace CoerceToDomain with a
RelabelType if the constraint is simple enough?  That's not entirely
trivial because we'd have to look into the typecache to get the
constraints, but that doesn't sound too bad.


> The most straightforward way to fix that would be to add some cross-type
> comparison operators to the name_ops operator family.

That seems reasonable.


> While we only
> directly need 'name = text' to make this case work, the opr_sanity
> regression test will complain if we don't supply a fairly complete set of
> operators, and I think not without reason.  So I experimented with doing
> that, as in the very-much-WIP 0001 patch below.  A name index can only
> cope with strcmp-style comparison semantics, not strcoll-style, so while
> it's okay to call the equality operator = I felt we'd better call the
> inequality operators ~<~ and so on.  While the patch as presented fixes
> the case shown above, it's not all good: the problem with having a new
> 'text = name' operator is that it will also capture cases where you would
> like to have a text index working with a comparison value of type "name".
> If 'text = name' isn't part of the text_ops opclass then that doesn't
> work.  I think that the reason for the join.sql regression test failure
> shown in patch 0001 is likewise that since 'text = name' isn't part of the
> text_ops opclass, the join elimination stuff is unable to prove that it
> can eliminate a join to a unique text column.  This could probably be
> fixed by creating yet another dozen cross-type operators that implement
> text vs name comparisons using strcoll semantics (hence, using the normal
> '<' operator names), and adding that set to the text_ops opfamily.
> I didn't do that here (yet), because it seems pretty tedious.

Is there a way we could make that less laborious by allowing a bit more
casting?


> 0002 is a really preliminary POC for eliminating CoerceToDomain nodes
> at plan time if the domain has no constraints to check.  While this is
> enough to check the effects on plan selection, it's certainly not
> committable as-is, because the resulting plan is broken if someone then
> adds a constraint to the domain.  (There is a regression test failure,
> not shown in 0002, for a case that tests exactly that scenario.)

Hah.


> What we would have to do to make 0002 committable is to add sinval
> signaling to invalidate any cached plan that's dependent on an
> assumption of no constraints on a domain.  This is something that
> we probably want anyway, since it would be necessary if we ever want
> to compile domain constraint expressions as part of the plan tree
> rather than leaving them to run-time.

Yea, that seems good.  I don't like the fact that expression
initialization does the lookups for constraints right now.


> While the sinval additions per se would be fairly straightforward,
> I wonder whether anyone is likely to complain about the race conditions
> that will ensue from not taking any locks associated with the domain
> type; i.e. it's possible that a query would execute with slightly
> out-of-date assumptions about what constraints a domain has.  I suspect
> that we have race conditions like that already, but they might be worse
> if we inspect the domain during planning rather than at executor
> startup.  Is anyone worried enough about that to want to add locking
> overhead to domain usage?  (I'm not.)

I'm not either.


> 0001+0002 still don't get the job done: now we strip off the
> CoerceToDomain successfully, but we still have "relname::varchar"
> being compared to the comparison value, so we still can't match
> that to the index.  0003 is a somewhat klugy solution to that part,
> allowing indxpath.c to construct a name equality test from a texteq
> restriction condition.  (This is semantically OK because equality
> is equality in both name and text domains.  We could not convert
> inequalities, at least not as freely; maybe it could be done in
> C locale, but I've not done anything about that here.)
>
> With all three patches in place, we get
>
> regression=# explain select * from pg_class where 
> relname::information_schema.sql_identifier = 'foo'::text;
>  QUERY PLAN
> 

Re: psql display of foreign keys

2018-12-05 Thread Alvaro Herrera
On 2018-Dec-04, Alvaro Herrera wrote:

> On 2018-Dec-04, Alvaro Herrera wrote:
> 
> > v2 attached.
> 
> Oops.

One more oops: The version I posted was for pg11, and does not apply to
master.  This version applies to master.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 0a181b01d9..9d24fb77bc 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -1479,6 +1479,7 @@ describeOneTableDetails(const char *schemaname,
 		bool		rowsecurity;
 		bool		forcerowsecurity;
 		bool		hasoids;
+		bool		ispartition;
 		Oid			tablespace;
 		char	   *reloptions;
 		char	   *reloftype;
@@ -1501,7 +1502,24 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
-		  "false AS relhasoids, %s, c.reltablespace, "
+		  "false AS relhasoids, c.relispartition, %s, c.reltablespace, "
+		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
+		  "c.relpersistence, c.relreplident\n"
+		  "FROM pg_catalog.pg_class c\n "
+		  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
+		  "WHERE c.oid = '%s';",
+		  (verbose ?
+		   "pg_catalog.array_to_string(c.reloptions || "
+		   "array(select 'toast.' || x from pg_catalog.unnest(tc.reloptions) x), ', ')\n"
+		   : "''"),
+		  oid);
+	}
+	else if (pset.sversion >= 11)
+	{
+		printfPQExpBuffer(,
+		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
+		  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
+		  "c.relhasoids, c.relispartition, %s, c.reltablespace, "
 		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
 		  "c.relpersistence, c.relreplident\n"
 		  "FROM pg_catalog.pg_class c\n "
@@ -1518,7 +1536,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, c.relrowsecurity, c.relforcerowsecurity, "
-		  "c.relhasoids, %s, c.reltablespace, "
+		  "c.relhasoids, false, %s, c.reltablespace, "
 		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
 		  "c.relpersistence, c.relreplident\n"
 		  "FROM pg_catalog.pg_class c\n "
@@ -1535,7 +1553,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, false, false, c.relhasoids, "
-		  "%s, c.reltablespace, "
+		  "false, %s, c.reltablespace, "
 		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
 		  "c.relpersistence, c.relreplident\n"
 		  "FROM pg_catalog.pg_class c\n "
@@ -1552,7 +1570,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, false, false, c.relhasoids, "
-		  "%s, c.reltablespace, "
+		  "false, %s, c.reltablespace, "
 		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END, "
 		  "c.relpersistence\n"
 		  "FROM pg_catalog.pg_class c\n "
@@ -1569,7 +1587,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, false, false, c.relhasoids, "
-		  "%s, c.reltablespace, "
+		  "false, %s, c.reltablespace, "
 		  "CASE WHEN c.reloftype = 0 THEN '' ELSE c.reloftype::pg_catalog.regtype::pg_catalog.text END\n"
 		  "FROM pg_catalog.pg_class c\n "
 		  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
@@ -1585,7 +1603,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT c.relchecks, c.relkind, c.relhasindex, c.relhasrules, "
 		  "c.relhastriggers, false, false, c.relhasoids, "
-		  "%s, c.reltablespace\n"
+		  "false, %s, c.reltablespace\n"
 		  "FROM pg_catalog.pg_class c\n "
 		  "LEFT JOIN pg_catalog.pg_class tc ON (c.reltoastrelid = tc.oid)\n"
 		  "WHERE c.oid = '%s';",
@@ -1600,7 +1618,7 @@ describeOneTableDetails(const char *schemaname,
 		printfPQExpBuffer(,
 		  "SELECT relchecks, relkind, relhasindex, relhasrules, "
 		  "reltriggers <> 0, false, false, relhasoids, "
-		  "%s, reltablespace\n"
+		  "false, %s, reltablespace\n"
 		  "FROM pg_catalog.pg_class WHERE oid = '%s';",
 		  (verbose ?
 		   "pg_catalog.array_to_string(reloptions, E', ')" : "''"),
@@ -1611,7 +1629,7 @@ 

Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
Hi

st 5. 12. 2018 v 18:28 odesílatel Jonah H. Harris 
napsal:

> You can alter the lexer and create a comment node, right? That’s how we
> did hints in EnterpriseDB.
>

I don't think so it is adequate solution - sure, it is, if I would to
implement Oracle's hints. But it is not my target. I afraid about
performance impacts a) for SQL queries, b) for plpgsql.

I had a possibility to see a functions, procedures (ported from Oracle)
with thousands lines of comments. It is not a Oracle problem, due
compilation to persistent byte code. But I don't think so this cost is
accepted for pragma implementation.

The native plpgsql statement PRAGMA has zero performance impact on existing
code. More, the implementation is just local inside plpgsql, what is
important for me.

Regards

Pavel




>
> On Wed, Dec 5, 2018 at 11:41 AM Pavel Stehule 
> wrote:
>
>>
>>
>> st 5. 12. 2018 v 15:03 odesílatel Pavel Luzanov 
>> napsal:
>>
>>>
>>> But maybe your extension could read the PERFORM statement preceding it
>>> and treat it as an annotation hint for the following statement.
>>>
>>>
>>> In this case, comment line in some format will be better than real
>>> PERFORM statement. Like this:
>>>
>>> /*+PRAGMA cmdtype CREATE; */
>>> EXECUTE format('CREATE TABLE xxx ...
>>>
>>
>> I looked there and It is not possible to implement it - plpgsql uses SQL
>> lexer, and the content of comments are just ignored. So I cannot to read
>> comments. There is not any possibility to read it simply from plpgsql.
>> Unfortunately, but it is expected, there is nothing like query string for
>> plpgsql statement.
>>
>> Regards
>>
>> Pavel
>>
>>> -
>>> Pavel Luzanov
>>> Postgres Professional: http://www.postgrespro.com
>>> The Russian Postgres Company
>>>
>>>
>>> --
> Jonah H. Harris
>
>


Re: minor leaks in pg_dump (PG tarball 10.6)

2018-12-05 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Pavel Raiskup (prais...@redhat.com) wrote:
> >> -  attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * 
> >> sizeof(AttrDefInfo));
> >> ...
> >> +  attrdefs = (AttrDefInfo *) 
> >> pg_malloc(numDefaults * sizeof(AttrDefInfo));
> 
> > This change doesn't seem to make any sense to me..?  If anything, seems
> > like we'd end up overallocating memory *after* this change, where we
> > don't today (though an analyzer tool might complain because we don't
> > free the memory from it and instead copy the pointer from each of these
> > items into the tbinfo structure).
> 
> Yeah, Coverity is exceedingly not smart about the method pg_dump uses
> (in lots of places, not just here) of allocating an array and then
> entering pointers to individual array elements into its long-lived
> data structures.  I concur that the proposed change is giving up a
> lot of malloc overhead to silence an invalid complaint, and we
> shouldn't do it.

Agreed, patch attached.  If there aren't any more comments, I'll plan to
push this later today or tomorrow.  I wasn't thinking we would backpatch
this, so if others feel differently, please let me know.

> The other two points seem probably valid, so I wonder why our own
> Coverity runs haven't noticed them.

Not sure..  Looks like Coverity (incorrectly) worries about srvname
being NULL and then dereferenced inside fmtId(), except that relkind
doesn't change between the switch() and the conditional where srvname is
used.  Maybe that is masking the resource leak concern?  I don't see it
complaining about ftoptions though, so really not sure what's going on
there.  I can try to ask the Coverity admin folks, but they've yet to
respond to multiple requests I've made about linking in the v11 branch
with the others..

Thanks!

Stephen
From b3551d31f00ac802392aa1982ed0045d4459bbad Mon Sep 17 00:00:00 2001
From: Stephen Frost 
Date: Wed, 5 Dec 2018 11:53:44 -0500
Subject: [PATCH] Cleanup minor pg_dump memory leaks

In dumputils, we may have successfully parsed the acls when we discover
that we can't parse the reverse ACLs and then return- check and free
aclitems if that happens.

In dumpTableSchema, move ftoptions and srvname under the relkind !=
RELKIND_VIEW branch (since they're only used there) and then check if
they've been allocated and, if so, free them at the end of that block.

Pointed out by Pavel Raiskup, though I didn't use those patches.

Discussion: https://postgr.es/m/2183976.vkcjmhd...@nb.usersys.redhat.com
---
 src/bin/pg_dump/dumputils.c |  2 ++
 src/bin/pg_dump/pg_dump.c   | 14 --
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 8a93ace9fa..475d6dbd73 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -95,6 +95,8 @@ buildACLCommands(const char *name, const char *subname, const char *nspname,
 	{
 		if (!parsePGArray(racls, , ))
 		{
+			if (aclitems)
+free(aclitems);
 			if (raclitems)
 free(raclitems);
 			return false;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d583154fba..637c79af48 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -15303,8 +15303,6 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	int			actual_atts;	/* number of attrs in this CREATE statement */
 	const char *reltypename;
 	char	   *storage;
-	char	   *srvname;
-	char	   *ftoptions;
 	int			j,
 k;
 
@@ -15361,6 +15359,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 	}
 	else
 	{
+		char	   *ftoptions = NULL;
+		char	   *srvname = NULL;
+
 		switch (tbinfo->relkind)
 		{
 			case RELKIND_FOREIGN_TABLE:
@@ -15397,13 +15398,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
 }
 			case RELKIND_MATVIEW:
 reltypename = "MATERIALIZED VIEW";
-srvname = NULL;
-ftoptions = NULL;
 break;
 			default:
 reltypename = "TABLE";
-srvname = NULL;
-ftoptions = NULL;
 		}
 
 		numParents = tbinfo->numParents;
@@ -15951,6 +15948,11 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
   tbinfo->attfdwoptions[j]);
 			}
 		}
+
+		if (ftoptions)
+			free(ftoptions);
+		if (srvname)
+			free(srvname);
 	}
 
 	/*
-- 
2.17.1



signature.asc
Description: PGP signature


Re: proposal: plpgsql pragma statement

2018-12-05 Thread Jonah H. Harris
You can alter the lexer and create a comment node, right? That’s how we did
hints in EnterpriseDB.

On Wed, Dec 5, 2018 at 11:41 AM Pavel Stehule 
wrote:

>
>
> st 5. 12. 2018 v 15:03 odesílatel Pavel Luzanov 
> napsal:
>
>>
>> But maybe your extension could read the PERFORM statement preceding it
>> and treat it as an annotation hint for the following statement.
>>
>>
>> In this case, comment line in some format will be better than real
>> PERFORM statement. Like this:
>>
>> /*+PRAGMA cmdtype CREATE; */
>> EXECUTE format('CREATE TABLE xxx ...
>>
>
> I looked there and It is not possible to implement it - plpgsql uses SQL
> lexer, and the content of comments are just ignored. So I cannot to read
> comments. There is not any possibility to read it simply from plpgsql.
> Unfortunately, but it is expected, there is nothing like query string for
> plpgsql statement.
>
> Regards
>
> Pavel
>
>> -
>> Pavel Luzanov
>> Postgres Professional: http://www.postgrespro.com
>> The Russian Postgres Company
>>
>>
>> --
Jonah H. Harris


Re: slow queries over information schema.tables

2018-12-05 Thread Tom Lane
Pavel Stehule  writes:
> Slow query
> select * from information_schema.tables where table_name = 'pg_class';

Yeah.  This has been complained of many times before.

The core of the problem, I think, is that we're unable to convert the
condition on table_name into an indexscan on pg_class.relname, because
the view has cast pg_class.relname to the sql_identifier domain.

There are two different issues in that.  One is that the domain might
have constraints (though in reality it does not), so the planner can't
throw away the CoerceToDomain node, and thus can't match the expression
to the index.  Even if we did throw away the CoerceToDomain, it still
would not work because the domain is declared to be over varchar, and
so there's a cast-to-varchar underneath the CoerceToDomain.

The latter half of the problem can be exhibited in simplified form as

regression=# explain select * from pg_class where relname = 'pg_class'::text;
 QUERY PLAN 

 Seq Scan on pg_class  (cost=0.00..175.41 rows=7 width=791)
   Filter: ((relname)::text = 'pg_class'::text)
(2 rows)

which is much stupider than what you get with a name comparison:

regression=# explain select * from pg_class where relname = 'pg_class';
 QUERY PLAN 
 
-
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 
rows=1 width=791)
   Index Cond: (relname = 'pg_class'::name)
(2 rows)

(We've seen complaints about this form of problem, too.)  Since there's
no name-vs-text equality operator, we end up applying a cast to text so
that the texteq operator can be used, and now we're out of luck for
matching that to the index.  To add insult to injury, we also fail to
match the cast expression to the available statistics, so that we don't
get quite the right selectivity estimate.

The most straightforward way to fix that would be to add some cross-type
comparison operators to the name_ops operator family.  While we only
directly need 'name = text' to make this case work, the opr_sanity
regression test will complain if we don't supply a fairly complete set of
operators, and I think not without reason.  So I experimented with doing
that, as in the very-much-WIP 0001 patch below.  A name index can only
cope with strcmp-style comparison semantics, not strcoll-style, so while
it's okay to call the equality operator = I felt we'd better call the
inequality operators ~<~ and so on.  While the patch as presented fixes
the case shown above, it's not all good: the problem with having a new
'text = name' operator is that it will also capture cases where you would
like to have a text index working with a comparison value of type "name".
If 'text = name' isn't part of the text_ops opclass then that doesn't
work.  I think that the reason for the join.sql regression test failure
shown in patch 0001 is likewise that since 'text = name' isn't part of the
text_ops opclass, the join elimination stuff is unable to prove that it
can eliminate a join to a unique text column.  This could probably be
fixed by creating yet another dozen cross-type operators that implement
text vs name comparisons using strcoll semantics (hence, using the normal
'<' operator names), and adding that set to the text_ops opfamily.
I didn't do that here (yet), because it seems pretty tedious.

Also worth noting is that the name_ops and text_pattern_ops opfamilies
are implementing identical semantics.  I wonder whether we could merge
them.

While most of the other regression test output changes shown in the 0001
patch are unsurprising, one that is surprising is that a merge join on
two text columns has started sorting USING ~<~ rather than the normal
text ordering.  The reason for this seems to be that the 'text = text'
operator is now a member of name_ops as well as text_ops, and
select_outer_pathkeys_for_merge arbitrarily uses the lowest-number
opfamily OID if it has a choice.  We could avoid that by renumbering
name_ops to have an OID higher than text_ops, though that's certainly
klugy.  Or we might just decide that we like that plan better anyway,
since strcmp-based comparison is probably faster than strcoll-based
comparison.  (It'd be kind of nice if the choice were based on something
more principled than OID order, though I don't especially want to do
anything about that right now.)

Now, 0001 by itself doesn't do much for Pavel's complaint, because
the view is still forcing a cast to sql_identifier to be inserted
atop pg_class.relname, even though we no longer need any cast for
simple name-vs-text cases.

0002 is a really preliminary POC for eliminating CoerceToDomain nodes
at plan time if the domain has no constraints to check.  While this is
enough to check the effects on plan selection, it's certainly not

Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
st 5. 12. 2018 v 15:03 odesílatel Pavel Luzanov 
napsal:

>
> But maybe your extension could read the PERFORM statement preceding it and
> treat it as an annotation hint for the following statement.
>
>
> In this case, comment line in some format will be better than real PERFORM
> statement. Like this:
>
> /*+PRAGMA cmdtype CREATE; */
> EXECUTE format('CREATE TABLE xxx ...
>

I looked there and It is not possible to implement it - plpgsql uses SQL
lexer, and the content of comments are just ignored. So I cannot to read
comments. There is not any possibility to read it simply from plpgsql.
Unfortunately, but it is expected, there is nothing like query string for
plpgsql statement.

Regards

Pavel

> -
> Pavel Luzanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>


Re: don't mark indexes invalid unnecessarily

2018-12-05 Thread Alvaro Herrera
Pushed, thanks for the review.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: psql display of foreign keys

2018-12-05 Thread Alvaro Herrera
I added this patch to commitfest in order to get more opinions,
particularly on whether to backpatch this.  I might commit sooner than
that if others care to comment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Hint and detail punctuation

2018-12-05 Thread Daniel Gustafsson
While looking at error messages downstream, I noticed a few hints and details
in postgres which aren’t punctuated as per the style guide.  The attached patch
fixes the ones where it seemed reasonable to end with a period.

cheers ./daniel



errhint_punctuation.patch
Description: Binary data


Re: minor leaks in pg_dump (PG tarball 10.6)

2018-12-05 Thread Tom Lane
Stephen Frost  writes:
> * Pavel Raiskup (prais...@redhat.com) wrote:
>> -attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * 
>> sizeof(AttrDefInfo));
>> ...
>> +attrdefs = (AttrDefInfo *) 
>> pg_malloc(numDefaults * sizeof(AttrDefInfo));

> This change doesn't seem to make any sense to me..?  If anything, seems
> like we'd end up overallocating memory *after* this change, where we
> don't today (though an analyzer tool might complain because we don't
> free the memory from it and instead copy the pointer from each of these
> items into the tbinfo structure).

Yeah, Coverity is exceedingly not smart about the method pg_dump uses
(in lots of places, not just here) of allocating an array and then
entering pointers to individual array elements into its long-lived
data structures.  I concur that the proposed change is giving up a
lot of malloc overhead to silence an invalid complaint, and we
shouldn't do it.

The other two points seem probably valid, so I wonder why our own
Coverity runs haven't noticed them.

regards, tom lane



Re: minor leaks in pg_dump (PG tarball 10.6)

2018-12-05 Thread Stephen Frost
Greetings,

* Pavel Raiskup (prais...@redhat.com) wrote:
> Among other reports (IMO clearly non-issues), I'm sending patch which
> fixes/points to a few resource leaks detected by Coverity that might be
> worth fixing.  If they are not, feel free to ignore this mail.

> diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
> index 8a93ace9fa..475d6dbd73 100644
> --- a/src/bin/pg_dump/dumputils.c
> +++ b/src/bin/pg_dump/dumputils.c
> @@ -95,6 +95,8 @@ buildACLCommands(const char *name, const char *subname, 
> const char *nspname,
>   {
>   if (!parsePGArray(racls, , ))
>   {
> + if (aclitems)
> + free(aclitems);
>   if (raclitems)
>   free(raclitems);
>   return false;

Yeah, that could be fixed.

> diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
> index d583154fba..731a08c15c 100644
> --- a/src/bin/pg_dump/pg_dump.c
> +++ b/src/bin/pg_dump/pg_dump.c
> @@ -8284,7 +8284,6 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
> numTables)
>   res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
>  
>   numDefaults = PQntuples(res);
> - attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * 
> sizeof(AttrDefInfo));
>  
>   for (j = 0; j < numDefaults; j++)
>   {
> @@ -8304,6 +8303,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
> numTables)
>   if (tbinfo->attisdropped[adnum - 1])
>   continue;
>  
> + attrdefs = (AttrDefInfo *) 
> pg_malloc(numDefaults * sizeof(AttrDefInfo));
> +

This change doesn't seem to make any sense to me..?  If anything, seems
like we'd end up overallocating memory *after* this change, where we
don't today (though an analyzer tool might complain because we don't
free the memory from it and instead copy the pointer from each of these
items into the tbinfo structure).

Moving the allocation into the loop would also add unnecessary malloc
traffic, so I don't think we should add this.

> @@ -15951,6 +15952,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
> 
> tbinfo->attfdwoptions[j]);
>   }
>   }
> +
> + free(ftoptions);
> + free(srvname);
>   }

Yes, those could be free'd too.

I'll see about making those changes.

Thanks!

Stephen


signature.asc
Description: PGP signature


minor leaks in pg_dump (PG tarball 10.6)

2018-12-05 Thread Pavel Raiskup
Among other reports (IMO clearly non-issues), I'm sending patch which
fixes/points to a few resource leaks detected by Coverity that might be
worth fixing.  If they are not, feel free to ignore this mail.

Pavel

diff --git a/src/bin/pg_dump/dumputils.c b/src/bin/pg_dump/dumputils.c
index 8a93ace9fa..475d6dbd73 100644
--- a/src/bin/pg_dump/dumputils.c
+++ b/src/bin/pg_dump/dumputils.c
@@ -95,6 +95,8 @@ buildACLCommands(const char *name, const char *subname, const 
char *nspname,
{
if (!parsePGArray(racls, , ))
{
+   if (aclitems)
+   free(aclitems);
if (raclitems)
free(raclitems);
return false;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d583154fba..731a08c15c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -8284,7 +8284,6 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
numTables)
res = ExecuteSqlQuery(fout, q->data, PGRES_TUPLES_OK);
 
numDefaults = PQntuples(res);
-   attrdefs = (AttrDefInfo *) pg_malloc(numDefaults * 
sizeof(AttrDefInfo));
 
for (j = 0; j < numDefaults; j++)
{
@@ -8304,6 +8303,8 @@ getTableAttrs(Archive *fout, TableInfo *tblinfo, int 
numTables)
if (tbinfo->attisdropped[adnum - 1])
continue;
 
+   attrdefs = (AttrDefInfo *) 
pg_malloc(numDefaults * sizeof(AttrDefInfo));
+
attrdefs[j].dobj.objType = DO_ATTRDEF;
attrdefs[j].dobj.catId.tableoid = 
atooid(PQgetvalue(res, j, 0));
attrdefs[j].dobj.catId.oid = 
atooid(PQgetvalue(res, j, 1));
@@ -15951,6 +15952,9 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
  
tbinfo->attfdwoptions[j]);
}
}
+
+   free(ftoptions);
+   free(srvname);
}
 
/*






Re: don't mark indexes invalid unnecessarily

2018-12-05 Thread Alvaro Herrera
On 2018-Dec-05, Amit Langote wrote:

> Hi,
> 
> On 2018/12/04 7:50, Alvaro Herrera wrote:
> > While working on FKs pointing to partitioned tables, I noticed that in
> > PG11 we fail to produce a working dump in the case of a partitioned
> > table that doesn't have partitions.
> > 
> > The attached patch fixes that.
> 
> The fix makes sense.  I see that the fix is similar in spirit to:
> 
> commit 9139aa19423b736470f669e566f8ef6a7f19b801

Ooh, right, it's pretty much the same.  I wasn't aware of this commit,
thanks for pointing it out.

> > I patched it so that it continues
> > to work (and now it tests the correct thing).  To be precise, the test
> > was doing this:
> > 
> >  create table parted_conflict (a int, b text) partition by range (a);
> >  create table parted_conflict_1 partition of parted_conflict for values 
> > from (0) to (1000) partition by range (a);
> >  create unique index on only parted_conflict_1 (a);
> >  create unique index on only parted_conflict (a);
> >  alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
> >  create table parted_conflict_1_1 partition of parted_conflict_1 for values 
> > from (0) to (500);
> > 
> > with the expectation that the partition would not have a proper index on
> > which to run an INSERT INTO parted_conflict_1 ON CONFLICT (a), expecting
> > that partition parted_conflict_1_1 would not have the index.  But in
> > reality parted_conflict_1_1 does have the index (and it only fails
> > because the index in its parent is marked invalid).  So the patch moves
> > the CREATE TABLE parted_conflict_1_1 to before the indexes creation, so
> > that the partition really does not have the index, and then it gets the
> > expected error.
> 
> ... isn't the test really trying to show that invalid unique index on
> table mentioned in the insert command cannot be used to proceed with the
> on conflict clause, and so isn't broken per se?  But maybe I misunderstood
> you.

You're right, I misinterpreted one bit: I was thinking that when we
create the first partition parted_conflict_1_1, then the index
automatically created in it causes the index in parted_conflict_1 to
become valid.  But that's not so: the index remains invalid.  This seems
a bit broken in itself, but it's not really important because that's
precisely what's getting fixed by my patch, namely that the index
created ONLY is valid if there are no partitions.  So if we want the
index to appear invalid (to continue to reproduce the scenario), we need
to create the partition first.

Thanks for reviewing.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Connection slots reserved for replication

2018-12-05 Thread Oleksii Kliukin


> On 30. Nov 2018, at 13:58, Alexander Kukushkin  wrote:
> 
> attaching the new version of the patch.
> Now it simply reserves max_wal_senders slots in the ProcGlobal, what
> guarantees that only walsender process could use them.

With this patch It looks like InitProcess will trigger the generic error about 
'too many clients' before the more specific error message in InitWalSenderSlot 
when exceeding the number of max_wal_senders.

Does excluding WAL senders from the max_connections limit and including 
max_wal_senders in MaxBackends also imply that we need to add max_wal_senders 
to the list at xlog.c: CheckRequiredParameterValues, requiring its value on the 
replica to be not lower than the one on the primary? 

Cheers,
Oleksii


Re: NOTIFY and pg_notify performance when deduplicating notifications

2018-12-05 Thread Julien Demoor
Is there a particular format that is needed for the benchmark? Here's a 
quick benchmark below.


Unpatched, generating N notifications takes t milliseconds:
N= 1, t= 348
N= 2, t=1419
N= 3, t=3253
N= 4, t=6170

Patched, with the 'never' collapse mode, on another (much faster) machine:
N= 1, t=   6
N= 2, t=  32
N= 3, t=  11
N= 4, t=  14
N=10, t=  37
N=20, t=  86

The benchmark shows that the time to send notifications grows 
quadratically with the number of notifications per transaction without 
the patch while it grows linearly with the patch applied and 
notification collapsing disabled.



The actual psql output is below.

-- Unpatched

jdemoor=# listen my_channel;
LISTEN
Time: 0.100 ms
jdemoor=#
jdemoor=#
jdemoor=# begin;
BEGIN
Time: 0.034 ms
jdemoor=# select count(*) from (select pg_notify('my_channel', g::text) 
from generate_series(1, 1) g) x;

 count
---
 1
(1 row)

Time: 348.214 ms
jdemoor=# rollback;
ROLLBACK
Time: 0.054 ms
jdemoor=#
jdemoor=# begin;
BEGIN
Time: 0.050 ms
jdemoor=# select count(*) from (select pg_notify('my_channel', g::text) 
from generate_series(1, 2) g) x;

 count
---
 2
(1 row)

Time: 1419.723 ms (00:01.420)
jdemoor=# rollback;
ROLLBACK
Time: 0.062 ms
jdemoor=#
jdemoor=# begin;
BEGIN
Time: 0.020 ms
jdemoor=# select count(*) from (select pg_notify('my_channel', g::text) 
from generate_series(1, 3) g) x;

 count
---
 3
(1 row)

Time: 3253.845 ms (00:03.254)
jdemoor=# rollback;
ROLLBACK
Time: 0.064 ms
jdemoor=#
jdemoor=# begin;
BEGIN
Time: 0.020 ms
jdemoor=# select count(*) from (select pg_notify('my_channel', g::text) 
from generate_series(1, 4) g) x;

 count
---
 4
(1 row)

Time: 6170.646 ms (00:06.171)
jdemoor=# rollback;
ROLLBACK
Time: 0.063 ms
jdemoor=#



-- Patched

postgres=# listen my_channel;
LISTEN
Time: 0.164 ms
postgres=#
postgres=#
postgres=# begin;
BEGIN
Time: 0.099 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 1) g) x;

 count
---
 1
(1 row)

Time: 6.092 ms
postgres=# rollback;
ROLLBACK
Time: 0.112 ms
postgres=#
postgres=# begin;
BEGIN
Time: 0.032 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 2) g) x;

 count
---
 2
(1 row)

Time: 7.378 ms
postgres=# rollback;
ROLLBACK
Time: 0.258 ms
postgres=#
postgres=# begin;
BEGIN
Time: 0.070 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 3) g) x;

 count
---
 3
(1 row)

Time: 11.782 ms
postgres=# rollback;
ROLLBACK
Time: 0.256 ms
postgres=#
postgres=# begin;
BEGIN
Time: 0.073 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 4) g) x;

 count
---
 4
(1 row)

Time: 14.269 ms
postgres=# rollback;
ROLLBACK
Time: 0.144 ms
postgres=# begin;
BEGIN
Time: 0.204 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 10) g) x;

 count

 10
(1 row)

Time: 37.199 ms
postgres=# rollback;
ROLLBACK
Time: 0.864 ms
postgres=#
postgres=#
postgres=# begin;
BEGIN
Time: 0.126 ms
postgres=# select count(*) from (select pg_notify('my_channel', g::text, 
'never') from generate_series(1, 20) g) x;

 count

 20
(1 row)

Time: 86.477 ms
postgres=# rollback;
ROLLBACK
Time: 1.292 ms




On 01/12/2018 02:35, Dmitry Dolgov wrote:

On Mon, Nov 19, 2018 at 8:30 AM Julien Demoor  wrote:

Thank you for the review. I've addressed all your points in the attached
patch. The patch was made against release 11.1.


I've noticed, that cfbot complains about this patch [1], since:

Duplicate OIDs detected:
3423
found 1 duplicate OID(s) in catalog data

At the same time it's quite minor problem, so I'm moving it to the nexc CF
as
"Needs review".

Also since it's performance related patch, and the latest tests I see in the
history of this topic were posted around the time of the initial thread
(which
was quite some time ago), could we expect to see some new benchmarks with
this
patch and without? Probably the positive difference would be obvious, but
still.

[1]: https://travis-ci.org/postgresql-cfbot/postgresql/builds/461617098






Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
st 5. 12. 2018 v 15:03 odesílatel Pavel Luzanov 
napsal:

>
> But maybe your extension could read the PERFORM statement preceding it and
> treat it as an annotation hint for the following statement.
>
>
> In this case, comment line in some format will be better than real PERFORM
> statement. Like this:
>
> /*+PRAGMA cmdtype CREATE; */
> EXECUTE format('CREATE TABLE xxx ...
>

I though about it, but it is significantly harder for implementation. Now,
the content of comments are just ignored. If we push PRAGMA into comments,
then we should to do some work with comments.

PRAGMA as statement is much more easy for implementation. But Using PRAGMA
inside comments is very good alternative.

Regards

Pavel



> -
> Pavel Luzanov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>


Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
st 5. 12. 2018 v 14:42 odesílatel Alexey Bashtanov 
napsal:

>
>
>> You can use PERFORM as a workaround:
>>
>> PERFORM 'PRAGMA', 'cmdtype', 'CREATE';
>>
>> There's some overhead when executing, but probably not too much.
>>
>
> Thank you for tip, but I have not any idea, how it can work?
>
>
> Well, I thought you were for a comment-like thing that remains there when
> compiled and can act as a hint for your static analysis extension.
> Surely this PERFORM won't impose any hints on the following statement
> itself.
> But maybe your extension could read the PERFORM statement preceding it and
> treat it as an annotation hint for the following statement.
>

PERFORM is +/- SELECT, so if I use PERFORM, I have to modify this statement.



> Best,
>   Alex
>


Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Luzanov


But maybe your extension could read the PERFORM statement preceding it 
and treat it as an annotation hint for the following statement.


In this case, comment line in some format will be better than real 
PERFORM statement. Like this:


/*+PRAGMA cmdtype CREATE; */
EXECUTE format('CREATE TABLE xxx ...

-
Pavel Luzanov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: proposal: plpgsql pragma statement

2018-12-05 Thread Alexey Bashtanov




You can use PERFORM as a workaround:

PERFORM 'PRAGMA', 'cmdtype', 'CREATE';

There's some overhead when executing, but probably not too much.


Thank you for tip, but I have not any idea, how it can work?



Well, I thought you were for a comment-like thing that remains there 
when compiled and can act as a hint for your static analysis extension.
Surely this PERFORM won't impose any hints on the following statement 
itself.
But maybe your extension could read the PERFORM statement preceding it 
and treat it as an annotation hint for the following statement.


Best,
  Alex


Re: proposal: plpgsql pragma statement

2018-12-05 Thread Pavel Stehule
Hi

st 5. 12. 2018 v 12:42 odesílatel Alexey Bashtanov 
napsal:

> Hello Pavel,
>
> >
> > The PRAGMA statement does nothing in runtime. It works only in compile
> > time, and add a pair of key, value to next non pragma statement. This
> > information can be used by some plpgsql extensions.
> >
> > What do you think about this proposal?
> >
>
> You can use PERFORM as a workaround:
>
> PERFORM 'PRAGMA', 'cmdtype', 'CREATE';
>
> There's some overhead when executing, but probably not too much.
>

Thank you for tip, but I have not any idea, how it can work?

Regards

Pavel


>
> Best regards,
>Alexey
>


Re: proposal: plpgsql pragma statement

2018-12-05 Thread Alexey Bashtanov

Hello Pavel,



The PRAGMA statement does nothing in runtime. It works only in compile 
time, and add a pair of key, value to next non pragma statement. This 
information can be used by some plpgsql extensions.


What do you think about this proposal?



You can use PERFORM as a workaround:

PERFORM 'PRAGMA', 'cmdtype', 'CREATE';

There's some overhead when executing, but probably not too much.

Best regards,
  Alexey



Re: [PROPOSAL]a new data type 'bytea' for ECPG

2018-12-05 Thread Michael Meskes
Matsumura-san,

> Sorry to bother you, but I hope any comment of yours.

It is no bother.

I'm fine with the patch if it does not work against the standard.

I do think, though, we should change the debug output for
ecpg_free_params(). The way it is now it prints binary values which we
also have in our test suite. I'm afraid this will come back to haunt
us.

> > The patch does not support ECPG.bytea in sqltype of "struct
> > sqlvar_struct"
> > because of compatibility.

Sorry I do not really understand what you mean. Could you please
explain?

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




Re: don't mark indexes invalid unnecessarily

2018-12-05 Thread Amit Langote
Hi,

On 2018/12/04 7:50, Alvaro Herrera wrote:
> While working on FKs pointing to partitioned tables, I noticed that in
> PG11 we fail to produce a working dump in the case of a partitioned
> table that doesn't have partitions.
> 
> The attached patch fixes that.

The fix makes sense.  I see that the fix is similar in spirit to:

commit 9139aa19423b736470f669e566f8ef6a7f19b801
Author: Stephen Frost 
Date:   Tue Apr 25 16:57:43 2017 -0400

  Allow ALTER TABLE ONLY on partitioned tables

  There is no need to forbid ALTER TABLE ONLY on partitioned tables,
  when no partitions exist yet.  This can be handy for users who are
  building up their partitioned table independently and will create actual
  partitions later.

  In addition, this is how pg_dump likes to operate in certain instances.

> In doing so, it breaks a test ... and
> analyzing that, it turns out that the test was broken, it wasn't testing
> what it was supposed to be testing.

Hadn't considered it this closely at the time, but...

> I patched it so that it continues
> to work (and now it tests the correct thing).  To be precise, the test
> was doing this:
> 
>  create table parted_conflict (a int, b text) partition by range (a);
>  create table parted_conflict_1 partition of parted_conflict for values from 
> (0) to (1000) partition by range (a);
>  create unique index on only parted_conflict_1 (a);
>  create unique index on only parted_conflict (a);
>  alter index parted_conflict_a_idx attach partition parted_conflict_1_a_idx;
>  create table parted_conflict_1_1 partition of parted_conflict_1 for values 
> from (0) to (500);
> 
> with the expectation that the partition would not have a proper index on
> which to run an INSERT INTO parted_conflict_1 ON CONFLICT (a), expecting
> that partition parted_conflict_1_1 would not have the index.  But in
> reality parted_conflict_1_1 does have the index (and it only fails
> because the index in its parent is marked invalid).  So the patch moves
> the CREATE TABLE parted_conflict_1_1 to before the indexes creation, so
> that the partition really does not have the index, and then it gets the
> expected error.

... isn't the test really trying to show that invalid unique index on
table mentioned in the insert command cannot be used to proceed with the
on conflict clause, and so isn't broken per se?  But maybe I misunderstood
you.

Thanks,
Amit